2.2 Bayesian parameter estimation

Once the measurement data are available, the next step would be to estimate the model parameters. Among many parameter estimation methods, the Bayesian inference is explained in this section. In the following explanation, Θ represents the random variable of the unknown model parameter, and Y represents the random variable of degradation feature. A variable with an upper case denotes a random variable, while a variable with a lower case denotes a realization of the random variable. Bayesian inference estimates the degree of belief in a hypothesis based on collected evidence. Bayes [12] formulated the degree of belief using the identity in conditional probability:

$$P(\Theta \cap Y) = P(\Theta | Y)P(Y) = P(Y|\Theta)P(\Theta) \tag{4}$$

where Pð Þ ΘjY is the conditional probability of Θ given Y. In the case of estimating the model parameter using measured data, the conditional probability of Θ when the probability of measured data Y is available can be written as


Table 1.

Measurement data (relative capacity) for the battery degradation example.

Figure 2. True degradation curve and measured data for the relative capacity.

$$P(\Theta|Y) = \frac{P(Y|\Theta)P(\Theta)}{P(Y)}\tag{5}$$

normalizing constant. Similar to Eq. (5), f <sup>Θ</sup>ð Þ θjY ¼ y is the posterior PDF of param-

The Bayesian inference can be extended to the case when many data are available. In general, it is possible that the posterior PDF can be obtained by applying all data simultaneously or by iteratively applying each data at a time. Although two approaches are theoretically equivalent, they end up numerically different methods. For example, the particle filter method uses a single measurement to update the posterior distribution, and the previous posterior distribution is used as a prior distribution for the following measurement. On the other hand, Bayesian method uses all measurement data together to build a single posterior distribution, which is used in this paper. Let us consider that <sup>y</sup> <sup>¼</sup> <sup>y</sup>1; <sup>y</sup>2; …; yNdata n o is the vector or <sup>N</sup>data

eter Θ that is updated from the prior PDF f <sup>Θ</sup>ð Þθ with the likelihood function f <sup>Y</sup>ð Þ yjΘ ¼ θ , which is the probability density value of measured data y given model parameter Θ ¼ θ. The process of updating the posterior distribution f <sup>Θ</sup>ð Þ θjY ¼ y of the model parameter using the measured data y is called Bayesian inference.

Prognostics 102: Efficient Bayesian-Based Prognostics Algorithm in MATLAB

DOI: http://dx.doi.org/10.5772/intechopen.82781

measurements. In such a case, the Bayes' theorem can be written as

K N Ydata i¼1

where K is the product of all marginal PDFs. However, it can be considered as a normalizing constant to make the integration of the posterior PDF to be one. It is noted that the total likelihood function is the product of the likelihood functions of individual data, which is then multiplied by the prior PDF followed by normaliza-

In contrast to the traditional least-squares method, the Bayes' theorem can estimate not only the best values of parameters but also the uncertainty structure of the identified parameters. Since these uncertainty structures are derived from that of the prior distribution and likelihood function, the uncertainty of the posterior distribution is directly related to that of the likelihood and the prior distribution. In the Bayesian method, it is assumed that the users know the prior distribution of model parameters and the distribution type of measurement noise. In this paper, it is assumed that the prior distribution is given as a uniform distribution with a lower- and upper-bound. It is also assumed that the measurement noise is a

However, users can change these assumptions easily. For example, the case when noise in data follows a lognormal distribution is considered in the crack growth problem in Section 4. In most cases, since the standard deviation of noise is unknown, it should be a part of unknown model parameters. In the case of the battery model, therefore, the vector of unknown model parameters is defined as Θ ¼ f g b; s . By assuming that the two model parameters are statistically indepen-

<sup>f</sup> <sup>Θ</sup>ð Þ¼ <sup>θ</sup> f bð Þ� f sð Þ, fbð Þ� <sup>U</sup>ð Þ <sup>0</sup>; <sup>0</sup>:<sup>05</sup> ,fsðÞ� <sup>U</sup> <sup>10</sup>�<sup>5</sup>

Once the prior distribution is determined, it is necessary to build the likelihood function using the measurement data and to yield the posterior distribution shown in Eq. (8). The meaning of the likelihood function is the PDF value of obtaining the measured data yk for given model parameters θ ¼ f g b; s . Since the measured data are fixed, the likelihood function is a function of model parameters, which makes the likelihood function different from the PDF. If the model prediction is close to the measured data, then the likelihood is large, while the likelihood is small when the two values are significantly different. In order to build the likelihood, it is

dent, the prior joint PDF of the two parameters can be defined as

f <sup>Y</sup> yi

<sup>j</sup><sup>Θ</sup> <sup>¼</sup> <sup>θ</sup> � � � � <sup>f</sup> <sup>Θ</sup>ð Þ<sup>θ</sup> (8)

<sup>2</sup> ð Þ, where <sup>s</sup> is the standard deviation of noise.

; 0:1 � � (9)

<sup>f</sup> <sup>Θ</sup> <sup>θ</sup>j<sup>Y</sup> <sup>¼</sup> <sup>y</sup> � � <sup>¼</sup> <sup>1</sup>

tion to yield the posterior PDF.

Gaussian distribution; that is ε � N 0; s

11

where Pð Þ Θ is the prior probability of parameter Θ, which represents the preexisting knowledge on the parameter. Pð Þ ΘjY is the posterior probability of parameter Θ after updating the prior with measurement data Y. P Yð Þ jΘ is the likelihood function or the probability of obtaining data Y for a given parameter Θ. The measurement data affect the posterior probability through the likelihood function. The denominator, P(Y), is the marginal probability of Y and acts as a normalizing constant. The above equation can be used to improve the knowledge of P(Θ) when additional information P(Y) is available.

If the Bayes' theorem in Eq. (5) is going to be used for identifying unknown model parameters, it would be better to express the theorem in the form of a probability density function (PDF) [13], which is used in the present paper. Let f <sup>Θ</sup>ð Þθ be a PDF of model parameter Θ. When there are more than one model parameters, f <sup>Θ</sup>ð Þθ can be a joint PDF of multiple parameters. If the health monitoring measures a degradation feature Y, the measurement variability can be represented using PDF, f <sup>Y</sup>ð Þy . Then, the conditional PDFs between Θ and Y can be related to the joint PDF and the marginal PDF, f <sup>Θ</sup>ð Þθ and f <sup>Y</sup>ð Þy , as

$$f\_{\Theta Y}(\theta, y) = f\_{\Theta}(\theta | Y = y) \\ f\_Y(y) = f\_Y(y | \Theta = \theta) f\_{\Theta}(\theta) \tag{6}$$

It is obvious that the joint PDF can be written as f <sup>Θ</sup><sup>Y</sup>ð Þ¼ θ; y f <sup>Θ</sup>ð Þθ f <sup>Y</sup>ð Þy when Θ and Y are independent, and Bayesian inference cannot be used to improve the probability distribution of f <sup>Θ</sup>ð Þθ . Similar to Eqs. (5) and (6) can be used for obtaining the Bayesian inference in the form of PDF as [14].

$$f\_{\Theta}(\theta|Y=y) = \frac{f\_Y(y|\Theta=\theta)f\_{\Theta}(\theta)}{f\_Y(y)}\tag{7}$$

Since the denominator f <sup>Y</sup>ð Þy is a constant and since the integral of f <sup>Θ</sup>ð Þ θjY ¼ y is one from the property of PDF, the denominator in Eq. (7) can be considered as a

#### Prognostics 102: Efficient Bayesian-Based Prognostics Algorithm in MATLAB DOI: http://dx.doi.org/10.5772/intechopen.82781

normalizing constant. Similar to Eq. (5), f <sup>Θ</sup>ð Þ θjY ¼ y is the posterior PDF of parameter Θ that is updated from the prior PDF f <sup>Θ</sup>ð Þθ with the likelihood function f <sup>Y</sup>ð Þ yjΘ ¼ θ , which is the probability density value of measured data y given model parameter Θ ¼ θ. The process of updating the posterior distribution f <sup>Θ</sup>ð Þ θjY ¼ y of the model parameter using the measured data y is called Bayesian inference.

The Bayesian inference can be extended to the case when many data are available. In general, it is possible that the posterior PDF can be obtained by applying all data simultaneously or by iteratively applying each data at a time. Although two approaches are theoretically equivalent, they end up numerically different methods. For example, the particle filter method uses a single measurement to update the posterior distribution, and the previous posterior distribution is used as a prior distribution for the following measurement. On the other hand, Bayesian method uses all measurement data together to build a single posterior distribution, which is used in this paper. Let us consider that <sup>y</sup> <sup>¼</sup> <sup>y</sup>1; <sup>y</sup>2; …; yNdata n o is the vector or <sup>N</sup>data measurements. In such a case, the Bayes' theorem can be written as

$$f\_{\Theta}(\boldsymbol{\theta}|\boldsymbol{Y}=\mathbf{y}) = \frac{1}{K} \prod\_{i=1}^{N\_{\text{data}}} \left[ f\_{\boldsymbol{Y}}(\boldsymbol{y}\_{i}|\boldsymbol{\Theta}=\boldsymbol{\theta}) \right] f\_{\Theta}(\boldsymbol{\theta}) \tag{8}$$

where K is the product of all marginal PDFs. However, it can be considered as a normalizing constant to make the integration of the posterior PDF to be one. It is noted that the total likelihood function is the product of the likelihood functions of individual data, which is then multiplied by the prior PDF followed by normalization to yield the posterior PDF.

In contrast to the traditional least-squares method, the Bayes' theorem can estimate not only the best values of parameters but also the uncertainty structure of the identified parameters. Since these uncertainty structures are derived from that of the prior distribution and likelihood function, the uncertainty of the posterior distribution is directly related to that of the likelihood and the prior distribution.

In the Bayesian method, it is assumed that the users know the prior distribution of model parameters and the distribution type of measurement noise. In this paper, it is assumed that the prior distribution is given as a uniform distribution with a lower- and upper-bound. It is also assumed that the measurement noise is a Gaussian distribution; that is ε � N 0; s <sup>2</sup> ð Þ, where <sup>s</sup> is the standard deviation of noise. However, users can change these assumptions easily. For example, the case when noise in data follows a lognormal distribution is considered in the crack growth problem in Section 4. In most cases, since the standard deviation of noise is unknown, it should be a part of unknown model parameters. In the case of the battery model, therefore, the vector of unknown model parameters is defined as Θ ¼ f g b; s . By assuming that the two model parameters are statistically independent, the prior joint PDF of the two parameters can be defined as

$$f\_{\theta}(\theta) = f(b) \times f(\mathfrak{s}), \quad f(b) \sim U(0, 0.05), \; f(\mathfrak{s}) \sim U(10^{-5}, 0.1) \tag{9}$$

Once the prior distribution is determined, it is necessary to build the likelihood function using the measurement data and to yield the posterior distribution shown in Eq. (8). The meaning of the likelihood function is the PDF value of obtaining the measured data yk for given model parameters θ ¼ f g b; s . Since the measured data are fixed, the likelihood function is a function of model parameters, which makes the likelihood function different from the PDF. If the model prediction is close to the measured data, then the likelihood is large, while the likelihood is small when the two values are significantly different. In order to build the likelihood, it is

<sup>P</sup>ð Þ¼ <sup>Θ</sup>j<sup>Y</sup> P Yð Þ <sup>j</sup><sup>Θ</sup> <sup>P</sup>ð Þ <sup>Θ</sup>

If the Bayes' theorem in Eq. (5) is going to be used for identifying unknown model parameters, it would be better to express the theorem in the form of a probability density function (PDF) [13], which is used in the present paper. Let f <sup>Θ</sup>ð Þθ be a PDF of model parameter Θ. When there are more than one model parameters, f <sup>Θ</sup>ð Þθ can be a joint PDF of multiple parameters. If the health monitor-

represented using PDF, f <sup>Y</sup>ð Þy . Then, the conditional PDFs between Θ and Y can be

It is obvious that the joint PDF can be written as f <sup>Θ</sup><sup>Y</sup>ð Þ¼ θ; y f <sup>Θ</sup>ð Þθ f <sup>Y</sup>ð Þy when Θ

<sup>f</sup> <sup>Θ</sup>ð Þ¼ <sup>θ</sup>j<sup>Y</sup> <sup>¼</sup> <sup>y</sup> <sup>f</sup> <sup>Y</sup>ð Þ <sup>y</sup>j<sup>Θ</sup> <sup>¼</sup> <sup>θ</sup> <sup>f</sup> <sup>Θ</sup>ð Þ<sup>θ</sup>

Since the denominator f <sup>Y</sup>ð Þy is a constant and since the integral of f <sup>Θ</sup>ð Þ θjY ¼ y is one from the property of PDF, the denominator in Eq. (7) can be considered as a

and Y are independent, and Bayesian inference cannot be used to improve the probability distribution of f <sup>Θ</sup>ð Þθ . Similar to Eqs. (5) and (6) can be used for

f <sup>Θ</sup><sup>Y</sup>ð Þ¼ θ; y f <sup>Θ</sup>ð Þ θjY ¼ y f <sup>Y</sup>ð Þ¼ y f <sup>Y</sup>ð Þ yjΘ ¼ θ f <sup>Θ</sup>ð Þθ (6)

f <sup>Y</sup>ð Þy

(7)

ing measures a degradation feature Y, the measurement variability can be

related to the joint PDF and the marginal PDF, f <sup>Θ</sup>ð Þθ and f <sup>Y</sup>ð Þy , as

obtaining the Bayesian inference in the form of PDF as [14].

where Pð Þ Θ is the prior probability of parameter Θ, which represents the preexisting knowledge on the parameter. Pð Þ ΘjY is the posterior probability of parameter Θ after updating the prior with measurement data Y. P Yð Þ jΘ is the likelihood function or the probability of obtaining data Y for a given parameter Θ. The measurement data affect the posterior probability through the likelihood function. The denominator, P(Y), is the marginal probability of Y and acts as a normalizing constant. The above equation can be used to improve the knowledge of P(Θ)

when additional information P(Y) is available.

True degradation curve and measured data for the relative capacity.

Fault Detection, Diagnosis and Prognosis

Figure 2.

10

P Yð Þ (5)

necessary to measure degradations at different times. Since the measured degradation data yk; tk � �, k <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, Ndata are given at discrete times, the degradation model is also evaluated at the same discrete times as ~ykð Þ¼ b ~y tð Þ <sup>k</sup>; b , k ¼ 1, 2, …, Ndata. Since the times between the measurement and the model are synchronized, ~ykð Þ b is only a function of model parameter b. The measured data yk include the random noise that is governed by s, while the model prediction ~ykð Þ b depends on b. Then, the likelihood function of the k-th measured data can be defined as

$$\,\_{1}f\_{Y}(\boldsymbol{y}\_{k}|\boldsymbol{\theta}) = \frac{1}{s\sqrt{2\pi}} \exp\left[-\frac{1}{2s^{2}}\left(\boldsymbol{y}\_{k} - \boldsymbol{\tilde{y}}\_{k}(\boldsymbol{b})\right)^{2}\right], \quad k = 1, 2, \ldots, N\_{\text{data}}\tag{10}$$

As shown in Eq. (8), the likelihoods of multiple data can be multiplied to obtain the posterior distribution. With <sup>N</sup>data data, <sup>y</sup> <sup>¼</sup> <sup>y</sup>1; <sup>y</sup>2; …; yNdata n o, the posterior joint PDF can be calculated by multiplying all likelihood functions and the prior PDF as

$$f\_{\boldsymbol{\theta}}(\boldsymbol{\theta}|\mathbf{y}) = \frac{1}{K\boldsymbol{\delta}^{N\_{\rm data}}} \exp\left[-\frac{1}{2\sigma^{2}} \sum\_{k=1}^{N\_{\rm data}} \left(\boldsymbol{y}\_{k} - \boldsymbol{\tilde{y}}\_{k}(\boldsymbol{b})\right)^{2}\right] \boldsymbol{f}\_{\boldsymbol{\theta}}(\boldsymbol{\theta})\tag{11}$$

distribution centered at the current sample. In this paper, a uniformly distributed proposal distribution is used. Therefore, it is expected that the users provide the initial sample of parameters and the width of the proposal distribution. At i-th iteration, it is expected that the current sample θð Þ <sup>i</sup>�<sup>1</sup> is available, and the new candidate sample θ<sup>∗</sup> is drawn from the following proposal distribution that is uni-

Prognostics 102: Efficient Bayesian-Based Prognostics Algorithm in MATLAB

<sup>g</sup> <sup>θ</sup>∗; <sup>j</sup>θð Þ <sup>i</sup>�<sup>1</sup> � � � <sup>U</sup> <sup>θ</sup>ð Þ <sup>i</sup>�<sup>1</sup> � <sup>w</sup>; <sup>θ</sup>ð Þ <sup>i</sup>�<sup>1</sup> <sup>þ</sup> <sup>w</sup>

where w is the user-provided width of the proposal distribution. It is noted that the

<sup>Q</sup> <sup>θ</sup>ð Þ <sup>i</sup>�<sup>1</sup> ; <sup>θ</sup><sup>∗</sup> � � <sup>¼</sup> <sup>f</sup> <sup>Θ</sup> <sup>θ</sup><sup>∗</sup>j<sup>y</sup> � �

The acceptance ratio compares the posterior probability of the new candidate sample against that of the current sample. If the candidate sample has a higher probability than that of the current sample; i.e., Q θð Þ <sup>i</sup>�<sup>1</sup> ; θ<sup>∗</sup> � �>1, then it is always accepted as a new sample. When 0 < Q θð Þ <sup>i</sup>�<sup>1</sup> ; θ<sup>∗</sup> � � < 1; that is, the probability of the candidate sample is lower than that of the current sample, the acceptance is determined based on the ratio. A high acceptance ratio has a more probability to be accepted, while a low ratio is occasionally accepted. This can be achieved by generating a sample from a

u < Q θð Þ <sup>i</sup>�<sup>1</sup> ; θ<sup>∗</sup> � �; otherwise, it is rejected. Intuitively, this is why this algorithm works, and returns samples that follow the desired distribution <sup>f</sup> <sup>Θ</sup> <sup>θ</sup>j<sup>y</sup> � �. In Figure 3, two

uniform distribution, u � U½ � 0; 1 , and the candidate sample is accepted if

Once the candidate sample is generated, it is either accepted as a new sample or rejected based on an acceptance criterion. When accepted, the candidate sample is added to a new sample and used in the next iteration. When rejected, the candidate sample is discarded, and the current sample is reused in the next iteration. In the original MH algorithm, it is suggested to use a function that is proportional to the posterior distribution for the acceptance/rejection test. In this paper, however, the posterior distribution is directly used as its evaluation is not computationally expensive. Since the proposal distribution is symmetric, the following acceptance

proposal distribution is symmetric; that is <sup>g</sup> <sup>θ</sup>∗; <sup>j</sup>θð Þ <sup>i</sup>�<sup>1</sup> � � <sup>¼</sup> <sup>g</sup> <sup>θ</sup>ð Þ <sup>i</sup>�<sup>1</sup> ; <sup>j</sup>θ<sup>∗</sup> � �.

h i (12)

<sup>f</sup> <sup>Θ</sup> <sup>θ</sup>ð Þ <sup>i</sup>�<sup>1</sup> <sup>j</sup><sup>y</sup> � � (13)

formly distributed centered at θð Þ <sup>i</sup>�<sup>1</sup> :

Markov chain Monte Carlo sampling using random walk.

DOI: http://dx.doi.org/10.5772/intechopen.82781

Figure 3.

ratio can be defined:

13

where K is again a normalizing constant.

#### 2.3 Markov chain Monte Carlo sampling

Bayesian parameter estimation in Eq. (11) shows the functional expression of the posterior joint PDF of unknown model parameters. When the prior and posterior are conjugate, the posterior distribution can be expressed in the form of a standard probability distribution. In general cases, however, the posterior distribution can be expressed as a product of complex functions, such as the posterior PDF shown in Eq. (11).

The posterior PDF is then used to calculate the degradation trend and predict the RUL. For complex nonlinear models, it is difficult to propagate uncertainty in the parameters to the degradation model. Instead, samples of model parameters are generated from the posterior distribution, and the degradation model with the threshold in Eq. (2) is used to propagate these samples to calculate the samples of the end of life, and thus, the samples of the RUL in Eq. (3). Therefore, it is important to generate samples that follow the posterior distribution of parameters.

In general, the inverse cumulative distribution function (CDF) method is the easiest way of generating samples from a non-standard probability distribution, but it requires the functional expression of CDF, not PDF. For practical engineering applications, it is likely that the posterior PDF may be different from a standard probability distribution, or the posterior PDF is complicated due to the complex correlation structures between parameters. In such a case, sampling-based methods can be used to generate samples of parameters. There are many sampling methods, such as the grid approximation [15], rejection sampling [16], importance sampling [17], and the Markov Chain Monte Carlo (MCMC) method [18]. In this paper, the MCMC method using the Metropolis-Hastings (MH) algorithm is employed. MCMC is a simulation technique used to estimate quantities of interest by sampling consecutive random variables wherein the future state depends only on the current state [19].

The MCMC sampling method uses a Markov chain model in a random walk, where the distribution of the next sample depends only on the current sample (see Figure 3). As the algorithm generates more and more samples, the samples more closely approximate the posterior PDF. Specifically, Starting with an arbitrary initial sample (current sample), a new candidate sample is drawn from a proposal

Prognostics 102: Efficient Bayesian-Based Prognostics Algorithm in MATLAB DOI: http://dx.doi.org/10.5772/intechopen.82781

Figure 3. Markov chain Monte Carlo sampling using random walk.

necessary to measure degradations at different times. Since the measured degradation

<sup>2</sup>s<sup>2</sup> yk � <sup>~</sup>ykð Þ <sup>b</sup> � �<sup>2</sup> � �

As shown in Eq. (8), the likelihoods of multiple data can be multiplied to obtain

<sup>2</sup>s<sup>2</sup> <sup>∑</sup> Ndata k¼1

Bayesian parameter estimation in Eq. (11) shows the functional expression of the posterior joint PDF of unknown model parameters. When the prior and posterior are conjugate, the posterior distribution can be expressed in the form of a standard probability distribution. In general cases, however, the posterior distribution can be expressed as a product of complex functions, such as the posterior PDF shown in

The posterior PDF is then used to calculate the degradation trend and predict the RUL. For complex nonlinear models, it is difficult to propagate uncertainty in the parameters to the degradation model. Instead, samples of model parameters are generated from the posterior distribution, and the degradation model with the threshold in Eq. (2) is used to propagate these samples to calculate the samples of the end of life, and thus, the samples of the RUL in Eq. (3). Therefore, it is

important to generate samples that follow the posterior distribution of parameters. In general, the inverse cumulative distribution function (CDF) method is the easiest way of generating samples from a non-standard probability distribution, but it requires the functional expression of CDF, not PDF. For practical engineering applications, it is likely that the posterior PDF may be different from a standard probability distribution, or the posterior PDF is complicated due to the complex correlation structures between parameters. In such a case, sampling-based methods can be used to generate samples of parameters. There are many sampling methods, such as the grid approximation [15], rejection sampling [16], importance sampling [17], and the Markov Chain Monte Carlo (MCMC) method [18]. In this paper, the MCMC method using the Metropolis-Hastings (MH) algorithm is employed. MCMC is a simulation technique used to estimate quantities of interest by sampling consecutive random

variables wherein the future state depends only on the current state [19].

The MCMC sampling method uses a Markov chain model in a random walk, where the distribution of the next sample depends only on the current sample (see Figure 3). As the algorithm generates more and more samples, the samples more closely approximate the posterior PDF. Specifically, Starting with an arbitrary initial sample (current sample), a new candidate sample is drawn from a proposal

PDF can be calculated by multiplying all likelihood functions and the prior PDF as

, k ¼ 1, 2, …, Ndata (10)

, the posterior joint

f <sup>Θ</sup>ð Þθ (11)

n o

yk � <sup>~</sup>ykð Þ <sup>b</sup> � �<sup>2</sup>

� �

function of the k-th measured data can be defined as

<sup>2</sup><sup>π</sup> <sup>p</sup> exp � <sup>1</sup>

the posterior distribution. With Ndata data, y ¼ y1; y2; …; yNdata

Ks<sup>N</sup>data exp � <sup>1</sup>

s ffiffiffiffiffi

<sup>f</sup> <sup>Θ</sup> <sup>θ</sup>j<sup>y</sup> � � <sup>¼</sup> <sup>1</sup>

where K is again a normalizing constant.

2.3 Markov chain Monte Carlo sampling

<sup>f</sup> <sup>Y</sup> ykj<sup>θ</sup> � � <sup>¼</sup> <sup>1</sup>

Fault Detection, Diagnosis and Prognosis

� �, k <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, Ndata are given at discrete times, the degradation model is also evaluated at the same discrete times as ~ykð Þ¼ b ~y tð Þ <sup>k</sup>; b , k ¼ 1, 2, …, Ndata. Since the times between the measurement and the model are synchronized, ~ykð Þ b is only a function of model parameter b. The measured data yk include the random noise that is governed by s, while the model prediction ~ykð Þ b depends on b. Then, the likelihood

data yk; tk

Eq. (11).

12

distribution centered at the current sample. In this paper, a uniformly distributed proposal distribution is used. Therefore, it is expected that the users provide the initial sample of parameters and the width of the proposal distribution. At i-th iteration, it is expected that the current sample θð Þ <sup>i</sup>�<sup>1</sup> is available, and the new candidate sample θ<sup>∗</sup> is drawn from the following proposal distribution that is uniformly distributed centered at θð Þ <sup>i</sup>�<sup>1</sup> :

$$\lg\left(\theta^\*, |\theta^{(i-1)}\right) \sim U\left[\theta^{(i-1)} - \mathbf{w}, \theta^{(i-1)} + \mathbf{w}\right] \tag{12}$$

where w is the user-provided width of the proposal distribution. It is noted that the proposal distribution is symmetric; that is <sup>g</sup> <sup>θ</sup>∗; <sup>j</sup>θð Þ <sup>i</sup>�<sup>1</sup> � � <sup>¼</sup> <sup>g</sup> <sup>θ</sup>ð Þ <sup>i</sup>�<sup>1</sup> ; <sup>j</sup>θ<sup>∗</sup> � �.

Once the candidate sample is generated, it is either accepted as a new sample or rejected based on an acceptance criterion. When accepted, the candidate sample is added to a new sample and used in the next iteration. When rejected, the candidate sample is discarded, and the current sample is reused in the next iteration. In the original MH algorithm, it is suggested to use a function that is proportional to the posterior distribution for the acceptance/rejection test. In this paper, however, the posterior distribution is directly used as its evaluation is not computationally expensive. Since the proposal distribution is symmetric, the following acceptance ratio can be defined:

$$Q\left(\theta^{(i-1)}, \theta^\*\right) = \frac{f\_{\theta}\left(\theta^\*|\mathbf{y}\right)}{f\_{\theta}\left(\theta^{(i-1)}|\mathbf{y}\right)}\tag{13}$$

The acceptance ratio compares the posterior probability of the new candidate sample against that of the current sample. If the candidate sample has a higher probability than that of the current sample; i.e., Q θð Þ <sup>i</sup>�<sup>1</sup> ; θ<sup>∗</sup> � �>1, then it is always accepted as a new sample. When 0 < Q θð Þ <sup>i</sup>�<sup>1</sup> ; θ<sup>∗</sup> � � < 1; that is, the probability of the candidate sample is lower than that of the current sample, the acceptance is determined based on the ratio. A high acceptance ratio has a more probability to be accepted, while a low ratio is occasionally accepted. This can be achieved by generating a sample from a uniform distribution, u � U½ � 0; 1 , and the candidate sample is accepted if u < Q θð Þ <sup>i</sup>�<sup>1</sup> ; θ<sup>∗</sup> � �; otherwise, it is rejected. Intuitively, this is why this algorithm works, and returns samples that follow the desired distribution <sup>f</sup> <sup>Θ</sup> <sup>θ</sup>j<sup>y</sup> � �. In Figure 3, two


tCUR to estimate the posterior PDF of model parameters. Using the estimated model parameters, the degradations in the future times between tCUR and tend are calculated in the prediction stage. If two consecutive degradations cross the threshold;

Prognostics 102: Efficient Bayesian-Based Prognostics Algorithm in MATLAB

DOI: http://dx.doi.org/10.5772/intechopen.82781

then tEOL exists between tk and tkþ1, which can be found by a simple linear interpolation. When the set of futures times do not include the end of life, it can extrapolate based on the trend of data. It is also possible that the degradation model never reaches the threshold; that is, the system has an infinite of life. In such a case, the sample is deleted from the calculation. Once the samples of the end of life are

Once the samples of RUL are available, the confidence interval and/or the prediction interval is used to evaluate the accuracy or precision of the RUL. The confidence interval represents how good the RUL is. Therefore, the confidence interval of 95% means that the true RUL will be within this interval with the probability of 95%. That is, the confidence interval tells us about the likely location of the true RUL. In the case of RUL samples, the 95% confidence interval can be calculated by taking the lower 2.5 percentile and the upper 2.5 percentile from the samples. On the other hand, the prediction interval shows the possible location of the next sample. Knowing that the next sample will have additional randomness from the predicted RUL, the prediction interval is calculated by adding additional

randomness to the data. In practice, the RUL estimation is important for

Statistical distribution and the confidence interval of the remaining useful life.

scheduling maintenance. Therefore, only the lower confidence/prediction bound is of interest in the practical application. Figure 5 shows a representative result of prognostics, which shows the statistical distribution and the confidence interval of

obtained, the samples of RUL can be calculated using Eq. (3).

<sup>~</sup>ykð Þ� <sup>b</sup> <sup>y</sup>thresholdÞ � <sup>~</sup>ykþ1ð Þ� <sup>b</sup> <sup>y</sup>thresholdÞ<sup>≤</sup> <sup>0</sup> (15)

that is,

the RUL.

Figure 5.

15

#### Figure 4.

Metropolis-Hastings algorithm for generating samples from a posterior distribution.

dashed circles mean that these candidate samples are not selected according to the criterion. In such a case, the current sample is selected again. This process is repeated as many times as necessary until a sufficient number of samples are obtained. Figure 4 summarizes the MCMC sampling procedure using the MH algorithm.

The performance of MCMC sampling depends on the initial sample and the selection of the proposal distribution. A too-narrow proposal distribution can yield destabilization by not fully covering the posterior distribution, while a too-wide distribution can yield many duplications in sampling result by not accepting new samples. In addition, if the initial sample is located far away from the posterior distribution, many iterations (samples) will be required to converge to the posterior distribution. To prevent the effect of inaccurate initial samples, an initial portion of the samples can be discarded in estimating the posterior distribution, which is called the burn-in. In this paper, the first 20 percent of the samples are discarded as a burn-in.

#### 2.4 Prognostics

Once the samples of parameters are obtained based on the posterior distribution, the future damage state and the RUL can be predicted by substituting the samples of parameters in the degradation model and estimating the RUL using Eqs. (2) and (3). In general, since the degradation model is a nonlinear implicit function of time, solving for t ð Þi EOL with a given sample <sup>b</sup>ð Þ<sup>i</sup> in Eq. (2) may require an iterative process. Instead, in this paper, the degradation model is evaluated at a set of discrete future times, and then, the end of life is calculated using a simple interpolation. More specifically, let the set of discrete times is defined as

$$\text{time} = \begin{bmatrix} t\_0 \ t\_1 \ \cdots \ t\_{\text{CUR}} \ \cdots \ t\_{\text{end}} \end{bmatrix} \tag{14}$$

Then, the measurement data are available between t<sup>0</sup> and tCUR. Bayesian parameter estimation in the previous section uses the measurement data between t<sup>0</sup> and

Prognostics 102: Efficient Bayesian-Based Prognostics Algorithm in MATLAB DOI: http://dx.doi.org/10.5772/intechopen.82781

tCUR to estimate the posterior PDF of model parameters. Using the estimated model parameters, the degradations in the future times between tCUR and tend are calculated in the prediction stage. If two consecutive degradations cross the threshold; that is,

$$\left(\ddot{\boldsymbol{y}}\_{k}(\boldsymbol{b}) - \boldsymbol{y}\_{\text{threshold}}\right) \cdot \left(\ddot{\boldsymbol{y}}\_{k+1}(\boldsymbol{b}) - \boldsymbol{y}\_{\text{threshold}}\right) \leq \mathbf{0} \tag{15}$$

then tEOL exists between tk and tkþ1, which can be found by a simple linear interpolation. When the set of futures times do not include the end of life, it can extrapolate based on the trend of data. It is also possible that the degradation model never reaches the threshold; that is, the system has an infinite of life. In such a case, the sample is deleted from the calculation. Once the samples of the end of life are obtained, the samples of RUL can be calculated using Eq. (3).

Once the samples of RUL are available, the confidence interval and/or the prediction interval is used to evaluate the accuracy or precision of the RUL. The confidence interval represents how good the RUL is. Therefore, the confidence interval of 95% means that the true RUL will be within this interval with the probability of 95%. That is, the confidence interval tells us about the likely location of the true RUL. In the case of RUL samples, the 95% confidence interval can be calculated by taking the lower 2.5 percentile and the upper 2.5 percentile from the samples. On the other hand, the prediction interval shows the possible location of the next sample. Knowing that the next sample will have additional randomness from the predicted RUL, the prediction interval is calculated by adding additional randomness to the data. In practice, the RUL estimation is important for scheduling maintenance. Therefore, only the lower confidence/prediction bound is of interest in the practical application. Figure 5 shows a representative result of prognostics, which shows the statistical distribution and the confidence interval of the RUL.

Figure 5. Statistical distribution and the confidence interval of the remaining useful life.

dashed circles mean that these candidate samples are not selected according to the criterion. In such a case, the current sample is selected again. This process is repeated as many times as necessary until a sufficient number of samples are obtained. Figure 4

The performance of MCMC sampling depends on the initial sample and the selection of the proposal distribution. A too-narrow proposal distribution can yield destabilization by not fully covering the posterior distribution, while a too-wide distribution can yield many duplications in sampling result by not accepting new samples. In addition, if the initial sample is located far away from the posterior distribution, many iterations (samples) will be required to converge to the posterior distribution. To prevent the effect of inaccurate initial samples, an initial portion of the samples can be discarded in estimating the posterior distribution, which is called the burn-in. In this paper, the first 20 percent of the samples are discarded as

Once the samples of parameters are obtained based on the posterior distribution, the future damage state and the RUL can be predicted by substituting the samples of parameters in the degradation model and estimating the RUL using Eqs. (2) and (3). In general, since the degradation model is a nonlinear implicit function of time,

Instead, in this paper, the degradation model is evaluated at a set of discrete future times, and then, the end of life is calculated using a simple interpolation. More

Then, the measurement data are available between t<sup>0</sup> and tCUR. Bayesian parameter estimation in the previous section uses the measurement data between t<sup>0</sup> and

EOL with a given sample <sup>b</sup>ð Þ<sup>i</sup> in Eq. (2) may require an iterative process.

time ¼ t<sup>0</sup> t<sup>1</sup> ⋯ tCUR ⋯ t ½ � end (14)

summarizes the MCMC sampling procedure using the MH algorithm.

Metropolis-Hastings algorithm for generating samples from a posterior distribution.

a burn-in.

Figure 4.

Fault Detection, Diagnosis and Prognosis

2.4 Prognostics

solving for t

14

ð Þi

specifically, let the set of discrete times is defined as

### 3. MATLAB implementation

In this section, MATLAB implementation of prognostics using the Bayesian method is discussed. In the following explanation, 'line' or 'lines' in a parenthesis indicated the number of the line of the code in Appendix. The code is divided into three parts: (1) Problem Definition (lines 2–15, 65–67) (2) Bayesian parameter estimation and MCMC sampling (lines 16–29, 60–76) (3) Post-processing for displaying results (lines 40–57). Only the Problem Definition part needs to be changed for different applications. Detailed explanations are given in the subsequent sections with an example of battery degradation.

Eq. (9). When a uniform prior distribution is used, each row contains the lower-

Since the RUL is represented by Ns samples, the confidence interval or the prediction interval is often used to support the decision-making process. signiLevel (line 14) is the significance level of this interval in percentage. When signiLevel = 5, the code will return the lower 5 percentile, median, and the upper 5 percentile.

5prct: 18.7182, median: 20.381, 95prct: 22.1576

For the model definition, the degradation model ~y tð Þ ; b in Eq. (1) is defined in lines 65–67. In this equation, t is the time of measurement, and b is the model parameter as defined in line 9. The model equation needs to be defined in such a way that component-by-component operations are possible. This is because the

In the Bayesian parameter estimation process, the posterior distribution is expressed in terms of the product of the prior distribution and the likelihoods of all measured data. In the MATLAB code BM.m, the degradation model and the posterior distribution are calculated in the function BMappl (lines 60–76). First, the parameter samples in param are assigned to the variables using the eval function

This is why the ParamName in line 9 must have the same name with the actual variable. Then, these parameters are used to calculate the degradation model with

If measurement data (measuData) is empty, then BMappl only calculates the degradation model with given parameter samples at a given time t. This corresponds to propagating the degradation model using parameter samples to the future time for prognostics. If measurement data are provided (lines 71–74), then BMappl calculates the values of the posterior distribution at the parameter samples. In this case, the time should be given as an array of Ndata components from the start time to the current time. The posterior distribution in Eq. (11) (line 74) is the multiplication of the prior distribution (line 71) in Eq. (9) with the likelihood of all measured data (lines 72–73) in Eq. (10). The calculated posterior distribution from BMappl

MCMC sampling using the MH algorithm starts with the initial sample that is provided by the users (line 19) and the value of the posterior PDF (line 20). In the

generated first, and then, nBurn = 1250 samples (line 30) are discarded.

Prognostics 102: Efficient Bayesian-Based Prognostics Algorithm in MATLAB
