2. Principal component analysis (PCA)

Principal component analysis (PCA) is a linear dimensionality reduction tool used to reduce the number of variables in a dataset, whilst retaining most of the data's variability. PCA finds a new set of variables, called principal components, using a linear combination of the dataset's original cross-correlated variables [9]. The algorithm for PCA is summarized below.

#### 2.1 PCA algorithm

Statistical process monitoring methods can be classified into three broad catego-

Principal component analysis (PCA) is a powerful, linear data analysis technique

Data utilized in the construction of a PCA model may be of two types depending

However, in cases where reducing the dataset may not be a viable option, due to a relatively limited sample size or sampling frequency, the use of interval data can be applied using a moving window aggregation method. This is also true of applications where batch process monitoring is not a viable option, thereby necessitating the need for real-time online monitoring of samples. The benchmark TEP example will be used once more in order to analyze the benefit of using

on the application being monitored: single-valued, and interval-valued. Singlevalued data can be directly obtained from sensors measuring particular variables in a process, while interval-valued data is aggregated or artificially generated from batch single-valued measurements, thereby resulting in a range of possible measurement values for a given process variable at one time instant. The use of interval data in fault detection was originally introduced in order to reduce large datasets to a more manageable size [18], without compromising the integrity of the dataset. In addition, the use of interval data is beneficial because of its inherent ability to deal with missing values in samples, which may happen due to malfunctioning

sensors or varying sampling frequencies between variables [19].

widely used in research and industrial applications [8], for fault detection and isolation, data modeling and reconstruction, feature extraction, and noise filtration. PCA is useful for the extraction of dominant underlying information from a dataset, without any previous knowledge of the model. An example of the practical application of PCA has been discussed in [8], where data gathered from parallel sensors are used to quantify the quality of a given food sample. PCA is used to reduce the dimensionality of a dataset, whilst filtering out variability caused by noise [9]. The PCA model has been utilized in order to monitor a wide variety of processes, and has seen many extensions [10–13]. Two main fault detection statistics are typically utilized with a PCA model: Hotelling's T<sup>2</sup> statistic, and the Q statistic [10]. Variations captured by the principal component space are monitored using the T<sup>2</sup> statistic, while variations in the residual space are monitored using the Q statistic [14]. On the other hand, statistical hypothesis testing methods function by using statistical techniques in order to determine if observations collected from a given process follow the null hypothesis, that is, operating under normal operating conditions, or alternate hypothesis, that is, operating under abhorrent or faulty operating conditions [15]. These faults can be of different types, such as shifts in the mean, variance, or both. The generalized likelihood ratio (GLR) technique has received a lot of attention in process monitoring literature [10, 11, 13, 16]. The GLR method aims to maximize the detection rate for a fixed false alarm rate [15]. Therefore, an objective of this work is to provide a comparative review of the different GLR charts by utilizing examples such as the benchmark Tennessee Eastman Process

ries: quantitative model based methods, qualitative model based methods, and process history based methods [3–5]. Quantitative model based methods require detailed knowledge of a process in order to construct a model that can be used for monitoring, for example, Kalman filters [3], while qualitative model based methods require the presence of process engineering experts in order to develop monitoring procedures or tasks, for example, fault trees [4]. In the absence of these two requirements, and due to the complexity of many processes that require monitoring, data-based techniques are often commonly used by the industry for various

applications from drug design, to drinking water treatment [5–7].

Fault Detection, Diagnosis and Prognosis

(TEP) [17].

106

Given a n � p classical training dataset X, where n is the number of sample rows and p is the number of variable columns, the PCA model is found as follows:


C^ is used to find the projection of the dataset onto the PCA model, and C~ is used to find the amount of deviation of the dataset from its projection onto the PCA model, also known as the matrix of residuals. For more comprehensive details, please refer to [9, 19, 20].

The training dataset X defines the system under normal or optimal operating conditions, where there are no faults and the noise is minimal. Consequently, X is used to find the PCA model, defined using C^ and C~ transformation matrices. The testing dataset S defines the system under unknown operating conditions, and it

is monitored for faults using its respective residuals ~ <sup>S</sup> <sup>¼</sup> <sup>S</sup> � <sup>C</sup><sup>~</sup> , as will be discussed later.

#### 2.2 Fault detection statistics

Knowing the optimal number of eigenvectors or principal components to retain, fault detection is then carried out by evaluating the PCA model's residuals using any detection statistic. This section will focus on briefly introducing the two most wellknown statistics in literature: The Q and T<sup>2</sup> statistics.

The Q-statistics of a <sup>n</sup> � <sup>p</sup> classical residual matrix <sup>X</sup><sup>~</sup> is defined as [11]:

$$Q\_{\mathbf{x}}[i] = \sum\_{j=1}^{p} \left(\tilde{X}\_{j}[i]\right)^{2} \tag{1}$$

called a composite hypothesis. An efficient point estimation method that utilizes the concept of maximum likelihood estimates (MLEs) is employed in order to estimate

Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring…

The univariate GLR chart uses the concept of maximum likelihood estimates in order to maximize the detection rate for a fixed false alarm rate. The GLR process is

1. The null and alternate hypotheses are defined, and their respective likelihood

2. Any unknown parameters in the alternate hypothesis are computed from the testing data using their MLEs, for example, the mean and/or variance.

3. The log likelihood ratio of the alternate to null hypotheses is then computed, and its maximum value is calculated, which maximizes the detection rate.

Univariate GLR charts can be designed based on the type of the fault that needs to be detected. Most processes experience shifts in the mean, and/or shifts in the

For the case when residuals are collected from processes under normal operating conditions, the likelihood function derived from a random normal distribution can

> 0 � ��k=<sup>2</sup>

If a shift in the mean has occurred at time τ, from μ<sup>0</sup> to μ1, the likelihood

Xτ i¼1

Since the magnitude of the new mean is unknown, its MLE can be computed

ð Þ k � τ

ð Þ k � τ 2σ<sup>2</sup> 0

The authors in [24] state that it is not necessary to store the entire length of previous historical data in order to compute the MLEs, but a window length

The GLR statistic designed to specifically monitor a shift in the mean can now be

X k

i¼τþ1

μ^1, <sup>τ</sup>,k � μ<sup>0</sup> � �<sup>2</sup>

exp � <sup>1</sup>

xi � <sup>μ</sup><sup>0</sup> ð Þ<sup>2</sup> <sup>þ</sup> <sup>X</sup>

k

! ! (4)

xi � <sup>μ</sup><sup>1</sup> ð Þ<sup>2</sup>

xi: (5)

: (6)

i¼τþ1

<sup>0</sup> mean and variance of the process variable measured under

2σ<sup>2</sup> 0 X k

i¼1

!

xi � <sup>μ</sup><sup>0</sup> ð Þ<sup>2</sup>

(3)

variance, and three of these GLR charts will be explained next.

function of the alternate hypothesis is defined as follows [24]:

exp � <sup>1</sup>

2σ<sup>2</sup> 0

<sup>μ</sup>^1, <sup>τ</sup>,k <sup>¼</sup> <sup>1</sup>

computed by taking the log-likelihood ratio of (Eqs. (3) and (4)) [24]:

Rk ¼ max 0 ≤τ < k

the required parameters.

functions are derived.

be defined as follows [24]:

<sup>0</sup>jx1; x2; …; xk � � <sup>¼</sup> ð Þ <sup>2</sup><sup>π</sup> �k=<sup>2</sup> <sup>σ</sup><sup>2</sup>

normal operating conditions respectively.

<sup>0</sup>jx1; x2; …; xk � �

> 0 � ��k=<sup>2</sup>

<sup>¼</sup> ð Þ <sup>2</sup><sup>π</sup> �k=<sup>2</sup> <sup>σ</sup><sup>2</sup>

using testing data as follow [24]:

3.1.1 Univariate GLR chart for a shift in the mean

L ∞; μ0; σ<sup>2</sup>

L τ; μ1; σ<sup>2</sup>

109

where μ<sup>0</sup> and σ<sup>2</sup>

accomplished through the following steps [15]:

DOI: http://dx.doi.org/10.5772/intechopen.88217

Qx is used to find the Q-threshold value γ, which defines the maximum possible value for a testing data's Q-statistic, denoted as Qs, beyond which the sample will be declared as a fault [14, 19, 21]. The threshold is calculated using the empirical cumulative distribution function (CDF) of Qx, which is an estimate of the true CDF of its discrete values.

The fault detection performance is tabulated by comparing Qs with γ. If Qs½ �i >γ, then the ith sample is declared as faulty, otherwise it is normal. There are two metrics used for benchmarking each method: false alarm rate (FAR) and detection rate (DR).

FAR is the average percentage of samples that were wrongfully declared as faults. The detection rate is the average percentage of samples that were rightfully declared as faults. It is desirable to maximize DR, for a fixed FAR, in order to have a better fault detector.

Alternatively, the Hotelling T2 statistic, which measures variations in the principal component space can be used, is computed as follows [22]:

$$T^2 = \mathbf{x}^T \hat{P} \hat{\Lambda}^{-1} \hat{P}^T \mathbf{x},\tag{2}$$

where, <sup>Λ</sup>^ <sup>¼</sup> diagð Þ <sup>λ</sup>1; <sup>λ</sup>2; …; <sup>λ</sup><sup>l</sup> , is a diagonal matrix that contains the eigenvalues that are associated with the l retained principal components The threshold for the T<sup>2</sup> statistic can be computed either computational or empirically [22]. The Q statistic is often utilized by authors instead of the T<sup>2</sup> statistic as it better able to detect smaller faults [10, 11].

#### 3. Hypothesis testing methods

Hypothesis testing methods such as the generalized likelihood ratio (GLR), have received a lot of attention in recent literature [10, 13, 23]. Hypothesis testing methods utilize fundamental statistical theory in order to determine if given data conforms to a targeted distribution, that is, a null hypothesis, or deviates from this distribution, and follows an alternative distribution, that is, an alternate hypothesis [15]. In process monitoring terms, the parameters of the null and alternate hypotheses are defined using data from normal and abhorrent operating conditions, respectively [1].

#### 3.1 Generalized likelihood ratio

The generalized likelihood ratio (GLR) technique defines the alternate hypotheses by parameters that can assume an infinite number of values, and is therefore

Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring… DOI: http://dx.doi.org/10.5772/intechopen.88217

called a composite hypothesis. An efficient point estimation method that utilizes the concept of maximum likelihood estimates (MLEs) is employed in order to estimate the required parameters.

The univariate GLR chart uses the concept of maximum likelihood estimates in order to maximize the detection rate for a fixed false alarm rate. The GLR process is accomplished through the following steps [15]:


Univariate GLR charts can be designed based on the type of the fault that needs to be detected. Most processes experience shifts in the mean, and/or shifts in the variance, and three of these GLR charts will be explained next.

For the case when residuals are collected from processes under normal operating conditions, the likelihood function derived from a random normal distribution can be defined as follows [24]:

$$L\left(\infty,\mu\_0,\sigma\_0^2|\mathbf{x}\_1,\mathbf{x}\_2,\dots,\mathbf{x}\_k\right) = \left(2\pi\right)^{-k/2} \left(\sigma\_0^2\right)^{-k/2} \exp\left(-\frac{1}{2\sigma\_0^2} \sum\_{i=1}^k \left(\mathbf{x}\_i - \mu\_0\right)^2\right) \tag{3}$$

where μ<sup>0</sup> and σ<sup>2</sup> <sup>0</sup> mean and variance of the process variable measured under normal operating conditions respectively.

#### 3.1.1 Univariate GLR chart for a shift in the mean

If a shift in the mean has occurred at time τ, from μ<sup>0</sup> to μ1, the likelihood function of the alternate hypothesis is defined as follows [24]:

$$\begin{split} &L\left(\boldsymbol{\pi},\boldsymbol{\mu}\_{1},\sigma\_{0}^{2}|\mathbf{x}\_{1},\mathbf{x}\_{2},\ldots,\mathbf{x}\_{k}\right) \\ &= (2\pi)^{-k/2} \left(\sigma\_{0}^{2}\right)^{-k/2} \exp\left(-\frac{1}{2\sigma\_{0}^{2}} \left(\sum\_{i=1}^{\tau} (\mathbf{x}\_{i}-\boldsymbol{\mu}\_{0})^{2} + \sum\_{i=\tau+1}^{k} (\mathbf{x}\_{i}-\boldsymbol{\mu}\_{1})^{2}\right)\right) \end{split} \tag{4}$$

Since the magnitude of the new mean is unknown, its MLE can be computed using testing data as follow [24]:

$$
\hat{\mu}\_{1,\tau,k} = \frac{1}{(k-\tau)} \sum\_{i=\tau+1}^{k} \mathbf{x}\_i. \tag{5}
$$

The GLR statistic designed to specifically monitor a shift in the mean can now be computed by taking the log-likelihood ratio of (Eqs. (3) and (4)) [24]:

$$R\_k = \max\_{0 \le \tau < k} \frac{(k - \tau)}{2\sigma\_0^2} \left(\hat{\mu}\_{1,\tau,k} - \mu\_0\right)^2. \tag{6}$$

The authors in [24] state that it is not necessary to store the entire length of previous historical data in order to compute the MLEs, but a window length

is monitored for faults using its respective residuals ~

known statistics in literature: The Q and T<sup>2</sup> statistics.

Knowing the optimal number of eigenvectors or principal components to retain, fault detection is then carried out by evaluating the PCA model's residuals using any detection statistic. This section will focus on briefly introducing the two most well-

The Q-statistics of a <sup>n</sup> � <sup>p</sup> classical residual matrix <sup>X</sup><sup>~</sup> is defined as [11]:

p

X~j ½ �i � �<sup>2</sup>

j¼1

Qx is used to find the Q-threshold value γ, which defines the maximum possible value for a testing data's Q-statistic, denoted as Qs, beyond which the sample will be declared as a fault [14, 19, 21]. The threshold is calculated using the empirical cumulative distribution function (CDF) of Qx, which is an estimate of the true CDF

The fault detection performance is tabulated by comparing Qs with γ. If Qs½ �i >γ, then the ith sample is declared as faulty, otherwise it is normal. There are two metrics used for benchmarking each method: false alarm rate (FAR) and detection rate (DR). FAR is the average percentage of samples that were wrongfully declared as faults. The detection rate is the average percentage of samples that were rightfully declared as faults. It is desirable to maximize DR, for a fixed FAR, in order to have a

Alternatively, the Hotelling T2 statistic, which measures variations in the prin-

where, <sup>Λ</sup>^ <sup>¼</sup> diagð Þ <sup>λ</sup>1; <sup>λ</sup>2; …; <sup>λ</sup><sup>l</sup> , is a diagonal matrix that contains the eigenvalues that are associated with the l retained principal components The threshold for the T<sup>2</sup> statistic can be computed either computational or empirically [22]. The Q statistic is often utilized by authors instead of the T<sup>2</sup> statistic as it better able to detect

Hypothesis testing methods such as the generalized likelihood ratio (GLR), have

The generalized likelihood ratio (GLR) technique defines the alternate hypotheses by parameters that can assume an infinite number of values, and is therefore

received a lot of attention in recent literature [10, 13, 23]. Hypothesis testing methods utilize fundamental statistical theory in order to determine if given data conforms to a targeted distribution, that is, a null hypothesis, or deviates from this distribution, and follows an alternative distribution, that is, an alternate hypothesis [15]. In process monitoring terms, the parameters of the null and alternate hypotheses are defined using data from normal and abhorrent operating conditions,

P^T

x, (2)

<sup>T</sup><sup>2</sup> <sup>¼</sup> xTP^Λ^ �<sup>1</sup>

cipal component space can be used, is computed as follows [22]:

Qx½�¼<sup>i</sup> <sup>X</sup>

discussed later.

of its discrete values.

better fault detector.

smaller faults [10, 11].

respectively [1].

108

3. Hypothesis testing methods

3.1 Generalized likelihood ratio

2.2 Fault detection statistics

Fault Detection, Diagnosis and Prognosis

<sup>S</sup> <sup>¼</sup> <sup>S</sup> � <sup>C</sup><sup>~</sup> , as will be

(1)

of about 400 is sufficient to provide reliable results. Therefore, a window length of 400 was utilized throughout this work for all GLR charts.

#### 3.1.2 Univariate GLR chart for a shift in the variance

If only a shift in the variance has occurred from at time τ, from σ<sup>2</sup> <sup>0</sup> to σ<sup>2</sup> 1, the alternate hypothesis for this case is defined as follows [25]:

$$\begin{split} &L\left(\boldsymbol{\pi},\mu\_{0},\sigma\_{1}^{2}|\boldsymbol{\pi}\_{\mathrm{r}+1},\boldsymbol{\pi}\_{2},...,\boldsymbol{\pi}\_{k}\right) \\ &= (2\pi)^{-k/2} \left(\sigma\_{1}^{2}\right)^{-k/2} \exp\left(-\frac{1}{2\sigma\_{1}^{2}} \left(\sum\_{i=\mathrm{r}+1}^{k} \left(\boldsymbol{\pi}\_{i}-\boldsymbol{\mu}\_{0}\right)^{2}\right)\right). \end{split} \tag{7}$$

σ^2

S2

(Eqs. (3) and (10)) resulting in the following equation [26]:

k � τ 2

Rk ¼ max 0≤τ < k

DOI: http://dx.doi.org/10.5772/intechopen.88217

3.1.4 Multivariate GLR chart for a shift in the mean

<sup>0</sup>, <sup>τ</sup>,k <sup>¼</sup> <sup>1</sup>

follows [26]:

of shift is unknown.

statistic is defined as follows:

Rk <sup>¼</sup> max

Figure 1.

111

max 0ð Þ ; k � m ≤t < k

matrix under normal conditions [27].

PCA-based GLR fault detection algorithm.

3.2 Fault detection using PCA-based GLR

over sample window of maximum length m, and P

<sup>1</sup>, <sup>τ</sup>,k <sup>¼</sup> max <sup>σ</sup><sup>2</sup>

Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring…

k � τ

S2 0, τ,k σ2 0

If there are no shifts in the mean for testing data, the variance is computed as

X k

i¼τþ1

In this case, the GLR statistic designed to simultaneously monitor both shifts in the mean and variance, and can be computed by taking the log-likelihood ratio of

> � S2 τ,k σ^

It is important to note that for this particular GLR method, two parameters, that is, the mean and the variance have to be estimated using their MLE, since the type

Since using a univariate GLR chart may not always be practical, Wang and Reynolds [27] introduce the multivariate GLR chart, designed to specifically monitor shifts in the process mean for multivariate applications. In this case, the GLR

> <sup>2</sup> <sup>μ</sup>^1,t,k � <sup>μ</sup><sup>0</sup> � � �

Where μ<sup>0</sup> is the multivariate mean vector of the process under normal operating conditions, μ^1,t,k is the MLE of a sustained process mean shift μ<sup>1</sup> at time index k

The PCA method introduced in Section 2 is commonly utilized by many industries. Therefore, it is necessary to integrate the simplicity of the PCA method with the advantages brought forward by the GLR charts, so that it can be easily applied to

k � t

2

1, τ,k

" # !

� ln <sup>σ</sup>^<sup>2</sup>

X�<sup>1</sup>

� � � � (15)

<sup>0</sup> � <sup>μ</sup>^1,t,k � <sup>μ</sup><sup>0</sup>

<sup>0</sup> is the process covariance

1, τ,k σ2 0

<sup>0</sup>; S<sup>2</sup> τ,k

xi � <sup>μ</sup><sup>0</sup> ð Þ<sup>2</sup>

� �: (12)

: (13)

(14)

From a quality control standpoint we are only concerned with increases in variance, as larger variations imply that product is being manufactured with quality further away from the targeted amount, and since the magnitude of the new variance is unknown, its MLE can be computed using testing data as follows [25]:

$$\hat{\sigma}\_{1,\tau,k}^{2} = \max\left\{\sigma\_{0}^{2}, \frac{1}{k-\tau} \sum\_{i=\tau+1}^{k} \left(\boldsymbol{\omega}\_{i} - \boldsymbol{\mu}\_{0}\right)^{2}\right\}.\tag{8}$$

The GLR statistic designed to specifically monitor a shift in the variance can now be computed by taking the log-likelihood ratio of (Eqs. (3) and (7)) [25]:

$$R\_k = \max\_{0 \le \tau < k} \frac{k - \tau}{2} \left[ \frac{\hat{\sigma}\_{1,\tau,k}^2}{\sigma\_0^2} - 1 - \ln \left( \frac{\hat{\sigma}\_{1,\tau,k}^2}{\sigma\_0^2} \right) \right] \tag{9}$$

#### 3.1.3 Univariate GLR chart for a shift in the mean and/or variance

Since it is possible for most processes to experience both shifts in the mean and variance, a GLR statistic that is capable of detecting either type of shift can be designed. The likelihood function of the alternate hypothesis for this case is defined as follows [26]:

$$\begin{split} &L\left(\boldsymbol{\pi},\boldsymbol{\mu}\_{1},\sigma\_{1}^{2}|\boldsymbol{\pi}\_{1},\boldsymbol{\chi}\_{2},\ldots,\boldsymbol{\chi}\_{k}\right) \\ &= \left(2\boldsymbol{\pi}\right)^{-k/2} \left(\sigma\_{0}^{2}\right)^{-\tau/2} \left(\sigma\_{1}^{2}\right)^{-(k-\tau)/2} \exp\left(-\frac{1}{2\sigma\_{0}^{2}} \left(\sum\_{i=1}^{\tau} \boldsymbol{\chi}\_{i} - \boldsymbol{\mu}\_{0}\right)^{2} - \frac{1}{2\sigma\_{1}^{2}} \left(\sum\_{i=\tau+1}^{k} \boldsymbol{\chi}\_{i} - \boldsymbol{\mu}\_{1}\right)^{2}\right). \end{split} \tag{10}$$

The MLE of the mean can be computed from the testing data using (Eq. (5)). However, the variance now has to be computed utilizing the MLE for the mean as well [26]:

$$\mathbf{S}\_{\mathbf{r},k}^{2} = \frac{\mathbf{1}}{k - \tau} \sum\_{i=\tau+1}^{k} \left(\mathbf{x}\_{i} - \hat{\boldsymbol{\mu}}\_{\mathbf{1},\tau,k}\right)^{2}. \tag{11}$$

As previously stated, from a quality control standpoint only an increase in the variance is of concern, and the MLE for the variance can be computed as follows [26]: Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring… DOI: http://dx.doi.org/10.5772/intechopen.88217

$$
\hat{\sigma}^2\_{\mathbf{1},\mathbf{r},k} = \max \{ \sigma\_0^2, \mathbf{S}\_{\mathbf{r},k}^2 \}. \tag{12}
$$

If there are no shifts in the mean for testing data, the variance is computed as follows [26]:

$$\mathcal{S}\_{0,\tau,k}^{2} = \frac{1}{k-\tau} \sum\_{i=\tau+1}^{k} \left(\boldsymbol{\chi}\_{i} - \boldsymbol{\mu}\_{0}\right)^{2}.\tag{13}$$

In this case, the GLR statistic designed to simultaneously monitor both shifts in the mean and variance, and can be computed by taking the log-likelihood ratio of (Eqs. (3) and (10)) resulting in the following equation [26]:

$$R\_k = \max\_{0 \le \tau < k} \frac{k - \tau}{2} \left[ \frac{\mathbf{S}\_{0,\tau,k}^2}{\sigma\_0^2} - \frac{\mathbf{S}\_{\tau,k}^2}{\hat{\sigma}}\_{1,\tau,k} - \ln \left( \frac{\hat{\sigma}\_{1,\tau,k}^2}{\sigma\_0^2} \right) \right] \tag{14}$$

It is important to note that for this particular GLR method, two parameters, that is, the mean and the variance have to be estimated using their MLE, since the type of shift is unknown.

#### 3.1.4 Multivariate GLR chart for a shift in the mean

Since using a univariate GLR chart may not always be practical, Wang and Reynolds [27] introduce the multivariate GLR chart, designed to specifically monitor shifts in the process mean for multivariate applications. In this case, the GLR statistic is defined as follows:

$$R\_k = \max\_{\max(0, k - m) \le t < k} \left( \frac{k - t}{2} \left( \hat{\mu}\_{1, t, k} - \mu\_0 \right) \cdot \sum\_{0}^{-1} \cdot \left( \hat{\mu}\_{1, t, k} - \mu\_0 \right) \right) \tag{15}$$

Where μ<sup>0</sup> is the multivariate mean vector of the process under normal operating conditions, μ^1,t,k is the MLE of a sustained process mean shift μ<sup>1</sup> at time index k over sample window of maximum length m, and P <sup>0</sup> is the process covariance matrix under normal conditions [27].

#### 3.2 Fault detection using PCA-based GLR

The PCA method introduced in Section 2 is commonly utilized by many industries. Therefore, it is necessary to integrate the simplicity of the PCA method with the advantages brought forward by the GLR charts, so that it can be easily applied to

Figure 1. PCA-based GLR fault detection algorithm.

of about 400 is sufficient to provide reliable results. Therefore, a window length of

exp � <sup>1</sup>

From a quality control standpoint we are only concerned with increases in variance, as larger variations imply that product is being manufactured with quality further away from the targeted amount, and since the magnitude of the new variance is unknown, its MLE can be computed using testing data as follows [25]:

> <sup>0</sup>; <sup>1</sup> k � τ

> > σ^2 1, τ,k σ2 0

variance, a GLR statistic that is capable of detecting either type of shift can be designed. The likelihood function of the alternate hypothesis for this case is defined

exp � <sup>1</sup>

@

2σ<sup>2</sup> 0

The MLE of the mean can be computed from the testing data using (Eq. (5)). However, the variance now has to be computed utilizing the MLE for the mean as

> X k

i¼τþ1

As previously stated, from a quality control standpoint only an increase in the variance is of concern, and the MLE for the variance can be computed as follows [26]:

Xτ i¼1

xi � μ^1, <sup>τ</sup>,k � �<sup>2</sup>

xi � μ<sup>0</sup> !<sup>2</sup>

� 1 2σ<sup>2</sup> 1

!<sup>2</sup> 0

X k

xi � μ<sup>1</sup>

1 A:

(10)

i¼τþ1

: (11)

be computed by taking the log-likelihood ratio of (Eqs. (3) and (7)) [25]:

k � τ 2

3.1.3 Univariate GLR chart for a shift in the mean and/or variance

2σ<sup>2</sup> 1

> X k

( )

i¼τþ1

� <sup>1</sup> � ln <sup>σ</sup>^<sup>2</sup>

" # !

The GLR statistic designed to specifically monitor a shift in the variance can now

Since it is possible for most processes to experience both shifts in the mean and

X k

! !

xi � <sup>μ</sup><sup>0</sup> ð Þ<sup>2</sup>

1, τ,k σ2 0

xi � <sup>μ</sup><sup>0</sup> ð Þ<sup>2</sup>

i¼τþ1

<sup>0</sup> to σ<sup>2</sup>

:

: (8)

1, the

(7)

(9)

If only a shift in the variance has occurred from at time τ, from σ<sup>2</sup>

400 was utilized throughout this work for all GLR charts.

alternate hypothesis for this case is defined as follows [25]:

<sup>1</sup>jx<sup>τ</sup>þ1; x2; …; xk � �

> 1 � ��k=<sup>2</sup>

<sup>1</sup>, <sup>τ</sup>,k <sup>¼</sup> max <sup>σ</sup><sup>2</sup>

3.1.2 Univariate GLR chart for a shift in the variance

<sup>¼</sup> ð Þ <sup>2</sup><sup>π</sup> �k=<sup>2</sup> <sup>σ</sup><sup>2</sup>

σ^2

Rk ¼ max 0 ≤τ < k

as follows [26]:

<sup>¼</sup> ð Þ <sup>2</sup><sup>π</sup> �k=<sup>2</sup> <sup>σ</sup><sup>2</sup>

<sup>1</sup>jx1; x2; …; xk � �

> 0 � ��τ=<sup>2</sup> σ<sup>2</sup>

1 � ��ð Þ <sup>k</sup>�<sup>τ</sup> <sup>=</sup><sup>2</sup>

> S2 <sup>τ</sup>,k <sup>¼</sup> <sup>1</sup> k � τ

L τ; μ1; σ<sup>2</sup>

well [26]:

110

L τ; μ0; σ<sup>2</sup>

Fault Detection, Diagnosis and Prognosis

monitor processes online. Figure 1 illustrates the fault detection algorithm utilized in this work.

introduced, before discussing our proposed method of integrating the moving win-

Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring…

Centers IPCA (CIPCA) was introduced by Cazes et al. [31], where the idea was to only apply PCA to the matrix of interval centers. This method focuses on the variation between the intervals of a dataset, rather than the variations within them [18, 32]. Midpoint-Radii IPCA (MRIPCA) was developed by Lauro et al. [33–36], where PCA models are separately generated for the centers and radii matrices of the interval training dataset. Finally, the Symbolic Covariance IPCA (SCIPCA) method was introduced by Le-Rademacher et al. [18, 32] as a way to better represent the

In this paper, the integration of the moving window aggregation to PCA-based GLR will be as follows. After generating an interval sample for each single-valued sample, the single-valued matrices of interval centers and radii are extracted. The matrices are then concatenated along the variables dimension, so as to maintain the number of samples, but double the number of variables. This is similar to the MRIPCA method, except it avoids the need to apply PCA twice, eliminating any

This section evaluates the performance of the three PCA-based GLR charts described in Section 3, and the moving window aggregation method discussed in Section 4. The PCA-based GLR charts are evaluated under different fault scenarios, and this is done through two illustrative examples: a simulated synthetic data set, and the benchmark Tennessee Eastman Process (TEP). Three fault detection metrics are used to evaluate the performance of each univariate chart: missed DR (which is equal to 100-DR), FAR, and average out-of-control run length (ARL1). Finally, the moving window interval aggregation method, in tandem with the PCAbased multivariate GLR chart, are analyzed using the benchmark TEP process, and the results are tabulated and compared to the single-valued multivariate GLR chart.

The purpose of this example is to utilize a simple linear model to compare and evaluate the performance of the difference PCA-based univariate GLR charts. The

t1 t2 t3 3 7

<sup>5</sup> <sup>þ</sup> noise (16)

2 6 4

�0:3441 0:4815 0:6637 �0:2313 �0:5936 0:3545 �0:5060 0:2495 0:0739 �0:5552 �0:2405 �0:1123 �0:3371 0:3822 �0:6115 �0:3877 �0:3868 �0:2045

where, t1, t2, and t3, are uniformly distributed random variables with ranges, ½ � 0; 2 , 0½ � ; 1:6 , and 0½ � ; 1:2 , respectively, while the noise follows a normal distribution

The linear model is used to generate 6000 observations, split into training and testing data sets of 3000 observations each. The training data are used to train the PCA model, while the testing data are used to evaluate the performance of all

linear data set can be generated using the following model [37]:

dow interval approach to the PCA-based GLR technique.

range and variability found in interval data.

DOI: http://dx.doi.org/10.5772/intechopen.88217

additional processing complexity.

5.1 Simulated synthetic data example

x1 x2 x3 x4 x5 x6

¼

with zero-mean and standard deviation of 0.2 [37].

113

5. Illustrative examples

PCA is utilized in order to model available data. The different GLR charts can then be applied on the residuals produced by the PCA model in order to determine if the process is operating under normal or faulty conditions. The fault detection threshold limits are obtained from an empirical distribution of the GLR statistic computed under normal operating conditions. The residual space is typically better able at detecting faults of smaller magnitude [10].

#### 4. Moving window interval data aggregation

Data utilized in the construction of a PCA model may be of two types depending on the application being monitored: single-valued, and interval-valued. Singlevalued data can be directly obtained from sensors measuring particular variables in a process, while interval-valued data is aggregated or artificially generated from batch single-valued measurements, thereby resulting in a range of possible measurement values for a given process variable at one time instant [18].

An interval is defined using a lower and upper bound, such as [a, b], where a≤b. In this work, interval data is generated by aggregating the single-valued samples in a dataset, such that the mean of each block of aggregated samples is defined as the interval center (c), and the standard deviation of each block of aggregated samples is defined as the interval radii (r). Consequently, the intervals can now be defined as [c r, c + r]. Unlike the lower and upper bounds, the centers and radii are of particular importance because they can be used to represent unique characteristics of the classical samples from which they are generated [19].

Initially, the use of interval data is motivated by the need to quickly and efficiently monitor large datasets [28], in addition to its ability to deal with missing values without the need to remove entire samples. Generating intervals by aggregation is a form of batch processing, which may not always be ideal. The ability to monitor faults in real-time is typically much more desirable from a quality and safety standpoint. It also becomes impractical to use batch aggregation when discussing processes with a low sample size or low sampling frequency.

As a result, interval data aggregation must be adapted for real-time monitoring purposes. One way to do that would be to use a moving window aggregation technique, such that any observed sample is aggregated with previously gathered samples, if any, in the defined window size. This allows for the generation and processing of interval data in real-time, without the need to wait for multiple samples to be observed before processing.

As expected, however, this method suffers from some drawbacks relative to its batch aggregation counterpart. The moving window approach may cause smearing along the detection statistic, leading to higher false alarms and lower detection rates. This is especially true for large window sizes, as is the case for most methods which apply that approach. The problem can be mitigated by limiting the window size to reasonable limits, whilst also adjusting the threshold in order to meet the desired false alarm rates of the process.

#### 4.1 Integration with PCA-based GLR

Interval principal component analysis (IPCA) methods are an extension to the classical PCA method, and they have been explored in literature for fault detection and isolation examples [29, 30]. In this work, three IPCA methods will be briefly

Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring… DOI: http://dx.doi.org/10.5772/intechopen.88217

introduced, before discussing our proposed method of integrating the moving window interval approach to the PCA-based GLR technique.

Centers IPCA (CIPCA) was introduced by Cazes et al. [31], where the idea was to only apply PCA to the matrix of interval centers. This method focuses on the variation between the intervals of a dataset, rather than the variations within them [18, 32]. Midpoint-Radii IPCA (MRIPCA) was developed by Lauro et al. [33–36], where PCA models are separately generated for the centers and radii matrices of the interval training dataset. Finally, the Symbolic Covariance IPCA (SCIPCA) method was introduced by Le-Rademacher et al. [18, 32] as a way to better represent the range and variability found in interval data.

In this paper, the integration of the moving window aggregation to PCA-based GLR will be as follows. After generating an interval sample for each single-valued sample, the single-valued matrices of interval centers and radii are extracted. The matrices are then concatenated along the variables dimension, so as to maintain the number of samples, but double the number of variables. This is similar to the MRIPCA method, except it avoids the need to apply PCA twice, eliminating any additional processing complexity.

### 5. Illustrative examples

monitor processes online. Figure 1 illustrates the fault detection algorithm utilized

PCA is utilized in order to model available data. The different GLR charts can then be applied on the residuals produced by the PCA model in order to determine if the process is operating under normal or faulty conditions. The fault detection threshold limits are obtained from an empirical distribution of the GLR statistic computed under normal operating conditions. The residual space is typically better

Data utilized in the construction of a PCA model may be of two types depending

on the application being monitored: single-valued, and interval-valued. Singlevalued data can be directly obtained from sensors measuring particular variables in a process, while interval-valued data is aggregated or artificially generated from batch single-valued measurements, thereby resulting in a range of possible measurement values for a given process variable at one time instant [18].

An interval is defined using a lower and upper bound, such as [a, b], where a≤b. In this work, interval data is generated by aggregating the single-valued samples in a dataset, such that the mean of each block of aggregated samples is defined as the interval center (c), and the standard deviation of each block of aggregated samples is defined as the interval radii (r). Consequently, the intervals can now be defined as [c r, c + r]. Unlike the lower and upper bounds, the centers and radii are of particular importance because they can be used to represent unique

characteristics of the classical samples from which they are generated [19].

Initially, the use of interval data is motivated by the need to quickly and efficiently monitor large datasets [28], in addition to its ability to deal with missing values without the need to remove entire samples. Generating intervals by aggregation is a form of batch processing, which may not always be ideal. The ability to monitor faults in real-time is typically much more desirable from a quality and safety standpoint. It also becomes impractical to use batch aggregation when discussing processes with a low sample size or low sampling frequency.

As a result, interval data aggregation must be adapted for real-time monitoring

As expected, however, this method suffers from some drawbacks relative to its batch aggregation counterpart. The moving window approach may cause smearing along the detection statistic, leading to higher false alarms and lower detection rates. This is especially true for large window sizes, as is the case for most methods which apply that approach. The problem can be mitigated by limiting the window size to reasonable limits, whilst also adjusting the threshold in order to meet the desired

Interval principal component analysis (IPCA) methods are an extension to the classical PCA method, and they have been explored in literature for fault detection and isolation examples [29, 30]. In this work, three IPCA methods will be briefly

purposes. One way to do that would be to use a moving window aggregation technique, such that any observed sample is aggregated with previously gathered samples, if any, in the defined window size. This allows for the generation and processing of interval data in real-time, without the need to wait for multiple

samples to be observed before processing.

false alarm rates of the process.

112

4.1 Integration with PCA-based GLR

able at detecting faults of smaller magnitude [10].

4. Moving window interval data aggregation

in this work.

Fault Detection, Diagnosis and Prognosis

This section evaluates the performance of the three PCA-based GLR charts described in Section 3, and the moving window aggregation method discussed in Section 4. The PCA-based GLR charts are evaluated under different fault scenarios, and this is done through two illustrative examples: a simulated synthetic data set, and the benchmark Tennessee Eastman Process (TEP). Three fault detection metrics are used to evaluate the performance of each univariate chart: missed DR (which is equal to 100-DR), FAR, and average out-of-control run length (ARL1). Finally, the moving window interval aggregation method, in tandem with the PCAbased multivariate GLR chart, are analyzed using the benchmark TEP process, and the results are tabulated and compared to the single-valued multivariate GLR chart.

#### 5.1 Simulated synthetic data example

The purpose of this example is to utilize a simple linear model to compare and evaluate the performance of the difference PCA-based univariate GLR charts. The linear data set can be generated using the following model [37]:

$$
\begin{bmatrix} \mathbf{x}\_1 \\ \mathbf{x}\_2 \\ \mathbf{x}\_3 \\ \mathbf{x}\_4 \\ \mathbf{x}\_5 \\ \mathbf{x}\_6 \end{bmatrix} = \begin{bmatrix} -0.3441 & 0.4815 & 0.6637 \\ -0.2313 & -0.5936 & 0.3545 \\ -0.5060 & 0.2495 & 0.0739 \\ -0.5552 & -0.2405 & -0.1123 \\ -0.3371 & 0.3822 & -0.6115 \\ -0.3877 & -0.3868 & -0.2045 \end{bmatrix} \begin{bmatrix} t\_1 \\ t\_2 \\ t\_3 \end{bmatrix} + \text{noise} \tag{16}
$$

where, t1, t2, and t3, are uniformly distributed random variables with ranges, ½ � 0; 2 , 0½ � ; 1:6 , and 0½ � ; 1:2 , respectively, while the noise follows a normal distribution with zero-mean and standard deviation of 0.2 [37].

The linear model is used to generate 6000 observations, split into training and testing data sets of 3000 observations each. The training data are used to train the PCA model, while the testing data are used to evaluate the performance of all

techniques using three cases of faults: a shift in the mean, a shift in the variance, and a simultaneous shift in both.

The relatively high missed DR of the GLR chart designed to simultaneously monitor shifts in both the mean and variance (Figure 3c) can be attributed to the fact that two parameters need to be estimated from available data while maximizing the GLR statistic, thereby making it difficult to predict a shift in a single parameter

Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring…

For this case, an increase in the variance (double that of the training data) was introduced between observations 1501:3000 in x<sup>1</sup> in the testing data set. This shift

As can be seen through Figure 4, the T<sup>2</sup> and Q charts are unable to detect the entirety of the fault. In contrast, two GLR charts (Figure 5b and c) were able to detect most of the fault, while the GLR chart designed to monitor a shift in the mean (Figure 5a) could not detect it as well. Examining the summary of the results (Table 2), it can be observed that the GLR chart designed to monitor a shift in the variance (Figure 5b) provided the lowest missed DR and ARL1 values, compared to

For this case, a simultaneous shift in the mean of 1σ and an increase in the variance (double that of the training data) was introduced between observations

95.3 94.5 00.4 85.1 31.5

05.2 05.5 05.3 05.8 04.6

ARL1 20.1 16.6 04.8 81.8 05.0

PCA-based GLR (to monitor variance)

PCA-based GLR (to monitor mean and/or variance)

PCA-based GLR (to monitor mean)

in the variance is too small for detection by most conventional techniques.

as efficiently.

other charts.

Missed DR (%)

FAR (%)

Table 1.

Figure 4.

115

PCA-based T2 and Q charts (case 2).

5.1.2 Case 2: a shift in the variance

DOI: http://dx.doi.org/10.5772/intechopen.88217

5.1.3 Case 3: a shift in the mean and/or variance

1501:3000 in x<sup>1</sup> in the testing data set.

PCAbased Q

PCAbased T2

Summary of fault detection results (case 1).

Five charts are evaluated and compared: the PCA-based T2 and Q charts, and the three different PCA-based univariate GLR charts. The faulty region is highlighted in light blue for all figures, and the fault detection threshold limits for all charts are represented by the red dotted line. For each case a Monte-Carlo simulation of 1000 realizations is carried out in order to obtain meaningful results, so that conclusions can be drawn.

#### 5.1.1 Case 1: a shift in the mean

For this case, a shift in the mean of 1σ was introduced between observations 1501 and 3000 in x<sup>1</sup> in the testing data set. This fault size was chosen as most conventional techniques are unable to detect a fault of this magnitude. Faults of higher magnitude would likely provide misleading results and exaggerate the robustness of the method in question, leading to a biased comparison.

As can be seen through Figure 2, the T2 and Q charts are unable to detect the entirety of the fault. In contrast, two GLR charts (Figure 3a and c), are able to detect most of the fault, while the GLR chart designed to monitor a shift in the variance (Figure 3b) could not detect that a shift in the mean was present.

Examining the summary of the fault detection results (Table 1), it can be observed that the GLR chart designed to monitor shifts in the mean (Figure 3a) provided the lowest missed DR and ARL1 values, compared to all other charts.

Figure 2. PCA-based T2 and Q charts (case 1).

Figure 3. PCA-based GLR charts (case 1).

#### Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring… DOI: http://dx.doi.org/10.5772/intechopen.88217

The relatively high missed DR of the GLR chart designed to simultaneously monitor shifts in both the mean and variance (Figure 3c) can be attributed to the fact that two parameters need to be estimated from available data while maximizing the GLR statistic, thereby making it difficult to predict a shift in a single parameter as efficiently.

## 5.1.2 Case 2: a shift in the variance

techniques using three cases of faults: a shift in the mean, a shift in the variance, and

For this case, a shift in the mean of 1σ was introduced between observations 1501 and 3000 in x<sup>1</sup> in the testing data set. This fault size was chosen as most conventional techniques are unable to detect a fault of this magnitude. Faults of higher magnitude would likely provide misleading results and exaggerate the robustness of the method in question, leading to a biased comparison.

As can be seen through Figure 2, the T2 and Q charts are unable to detect the entirety of the fault. In contrast, two GLR charts (Figure 3a and c), are able to detect most of the fault, while the GLR chart designed to monitor a shift in the variance (Figure 3b) could not detect that a shift in the mean was present. Examining the summary of the fault detection results (Table 1), it can be observed that the GLR chart designed to monitor shifts in the mean (Figure 3a) provided the lowest missed DR and ARL1 values, compared to all other charts.

Five charts are evaluated and compared: the PCA-based T2 and Q charts, and the three different PCA-based univariate GLR charts. The faulty region is highlighted in light blue for all figures, and the fault detection threshold limits for all charts are represented by the red dotted line. For each case a Monte-Carlo simulation of 1000 realizations is carried out in order to obtain meaningful results, so that conclusions

a simultaneous shift in both.

Fault Detection, Diagnosis and Prognosis

5.1.1 Case 1: a shift in the mean

can be drawn.

Figure 2.

Figure 3.

114

PCA-based GLR charts (case 1).

PCA-based T2 and Q charts (case 1).

For this case, an increase in the variance (double that of the training data) was introduced between observations 1501:3000 in x<sup>1</sup> in the testing data set. This shift in the variance is too small for detection by most conventional techniques.

As can be seen through Figure 4, the T<sup>2</sup> and Q charts are unable to detect the entirety of the fault. In contrast, two GLR charts (Figure 5b and c) were able to detect most of the fault, while the GLR chart designed to monitor a shift in the mean (Figure 5a) could not detect it as well. Examining the summary of the results (Table 2), it can be observed that the GLR chart designed to monitor a shift in the variance (Figure 5b) provided the lowest missed DR and ARL1 values, compared to other charts.

#### 5.1.3 Case 3: a shift in the mean and/or variance

For this case, a simultaneous shift in the mean of 1σ and an increase in the variance (double that of the training data) was introduced between observations 1501:3000 in x<sup>1</sup> in the testing data set.


#### Table 1. Summary of fault detection results (case 1).

Figure 4. PCA-based T2 and Q charts (case 2).

As can be seen through Figure 6, the T2 and Q charts are unable to detect the entirety of the fault once more. Although it might seem that all three GLR charts (Figure 7) are able to detect most of the fault, upon closer inspection of the results summarized in Table 3, it can be observed that the GLR charts designed to independently detect a shift in the mean (Figure 7a), and variance (Figure 7b), are able to provide significantly lower missed DR and ARL1 values compared to the chart designed to monitors shifts in both (Figure 7c).

The main conclusion from this example is that if a process is expected to experience shifts in both the mean and/or variance, it is more beneficial to run the PCAbased GLR charts designed to independently monitor shifts in the mean and variance as two parallel charts, rather than utilizing the GLR chart designed to simultaneously monitor both. Based on this conclusion, only the former two GLR charts will be utilized for the next example.

Figure 5. PCA-based GLR charts (case 2).


5.2 Tennessee Eastman Process (TEP)

Summary of fault detection results (case 3).

PCAbased T<sup>2</sup>

PCA-based GLR charts (case 3).

Missed DR (%)

Figure 7.

FAR (%)

Table 3.

117

PCAbased Q

DOI: http://dx.doi.org/10.5772/intechopen.88217

PCA-based GLR (to monitor mean)

Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring…

PCA-based GLR (to monitor variance)

PCA-based GLR (to monitor mean and/or variance)

benchmark for fault detection [17].

highlighted in light blue in all figures.

In order to assess the feasibility of using two separate GLR charts to monitor shifts in the process mean and variance, their performance has to be evaluated using real data. Many authors utilize the Tennessee Eastman Process (TEP) in order to evaluate the performance of their techniques [17, 38, 39]. The Tennessee Eastman Process is a realistic simulation of an actual chemical process that consists of a reactor, condenser, stripper, compressor, and separator, and is widely accepted as a

86.7 84.5 00.4 00.4 24.2

05.2 05.2 04.9 05.3 05.5

ARL1 07.5 06.0 03.2 03.9 04.9

The Tennessee Eastman Process contains a bank of pre-defined faults that can be utilized by authors in order to assess the performance of their developed fault detection algorithms. More information on the Tennessee Eastman Process, the process description, and the available bank of faults is available in literature [10, 17, 21, 38, 39]. Two fault scenarios will be examined in this work: IDV 3 and IDV 11 [39]. IDV 3

For the case where there is a shift in the mean of the temperature of Feed D, the

PCA-based T2 and Q charts, and the PCA-based univariate GLR charts are

is a shift in the mean of the temperature of Feed D, while IDV 11 is random variation in the reactor cooling water inlet temperature [39]. These two fault scenarios were selected because the conventional techniques are unable to provide the best possible detection. For both scenarios, the fault is introduced after 800 observations of normal operation. The performance of four charts are evaluated: PCAbased T<sup>2</sup> and Q charts, and the PCA-based univariate GLR charts designed to independently monitor shifts in the mean and variance. The faulty region is

5.2.1 IDV 3: a step fault in the mean of the temperature of feed D

Table 2.

Summary of fault detection results (case 2).

Figure 6. PCA-based T2 and Q charts (case 3).

Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring… DOI: http://dx.doi.org/10.5772/intechopen.88217

Figure 7. PCA-based GLR charts (case 3).


Table 3.

As can be seen through Figure 6, the T2 and Q charts are unable to detect the entirety of the fault once more. Although it might seem that all three GLR charts (Figure 7) are able to detect most of the fault, upon closer inspection of the results summarized in Table 3, it can be observed that the GLR charts designed to independently detect a shift in the mean (Figure 7a), and variance (Figure 7b), are able to provide significantly lower missed DR and ARL1 values compared to the chart

The main conclusion from this example is that if a process is expected to experience shifts in both the mean and/or variance, it is more beneficial to run the PCAbased GLR charts designed to independently monitor shifts in the mean and variance as two parallel charts, rather than utilizing the GLR chart designed to simultaneously monitor both. Based on this conclusion, only the former two GLR charts

designed to monitors shifts in both (Figure 7c).

will be utilized for the next example.

Fault Detection, Diagnosis and Prognosis

PCAbased T<sup>2</sup>

PCA-based GLR charts (case 2).

Summary of fault detection results (case 2).

Missed DR (%)

Figure 5.

FAR (%)

Table 2.

Figure 6.

116

PCA-based T2 and Q charts (case 3).

PCAbased Q

PCA-based GLR (to monitor mean)

90.2 88.6 47.5 00.7 33.0

05.3 05.4 05.0 04.8 04.8

ARL1 10.1 8.3 07.9 04.5 05.6

PCA-based GLR (to monitor variance)

PCA-based GLR (to monitor mean and/or variance)

Summary of fault detection results (case 3).

### 5.2 Tennessee Eastman Process (TEP)

In order to assess the feasibility of using two separate GLR charts to monitor shifts in the process mean and variance, their performance has to be evaluated using real data. Many authors utilize the Tennessee Eastman Process (TEP) in order to evaluate the performance of their techniques [17, 38, 39]. The Tennessee Eastman Process is a realistic simulation of an actual chemical process that consists of a reactor, condenser, stripper, compressor, and separator, and is widely accepted as a benchmark for fault detection [17].

The Tennessee Eastman Process contains a bank of pre-defined faults that can be utilized by authors in order to assess the performance of their developed fault detection algorithms. More information on the Tennessee Eastman Process, the process description, and the available bank of faults is available in literature [10, 17, 21, 38, 39].

Two fault scenarios will be examined in this work: IDV 3 and IDV 11 [39]. IDV 3 is a shift in the mean of the temperature of Feed D, while IDV 11 is random variation in the reactor cooling water inlet temperature [39]. These two fault scenarios were selected because the conventional techniques are unable to provide the best possible detection. For both scenarios, the fault is introduced after 800 observations of normal operation. The performance of four charts are evaluated: PCAbased T<sup>2</sup> and Q charts, and the PCA-based univariate GLR charts designed to independently monitor shifts in the mean and variance. The faulty region is highlighted in light blue in all figures.

#### 5.2.1 IDV 3: a step fault in the mean of the temperature of feed D

For the case where there is a shift in the mean of the temperature of Feed D, the PCA-based T2 and Q charts, and the PCA-based univariate GLR charts are

illustrated in Figures 8 and 9 respectively, and the fault detection results are summarized in Table 4.

From Figure 8 it can be observed that the T<sup>2</sup> and Q charts are unable to detect the entirety of the fault, while the GLR chart designed to monitor shifts in the mean (Figure 9a) is able to detect the most of the fault, and provides the lowest missed DR (Table 4). Although, the T<sup>2</sup> chart returns a low ARL1 value, it does not detect the fault efficiently, and the low ARL1 value can be attributed to random noise.

5.2.2 IDV 11: random variation in the reactor cooling water inlet temperature

Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring…

most of the fault, they still have higher missed DR than both GLR charts

lowest missed DR from the charts that were compared.

DOI: http://dx.doi.org/10.5772/intechopen.88217

Figure 10.

Figure 11.

Table 5.

119

PCA-based GLR charts (IDV 11).

PCAbased T2

Summary of fault detection results (IDV 11).

PCAbased Q

Missed DR (%) 09.9 22.3 02.3 01.9 FAR (%) 05.1 05.0 05.0 05.4 ARL1 20.0 24.0 28.0 24.0

PCA-based GLR (to monitor mean)

PCA-based GLR (to monitor variance)

PCA-based T2 and Q charts (IDV 11).

For the case where there is random variation in the reactor cooling water inlet temperature, the T<sup>2</sup> and Q charts, and the GLR charts are illustrated in Figures 10 and 11 respectively, and the fault detection results are summarized in Table 5. Although it might seem like the T<sup>2</sup> and Q charts (Figure 10) are able to detect

(Figure 11). The GLR chart designed to monitor shifts in the variance provides the

From this example we can conclude that the PCA-based GLR charts are able to provide improved fault detection results over the conventional PCA-based

Figure 8. PCA-based T2 and Q charts (IDV 3).

Figure 9. PCA-based GLR charts (IDV 3).


#### Table 4.

Summary of fault detection results (IDV 3).

Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring… DOI: http://dx.doi.org/10.5772/intechopen.88217

#### 5.2.2 IDV 11: random variation in the reactor cooling water inlet temperature

For the case where there is random variation in the reactor cooling water inlet temperature, the T<sup>2</sup> and Q charts, and the GLR charts are illustrated in Figures 10 and 11 respectively, and the fault detection results are summarized in Table 5.

Although it might seem like the T<sup>2</sup> and Q charts (Figure 10) are able to detect most of the fault, they still have higher missed DR than both GLR charts (Figure 11). The GLR chart designed to monitor shifts in the variance provides the lowest missed DR from the charts that were compared.

From this example we can conclude that the PCA-based GLR charts are able to provide improved fault detection results over the conventional PCA-based

Figure 10. PCA-based T2 and Q charts (IDV 11).

illustrated in Figures 8 and 9 respectively, and the fault detection results are

From Figure 8 it can be observed that the T<sup>2</sup> and Q charts are unable to detect the entirety of the fault, while the GLR chart designed to monitor shifts in the mean (Figure 9a) is able to detect the most of the fault, and provides the lowest missed DR (Table 4). Although, the T<sup>2</sup> chart returns a low ARL1 value, it does not detect the fault efficiently, and the low ARL1 value can be attributed to random noise.

summarized in Table 4.

Fault Detection, Diagnosis and Prognosis

Figure 8.

Figure 9.

Table 4.

118

Missed DR (%)

PCA-based GLR charts (IDV 3).

PCAbased T<sup>2</sup>

Summary of fault detection results (IDV 3).

PCAbased Q PCA-based GLR (to monitor mean)

97.6 92.8 07.9 70.9

FAR (%) 04.8 04.5 05.0 05.4 ARL1 02.0 86.0 84.0 84.00

PCA-based GLR (to monitor variance)

PCA-based T2 and Q charts (IDV 3).
