1. Introduction

Current technological advancements allow data to be collected from a number of different sources. The availability of abundant data collected from different sensors is beneficial, as they can be utilized in order to observe trends between and within different measured process variables. This allows process models to be developed in order to help identify if different processes or applications are behaving as expected [1]. Additionally, with industrial growth present in many developing countries, efficient process monitoring is essential for newer and more complex processes. Monitoring of these processes is required in order to ensure process safety, maintain product quality, increase economic benefits, and also to ensure that the process adheres to strict environmental regulation standards [2].

Statistical process monitoring methods can be classified into three broad categories: quantitative model based methods, qualitative model based methods, and process history based methods [3–5]. Quantitative model based methods require detailed knowledge of a process in order to construct a model that can be used for monitoring, for example, Kalman filters [3], while qualitative model based methods require the presence of process engineering experts in order to develop monitoring procedures or tasks, for example, fault trees [4]. In the absence of these two requirements, and due to the complexity of many processes that require monitoring, data-based techniques are often commonly used by the industry for various applications from drug design, to drinking water treatment [5–7].

moving window interval aggregation on the fault detection performance of

Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring…

The rest of this chapter will be organized as follows. In Section 2, a more detailed introduction to PCA is provided along with a quick overview of the fault detection statistics used to examine the fault detection performance of the methods discussed in this paper. Section 3 will introduce hypothesis testing methods and the different GLR charts. In Section 4, the moving window interval aggregation method is explained, as well as its integration with PCA and GLR for the purposes of fault detection. Section 5 then presents illustrative examples using simulated synthetic data and TEP using a PCA-based GLR technique, used to demonstrate the effect that using GLR and interval data has on the fault detection performance.

Principal component analysis (PCA) is a linear dimensionality reduction tool used to reduce the number of variables in a dataset, whilst retaining most of the data's variability. PCA finds a new set of variables, called principal components, using a linear combination of the dataset's original cross-correlated variables [9].

Given a n � p classical training dataset X, where n is the number of sample rows

2. Find the column eigenvectors matrix P and the diagonal eigenvalues matrix Λ of R. Each eigenvector defines the linear combination coefficients used to find the principal components from the original variables, and each eigenvalue represents the amount of variance that its respective principal component

3. Retain l principal components that cover the minimum desired variability in

C^ is used to find the projection of the dataset onto the PCA model, and C~ is used

The training dataset X defines the system under normal or optimal operating conditions, where there are no faults and the noise is minimal. Consequently, X is used to find the PCA model, defined using C^ and C~ transformation matrices. The testing dataset S defines the system under unknown operating conditions, and it

to find the amount of deviation of the dataset from its projection onto the PCA model, also known as the matrix of residuals. For more comprehensive details,

4.Find the predictive transformation matrix, <sup>C</sup>^ <sup>¼</sup> <sup>P</sup>^P^<sup>T</sup>.

5. Find the residual transformation matrix, <sup>C</sup><sup>~</sup> <sup>¼</sup> <sup>1</sup> � <sup>C</sup>^.

and p is the number of variable columns, the PCA model is found as follows:

PCA and GLR.

Conclusions are then presented in Section 6.

DOI: http://dx.doi.org/10.5772/intechopen.88217

2. Principal component analysis (PCA)

The algorithm for PCA is summarized below.

1. Find the correlation matrix R of X.

covers in the dataset.

please refer to [9, 19, 20].

107

the dataset, denoted as P^.

2.1 PCA algorithm

Principal component analysis (PCA) is a powerful, linear data analysis technique widely used in research and industrial applications [8], for fault detection and isolation, data modeling and reconstruction, feature extraction, and noise filtration. PCA is useful for the extraction of dominant underlying information from a dataset, without any previous knowledge of the model. An example of the practical application of PCA has been discussed in [8], where data gathered from parallel sensors are used to quantify the quality of a given food sample. PCA is used to reduce the dimensionality of a dataset, whilst filtering out variability caused by noise [9]. The PCA model has been utilized in order to monitor a wide variety of processes, and has seen many extensions [10–13]. Two main fault detection statistics are typically utilized with a PCA model: Hotelling's T<sup>2</sup> statistic, and the Q statistic [10]. Variations captured by the principal component space are monitored using the T<sup>2</sup> statistic, while variations in the residual space are monitored using the Q statistic [14].

On the other hand, statistical hypothesis testing methods function by using statistical techniques in order to determine if observations collected from a given process follow the null hypothesis, that is, operating under normal operating conditions, or alternate hypothesis, that is, operating under abhorrent or faulty operating conditions [15]. These faults can be of different types, such as shifts in the mean, variance, or both. The generalized likelihood ratio (GLR) technique has received a lot of attention in process monitoring literature [10, 11, 13, 16]. The GLR method aims to maximize the detection rate for a fixed false alarm rate [15]. Therefore, an objective of this work is to provide a comparative review of the different GLR charts by utilizing examples such as the benchmark Tennessee Eastman Process (TEP) [17].

Data utilized in the construction of a PCA model may be of two types depending on the application being monitored: single-valued, and interval-valued. Singlevalued data can be directly obtained from sensors measuring particular variables in a process, while interval-valued data is aggregated or artificially generated from batch single-valued measurements, thereby resulting in a range of possible measurement values for a given process variable at one time instant. The use of interval data in fault detection was originally introduced in order to reduce large datasets to a more manageable size [18], without compromising the integrity of the dataset. In addition, the use of interval data is beneficial because of its inherent ability to deal with missing values in samples, which may happen due to malfunctioning sensors or varying sampling frequencies between variables [19].

However, in cases where reducing the dataset may not be a viable option, due to a relatively limited sample size or sampling frequency, the use of interval data can be applied using a moving window aggregation method. This is also true of applications where batch process monitoring is not a viable option, thereby necessitating the need for real-time online monitoring of samples. The benchmark TEP example will be used once more in order to analyze the benefit of using

Fault Detection of Single and Interval Valued Data Using Statistical Process Monitoring… DOI: http://dx.doi.org/10.5772/intechopen.88217

moving window interval aggregation on the fault detection performance of PCA and GLR.

The rest of this chapter will be organized as follows. In Section 2, a more detailed introduction to PCA is provided along with a quick overview of the fault detection statistics used to examine the fault detection performance of the methods discussed in this paper. Section 3 will introduce hypothesis testing methods and the different GLR charts. In Section 4, the moving window interval aggregation method is explained, as well as its integration with PCA and GLR for the purposes of fault detection. Section 5 then presents illustrative examples using simulated synthetic data and TEP using a PCA-based GLR technique, used to demonstrate the effect that using GLR and interval data has on the fault detection performance. Conclusions are then presented in Section 6.
