**1. Introduction**

16 Will-be-set-by-IN-TECH

238 Principal Component Analysis

Wright, J., Peng, Y., Ma, Y., Ganesh, A. & Rao, S. (2009). Robust principal component

Yamazaki, M., Xu, G. & Chen, Y. (2006). Detection of moving objects by independent component analysis, *Asian Conference on Computer Vision (ACCV) 2006* pp. 467–478.

*Neural Information Processing Systems, NIPS 2009* .

analysis: Exact recovery of corrupted low-rank matrices by convex optimization,

Batch processes play an important role in the production and processing of low-volume, high-value products such as specialty polymers, pharmaceuticals and biochemicals. Generally, a batch process is a finite-duration process that involves charging of the batch vessel with specified recipe of materials; processing them under controlled conditions according to specified trajectories of process variables, and discharging the final product from the vessel.

Batch processes generally exhibit variations in the specified trajectories, errors in the charging of the recipe of materials, and disturbances arising from variations in impurities. If the problem not being detected and remedied on time, at least the quality of one batch or subsequent batches productions is poor under abnormal conditions during these batch operations. Prior to completion of the batch or before the production of subsequent batches, batch processes need effective strategy of real-time, on-line monitoring to be detected and diagnosed the faults and hidden troubles earlier and identified the causes of the problems for safety and quality.

Based on multivariable statistical analysis, several chemometric techniques have been proposed for online monitoring and fault detection in batch processes. Nomikos and MacGregor (1994, 1995) firstly developed a powerful approach known as multiway principal component analysis (MPCA) by extending the application of principal component analysis (PCA) to three-dimensional batch processes. By again projecting the information contained in the process-variable trajectories onto low-dimensional latent-variable space that summarizes both the variables and their time trajectories, the main idea of their approach is to compress the normal batch data and extract information from massive batch data. A batch process can be monitored by comparing with its time progression of the projections in the reduced space with those of normal batch data after having set up normal batch behaviour. Several studies have investigated the applications of MPCA (Chen & Wang, 2010; Jung-hui & Hsin-hung, Chen, 2006; Kosanovich et al., 1996; Kourti, 2003; Westerhuis et al., 1999).

Many of the variables monitored in one process are not independent in some cases, may be combination of independent variables not being measured directly. Independent component analysis (ICA) can extract the underlying factors or components from non-Gaussian

On-Line Monitoring of Batch Process with Multiway PCA/ICA 241

the new batch trajectory and historical trajectories, to fill the subsequent unknown portion of the new batch trajectory with the corresponding part of the history one with

Recently, for online monitoring of batch process, some papers were involved in GCC prediction after DTW synchronization with MPCA/MICA (Bai et al., 2009a, 2009b; Gao et al., 2008b), other works were concerned with GCC prediction after OFA synchronization with MPCA/MICA (Bian 2008; Bian et al., 2009; Gao et al., 2008a). These examples proved that both DTW and OFA are integrated with GCC prediction perfectly with MPCA/MICA. In this chapter, a set of online batch process monitoring approaches are discussed. On real industrial batch process, the process data is not always followed Gaussian distribution, Compared with MPCA, MICA may reveal more hidden variation than MPCA though its complexity of computation; the methods of synchronization DTW and OFA, are applied in compound monitoring approaches respectively; four solutions for missing data of future

The chapter is organized as follows. Section 2 gives introduction of the principle of DTW and relevant method of synchronization. In section 3, the principle of OFA is also introduced in advance and narration of how the extracted coefficients from the trajectories are used for model and monitoring. Then the traditional three solutions of Nomikos and MacGregor (1995) and GCC estimation are discussed in Section 4. An industrial polyvinyl chloride (PVC) polymerization process is employed to illustrate the integrative approaches

Dynamic Time warping (DTW) is a flexible, deterministic pattern matching method for comparing two dynamic patterns that may not perfectly aligned and are characterized by similar, but locally transformed, compressed and expanded, so that similar features within (Kassidas et al.,1998) the two patterns are matched. The problem can be discussed from two

Let *R* and *T* express the multivariate trajectories of two batches, whose matrices of dimension *t*×*N* and *r*×*N*, separately, where *t* and *r* are the number of observations and *N* is the number of measured variables. In most case, *t* and *r* are not always equal, so that the two batches are not synchronized because they have not common length. Even if *t*=*r*, their trajectories may not be synchronized because of their different local characteristics. If one applies the monitoring scheme of MPCA (Nomikos and MacGregor, 1994), or the scheme of MICA (Yoo et al., 2004), by simply add or delete some measured points artificially, unnecessary variation will be included in statistical model and the subsequent statistical

On the principle of dynamic programming to minimize a distance between two trajectories, DTW warps the two trajectories so that similar events are matched and a minimum distance between them is obtained, because DTW will shift, compress or expand some feature vectors

maximum GCC.

value, are applied in an example comparatively.

**2. Dynamic time warping** 

general trajectories, *R* and *T*.

in Section 5. Finally, a conclusion is presented in Section 6.

**2.1 Symmetric and asymmetric DTW algorithm** 

tests will not detect the faulty batches sensitively.

to achieve minimum distance (Nadler and Smith, 1993).

multivariate statistical data in the process, and define a generative model for massive observed data, where the variables are assumed to be linear or nonlinear mixtures of unknown latent variables called as independent components (ICs) (Lee et al., 2004; Ikeda and Toyama, 2000). Unlike capturing the variance of the data and extracting uncorrelated latent variable from correlated data on PCA algorithm, ICA seeks to extract separated ICs that constitute the variables. Furthermore, without orthogonality constraint, ICA is different from PCA whose direction vectors should be orthogonal. Yoo et al. (2004) extended ICA to batch process on proposing on-line batch monitoring using multiway independent component analysis (MICA), and regarded that ICA may reveal more information in non-Gaussian data than PCA.

Although the approach proposed by Nomikos and MacGregor (1994, 1995) is based on the strong assumption that all the batches in process should be equal duration and synchronized, every operational period of the batches is almost different from others actually because of batch-to-batch variations in impurities, initial charges of the recipe component, and heat removal capability from seasonal change, therefore operators have to adjust the operational time to get the desired product quality. There are several methods to deal with the different durations for the algorithm MPCA. However, neither stretching all the data length to the maximum by simply attaching the last measurements nor cutting down all 'redundant trajectories' to the minimum directly could construct the process model perfectly. Kourti et al. (1996) used a sort of indicator variable which is followed by other variables to stretch or compress them applied on industrial batch polymerization process. Kassidas et al. (1998) presented an effective dynamic time warping (DTW) technique to synchronize trajectories, which is flexible to transform the trajectories optimally modelling and monitoring with the concept of MPCA. DTW appropriately translates, expands and contracts the process measurements to generate equal duration, based on the principle of optimally of dynamic programming to compute the distance between two trajectories while time aligning the two trajectories (Labiner et al., 1978). Chen and Liu (2000) put forward an approach to transform all the variables in a batch into a series of orthonormal coefficients with a technique of orthonormal function approximation (OFA), and then use those coefficients for MPCA and multiway partial least square (MPLS) modelling and monitoring (Chen and Liu, 2000, 2001). One group of the extracted coefficients can be thought as abbreviation of its source trajectory, and subsequent relevant information of the projection from PCA can reveal the variation information of process well.

About the measures of online monitoring MPCA, Nomikos and MacGregor (1995) presented three solutions: filling the future observation with mean trajectories from the reference database; attaching the current deviation as the prediction values of incomplete process; and partial model projection that the known data of appeared trajectories are projected onto the corresponding partial loading matrix. The former two schemes are introduced to estimate the future group of data by just filling hypothesis information simply, without consideration of possible subsequent variations; and on the latter scheme only part information of MPCA model is used with the appeared trajectories projection onto the corresponding part of loading matrix of MPCA to analyze the variation of local segments. Therefore the indices of monitoring may be inaccurate on the above three solutions. To eliminate the errors of monitoring, Gao and Bai (2007) developed an innovative measure to estimate the future data of one new batch by calculation of the Generalized Correlation Coefficients (GCC) between 240 Principal Component Analysis

multivariate statistical data in the process, and define a generative model for massive observed data, where the variables are assumed to be linear or nonlinear mixtures of unknown latent variables called as independent components (ICs) (Lee et al., 2004; Ikeda and Toyama, 2000). Unlike capturing the variance of the data and extracting uncorrelated latent variable from correlated data on PCA algorithm, ICA seeks to extract separated ICs that constitute the variables. Furthermore, without orthogonality constraint, ICA is different from PCA whose direction vectors should be orthogonal. Yoo et al. (2004) extended ICA to batch process on proposing on-line batch monitoring using multiway independent component analysis (MICA), and regarded that ICA may reveal more information in non-

Although the approach proposed by Nomikos and MacGregor (1994, 1995) is based on the strong assumption that all the batches in process should be equal duration and synchronized, every operational period of the batches is almost different from others actually because of batch-to-batch variations in impurities, initial charges of the recipe component, and heat removal capability from seasonal change, therefore operators have to adjust the operational time to get the desired product quality. There are several methods to deal with the different durations for the algorithm MPCA. However, neither stretching all the data length to the maximum by simply attaching the last measurements nor cutting down all 'redundant trajectories' to the minimum directly could construct the process model perfectly. Kourti et al. (1996) used a sort of indicator variable which is followed by other variables to stretch or compress them applied on industrial batch polymerization process. Kassidas et al. (1998) presented an effective dynamic time warping (DTW) technique to synchronize trajectories, which is flexible to transform the trajectories optimally modelling and monitoring with the concept of MPCA. DTW appropriately translates, expands and contracts the process measurements to generate equal duration, based on the principle of optimally of dynamic programming to compute the distance between two trajectories while time aligning the two trajectories (Labiner et al., 1978). Chen and Liu (2000) put forward an approach to transform all the variables in a batch into a series of orthonormal coefficients with a technique of orthonormal function approximation (OFA), and then use those coefficients for MPCA and multiway partial least square (MPLS) modelling and monitoring (Chen and Liu, 2000, 2001). One group of the extracted coefficients can be thought as abbreviation of its source trajectory, and subsequent relevant information of the projection

About the measures of online monitoring MPCA, Nomikos and MacGregor (1995) presented three solutions: filling the future observation with mean trajectories from the reference database; attaching the current deviation as the prediction values of incomplete process; and partial model projection that the known data of appeared trajectories are projected onto the corresponding partial loading matrix. The former two schemes are introduced to estimate the future group of data by just filling hypothesis information simply, without consideration of possible subsequent variations; and on the latter scheme only part information of MPCA model is used with the appeared trajectories projection onto the corresponding part of loading matrix of MPCA to analyze the variation of local segments. Therefore the indices of monitoring may be inaccurate on the above three solutions. To eliminate the errors of monitoring, Gao and Bai (2007) developed an innovative measure to estimate the future data of one new batch by calculation of the Generalized Correlation Coefficients (GCC) between

from PCA can reveal the variation information of process well.

Gaussian data than PCA.

the new batch trajectory and historical trajectories, to fill the subsequent unknown portion of the new batch trajectory with the corresponding part of the history one with maximum GCC.

Recently, for online monitoring of batch process, some papers were involved in GCC prediction after DTW synchronization with MPCA/MICA (Bai et al., 2009a, 2009b; Gao et al., 2008b), other works were concerned with GCC prediction after OFA synchronization with MPCA/MICA (Bian 2008; Bian et al., 2009; Gao et al., 2008a). These examples proved that both DTW and OFA are integrated with GCC prediction perfectly with MPCA/MICA.

In this chapter, a set of online batch process monitoring approaches are discussed. On real industrial batch process, the process data is not always followed Gaussian distribution, Compared with MPCA, MICA may reveal more hidden variation than MPCA though its complexity of computation; the methods of synchronization DTW and OFA, are applied in compound monitoring approaches respectively; four solutions for missing data of future value, are applied in an example comparatively.

The chapter is organized as follows. Section 2 gives introduction of the principle of DTW and relevant method of synchronization. In section 3, the principle of OFA is also introduced in advance and narration of how the extracted coefficients from the trajectories are used for model and monitoring. Then the traditional three solutions of Nomikos and MacGregor (1995) and GCC estimation are discussed in Section 4. An industrial polyvinyl chloride (PVC) polymerization process is employed to illustrate the integrative approaches in Section 5. Finally, a conclusion is presented in Section 6.
