**4. Online monitoring schemes**

#### **4.1 Traditional online monitoring schemes**

It is assumed that the future measurements are in perfect accordance with their mean trajectories as calculated from reference database, the first approach is to fill the unknown part of *xnew* with zeros. In other words, batch is supposed to operate normally for the rest of its duration with no deviations in its mean trajectories. On the analysis of Nomikos and MacGregor (1995), the advantage of this approach is a good graphical representation of the batch operation in the *t* plots and the quick detection of an abnormality in the SPE plot, whereas the drawback of this approach is that the *t* scores are reluctant, especially at the beginning of the batch run, to detect an abnormal operation.

On the hypothesis that the future deviations form the mean trajectories will retain for the rest of the batch duration at their current values at the time interval *k*, the second approach is to fill the unknown part of *xnew* with current scaled values under the assumption that the same errors will persist for the rest of the batch run. Although the SPE chart is not relative sensitive than one in the first approach, the *t* scores pick up an abnormality more quickly (Nomikos and MacGregor, 1995). Nomikos and MacGregor (1995) had to suggest that the future deviations will decay linearly or exponentially from their current values to the end of the batch run, to share the advantages and disadvantages of the first two approaches.

The unknown future observations can be regarded as missing data from a batch in MPCA on the third approach. To be consistent with the already measured values up to current time *k*, and with the correlation structure of the observation variables in the database as defined by the *p*-loading matrices of MPCA model, one can use the sub model of principal components of the reference database without excessive consideration of the unknown future values. MPCA projects the already known measurements , ( ( 1)) *new k x kJ* into the reduced space and calculates the *t* scores at each time interval as:

$$\mathbf{t}\_{\mathcal{R},k} = \left(P\_k^\top P\_k\right)^{-1} P\_k^\top \mathbf{x}\_{\text{new},k} \tag{16}$$

On-Line Monitoring of Batch Process with Multiway PCA/ICA 251

[ E( )] [ E( )] cov( ) <sup>1</sup>

cov( , ) E{[ E( )] [ E( )]} *<sup>T</sup>* **VY V V Y Y** (21)

When two trajectories align with each other from start, the range of GCC is (0, 1], they are more similar as the value of their GCC near to 1. Caution must be paid when two trajectories are asynchronous so that the two matrices which have different dimensions have to be dealt

The first step is to deal with the lack of data of online batch. The trouble of online monitoring of asynchronous batch is to choose the scheme properly. As above mentioned, traditional schemes are relative easy to be implemented whereas GCC approach need more computation time than others. The ongoing new batch **V** (*k* ×*m*) needs to compare with many normal batches and abnormal batches included in history model database Ω contained more matrices for prepared in many cases. Due to different dimensions of matrices between the new batch run and history batch run **N***i*(*Kn*×*m*)∈Ω, *i*=1,2,...,*h*, *h* is the number of stored history batches in Ω, the pseudo covariance is introduced to be calculated

Fig. 5. A sketch of GCC matching to decide the substitute of future measurements inhistory

*i ii <sup>n</sup> psd* **NV N N V V** *E E K k* (22)

cov( , ) [ ( )][ ( )] /max( , ) *<sup>T</sup>*

**4.3 The procedure of online monitoring of asynchronous batch** 

with in eq. 22.

model library

instead of Eq. 21 (Gao et al., 2008b).

*T n* 

**Y YY Y <sup>Y</sup>** (20)

where P( ) *<sup>k</sup> kJ R* is a matrix whose all elements in each columns of *p*-loading vectors (**p***r*) from all the principal component are from start to the current time interval *<sup>k</sup>*. The matrix 1 ( ) *<sup>T</sup> P Pk k* is well conditioned even for the early times, and approaches the identity matrix as *k* approaches the final time interval *K* because of the orthogonality property of the loading vectors **p***r* (Nomikos and MacGregor, 1995). The advantage of this method is that at least 10% known measurements of new batch trajectory are enough for computation and perfect *t*  scores near to the actual final values. However, Nomikos and MacGregor (1995) also indicated that little information will result in quite large and unexplainable *t* scores at the early stage of the new batch run. Similarly, the third approach can be applied to MICA model that the deterministic part of independent component vector, , *s d Jk* ˆ *d k* ( ) , can be calculated as:

$$
\hat{\mathbf{s}}\_{d,k} = \mathsf{V} \mathsf{V}\_d \mathsf{x}\_{mw,k} \tag{17}
$$

where W*d*(*Jk*×1)is the deterministic part of Ws, a separating matrix in ICA algorithm.

It is uncertain that which one of above mentioned schemes is most suitable for batch process. Nomikos and MacGregor (1995) stated that each scheme is fit for respective condition: the third for non frequent discontinuities, the second for persistent disturbances and the first for non persistent disturbances. They also suggested combining these schemes when online monitoring.

#### **4.2 Online monitoring with filling similar subsequent trajectory**

Generally, as measurements of correlation degree between two vectors, Correlation Coefficients (CC) are numerical values which stand for the similarity in some sense. However, because each multivariable trajectory can be expressed as one matrix whose columns are variables with time going on, the relationship of corresponding two matrices of two multivariable trajectories can not be distinctly denoted with CC in the form of a numerical value but a matrix that one can not examine the similarity between the matrices by comparing the CC value. A sort of Generalized Correlation Coefficients measuring method was presented to the solution of the mentioned problem by computation of the traces of covariances, because as the sums of the eigenvalues of the matrices, their traces expresses the features of corresponding matrices in some ways (Gao and Bai., 2007). Suppose that a monitoring trajectory **V** (*k* ×*m*), where *k* is the current time interval, and *m* is the number of variables, another trajectory **Y** (*k* ×*m*) from history model database is chosen to match with **V** (*k* ×*m*) , their GCC can be defined as:

$$\rho(\mathbf{V}, \mathbf{Y}) = \frac{\text{tr}[\text{cov}(\mathbf{V}, \mathbf{Y})]}{\sqrt{\text{tr}[\text{cov}(\mathbf{V})] \text{tr}[\text{cov}(\mathbf{Y})]}} \tag{18}$$

where *tr* is the function of trace, *ρ*(**V**,**Y**) is the GCC. In eq.18, the definitions of cov(V), cov(Y), cov(V, Y) are:

$$\text{cov}(\mathbf{V}) = \frac{[\mathbf{V} - \mathrm{E}(\mathbf{V})]^{\mathrm{r}}[\mathbf{V} - \mathrm{E}(\mathbf{V})]}{n - 1} \tag{19}$$

250 Principal Component Analysis

where P( ) *<sup>k</sup> kJ R* is a matrix whose all elements in each columns of *p*-loading vectors (**p***r*) from all the principal component are from start to the current time interval *<sup>k</sup>*. The matrix 1 ( ) *<sup>T</sup> P Pk k* is well conditioned even for the early times, and approaches the identity matrix as *k* approaches the final time interval *K* because of the orthogonality property of the loading vectors **p***r* (Nomikos and MacGregor, 1995). The advantage of this method is that at least 10% known measurements of new batch trajectory are enough for computation and perfect *t*  scores near to the actual final values. However, Nomikos and MacGregor (1995) also indicated that little information will result in quite large and unexplainable *t* scores at the early stage of the new batch run. Similarly, the third approach can be applied to MICA

model that the deterministic part of independent component vector, , *s d Jk* ˆ

where W*d*(*Jk*×1)is the deterministic part of Ws, a separating matrix in ICA algorithm.

**4.2 Online monitoring with filling similar subsequent trajectory** 

to match with **V** (*k* ×*m*) , their GCC can be defined as:

, , ˆ

It is uncertain that which one of above mentioned schemes is most suitable for batch process. Nomikos and MacGregor (1995) stated that each scheme is fit for respective condition: the third for non frequent discontinuities, the second for persistent disturbances and the first for non persistent disturbances. They also suggested combining these schemes

Generally, as measurements of correlation degree between two vectors, Correlation Coefficients (CC) are numerical values which stand for the similarity in some sense. However, because each multivariable trajectory can be expressed as one matrix whose columns are variables with time going on, the relationship of corresponding two matrices of two multivariable trajectories can not be distinctly denoted with CC in the form of a numerical value but a matrix that one can not examine the similarity between the matrices by comparing the CC value. A sort of Generalized Correlation Coefficients measuring method was presented to the solution of the mentioned problem by computation of the traces of covariances, because as the sums of the eigenvalues of the matrices, their traces expresses the features of corresponding matrices in some ways (Gao and Bai., 2007). Suppose that a monitoring trajectory **V** (*k* ×*m*), where *k* is the current time interval, and *m* is the number of variables, another trajectory **Y** (*k* ×*m*) from history model database is chosen

[cov( )] [cov( )]

1

(18)

*<sup>T</sup>* **<sup>V</sup> <sup>V</sup> <sup>V</sup> <sup>V</sup> <sup>V</sup>** (19)

[cov( , )] ( , ) **<sup>V</sup> <sup>Y</sup> <sup>V</sup> <sup>Y</sup> <sup>V</sup> <sup>Y</sup>** *tr tr tr*

where *tr* is the function of trace, *ρ*(**V**,**Y**) is the GCC. In eq.18, the definitions of cov(V),

[ E( )] [ E( )] cov( ) *<sup>n</sup>*

calculated as:

when online monitoring.

cov(Y), cov(V, Y) are:

*d k* ( ) , can be

*d k d new k s Wx* (17)

$$\text{cov}(\mathbf{Y}) = \frac{[\mathbf{Y} - \mathrm{E}(\mathbf{Y})]^\top [\mathbf{Y} - \mathrm{E}(\mathbf{Y})]}{n - 1} \tag{20}$$

$$\text{cov}(\mathbf{V}, \mathbf{Y}) = \text{E}[[\mathbf{V} - \text{E}(\mathbf{V})]^\dagger [\mathbf{Y} - \text{E}(\mathbf{Y})]] \tag{21}$$

When two trajectories align with each other from start, the range of GCC is (0, 1], they are more similar as the value of their GCC near to 1. Caution must be paid when two trajectories are asynchronous so that the two matrices which have different dimensions have to be dealt with in eq. 22.

#### **4.3 The procedure of online monitoring of asynchronous batch**

The first step is to deal with the lack of data of online batch. The trouble of online monitoring of asynchronous batch is to choose the scheme properly. As above mentioned, traditional schemes are relative easy to be implemented whereas GCC approach need more computation time than others. The ongoing new batch **V** (*k* ×*m*) needs to compare with many normal batches and abnormal batches included in history model database Ω contained more matrices for prepared in many cases. Due to different dimensions of matrices between the new batch run and history batch run **N***i*(*Kn*×*m*)∈Ω, *i*=1,2,...,*h*, *h* is the number of stored history batches in Ω, the pseudo covariance is introduced to be calculated instead of Eq. 21 (Gao et al., 2008b).

Fig. 5. A sketch of GCC matching to decide the substitute of future measurements inhistory model library

$$psd\text{cov}(\mathbf{N}\_{\shortparallel}, \mathbf{V}) = [\mathbf{N}\_{\shortparallel} - E(\mathbf{N}\_{\shortparallel})][\mathbf{V} - E(\mathbf{V})]^{\top} / \max(K\_{\shortparallel}k) \tag{22}$$

On-Line Monitoring of Batch Process with Multiway PCA/ICA 253

from the reaction. PVC in the solution will precipitate quickly to form solid phase PVC granules inside almost each VC monomer droplets on the polymerization, because it is not

�

�

Sample number

> 0 2000 0

Variable NO. Sensor NO. Variable name Unit 1 TIC-P101 Temperature of the reactor ℃ 2 TIC-P102 Temperature of the reactor jacket inlet ℃ 3 TI-P107 Temperature of the water inlet ℃ 4 TI-P108 Temperature of the baffle outlet ℃ 5 TI-P109 Temperature of the reactor jacket outlet ℃ 6 PIC-P102 Pressure of the reactor MPa 7 FIC-P101 Flow rate of baffle water m3/h 8 FIC-P102 Flow rate of jacket water m3/h 9 JI-P101 Stirring power KW

Due to the exothermic reaction, the temperature of the reactor will rise gradually so that the redundant reaction heat should be removed at once to keep constant temperate. In order to cool down the reactor, a flow of cooling water is pumped into the jacket surrounding the reactor. The condenser on the top the reactor also concentrates VC monomer from vapor to liquid. If temperature of reactor is lower than the set point temperature, the hot water is commanded to be injected in the jacket again, which is the automatic control of process by the parameters of the important variables. At the end of the polymerization, there is a little monomer of remained gaseous VC. With the VC being absorbed from the byproduct of

exhaust gas, the polymerization does not continue until the action of terminator.

Sample number

200

m3/h

400

Reactor jacket outlet Temp

Sample number

Flow rate of jacket water

<sup>0</sup> <sup>2000</sup> <sup>4</sup>

Sample number

Stirring power

0 2000 0

Sample number

50

KW

100

Sample number

Pressure of the reactor

Water inlet Temp

5 6 7

�

MPa

Reactor jacket inlet Temp

soluble in water, but little dissolved in the VC.

50

0

0

100

m3/h

Table 1. Polymerization reactor variables

200

�

�

0 2000

0 2000

0 2000

Sample number

Fig. 7. Typical batch profiles of nine variable of PVC form one batch

Sample number

Flow rate of baffle water

Sample number

Baffle outlet Temp

Reactor Temp

Then one of trajectories, **N***i*(*Kn*×*m*) , that have the largest GCC with **V** (*k* ×*m*) is chosen. If *k*<*K*n, extend **V** (*k* ×*m*) by copying from *k*+1 to *Kn* part of **N***i*(*Kn*×*m*) to follow **V** (*k* ×*m*), otherwise maintain **V** (*k* ×*m*). Although *k* is far less than *Kn* sometimes, the result of Eq.22 reveals the homologous relationship like covariance between the two matrices. Hence, the insufficiency of data of online batch run can be solved by filling the assumptive values in different ways.

The second step is pre-treatment of data. Before synchronization, all the measurements of new batch should be scaled.

The third step is synchronization; one can choose DTW or OFA to deal with the asynchronous running trajectory. After that, the new test batch is similar to offline batch so as to be projected onto MPCA/MICA model.
