*2) Feature kernel quantification*

The feature distances calculated above are used to determine the feature weights by evaluating the Gaussian kernel

$$k\_i^f = \exp\left(\frac{-d\_i^{\cdot 2}}{2h\_f^2}\right) \tag{2}$$

where *h <sup>f</sup>* is a kernel bandwidth for feature preservation, which controls how much the nearby memory feature vectors are weighted. This leads to the **k** *<sup>f</sup>* ∈ *<sup>M</sup>*�<sup>1</sup> vector. The superscript/subscript *f* indicates the feature components.

#### *3) Time position index identification*

Here, the time position index, that is, the temporal location of the nearest vector, within the memory vector, to the query vector observation, is determined using a derivative-based comparator. This provides the input to the weighted-distance algorithm in *Step 4*. Instead of directly using the derivative in the prediction to capture the temporal correlation of the data, which might not be a good choice because of process measurement noise, the derivatives are used as a *comparator* to determine the time position index within the memory vectors to which the query data vector is nearest. The derivative-based comparator is described as follows.

The backward-difference first-order derivative approximation of the current historical measurement vector in each matrix A*<sup>k</sup> <sup>r</sup>* based on *r* data points accuracy with respect to *t* is the element-by-element derivative:

$$\frac{\partial \mathcal{A}\_r^k}{\partial t} = \left[ \frac{\partial \mathcal{x}\_{r,1}^k}{\partial t} \quad \frac{\partial \mathcal{x}\_{r,2}^k}{\partial t} \quad \cdots \quad \frac{\partial \mathcal{x}\_{r,p}^k}{\partial t} \right] \tag{3}$$

whereas, that of the current query vector, **x**<sup>∗</sup> *qr* in matrix **<sup>X</sup>**<sup>∗</sup> *<sup>q</sup>* is the element-byelement derivative:

$$\frac{\partial \mathbf{x}\_{q\_r}^\*}{\partial t} = \left[ \frac{\partial \mathbf{x}\_{r,1}^\*}{\partial t} \quad \frac{\partial \mathbf{x}\_{r,2}^\*}{\partial t} \quad \cdots \quad \frac{\partial \mathbf{x}\_{r,p}^\*}{\partial t} \right]. \tag{4}$$

*Fault Detection by Signal Reconstruction in Nuclear Power Plants DOI: http://dx.doi.org/10.5772/intechopen.101276*

The first-order derivatives in Eqs. (3) and (4) have been approximated from the data using finite-difference derivative approximation. The backward finite difference derivative approximation is chosen to implement real-time on-line monitoring. The model needs *r* successive data points to evaluate the derivative of the current data point from the current measurement backward to the size of the moving window *r* at every sampling time, using backward finite-difference derivative approximation.

From Eqs. (3) and (4), the distance between the derivative of a query vector, **x**<sup>∗</sup> *qr* and each *k*th derivative vector of A*<sup>k</sup> <sup>r</sup>* can be calculated by Eq. (5) using the Manhattan distance (L1 norm):

$$\Delta\_k \left( \frac{\partial \mathcal{A}\_r^k}{\partial \mathbf{t}}, \frac{\partial \mathbf{x}\_{q\_r}^\*}{\partial t} \right) = \left\| \frac{\partial \mathcal{A}\_r^k}{\partial \mathbf{t}} - \frac{\partial \mathbf{x}\_{q\_r}^\*}{\partial t} \right\|\_1 = \sum\_{j=1}^p \left| \frac{\partial \mathbf{x}\_{r,j}^k}{\partial \mathbf{t}} - \frac{\partial \mathbf{x}\_{r,j}^\*}{\partial t} \right| \tag{5}$$

This gives the derivative distance vector, **Δ** ∈ *<sup>N</sup>*�<sup>1</sup> .

Then, using the minimum value in **Δ**, the index *i* ¼ *ε*, which indicates the location of the vector in the memory data, **X**, to which the current query vector **x***qr* is closest, can be obtained. The time position index is, therefore, the index at which the Manhattan distance between the derivative of the current query vector and those of the *r*th vectors in each of the A*<sup>k</sup> <sup>r</sup>* is minimized plus the overlapping length between the two consecutive time windows, which is determined as:

$$\varepsilon = \left(\underset{k\epsilon[1:N]}{\text{arg min}} \left(\Delta\_k\right)\right) + r - \mathbf{1} \tag{6}$$

#### *4) Temporal weighted-distance algorithm*

The temporal weighted-distance algorithm captures the temporal correlations in the data. It calculates the measures that capture the temporal variations in the data. The distance, δ, accounts for the time at which the query vector is observed. This algorithm calculates the temporal correlation of a query input with the memory data, without using the query time input *tq* � �, and eliminates the direct use of *tq*, which becomes indefinite when applied to on-line monitoring. In this way, the effect of the query time input is confined within the time duration of the historical memory data. The distance is calculated based on the assumption that the timevarying historical data collected in building the model were sampled at a constant time interval, *η*.

Based on the time position index determined in *Step 3*, the temporal weighteddistance algorithm that captures the temporal correlation is formulated as

$$\delta\_i = \begin{cases} \delta\_\varepsilon, & i = \varepsilon \\ \delta\_\varepsilon + (i - \varepsilon).\eta, & i > \varepsilon \& \varepsilon \neq M; \quad i\varepsilon [1, M] \\ \delta\_\varepsilon + (\varepsilon - i).\eta, & i < \varepsilon \& \varepsilon \neq 1 \end{cases} \tag{7}$$

giving the weighted-distance vector **δ** ∈ *<sup>M</sup>*�<sup>1</sup> :

> **δ** ¼ ½ � δ<sup>1</sup> ⋯ δ*<sup>ε</sup>*�<sup>2</sup> δ*<sup>ε</sup>*�<sup>1</sup> δ*<sup>ε</sup>* δ*<sup>ε</sup>*þ<sup>1</sup> δ*<sup>ε</sup>*þ<sup>2</sup> ⋯ δ*<sup>M</sup> <sup>T</sup>* (8)

Once the values of *δε* and *η* are known, the other values in Eq. (8) can be determined progressively using Eq. (7). The second and third equations in Eq. (7) follow arithmetic progression (AP): the first term and the common difference of the two progressions are *δε* and *η*, respectively. A zero value for the first term of the two progressions, *δε* ¼ 0, has been recommended [55] because the distance of the nearest vector in the memory data to the query vector is close to zero. Whereas the value of the common difference can be arbitrarily selected or taken to be the time interval, *η*. The other distance values to the right and left of *δε* in Eq. (8) can be progressively calculated using the second and third equations in Eq. (7), respectively. See Appendix C of [55] for the proof of this algorithm (Eq. (7)).

#### *5) Temporal kernel quantification*

Having determined the weighted-distance, the kernel weight can be calculated using the Gaussian kernel function:

$$k\_i^t = \exp\left(\frac{-\delta\_i^2}{2h\_t^2}\right) \tag{9}$$

where the *k<sup>t</sup> <sup>i</sup>* is the *i*th kernel weight calculated from the temporal weighteddistance, *δi*; the superscript/subscript *t* indicates the temporal components; *ht* is the bandwidth for the time-domain preservation, which can also serve as noise rejection and controls how much the nearby times in the memory vectors are weighted. This gives the vector **k***<sup>t</sup>* ∈ ð*M*�<sup>1</sup> .

#### *6) Adaptive bilateral kernel evaluation*

Depending on the magnitude of a fault that occurs in a process, the result of the direct multiplication of the two kernels at *i* ¼ *ε* could be zero, which would result in an inaccurate model prediction because the model prediction tends to follow the fault occurrence, so the fault would not be detected. To resolve such issue and achieve robust model signal reconstruction, and to reduce the impact of spillover onto other signals when one or more signals is in fault condition, Eq. (10) is formulated [54] adaptively for the combined kernels of Eqs. (2) and (9) as:

$$k\_i^{ab} = \begin{cases} \begin{array}{ll} k\_i^f \ast k\_i^l, & \mathbf{1} \le i \le M \& i \ne \varepsilon\\ \begin{pmatrix} k\_i^f + k\_i^f \end{pmatrix} & \text{; } & ic[\mathbf{1}, M] \\\hline 2 \end{array} \end{cases} \tag{10}$$

resulting in the adaptive bilateral kernel vector **k***<sup>b</sup>* ∈ *<sup>M</sup>*�<sup>1</sup> .

This reduces the effect of the dominance of one feature distance value over another when a fault occurs. The adaptive nature of Eq. (10) is to dynamically compensate for faulty sensor inputs to the bilateral kernel evaluation and always ensure that a larger weight is assigned to the closest vector within the memory data to the query vector, so as to guarantee an approximate signal reconstruction. This reduces the effect of the degeneration of the feature kernel when a fault of high magnitude has occurred.

#### *7) Output estimation*

Finally, the adaptive bilateral kernel weights are combined with the memory data vectors to give the predictions as:

*Fault Detection by Signal Reconstruction in Nuclear Power Plants DOI: http://dx.doi.org/10.5772/intechopen.101276*

$$
\hat{\boldsymbol{\alpha}}\_{r\_{j}}^{\*} = \frac{\sum\_{i=r}^{M} k\_{i}^{ab} \, \boldsymbol{\omega}\_{i,j}}{\sum\_{i=r}^{M} k\_{i}^{ab}} \tag{11}
$$

If a normalized adaptive bilateral kernel vector, **w** ∈ ð Þ� *<sup>M</sup>*�*r*þ<sup>1</sup> <sup>1</sup> is defined:

$$w\_i = \frac{k\_i^{ab}}{\sum\_{i=r}^{M} k\_i^{ab}},\tag{12}$$

then, Eq. (11) can be rewritten in matrix form to predict all the signals of the query vector simultaneously as:

$$\hat{\mathbf{x}}\_{q\_r}^\* = \mathbf{w}^T \mathbf{X} \tag{13}$$

where **X** ∈ ð Þ� *<sup>M</sup>*�*r*þ<sup>1</sup> *<sup>p</sup>*.

*8) Fault detection*

After training of the model, the root mean square error (RMSE) on the predictions of the fault-free validation dataset can be calculated using residuals **<sup>e</sup>***qr* <sup>¼</sup> **<sup>x</sup>***qr* � **<sup>x</sup>**^ <sup>∗</sup> *qr* � � between the actual values and the predicted values of the validation dataset, and can be used to set the threshold limit for fault detection in each signal as follows:

$$T^D\_j = \mathfrak{Z} \* \mathrm{RMSE}\_j. \tag{14}$$

Because the residuals can be assumed to be Gaussian and randomly distributed with a mean of zero and variance of *RMSE <sup>j</sup>* 2 , a constant value equal to 3 has been selected in [54] to minimize the false alarm rate and ensure that a fault is detected when the residuals exceed the threshold.

#### **3.2 Analysis of the limitation of the AABKR**

In this section, we analyze the limitation of the AABKR described in Section 3.1, in terms of signal reconstruction from faulty sensor signals. The major limitation can be understood from the description presented as follows.

We observed that, in an extreme, limit or worst case scenario, where the fault deviation intensity in a faulty sensor signal is significant, the feature distance vector degenerates and tends to zero (i.e., **k**<sup>f</sup> ≈**0**), so that the signal reconstructed by the AABKR model is bound to be:

$$
\hat{\boldsymbol{\mathfrak{x}}}\_{q,j}^{\*} = \boldsymbol{\mathfrak{x}}\_{e,j} \tag{15}
$$

This observation can be understood better by the following analysis. Recall the reconstructed output from a weighted average of Eq. (11), re-written as:

$$\hat{\boldsymbol{x}}\_{q,j}^{\*} = \frac{\sum\_{i=1}^{M} \left( \boldsymbol{k}\_{i}^{f} \circledast \boldsymbol{k}\_{i}^{t} \right) \boldsymbol{x}\_{i,j}}{\sum\_{i=1}^{M} \left( \boldsymbol{k}\_{i}^{f} \circledast \boldsymbol{k}\_{i}^{t} \right)} \tag{16}$$

where, *k <sup>f</sup> <sup>i</sup>* <sup>⊛</sup> *<sup>k</sup><sup>t</sup> <sup>i</sup>* <sup>¼</sup> *<sup>k</sup>ab <sup>i</sup>* is the adaptive bilateral kernel evaluated at **x***i*. The symbol ⊛ represents the bilateral kernel combination operator that combines the feature and temporal kernels, given by Eq. (10).

Applying the properties of limit to Eq. (16), we have:

$$\lim\_{\mathbf{k}^f \to \mathbf{0}} \left( \hat{\boldsymbol{x}}\_{qj}^\* \right) = \lim\_{\mathbf{k}^f \to \mathbf{0}} \left( \frac{\sum\_{i=1}^M \left( k\_i^f \otimes k\_i^t \right) \boldsymbol{\varkappa}\_{i,j}}{\sum\_{i=1}^M \left( k\_i^f \otimes k\_i^t \right)} \right) \tag{17}$$

Equation (17) can be re-written as:

$$\lim\_{\mathbf{k}^f \to \mathbf{0}} \left( \hat{\boldsymbol{x}}\_{q,j}^\* \right) = \frac{\lim\_{\mathbf{k}^f \to \mathbf{0}} \left( \sum\_{i=1}^M \left( k\_i^f \otimes k\_i^t \right) \boldsymbol{x}\_{i,j} \right)}{\lim\_{\mathbf{k}^f \to \mathbf{0}} \left( \sum\_{i=1}^M \left( k\_i^f \otimes k\_i^t \right) \right)} \tag{18}$$

provided that:

$$\lim\_{\mathbf{k}^f \to \mathbf{0}} \left( \sum\_{i=1}^{M} \left( k\_i^f \circledast k\_i^f \right) \right) \neq \mathbf{0} \tag{19}$$

Note that, judging from the Eq. (10), Eqs. (18) and (19) hold valid. Hence, the limit in Eq. (18) can be simplified as:

$$\lim\_{\mathbf{k}^{f}\to\mathbf{0}}\left(\dot{\mathbf{x}}\_{qj}^{\*}\right) = \frac{\lim\_{\mathbf{k}\_{1}^{f}\to 0}\left(\left(k\_{1}^{f}\otimes k\_{1}^{t}\right)\mathbf{x}\_{1j}\right) + \lim\_{\mathbf{k}\_{2}^{f}\to 0}\left(\left(k\_{2}^{f}\otimes k\_{2}^{t}\right)\mathbf{x}\_{2j}\right) + \dots + \lim\_{\mathbf{k}\_{M}^{f}\to 0}\left(\left(k\_{M}^{f}\otimes k\_{M}^{t}\right)\mathbf{x}\_{Mj}\right)}{\lim\_{\mathbf{k}\_{1}^{f}\to 0}\left(k\_{1}^{f}\otimes k\_{1}^{t}\right) + \lim\_{\mathbf{k}\_{2}^{f}\to 0}\left(k\_{2}^{f}\otimes k\_{2}^{t}\right) + \dots + \lim\_{\mathbf{k}\_{M}^{f}\to 0}\left(k\_{M}^{f}\otimes k\_{M}^{t}\right)}\tag{20}$$

But, from the adaptive bilateral combination of Eq. (10):

$$\lim\_{k\_i^f \to 0; i \neq e \& 1 \le i \le M;} \left( k\_i^f \oplus k\_i^t \right) = \mathbf{0} \tag{21}$$

and

$$\lim\_{k\_i^f \to 0, i = e} \left( k\_i^f \oplus k\_i^t \right) = \mathbf{0}.5\tag{22}$$

Hence Eq. (20) becomes

$$\lim\_{\mathbf{k}^f \to \mathbf{0}} \left( \hat{\boldsymbol{\kappa}}\_{q\_j i}^\* \right) = \frac{\lim\_{k\_i^f \to 0, i = \varepsilon} \left( \left( k\_i^f \otimes k\_i^t \right) \boldsymbol{\kappa}\_{i,j} \right)}{\lim\_{k\_i^f \to 0, i = \varepsilon} \left( k\_i^f \otimes k\_i^t \right)} = \boldsymbol{\kappa}\_{\varepsilon j} \left( \frac{0.5}{0.5} \right) \tag{23}$$

Thus,

$$\lim\_{\mathbf{k}^f \to \mathbf{0}} \left( \hat{\boldsymbol{\pi}}\_{q,j}^\* \right) = \boldsymbol{\pi}\_{\boldsymbol{\varepsilon},j} \tag{24}$$

This implies that, in a limit case of faulty sensor query signal, *x*<sup>∗</sup> *q*,*j* , the reconstructed query signal, *x*^ <sup>∗</sup> *<sup>q</sup>*,*<sup>j</sup>* is equal to the historical (memory) data point, *x<sup>ε</sup>*,*<sup>j</sup>* located at the identified time position index, *ε*.

*Fault Detection by Signal Reconstruction in Nuclear Power Plants DOI: http://dx.doi.org/10.5772/intechopen.101276*

From the above analysis, it is clear that the fault detection capability of the AABKR largely depends on the accuracy of the time position index identification algorithm. If the time position index has been identified correctly, the fault can be detected even though the fault deviation intensity is large and tends to infinity. However, if the time position index has been identified wrongly, the fault might or might not be detected depending on the identified index and the intensity of the fault. The consequence of this situation, even if the fault has been detected, is that the sensor signal responsible for the fault might not be diagnosed correctly which might lead to wrong diagnosis of the system under consideration. In this regards, a more robust approach for time position index identification is proposed and discussed in the next section.
