**2. Multimodel system identification**

In this section a two classification algorithms are used to allow the determination of the number, structure and parameters of the different models of the base. So, for the determination of model's base, we propose both Frequency Sensitive Competitive Learning (FSCL) and Fuzzy K-means to obtain respectively the number of models and the operating regime of each one. In the second step we propose the validity computation for each submodel in the model's base. The validity of each model is designed in such a way that it involves two types of validity, namely a simple and reinforced validity. The contribution of each submodel is done by each validity value and thus appears via an optimization technique. On the basis of the last results, one can in fact, minimize the number of submodels for those that have no effect on multimodel modeling. This can be explaining by a measure of similarity between each submodel and the global one. This technique is highlighted via an SVD technique.

#### **2.1 Determination of the number models**

Most existing clustering algorithms [20, 21], when we have not an idea about the number they cannot handle the selection of the appropriate number of cluster. However, when it is used for clustering, the FSCL algorithm automatically allocates the suitable number of units for an input data set. So, for the determination of model's base, we propose Frequency Sensitive Competitive Learning (FSCL). Frequency Sensitive Competitive Learning (FSCL) [22] is a competitive algorithm with *N* neuron that is trained by a dataset of *P* data vectors *x t*ð Þ. In the FSCL, the competitive computing units are penalized in proportion to the frequency of their winning. Giving an input *xi* each time, FSCL determines the winner by:

$$u\_i = \begin{cases} 1 & \text{if } \ i = c \\ 0 & \text{otherwise,} \end{cases} \tag{1}$$

Such that

$$\left\|\boldsymbol{\gamma}\_{\varepsilon}\right\|\mathbf{x} - \boldsymbol{w}\_{\varepsilon}\big|\big|^{2} = \min\_{\boldsymbol{j}} \boldsymbol{\gamma}\_{\boldsymbol{j}} \left\|\boldsymbol{x} - \boldsymbol{w}\_{\boldsymbol{j}}\right\|^{2} \tag{2}$$

where.

*N*: Is initial estimation of the number of clusters in the given data.

*ui*1≤*i*≤ *k*: The output units of dimension *k*. *wi* 1≤*i*≤ *k*: Weight vectors each of dimension *k*.

*xi* 1≤*i* ≤*d*: The d-dimensional input vector from data set *P*.

*wc*: D-dimensional weight vector corresponding to the winner.

k k ∗ : Euclidean distance;

c: index of the unit which wins the competition;

*γ j* : Conscience factor used to reduce the winning rate of the frequent winners defined as follows [23]:

$$\mathcal{Y}\_{j} = \mathbb{1}\_{j} \bigvee\_{i=1}^{k} \mathbb{1}\_{i} \tag{3}$$

Where *n <sup>j</sup>* refers to the cumulative number of occurrences the node *j* has won the competition.

After selecting the winner, FSCL updates the winner as follow:

$$\left(w\_{j}(t+1) = w\_{j}(t) + a\_{\mathfrak{g}}(t)\left(\boldsymbol{x} - w\_{j}(t)\right)\right) \tag{4}$$

Where *α<sup>g</sup>* is the learning rate defined as follow:

$$a\_{\mathfrak{g}} = a\_{\mathfrak{g}}^i \left(\frac{\alpha\_{\mathfrak{g}}^f}{\alpha\_{\mathfrak{g}}^i}\right)^{\sharp\_{\mathfrak{f}\_{\text{max}}}} \tag{5}$$

The convergence of the appropriate FSCL algorithm to a local minimum is studied in [24]. Using this technique with its parameters as defined so that, the maximum number of iteration designed by *t*max and with its initial and final learning rate designed respectively *α<sup>i</sup> <sup>g</sup>*, *<sup>α</sup> <sup>f</sup> <sup>g</sup>* , convergence of the algorithm visualize in the final step that there are some of the data clusters are more densely populated than others which ideally result that there are some wining units are more often in those clusters than other. It must be considered that when the number of unit (neuron) is larger than the real number of cluster in the input data-set, the extra units are gradually driven faraway from the distribution of the data-set. If the number of cluster c is not equal to the true value, FSCL will lead to an incorrect clustering result. So, it needs to pre-assign the number of clusters c. For all the studied examples we have varied the cluster number, the learning rate for each case study and take the opportunity to determine the appropriate one in such away that *α<sup>g</sup>* ∈½ � 0*:*1 � 0*:*5 .

#### **2.2 Fuzzy k-means clustering**

In order to establish the operating cluster, the use of Fuzzy k-means is considered. This last algorithm was defined by [25] and improved by [26]. The determination of different cluster centers and dada set *xi* assigned to each clusters for each centers is done by the use of the minimization objective function mentioned by:

*An Optimization Procedure of Model's Base Construction in Multimodel Representation… DOI: http://dx.doi.org/10.5772/intechopen.96458*

$$J\_m = \sum\_{j=1}^{K} \sum\_{i=1}^{N} \mu\_{ij}^m \left|| \mathbf{x}\_i - c\_j \right||^2, \mathbf{1} \le m \le \infty;\tag{6}$$

Where:

*μij*: represents degree of membership of *xi* to cluster *j* and stands for the local model's activation degree for that observation P*<sup>K</sup> <sup>j</sup>*¼1*μij* <sup>¼</sup> 1;

*N*: is the number of data points,

*K*: Number of cluster or local models;

*xi*: *i* th data point;

*m*: (real number greater than 1) is the "fuzzy exponent" that influences the membership values and represents the overlapping shape between clusters.

*cj*: Center of cluster *j*.

The algorithm consists of the following steps:

1. Initialize the membership matrix *<sup>U</sup>* <sup>¼</sup> *<sup>μ</sup>ij* h i with random value between 0 and 1.

2.Calculate fuzzy centers using:

$$\mathbf{c}\_{j} = \frac{\sum\_{i=1}^{N} \mu\_{ij}^{m} \mathbf{x}\_{i}}{\sum\_{i=1}^{N} \mu\_{ij}^{m}}.\tag{7}$$

3.Compute a new matrix U using

$$\mu\_{ij} = \left[ \sum\_{r=1}^{K} \left( \frac{||\mathbf{x}\_i - c\_j||}{||\mathbf{x}\_i - c\_r||} \right)^{2/(m-1)} \right]^{-1} \tag{8}$$

4.The iteration will stop if:

$$\|\|U(k-1) - U(k)\|\| < \xi. \tag{9}$$

Where *ξ* is a termination criterion between 0 and 1.

#### **2.3 Local linear models**

By identifying the cluster number and the datset for each cluster, the next step focuses primarily on obtaining cluster sub-models. This last step requires two phases: the first for the structure of submodel and the second deal with its parameters identification. For the different obtained vectors representative of cluster, a parametric estimation uses the Recursive Least-Square method is retained. The ARX model being chosen for each cluster whose equation is given by:

$$y(k) = -\sum\_{i=1}^{n\_a} a\_i y(k-i) + \sum\_{j=1}^{n\_b} b\_j u(k-j) \tag{10}$$

Where *ai* and *b <sup>j</sup>* are parameters of the ith submodel. *na* and *nb* are the lags respectively in input and output. The order of each model is determined by the instrumental determinants' ratio-test [27]. For every order value *d*, the instrumental determinants 'ratio *RDI d*ð Þ is computed and the retained order *d* isthe value for

which the ratio *RDI d*ð Þ quickly increases for the first time. This method consists in building an information matrix *Qd* giving by:

$$\mathbf{Q}\_d = \frac{\mathbf{1}}{n\_d} \sum\_{k=1}^{n\_d} \begin{bmatrix} u(k) \\ u(k+1) \\ u(k-d+1) \\ u(k+d) \end{bmatrix} \begin{bmatrix} \wp(k+1) \\ u(k+1) \\ \wp(k+d) \\ u(k+d) \end{bmatrix}^T \tag{11}$$

Where *nd* is the observations' number and the instrumental determinants' ratio *RDI* (d) is given by the following relation:

$$RDI(d) = \left| \frac{\det(Q\_d)}{\det(Q\_{d+1})} \right| \tag{12}$$

#### **2.4 Computation of validity**

In the proposed approach the validity computation of each submodel is proposed by this expression:

$$
\upsilon\_i^{mol} = a\_i \upsilon\_i^{simp} + \beta\_i \upsilon\_i^{ref} \tag{13}
$$

Where *v simp <sup>i</sup>* and *v renf <sup>i</sup>* are respectively simple and reinforced validity. The simple validity is defined by:

$$w\_i^{simp}(k) = \frac{1 - r\_i^{norm}(k)}{N - 1}, i = 1, N\_m \tag{14}$$

And the reinforced validity is expressed by

$$v\_i^{renf}(k) = v\_i(k) \prod\_{\substack{j=1 \ j \neq i}}^N \left(1 - v\_j(k)\right) \tag{15}$$

The *rnorm <sup>i</sup>* is the normalized residue given by:

$$r\_i^{norm}(k) = \frac{r\_i(k)}{\sum\_{i=1}^{N} r\_i(k)}, i = 1, N\_m \tag{16}$$

The residue is expressed as the distance between the process output *y* and the considered local output *yi* of the submodel *Mi* which has the following formula:

$$r\_i = |\mathcal{y} - \mathcal{y}\_i|, i \in [1, N\_m] \tag{17}$$

Where *Nm* is the number of submodel in the models' base. *αi*, *βi*: are respectively the adequate values which can be calculated as:

$$\begin{aligned} \underset{a\_i, \beta\_i}{\text{Min}} \left[ \left( \left( \sum\_{i=1}^{N\_m} \left( a\_i v\_i^{\text{simpl}} + \beta\_i v\_i^{\text{ref}} \right) y\_i \right) - \wp \right) \right] \\ \sum\_{i=1}^n a\_i = 1, \quad \sum\_{i=1}^n \beta\_i = 1. \end{aligned} \tag{18}$$

*An Optimization Procedure of Model's Base Construction in Multimodel Representation… DOI: http://dx.doi.org/10.5772/intechopen.96458*

A Least Square Estimation is used in order to prove the value of *α<sup>i</sup>* and *β<sup>i</sup>* . In fact, the multimodel output *ymul* is calculated by a fusion of the models' outputs *yi* weighted by their respective multi-validity indexes *vmul <sup>i</sup>* which is illustrated by the following expression:

$$\mathcal{Y}\_{mul}(k) = \sum\_{i=1}^{N\_m} v\_i^{mul}(k)\mathcal{y}\_i(k) = \boldsymbol{\theta}^T \boldsymbol{\rho}(k) \tag{19}$$

Where we introduce the following regressor vector:

$$\boldsymbol{\varrho}^{T}(\boldsymbol{k}) = \begin{bmatrix} \boldsymbol{y}\_{1}(\boldsymbol{k} - \mathbf{1}), & \boldsymbol{y}\_{i}(\boldsymbol{k} - \mathbf{1}), \end{bmatrix}, i = \mathbf{1}, N \tag{20}$$

And the parameters vector:

$$\boldsymbol{\theta}^{\mathrm{T}}(k) = [a\_1, a\_i, \beta\_1, \beta\_i], i = \mathbf{1}, \ N \tag{21}$$

The recursive least squares estimate of *θ* is:

$$\begin{cases} \theta(k) = \theta(k-1) + P(k)\rho(k)e(k) \\ P(k) = P(k-1) - \frac{P(k-1)\rho(k)\rho^T(k)P(k-1)}{1 + \rho^T(k)P(k-1)\rho(k)} \\ e(k) = y(k) - \theta^T(k-1)\rho(k) \end{cases} \tag{22}$$

Where *P* denote the covariance matrix and *e* is an error of modeling.
