**8. Complex layers of linear prognostic models**

Complex layers of linear classifiers or prognostic models have been proposed as a scheme for obtaining a general classification or forecasting rules designed on the basis of a small number of multidimensional feature vectors **x**<sup>j</sup> [11]. According to this scheme, when designing linear prognostic models, averaging over a small number *m* of feature vectors **x**<sup>j</sup> of the dimension *n* (*m* < < *n*) is replaced by averaging on collinear clusters of selected features (genes) *X*i. Such an approach to averaging can be linked to the ergodic theory [17].

In the case of a small sample of multivariate vectors, the number *m* of feature vectors **x**<sup>j</sup> in the data set *C* (1) may be much smaller than the dimension *n* of these vectors (*m* < < *n*). In this case, the collinear cluster *C*(**w**k\*[*r*k]) (52) may contain all feature vectors **x**<sup>j</sup> from the set *C* (1) and the vertex **w**k\*[*r*k] may have the rank *r*<sup>k</sup> equal to *m* (*r*<sup>k</sup> = *m*).

As it follows from Theorem 4, if the collinearity criterion function Φ(**w**) (38) is defined on linearly independent (*Def.* 2) feature vectors **x**<sup>j</sup> , then the values Φ(**w***m*(*<sup>l</sup>*)) of this function at each final vertex **w***m*(*l*) (59) are equal to zero (Φ(**w***m*(*<sup>l</sup>*)) = 0). Each final vertex **w***m*(*l*) (59) can be reached in *m* steps *k* (*k* = 1, … , *m*) starting from the vertex **w**<sup>0</sup> = [0,..., 0]T related to the identity matrix *B*<sup>0</sup> = *I*<sup>n</sup> = [**e**1,..., **e**n] T .

Minimization of the collinearity criterion function Φ(**w**) (38), and then minimization of the criterion function Ψ<sup>0</sup> (**w***L*(*<sup>l</sup>*)) (75) at the final vertices **w***L*(*l*) (59) allows to determine the optimal vertex **w***L*(*<sup>l</sup>*)\* (77) with the largest *L*<sup>1</sup> margin δL1(**w***L*(*<sup>l</sup>*)\*) (78). If the feature vectors **x**<sup>j</sup> (1) are linearly independent, then the optimal vertex **w***L*(*<sup>l</sup>*)\* (77) is related to the optimal basis *BL*(*<sup>l</sup>*)\*=[**x**1,..., **x**m, **e**i(*<sup>m</sup>* + 1),..., **e**i(*<sup>n</sup>*)] <sup>T</sup> which contains all *m* feature vectors **x**<sup>j</sup> (1) and *n* - *m* unit vectors **e**<sup>i</sup> with the indices *i* belonging to the optimal subset *I*(**w***L*(*<sup>l</sup>*)\*) (71) (*i* ∈ *I*(**w***L*(*<sup>l</sup>*)\*)).

The optimal basis *B*m\*=[**x**1,..., **x**m, **e**i(*<sup>m</sup>* + 1),..., **e**i(*<sup>n</sup>*)] <sup>T</sup> (73) is found in two stages. In the first stage, *m* feature vectors **x**<sup>j</sup> (1) are introduced into matrices *B*<sup>k</sup> = [**x**1,..., **x**k, **e**i(k + 1),..., **e**i(*<sup>n</sup>*)] <sup>T</sup> (*k* = 0, 1, … , *m* - 1). The inverse matrices *B*<sup>k</sup> �<sup>1</sup> (49) are computed in accordance with the vector Gauss-Jordan transformation (64). In the second stage, the unit vectors **e**i(*l*) in the matrices *B*m(*l*) (73) are exchanged to minimize the *CPL* function Ψ<sup>0</sup> (**w**m(*<sup>l</sup>*)) (75) at the final vertices **w***m*(*l*) (77). The optimal basis *B*m\* defines (47) the optimal vertex **w***m*(*<sup>l</sup>*)\* (77), which is characterized by the largest margin δL1(**w***m*(*<sup>l</sup>*)\*) (78).

The vertexical feature subspace *F*1\*[*m*] (*F*1\*[*m*] ⊂ *F*[*n*] (1)) can be obtained on the basis of the optimal vertex **w***m*(*<sup>l</sup>*)\* (77) with the largest margin δL1(**w***m*(*<sup>l</sup>*)\*) (78). The vertexical subspace *F*1\*[*m*] contains the reduced vectors **x**1,j[*m*] with the dimension *m* [7]:

$$(\forall j \in \{1, \ldots, m\}) \; \mathbf{x}\_{1,j}[m] \in \; \mathbf{F}\_1 \; ^\*[m] \tag{80}$$

The reduced vectors **x**1,j[*m*] (80) are obtained from the feature vectors **x**<sup>j</sup> = [xj,1,..., xj,n] <sup>T</sup> (**x**<sup>j</sup> ∈ *F*[*n*]) ignoring such components xj,i which are related to the unit vectors **e**<sup>i</sup> in the optimal basis *B*1\*=[**x**1,..., **x**m, **e**i(*<sup>m</sup>* + 1),..., **e**i(*<sup>n</sup>*)] <sup>T</sup> (73). The reduced vectors **x**1,j[*m*] are represented by such *m* features *X*<sup>i</sup> (*X*<sup>i</sup> ∈ *R*1\* (54)), which are not linked to the unit vectors **e**<sup>i</sup> (*i* ∉ *Im*(*<sup>l</sup>*)\*) in the basis *Bm*(*<sup>l</sup>*)\* (73) representing the optimal vertex **w***m*(*<sup>l</sup>*)\* (77).

$$R\_1 \textsuperscript{}^\* = \left\{ X\_{i(1)}, \ldots, X\_{i(m)} : i(l) \not\subset I\_{m(l)} \ ^\* \ (\mathsf{T}2) \right\} \tag{81}$$

The *m* features *X*i(*l*) belonging to the optimal subset *R*1\* (*X*i(*l*) ∈ *R*1\* (81) are related to the weights wk.*l*\* (**w**k\*[*m*] = [wk,1\*, … , wk,m\*]<sup>T</sup> ) that are not zero (wk.*<sup>l</sup>* \* 6¼ 0).

The optimal feature subset *R*1\* (81) consists of *m* collinear features *X*i. The optimal vertex **w**1\*[*m*] (Φ(**w**1\*[*m*]) = 0 (69)) in the reduced parameter space *R*<sup>m</sup> (**w**1\*[*m*] ∈ *R*m) is based on these *m* features *X*i. The reduced optimal vertex **w**1\*[*m*] with the largest margin δL1(**w**1\*[*m*]) (77) is the unique solution of the constrained optimization problem (74). Maximizing the *L*<sup>1</sup> margin δL1(**w***l*\*) (78) leads to the first reduced vertex **w**1\*[*m*] = [wk,1\*, … , wk,m\*]<sup>T</sup> with non-zero components wk.i \* (wk.i \* 6¼ 0).

The collinear interaction model between *m* collinear features *X*i(*l*) from the optimal subset *R*1\*(*m*) (81) can be formulated as follows (57):

$$\mathbf{w\_{k.1}}^{\*}\,^\*X\_{i(1)} + \dots + \mathbf{w\_{k.m}}^{\*}\,^\*X\_{i(m)} = \mathbf{1} \tag{82}$$

The prognostic models for each feature *X*<sup>i</sup> <sup>0</sup> from the subset *R*1\* (81) may have the following form (58):

$$\left(\forall i' \in \{1, \dots, m\right) \quad X\_{i'} = \mathbf{a}\_{i',0} + \mathbf{a}\_{i',1} X\_{i(1)} + \dots + \mathbf{a}\_{i',m} X\_{i(m)}\tag{83}$$

where α<sup>i</sup> 0 ,0 =1/wk.i0\*, α<sup>i</sup> 0 , i<sup>0</sup> = 0, and (∀ *i*(*l*) 6¼ *i* 0 ) α<sup>i</sup> 0 ,*i*(*l*) = wk.i(*l*)\*/wk.i0\*.

In the case of a data set *C* with a small number *m* (*m* < < *n*) of multidimensional feature vectors **x**<sup>j</sup> (1), the prognostic models (83) for individual features *X*<sup>i</sup> <sup>0</sup> can be weak. It is know that sets (ensembles) of weak models can have strong generalizing properties [4]. A set of weak prognostic models (83) for a selected feature (dependent variable) *X*<sup>i</sup> 0 can be implemented in the complex layer of *L* prognostic models (83) [11].

The complex layer can be built on the basis of the sequence of *L* optimal vertices **w***l*\* (77) related to *m* features *X*<sup>i</sup> constituting the subsets *Rl*\* (81), where *l* = 0, 1,..., *L*.

$$((\mathbf{w}\_1 \, ^\*, R\_1 \, ^\*), \dots, (\mathbf{w}\_L \, ^\*, R\_L \, ^\*)\tag{84}$$

*Design assumption*: Each subset *Rl*\* (81) in the sequence (84) contains a priori selected feature (dependent variable) *X*<sup>i</sup> <sup>0</sup> and *m* - 1 other features (independent variables) *X*i(*<sup>l</sup>*). The other features *X*i(*l*) (*X*i(*l*) ∈ *Rl*\*) should be different in successive subsets *Rl*\* (*l* = 0, 1,..., *L*).

The first optimal; vertex **w**1\* (77) in the sequence (84) is designed on the basis of *m* feature vectors **x**<sup>j</sup> (1), which are represented by all *n* features *X*<sup>i</sup> constituting the feature set *F*(*n*)={*X*1, … , *X*n}. The vertex **w**1\* (77) is found by solving the constraint optimization problem (74) according to the procedure with the two stages outlined earlier. The two-stage procedure allows to find the optimal vertex **w**1\* (77) with the largest *L*<sup>1</sup> margin δL1(**w**1\*) (78).

The second optimal vertex **w**2\* (77) in the sequence (84) is obtained on the basis of *m* reduced feature vectors **x**<sup>j</sup> [*n* - (*m* - 1)] (67), which are represented by *n* - (*m* - 1) features *X*<sup>i</sup> constituting the reduced feature subset *F*2(*n* - (*m* + 1)):

$$F\_2(n - (m - 1)) = F(n) / R\_1 "\cup \{X\_{\bar{Y}} \; \} \tag{85}$$

The *l*-th optimal vertex **w***l*\* (77) in the sequence (84) is designed on the basis of *m* reduced vectors **x**j[*n* - *l*(*m* - 1)] (67), which are represented by *n* - *l*(*m* - 1) features *X*<sup>i</sup> constituting the feature subset *Fl*(*n* - *l*(*m* - 1)):

$$F\_l(n - l(m - 1)) = F\_{l - 1}(n - l(m - 1)) / R\_{l - 1} "\cup \{ \, X\_{l'} \} \tag{86}$$

The sequence (84) of *L* optimal vertices **w***l*\* (77) related to the subsets *Fl*(*n* - *l*(*m* - 1)) (86) of features is characterized by decreased *L*<sup>1</sup> margins δL1(**w***l*\*) (78) [18].

$$
\delta\_{\rm L1}(\mathbf{w}\_1 \, ^\*) \ge \delta\_{\rm L1}(\mathbf{w}\_2 \, ^\*) \ge \dots \ge \delta\_{\rm L1}(\mathbf{w}\_L \, ^\*) \tag{87}
$$

The prognostic models (83) for the dependent feature (variable) *X*<sup>i</sup> <sup>0</sup> are designed for each subset *Fl*(*n* - *l*(*m* - 1)) (86) of features *X*i, where *l* = 0, 1,..., *L* (84):

$$(\forall l \in \{0, 1, \ldots, L\} \tag{88}$$

$$X\_{\mathbb{F}}(l) = \mathbf{a}\_{\mathbb{F}, 0}(l) + \mathbf{a}\_{\mathbb{F}, 1}(l) \, X\_{\mathbb{i}(1)}(l) + \ldots + \mathbf{a}\_{\mathbb{i}, \mathfrak{m}}(l) \, X\_{\mathbb{i}(\mathfrak{m})}$$

The final forecast *X*<sup>i</sup> 0 <sup>∧</sup> for the dependent feature (variable) *X*<sup>i</sup> <sup>0</sup> based on the complex layer of *L* + 1 prognostic models (88) can have the following form:

$$X\_{\mathbb{H}}{}^{\wedge} = (X\_{\mathbb{H}}(\mathbb{1}) + \dots + X\_{\mathbb{H}}(L))/(L+\mathbb{1})\tag{89}$$

In accordance with the Eq. (89), the final forecast *X*i(m)<sup>∧</sup> for the feature *X*<sup>i</sup> <sup>0</sup> results from averaging the forecasts of *L* + 1 individual models *X*<sup>i</sup> <sup>0</sup>(*l*) (88).
