**5. Parameter vertices**

The perceptron criterion function Φ<sup>k</sup> p (**v**) (29) and the collinearity criterion function Φk(**w**) (38) are convex and piecewise-linear (*CPL*). The minimum values of a such *CPL* criterion functions can be located in parameter vertices of some convex polyhedra. We consider the parameter vertices **w**<sup>k</sup> (**w**<sup>k</sup> ∈ *R*<sup>n</sup> ) related to the collinearity criterion function Φk(**w**) (38).

*Definition* 4: The *parameter vertex* **w**<sup>k</sup> of the *rank r*<sup>k</sup> (*r*<sup>k</sup> ≤ *n*) in the weight space *R*<sup>n</sup> (**w**<sup>k</sup> ∈ *R*<sup>n</sup> ) is the intersection point of *r*<sup>k</sup> hyperplanes *h*<sup>j</sup> <sup>1</sup> (37) defined by linearly indepenedent feature vectors **x**<sup>j</sup> (*j* ∈ *J*k) from the data set *C* (1) and *n* - *r*<sup>k</sup> hyperplanes *h*i <sup>0</sup> (44) defined by unit vectors **e**<sup>i</sup> (*i* ∈ *I*k) [7].

The *j*-th dual hyperplane *h*<sup>j</sup> <sup>1</sup> (37) defined by the feature vector **x**<sup>j</sup> (1) passes through the *k*-th *vertex* **w**<sup>k</sup> if the equation **w**<sup>k</sup> T **x**<sup>j</sup> = 1 holds.

*Definition* 5: The *k*-th weight vertex **w**<sup>k</sup> of the rank *r*<sup>k</sup> is *degenerate* in the parameter space *R*<sup>n</sup> if the number *m*<sup>k</sup> of hyperplanes *h*<sup>j</sup> <sup>1</sup> (37) passing through this vertex (**w**<sup>k</sup> T **x**<sup>j</sup> = 1) is greater than the rank *r*<sup>k</sup> (*m*<sup>k</sup> > *r*k).

The vertex **w**<sup>k</sup> can be defined by the following set of *n* linear equations:

$$(\forall j \in J\_k(\mathbf{w}\_k)) \; \mathbf{w}\_k \; ^\mathrm{T} \mathbf{x}\_j = \mathbf{1} \tag{45}$$

and

$$(\forall i \in I\_k(\mathbf{w}\_k)) \; \mathbf{w}\_k \; ^T \mathbf{e}\_i = \mathbf{0} \tag{46}$$

Eqs. (45) and (46) can be represented in the below matrix form [7]:

$$\mathbf{B}\_{\mathbf{k}} \text{ } \mathbf{w}\_{\mathbf{k}} = \mathbf{1}\_{\mathbf{k}} \tag{47}$$

where **1**<sup>k</sup> = [1, … ,1, 0, … ,0]<sup>T</sup> is the vector with the first *r*<sup>k</sup> components equal to one and the remaining *n* - *r*<sup>k</sup> components are equal to zero.

The square matrix *B*<sup>k</sup> (47) consists of *k* feature vectors **x**<sup>j</sup> (*j* ∈ *J*<sup>k</sup> (45)) and *n* - *k* unit vectors **e**<sup>i</sup> (*i* ∈ *I*<sup>k</sup> (46)) []:

$$\mathbf{B}\_{\mathbf{k}} = \begin{bmatrix} \mathbf{x}\_1, \dots, \mathbf{x}\_{\mathbf{k}}, \mathbf{e}\_{\mathbf{i}(k+1)}, \dots, \mathbf{e}\_{\mathbf{i}(n)} \end{bmatrix}^{\mathrm{T}} \tag{48}$$

where the symbol **e**i(*l*) denotes such unit vector, which is the *l*-th row of the matrix *B*k. Since feature vectors **x**<sup>j</sup> (∀*j*∈ *J*k(**w**k) (45)) making up *r*<sup>k</sup> rows of the matrix *B*<sup>k</sup> (48) are linearly independent, then the inverse matrix *B*<sup>k</sup> �<sup>1</sup> exists:

$$\mathbf{B\_{k}}^{-1} = \begin{bmatrix} \mathbf{r\_{1}}, \dots, \mathbf{r\_{k}}, \mathbf{r\_{i(k+1)}}, \dots, \mathbf{r\_{i(n)}} \end{bmatrix} \tag{49}$$

The inverse matrix *B*<sup>k</sup> �<sup>1</sup> (49) can be obtained starting from the unit matrix *I* = [**e**1,..., **e**n] <sup>T</sup> and using the basis exchange algorithm [8].

The non-singular matrix **B**<sup>k</sup> (48) is the *basis* of the feature space *F*[*n*] related to the vertex **w**<sup>k</sup> = [wk,1, … , wk,n**]** T . Since the last *n* - *r*<sup>k</sup> components of the vector **1**<sup>k</sup> (47) are equal to zero, the following equation holds:

$$\mathbf{w}\_{\mathbf{k}} = \mathbf{B}\_{\mathbf{k}}^{-1} \mathbf{1}\_{\mathbf{k}} = \mathbf{r}\_1 + \dots + \mathbf{r}\_{\mathbf{k}} \tag{50}$$

According to Eq. (50), the weight vertex **w**<sup>k</sup> is the sum of the first *k* columns **r**<sup>i</sup> of the inverse matrix *B*<sup>k</sup> �<sup>1</sup> (49).

*Remark* 1: The *n* - *k* components wk.i of the vector **w**<sup>k</sup> = [wk,1, … , wk,n**]** <sup>T</sup> (50) linked to the zero components of the vector **1**<sup>k</sup> = [1, … , 1, 0, … ., 0, 1]T (7) are equal to zero:

$$(\forall i \in \{k+1, \ldots, n\}) \text{ w}\_{k,i} = \mathbf{0} \tag{51}$$

The conditions wk.i = 0 (51) result from the equations **w**<sup>k</sup> T **e**<sup>i</sup> = 0 (46) at the vertex **w**k.

The *fundamental theorem of linear programming* shows that the minimum Φk(**w**k\*) (39) of the *CPL* collinearity criterion function Φk(**w**) (38) can always be located in one of the vertices **w**<sup>k</sup> (50) [5]. The same property has also the regularized criterion function Ψk(**w**) (42), another function of the *CPL* type [7].

We can see that all such feature vectors **x**<sup>j</sup> (1) which define hyperplanes *h*<sup>j</sup> <sup>1</sup> (37) passing through the vertex **w**<sup>k</sup> are located on the hyperplane *H*(**w**k, 1) = {**x**: **w**<sup>k</sup> T **x** = 1} (3) in the feature space *F*[*n*]. A large number *m*<sup>k</sup> of feature vectors **x**<sup>j</sup> (1) located on the hyperplane *H*(**w**k, 1) (3) form the *collinear cluster C*(**w**k) based on the vertex **w**<sup>k</sup> [8]:

$$\mathbf{C}(\mathbf{w}\_k) = \left\{ \mathbf{x}\_{\mathbf{j}} \in \mathbf{C}\left(\mathbf{1}\right) : \mathbf{w}\_k^T \mathbf{x} = \mathbf{1} \right\} \tag{52}$$

If the vertex **w**<sup>k</sup> of the rank *r*<sup>k</sup> is degenerate in the parameter space *R*<sup>n</sup> then the collinear cluster *C*(**w**k) (52) contains more than *r*<sup>k</sup> feature vectors **x**<sup>j</sup> (1).

The *k*-th vertex **w**<sup>k</sup> = [wk,1, … , wk,n**]** <sup>T</sup> in the parameter space *R*<sup>n</sup> (**w**<sup>k</sup> ∈ *R*<sup>n</sup> ) is linked by the Eq. (47) to the non-singular matrix **B**<sup>k</sup> (48). The rows of the matrix **B**<sup>k</sup> (48) can form the *basis* of the feature space *F*[*n*]. The conditions wk.i = 0 (51) result from the equations **w**<sup>k</sup> T **e**<sup>i</sup> = 0 (46) at the vertex **w**k.

$$(\forall i = 1, \dots, n) \; \dot{y} \; (\mathbf{e}\_i \in \mathbf{B}\_k \; (48)), then \; \mathbf{w}\_{k,i} = \mathbf{0} \tag{53}$$

Each feature vector **x**<sup>j</sup> from the data set *C* (1) represents *n* features *X*<sup>i</sup> belonging to the feature set *R*(*n*)={*X*1, … , *X*n}. The *k*-th *vertexical feature subset R*k(*r*k) consists of *r*<sup>k</sup> features *X*<sup>i</sup> that are connected to the weights wk.i different from zero (wk.i 6¼ 0):

$$R\_{\mathbf{k}}(\eta\_{\mathbf{k}}) = \begin{Bmatrix} X\_{\mathbf{i}(1)}, \dots, X\_{\mathbf{i}(\mathbf{r}k)} \end{Bmatrix} \tag{54}$$

The *k*-th *vertexical subspace F*k[*r*k] (*F*k[*r*k] ⊂ *F*[*n*]) contains the reduced vectors **x**j[*r*k] with *r*<sup>k</sup> componets xj,i(*<sup>l</sup>*) (**x**j[*r*k] ∈ *F*k[*r*k]) related to the weights wk.i different from zero:

$$(\forall j \in \{1, \ldots, m\}) \; \mathbf{x}\_{\mathbf{j}}[\gamma\_{\mathbf{k}}] = \begin{bmatrix} \mathbf{x}\_{\mathbf{j}, \mathbf{i}(1)}, \ldots, \mathbf{x}\_{\mathbf{j}, \mathbf{i}(\mathbf{r}\mathbf{k})} \end{bmatrix}^{\mathrm{T}} \tag{55}$$

The reduced vectors **x**j[*r*k] (55) are obtained from the feature vectors **x**<sup>j</sup> = [xj,1,...,xj,n] <sup>T</sup> belonging to the data set *C* (1) by omitting the *n* - *r*<sup>k</sup> components xj,i related to the weights wk.i equal to zero (wk.i = 0).

We consider the optimal vertexical subspace *F*k\*[*r*k] (*F*k\*[*r*k] ⊂ *F*[*n*]) related to the reduced optimal vertex **w**k\*[*r*k] which determines the minimum Φk(**w**k\*) (39) of the collinearity criterion function Φk(**w**) (38). The optimal collinear cluster *C*(**w**k\*[*r*k]) (52) is based on the optimal vertex **w**k\*[*r*k] = [wk,1\*, … , wk,rk\*]<sup>T</sup> with *r*<sup>k</sup> different from zero components wk,i\* (wk.i\* 6¼ 0). Feature vectors **x**<sup>j</sup> belonging to the collinear cluster *C*(**w**k\*) (52) satisfy the equations **w**k\*[*r*k] T **x**j[*r*k] = 1, hence:

$$\begin{aligned} \left( \forall \mathbf{x}\_{\mathbf{j}} \in \mathbf{P}(\mathbf{w}\_{\mathbf{k}} \, ^\*) \right) \\ \mathbf{w}\_{\mathbf{k},1} \, ^\* \mathbf{x}\_{\mathbf{j},\mathbf{i}(1)} + \dots + \mathbf{w}\_{\mathbf{k},\mathbf{r}\mathbf{k}} \, ^\* \mathbf{x}\_{\mathbf{j},\mathbf{i}(\mathbf{r}\mathbf{k})} = \mathbf{1} \end{aligned} \tag{56}$$

where xj,i(*<sup>l</sup>*) are components of the *j*-th feature vectors **x**<sup>j</sup> related to the weights wk.i different from zero (wk.i 6¼ 0).

A large number *m*<sup>k</sup> of feature vectors **x**<sup>j</sup> (1) belonging to the collinear cluster *C*(**w**k\*[*r*k]) (52) justifies the following collinear model of interaction between selected features *X*i(*l*) which is based on the Eqs. (56) [9]:

$$\mathbf{w\_{k.1}}^{\*}\,^\*X\_{i(1)} + \dots + \mathbf{w\_{k.rk}}^{\*}\,^\*X\_{i(rk)} = \mathbf{1} \tag{57}$$

The collinear interaction model (57) allows, inter alia, to design the following prognostic models for each feature *X*<sup>i</sup> <sup>0</sup> from the subset *R*k(*r*k) (54):

$$\mathbf{x}' \left( \forall \text{'} \in \{ \mathbf{1}, \dots, r\_{\mathbf{k}} \} \,\, X\_{\mathbb{H}} = \mathbf{a}\_{\mathbb{H}, \mathbf{0}} + \mathbf{a}\_{\mathbb{H}, \mathbf{1}} \,\, X\_{\mathbb{i}(1)} + \dots + \mathbf{a}\_{\mathbb{H}, \mathbf{rk}} \,\, X\_{\mathbb{i}(\mathbf{rk})} \right) \tag{58}$$

where β<sup>i</sup> 0 ,0 =1/wk.i0\*, β<sup>i</sup> 0 , i<sup>0</sup> = 0, and (∀ *i*(*l*) 6¼ *i* 0 ) β<sup>i</sup> 0 ,*i*(*l*) = wk.i(*l*)\*/wk.i0\*.

Feature *X*<sup>i</sup> <sup>0</sup> is a dependent variable in the prognostic model (58), the remaining *m* - 1 features *X*i(*l*) are independent variables (*i*(*l*) 6¼ *i* 0 ). The family of *r*<sup>k</sup> prognostic models (58) can be designed on the basis of one collinear interaction model (57). Models (58) have a better justification for a large number *m*<sup>k</sup> of feature vectors **x**<sup>j</sup> (1) in the collinear cluster *C*(**w**k\*[*r*k]) (52).
