**7. Small samples of multivariate feature vectors**

A small sample of multivariate vectors appears when the number *m* of feature vectors **x**<sup>j</sup> in the data set *C* (1) is much smaller than the dimension *n* of these vectors (*m* < < *n*). The basis exchange algorithms allows for efficient minimization of the *CPL* criterion functions also in the case of small samples of multivariate vectors [10]. However, for small samples, some new properties of the basis exchange algorithms are more important. In particular, the regularization (42) of the *CPL* criterion functions becomes crucial. New properties of the basis exchange algorithms in the case of a small number *m* of multidimensional feature vectors **x**<sup>j</sup> (1) is discussed on the example of the collinearity criterion function Φ(**w**) (38) and the regularized criterion function Ψ(**w**) (42).

*Lemma* 4: The value Φ(**w**K) of the collinearity criterion function Φ(**w**) (38) at the final vertex **w***<sup>L</sup>* (59) is equal to zero if all *m* linear Eqs. (45) are fulfilled in the vertex **w***<sup>L</sup>* which is related by the Eq. (47) to the matrix *B<sup>L</sup>* = [**x**1,..., **x**m, **e**i(1),..., **e**i(*n*-*<sup>m</sup>*)] <sup>T</sup> (48) containing the unit vectors **e**<sup>i</sup> with the indices *i* from the subset *IL* (*i* ∈ *IL*).

*Theorem* 4: If the feature vectors **x**<sup>j</sup> constituting the subset *C*<sup>k</sup> (*C*<sup>k</sup> ⊂ *C* (1)) and used in the definition of the function Φ(**w**) (38) are linearly independent (*Def.* 2), then the value Φ(**w***L*) of the collinearity criterion function Φ(**w**) at the final vertex **w***<sup>L</sup>* (59) is equal to zero (Φ(**w***L*) = 0).

The proof of Theorem 4 can be based on the stepwise inversion of the matrices *B*<sup>k</sup> (48) [16]. The final vertex **w***<sup>L</sup>* (59) can be found by inverting the related matrix *B<sup>L</sup>* = [**x**1,..., **x**rk, **e**i(1),..., **e**i(*n*-*rk*)] <sup>T</sup> (48).

The final vertex **w***<sup>L</sup>* (59) resetting (Φ(**w***L*) = 0) the criterion function Φ(**w**) (38) can be related to the optimal matrix *B<sup>L</sup>* = [**x**1,..., **x***L*, ei(*<sup>L</sup>* + 1),..., ei(*n*)] <sup>T</sup> (48) built from *L* (*L* ≤ *m*) feature vectors **x**<sup>j</sup> (*j* ∈ *J*(**w***L*) (45)) from the data set *C* (1) and from *n* - *L* selected unit vectors **e**<sup>i</sup> (*i* ∈ *I*(**w***L*) (46)). Different subsets of the unit vectors **e**<sup>i</sup> in the final matrix *B<sup>L</sup>* (48) result in different positions of the final vertices **w***L*(*l*) (59) in the parameter space *R*<sup>n</sup> . The criterion function Φ(**w**) (38) is equal zero (Φ(**w***L*(*<sup>l</sup>*)) = 0) at each of these vertices **w***L*(*l*) (59).

The position of the final vertices **w***L*(*l*) (59) in the parameter space *R*<sup>n</sup> depends on which unit vectors **e**<sup>i</sup> (*i* ∈ *IL*(*<sup>l</sup>*)) are included in the basis **B***L*(*l*) (48), where:

$$\left(\forall l \in \{1, \ldots, l\_{\max}\} \,\Phi\_{\mathbf{k}}\left(\mathbf{w}\_{L(l)}\right) = \mathbf{0}\right) \tag{69}$$

The maximal number *l*max (69) of different vertices **w***L*(*l*) (59) can be large when *m* < < *n*:

*Computing on Vertices in Data Mining DOI: http://dx.doi.org/10.5772/intechopen.99315*

$$l\_{\text{max}} = n!/m!(n-m)! \tag{70}$$

The choice between different final vertices **w***L*(*l*) (59) can be based on the minimization of the regularized criterion function Ψ(**w**) (42). The regularized function Ψ(**w**) (42) is the sum of the collinearity function Φ(**w**) (38) and the weighted sum of the cost functions φ<sup>i</sup> 0 (**w**) (43). If Φ(**w***L*(*<sup>l</sup>*)) = 0 (38), then the value Ψ(**w***L*(*<sup>l</sup>*)) of the criterion function Ψ(**w**) (42) at the final vertex **w***L*(*l*) (59) can be given as follows:

$$\begin{split} \Psi(\mathbf{w}\_{L(l)}) &= \lambda\_{\mathbf{i}} \,\,\Sigma\_{\mathbf{i}} \,\,\chi\_{\mathbf{i}} \,\,\Phi\_{\mathbf{j}}^{0} \left(\mathbf{w}\_{L(l)}\right) = \\ &= \lambda \,\,\Sigma \,\,\chi\_{\mathbf{i}} \,\, \mathbf{w}\_{L(l),\mathbf{i}} \,\, \vert \, \end{split} \tag{71}$$

where the above sums take into account only the indices *i* of the subset *I*(**w***L*(*<sup>l</sup>*)) of the non-zero components w*L*(*l*),i of the final vertex **w***L*(*l*) = [w*L*(*<sup>l</sup>*),1, … , w*L*(*<sup>l</sup>*),n] <sup>T</sup> (59):

$$I(\mathbf{w}\_{L(l)}) = \left\{ i : \mathbf{e}\_i^T \mathbf{w}\_{L(l)} \neq \mathbf{0} \right\} = \left\{ i : \mathbf{w}\_{L(l),i} \neq \mathbf{0} \right\} \tag{72}$$

If the final vertex **w***L*(*l*) (59) is not degenerate (*Def.* 5), then the matrix *BL*(*l*) (48) is built from all *m* feature vectors **x**<sup>j</sup> (*j* ∈ {1,...., *m*}) making up the data set *C* (1) and from *n* - *m* selected unit vectors **e**<sup>i</sup> (*i* ∈ *I*(**w***L*(*<sup>l</sup>*)) (71)).

$$\mathbf{B}\_{\rangle \text{m}} = \begin{bmatrix} \mathbf{x}\_1, \dots, \mathbf{x}\_{\text{m}}, \mathbf{e}\_{\text{i}(m+1)}, \dots, \mathbf{e}\_{\text{i}(n)} \end{bmatrix}^{\text{T}} \tag{73}$$

The problem of the constrained minimizing of the regularized function Ψ(**w**) (71) at the vertices **w***L*(*l*) (59) satisfying the conditions Φ(**w***L*(*<sup>l</sup>*)) = 0 (69) can be formulated in the following way:

$$\begin{aligned} &\min\_{l} \left\{ \Psi(\mathbf{w}\_{L(l)}) \colon \Phi(\mathbf{w}\_{L(l)}) = \mathbf{0} \right\} = \\ &= \min\_{l} \left\{ \Sigma\_{\mathbf{i}} \,\, \gamma\_{\mathbf{i}} \, \middle| \, \mathbf{w}\_{L(l),\mathbf{i}} \, \middle| \, \mathbf{i} \, \Phi(\mathbf{w}\_{L(l)}) = \mathbf{0} \right\} \end{aligned} \tag{74}$$

According to the above formulation, the search for the minimum of the regularized criterion function Ψ(**w**) (42) is takes place at all such vertices **w***L*(*l*) (59), where the collinearity function Φ(**w**) (38) is equal to zero. The regularized criterion function Ψ(**w**) (42) is defined as follows at the final vertices **w***L*(*l*) = [w*L*(*<sup>l</sup>*),1, … , w*L*(*l*),n**]** <sup>T</sup> (59), where Φ(**w***L*(*<sup>l</sup>*)) = 0:

$$\left(\forall \mathbf{w}\_{L(l)}\right)\left(\Psi'(\mathbf{w}\_{L(l)})\right) = \Sigma \left|\mathbf{y}\_{i} \mid \mathbf{w}\_{L(l),i}\right|\tag{75}$$

The optimal vertex **w***L*(*<sup>l</sup>*)\* is the minimum value Ψ<sup>0</sup> (**w***L*(*<sup>l</sup>*)\*) of the *CPL* criterion function Ψ<sup>0</sup> (**w**) (75) defined on such final vertices **w***L*(*l*) (59), where Φ(**w***L*(*<sup>l</sup>*)) = 0 (38):

$$\left(\left(\mathsf{E}\mathbf{w}\_{L(l)}\,^\*\right)\left(\mathsf{V}\mathbf{w}\_{L(l)}\,:\,\Phi\left(\mathbf{w}\_{L(l)}\right)=\mathbf{0}\right)\,\Psi'\left(\mathbf{w}\_{L(l)}\right)\geq\Psi'\left(\mathbf{w}\_{L(l)}\,^\*\right)>\mathbf{0}\tag{76}$$

As in the case of the minimization of the perceptron criterion function Φ<sup>k</sup> p (**v**) (29), the optimal vector **w***L*(*<sup>l</sup>*)\* (76) may be located at a selected vertex of some convex polyhedron (27) in the parameter space *R*<sup>n</sup> (**w** ∈ *R*<sup>n</sup> ) [7].

If the cost parameters γ<sup>i</sup> (42) have standard values of one ((∀*i* ∈ {1,… ,*n*}) γ<sup>i</sup> = 1), then the constraint minimization problem (74) leads to the optimal vertex **w***L*(*<sup>l</sup>*)\* with the smallest *L*<sup>1</sup> length || **w***L*(*<sup>l</sup>*)\* ||*<sup>L</sup>*<sup>1</sup> = |w*L*(*<sup>l</sup>*),1\*| + … + |w*L*(*<sup>l</sup>*),n\*|, where Φ(**w***L*(*<sup>l</sup>*)\*) = 0 (38):

$$\left(\left(\exists \mathbf{w}\_{L(l)}\right)^{\*}\right)\left(\forall \mathbf{w}\_{L(l)}: \boldsymbol{\Phi}\left(\mathbf{w}\_{L(l)}\right)=\mathbf{0}\right)\left\|\mathbf{w}\_{L(l)}\right\| \geq \left\|\mathbf{w}\_{L(l)}\right\|\tag{77}$$

Optimal vertex **w***L*(*<sup>l</sup>*)\* with the smallest *L*<sup>1</sup> length || **w***L*(*<sup>l</sup>*)\* ||*<sup>L</sup>*<sup>1</sup> (77) is related to the largest *L*<sup>1</sup> margin δL1(**w***L*(*<sup>l</sup>*)\*) (6) [11]:

$$\delta\_{\mathbf{L}1} \left( \mathbf{w}\_{L(l)} \stackrel{\*}{}\right) = \mathbf{2} / \parallel \left| \mathbf{w}\_{L(l)} \stackrel{\*}{}\right| \left| \mathbf{1}\_{\mathbf{L}1} = \mathbf{2} / \left( \left| \mathbf{w}\_{L(l),1} \stackrel{\*}{}\right| + \dots + \left| \mathbf{w}\_{L(l),n} \stackrel{\*}{}\right| \right) \tag{78}$$

The basis exchane algorithm allow to solve the constraint minimization problem (74) and to find the optimal vertex **w***L*(*<sup>l</sup>*)\* (77) with the largest *L*<sup>1</sup> margin δL1(**w***L*(*<sup>l</sup>*)\*).

Support Vector Machines (*SVM*) is the most popular method for designing linear classifiers or prognostic models with large margins [12]. According to the *SVM* approach, the optimal linear classifier or the prognostic model defined by such an optimal weight vector **w**\* that has a maximum margin δL2(**w**\*) based on the Euclidean (*L*2) norm:

$$\delta\_{\rm L2}(\mathbf{w}^\*) = 2/\|\left.\mathbf{w}^\*\right\|\_{\rm L2} = 2/\left(\left(\mathbf{w}^\*\right)^\mathrm{T}\mathbf{w}^\*\right)^{1/2} \tag{79}$$

Maximization of the Euclidean margins δL2(**w**) (79) is performed using quadratic programming [2].
