**1. Introduction**

16 Principal Component Analysis

144 Principal Component Analysis

Young, F.W., Takane, Y., and de Leeuw, J. (1978). Principal components of mixed measurement

Wang, M., Kuroda, M., Sakakihara, M. and Geng, Z. (2008). Acceleration of the EM algorithm using the vector epsilon algorithm. *Computational Statistics*, 23, 469-486. Wynn, P. (1962). Acceleration techniques for iterated vector and matrix problems. *Mathematics*

features. *Psychometrika*, 43, 279-281.

*of Computation*, 16, 301-322.

level multivariate data: An alternating least squares method with optimal scaling

Principal component analysis (PCA) is linear method for feature extraction that is known as Karhonen Loove method. PCA was first proposed to recognize face by Turk and Pentland, and was also known as eigenface in 1991 [Turk, 1991]. However, PCA has some weaknesses. The first, it cannot capture the simplest invariance of the face image [Arif et al., 2008b] , when this information is not provided in the training data. The last, the result of feature extraction is global structure [Arif, 2008]. The PCA is very simple, has overcome curse of dimensionality problem, this method have been known and expanded by some researchers to recognize face such as Linear Discriminant Analysis (LDA)[Yambor, 2000; A.M. Martinez, 2003; J.H.P.N. Belhumeur 1998], Linear Preserving Projection that known Lapalacianfaces [Cai, 2005; Cai et al, 2006; Kokiopoulou, 2004; X. He et al., 2005], Independent Component Analysis, Kernel Principal Component Analysis [Scholkopf et al., 1998; Sch¨olkopf 1999], Kernel Linear Discriminant Analysis (KLDA) [Mika, 1999] and maximum feature value selection of nonlinear function based on Kernel PCA [Arif et al., 2008b]. As we know, PCA is dimensionality reduction method based on object appearance by projecting an original *ndimensional* (row\*column) image into *k* eigenface where *k*<<*n*. Although PCA have been developed into some methods, but in some cases, PCA can outperform LDA, LPP and ICA when it uses small sample size.

This chapter will explain some theoretical of modified PCA that derived from Principal Component Analysis. The first, PCA transforms input space into feature space by using three non-linear functions followed by selection of the maximum value of kernel PCA. The feature space is called as kernel of PCA [Arif et al., 2008b]. The function used to transform is the function that qualifies Mercer Kernel and generates positive semi-definite matrix. Kernel PCA as been implemented to recognize face image[Arif et al., 2008b] and has been compared with some method such as Principal Component Analysis, Principal Linear Discriminant Analysis, and Linear Preserving Projection. The last, the maximum value selection has been enhanced

The Maximum Non-Linear Feature Selection of Kernel Based on Object Appearance 147

To expresses *m* training image set, it is necessary to composed Equation (2) in the following

(3) (3) (3) (3) (3) 1,1 1,2 1, 1, 1 1,2

...... ......

*w w*

*X X XX X*

( 1) ( 1) ( 1) ( 1) 1,1 1,2 1, 1, 1

summation. It can be formulated by using the following equation

Research Laboratory (ORL) face image database as seen In Figure 1

*mm mm*

*XX XX*

(1) (1) (1) (1) (1) (1) (1) 1,1 1,2 1, 1, 1 1,2 1, 1 1, (2) (2) (2) (2) (2) (2) (2) 1,1 1,2 1, 1, 1 1,2 1, 1 1,

*w w w*

(4) (4) (4) (4) (4) (4) (4) 1,1 1,2 1, 1, 1 1,2 1, 1 1,

() () () () ( ) () () 1,1 1,2 1, 1, 1 1,2 1, 1 1,

> ( ) 1, 1 *<sup>m</sup> <sup>k</sup> N*

*XXXXX* [ .................................... ] 1,1 1,2 1,3 1,4 *X X* 1, 1 1, *n n* (5)

*m* 

*m m mm m m m*

...... ...... ......

...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ......

*w w w nn w w w nn*

*w w w nn*

*w w w nn*

 

..

......

*X XX*

(3) (3) 1, 1 1,

*X X*

(4)

( 1) ( 1) ( 1) 1,2 1, 1 1,

 

*m mm w nn*

*n n*

(3)

...... ...... ...... ...... ...... ...... ...... ...... ....

*X X XX X X X X X XX X X X*

*X X XX X X X*

*X X XX X X X*

The average of training image set of (Equation (3)) can be obtained by column-wise

*k X*

The result of Equation (4) is in the row vector form, it has 1x*N* dimension. It can be re-

The result of Equation (5) can be re-formed as original training image. To illustrate Equation (1), (2), (3) and (4), it is important to give an example of image average of the Olivetty

Fig. 1. average of ORL Face Image Database Using 3, 5 and 7 Face Image for Each Person

The zero mean matrix can be calculated by subtracting the face image values of training set with Equation (5). In order to perform the subtraction, both face image and Equation (5)

*X*

equation:

And *N*, *N*1*..n*.

*X*

written in the following equation

must have the same size.

and implemented to classify smiling stage by using Kernel Laplacianlips [Mauridhi et al., 2010]. Kernel Laplacianlips transform from input space into feature space on the lips data, followed by PCA and LPP process on feature space. Kernel Laplacianlips yield local structure in feature space. Local structure is more important than global structure. The experimental results show that, Kernel Laplacianlips using selection of non-linear function maximum value outperforms another methods [Mauridhi et al., 2010], such as Two Dimensional Principal Component Analysis (2D-PCA) [Rima et al., 2010], PCA+LDA+Support Vector Machine [Gunawan et al., 2009]. This chapter is composed as follows:


#### **2. Principal component analysis in input space**

Over the last two decades, many subspace algorithms have been developed for feature extraction. One of the most popular is Principal Component Analysis (PCA) [Arif et al., 2008a, Jon, 2003; A.M. Martinez and A.C. Kak, 2001; M. Kirby and L. Sirovich, 1990; M. Turk and A. Pentland, 1991]. PCA has overcome Curse of Dimensionality in object recognition, where it has been able to reduce the number of object characteristics fantastically. Therefore, until now PCA is still used as a reference to develop a feature extraction.

Suppose a set of training image containing *m training image X(k)*, *k*, *k*1. .*m*, each training image has *h*x*w* size where *H*, *H*1..*h* and *W*, *W*1.*w*. Each training image is represented as:

$$X = \begin{pmatrix} X\_{1,1}^{(k)} & X\_{1,2}^{(k)} & X\_{1,3}^{(k)} & \dots & X\_{1,w-1}^{(k)} & X\_{1,w}^{(k)} \\ X\_{2,1}^{(k)} & X\_{2,2}^{(k)} & X\_{2,3}^{(k)} & \dots & X\_{2,w-1}^{(k)} & X\_{2,w}^{(k)} \\ X\_{3,1}^{(k)} & X\_{3,2}^{(k)} & X\_{3,3}^{(k)} & \dots & X\_{3,w-1}^{(k)} & X\_{3,w}^{(k)} \\ X\_{4,1}^{(k)} & X\_{4,2}^{(k)} & X\_{4,3}^{(k)} & \dots & X\_{4,w-1}^{(k)} & X\_{4,w}^{(k)} \\ \dots & \dots & \dots & \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots & \dots & \dots & \dots \\ X\_{h-1,1}^{(k)} & X\_{h-1,2}^{(k)} & X\_{h-1,3}^{(k)} & \dots & X\_{h-1,w-1}^{(k)} & X\_{h-1,w}^{(k)} \\ X\_{h,1}^{(k)} & X\_{h,2}^{(k)} & X\_{h,3}^{(k)} & \dots & X\_{h,w-1}^{(k)} & X\_{h,w}^{(k)} \end{pmatrix} \tag{1}$$

Equation (1) can be transformed into one dimensional matrix form, by placing (t+1) th row to tth. If *N*, *N*1..*n* and *n*=*h*x*w*, then Equation (1) can be changed into the following equation

$$X = \begin{pmatrix} X\_{1,1}^{(k)} & X\_{1,2}^{(k)} & \dots & X\_{1,w}^{(k)} & X\_{1,w+1}^{(k)} & \dots & X\_{1,2w}^{(k)} & \dots & X\_{1,n-1}^{(k)} & X\_{1,n}^{(k)} \end{pmatrix} \tag{2}$$

To expresses *m* training image set, it is necessary to composed Equation (2) in the following equation:

$$X = \begin{pmatrix} X\_{1,1}^{(1)} & X\_{1,2}^{(1)} & \dots & X\_{1,w}^{(1)} & X\_{1,w+1}^{(1)} & \dots & X\_{1,2w}^{(1)} & \dots & X\_{1,n-1}^{(1)} & X\_{1,n}^{(1)} \\ \vdots & X\_{1,1}^{(2)} & \dots & X\_{1,w}^{(2)} & X\_{1,w+1}^{(2)} & \dots & X\_{1,2w}^{(2)} & \dots & X\_{1,n-1}^{(2)} & X\_{1,n}^{(2)} \\ \vdots & \vdots & \vdots & \dots & X\_{1,w}^{(3)} & X\_{1,w+1}^{(3)} & \dots & X\_{1,2w}^{(3)} & \dots & X\_{1,n-1}^{(3)} & X\_{1,n}^{(3)} \\ X\_{1,1}^{(4)} & X\_{1,2}^{(4)} & \dots & X\_{1,w}^{(4)} & X\_{1,w+1}^{(4)} & \dots & X\_{1,2w}^{(4)} & \dots & X\_{1,n-1}^{(4)} & X\_{1,n}^{(4)} \\ \vdots & \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots \\ \vdots & \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots \\ \vdots & \dots & \dots & \dots & \dots & X\_{1,w}^{(n-1)} & X\_{1,w+1}^{(n-1)} & \dots & X\_{1,2w}^{(n-1)} & \dots & X\_{1,n-1}^{(n-1)} & X\_{1,w}^{(n-1)} \end{pmatrix} \tag{3}$$

The average of training image set of (Equation (3)) can be obtained by column-wise summation. It can be formulated by using the following equation

$$\overline{X} = \frac{\sum\_{k=1}^{m} X\_{1,N}^{(k)}}{m} \tag{4}$$

And *N*, *N*1*..n*.

146 Principal Component Analysis

and implemented to classify smiling stage by using Kernel Laplacianlips [Mauridhi et al., 2010]. Kernel Laplacianlips transform from input space into feature space on the lips data, followed by PCA and LPP process on feature space. Kernel Laplacianlips yield local structure in feature space. Local structure is more important than global structure. The experimental results show that, Kernel Laplacianlips using selection of non-linear function maximum value outperforms another methods [Mauridhi et al., 2010], such as Two Dimensional Principal Component Analysis (2D-PCA) [Rima et al., 2010], PCA+LDA+Support Vector Machine

3. Maximum Value Selection of Kernel Principal Component Analysis as Feature

4. Experimental Results of Face Recognition by Using Maximum Value Selection of Kernel

5. The Maximum Value Selection of Kernel Linear Preserving Projection as Extension of

6. Experimental Results of Simile Stage Classification Based on Maximum Value Selection

Over the last two decades, many subspace algorithms have been developed for feature extraction. One of the most popular is Principal Component Analysis (PCA) [Arif et al., 2008a, Jon, 2003; A.M. Martinez and A.C. Kak, 2001; M. Kirby and L. Sirovich, 1990; M. Turk and A. Pentland, 1991]. PCA has overcome Curse of Dimensionality in object recognition, where it has been able to reduce the number of object characteristics fantastically. Therefore,

...... ...... ...... ...... ..

1..*h* and

() () () ( ) ( ) 1,1 1,2 1,3 1, 1 1, () () () ( ) ( ) 2,1 2,2 2,3 2, 1 2, () () () ( ) ( ) 3,1 3,2 3,3 3, 1 3, () () () ( ) ( ) 4,1 4,2 4,3 4, 1 4,

*kkk k k*

 

*XXX X X XXX X X XXX X X X XXX X X*

*kkk k k*

*kkk k k*

*kkk k k*

() () () ( ) ( ) 1,1 1,2 1,3 1, 1 1, () () () ( ) ( ) ,1 ,2 ,3 ,1 ,

() () () () ( ) () ()

*kkk k k hhh hw hw kkk k k hhh h w h w*

*XXX X X XXX X X* 

Equation (1) can be transformed into one dimensional matrix form, by placing (t+1) th row to

...... ......

1..*n* and *n*=*h*x*w*, then Equation (1) can be changed into the following equation

1,1 1,2 1, 1, 1 1,2 1, 1 1, ..... .... ..... *kk k k k k k XX X X X X X X w w w nn* (2)

...... ...... ...... ...... *W*, *W*

*w w*

 

*w w*

*w w*

*w w*

.... ......

*k*, *k*

1.*w*. Each training image is

(1)

1. .*m*, each

Principal Component Analysis as Feature Extraction in Feature Space

until now PCA is still used as a reference to develop a feature extraction. Suppose a set of training image containing *m training image X(k)*,

[Gunawan et al., 2009]. This chapter is composed as follows:

1. Principal Component Analysis in input space 2. Kernel Principal Component Analysis

Kernel Principal Component Analysis

of Kernel Linear Preserving Projection

training image has *h*x*w* size where *H*, *H*

**2. Principal component analysis in input space** 

Extraction in Feature Space

7. Conclusions

represented as:

tth. If *N*, *N* The result of Equation (4) is in the row vector form, it has 1x*N* dimension. It can be rewritten in the following equation

$$\overline{X} = \begin{bmatrix} \overline{X}\_{1,1} \ \overline{X}\_{1,2} \ \overline{X}\_{1,3} \ \overline{X}\_{1,4} \ \dots \ \overline{X}\_{1,n-1} \ \dots \ \overline{X}\_{1,n-1} \ \overline{X}\_{1,n} \end{bmatrix} \tag{5}$$

The result of Equation (5) can be re-formed as original training image. To illustrate Equation (1), (2), (3) and (4), it is important to give an example of image average of the Olivetty Research Laboratory (ORL) face image database as seen In Figure 1

Fig. 1. average of ORL Face Image Database Using 3, 5 and 7 Face Image for Each Person

The zero mean matrix can be calculated by subtracting the face image values of training set with Equation (5). In order to perform the subtraction, both face image and Equation (5) must have the same size.

The Maximum Non-Linear Feature Selection of Kernel Based on Object Appearance 149

et al., 1998; Sch¨olkopf et al., 1999; Arif et al., 2008b; Mauridhi et al., 2010]. Principally, KPCA works in feature space [Arif et al., 2008b]. Input space of training set is transformed into feature space by using Mercer Kernel that yields positive semi definite matrix as seen in

> *kXY X Y* ( , ) ( ( ), ( ))

Functions that can be used to transform are *Gaussian*, *Polynomial*, and *Sigmoidal* as seen in the

<sup>2</sup> || || ( , ) exp( ) *X Y kXY*

( , ) (( . ) )*<sup>d</sup> k X Y a XY b* (13)

( , ) tanh( ( . ) )*<sup>d</sup> k X Y a XY b* (14)

The results of Equation (12), (13) and (14) will be selected as object feature candidates [Arif et al., 2008b, Mauridhi, 2010]. The biggest value of them will be employed as feature space in

> max( : (,) *<sup>i</sup>* ) *i i kxy F RF*

For each kernel function has yielded one matrix feature, so we have 3 matrix of feature space from 3 kernel functions. For each corresponding matrix position will be compared and will be selected the maximum value (the greatest value). The maximum value will be used

> 

*XXX X X XXX X X XXX X X*

( ) ( ) ( ) ( ) ( ) 1,1 1,2 1,3 1, 1 1, ( ) ( ) ( ) ( ) ( ) 2,1 2,2 2,3 2, 1 2, ( ) ( ) ( ) ( ) ( ) 3,1 3,2 3,3 3, 1 3, ( ) ( ) ( ) ( ) ( ) 4,1 4,2 4,3 4, 1 4,

*kkk k k*

*kkk k k*

*kkk k k*

*kkk k k*

 

...... ......

... ...... ...... ...... ...... ......

( ) ( ) ( ) ( ) ( ) 1,1 1,2 1,3 1, 1 1, ( ) ( ) ( ) ( ) ( ) ,1 ,2 ,3 , 1 ,

*kkk k k m m m m m m m kkk k k m m m m m m m*

*XXX X X XXX X X*

The biggest value of feature space is the most dominant feature value. As we know, feature space as seen on equation (16) is yielded by using kernel (in this case, training set is transformed into feature space using equation (12), (13) and (14) and followed by selection of the biggest value at the same position using equation (15). where feature selection in kernel space will be used to determine average, zero mean, covariance matrix, eigenvalue

*X XXX X X*

...... ...... ......  

 

 

 

 

> 

as feature candidate. It can be represented by using the following equation

( ) ......

**4. Maximum value selection of kernel principal component analysis as** 

(11)

(12)

(15)

*w m*

 

 

*w m*

 

*w m*

 

*w m*

 

> 

  (16)

the Kernel Trick [Sch¨olkopf et al., 1998; Sch¨olkopf et al., 1999]

following equation

**feature extraction in feature space** 

the next stage, as seen in the following equation

...

Therefore, Equation (5) can be replicated as many as *m* row size. The zero mean matrix can be formulated by using the following equation

$$
\Phi\_M = X\_M - \overline{X} \tag{6}
$$

*M*, *M*1..*m*. Furthermore, the covariance value can be computed by using the following equation

$$C = (X\_M - \overline{X}) . \left( X\_M - \overline{X} \right)^r \tag{7}$$

As shown in Equation (7), *C* has *mxm* size and the value of *m*<<*n*. To obtain the principal components, the eigenvalues and eigenvectors can be computed by using the following equation:

$$\begin{aligned} \mathbf{C}.\Lambda &= \lambda.\Lambda\\ \mathbf{C}.\Lambda &= \lambda.I.\Lambda\\ \left(\lambda I - \mathbf{C}\right).\Lambda &= 0\\ \text{Det}\left(\lambda I - \mathbf{C}\right) &= 0 \end{aligned} \tag{8}$$

The values of and represent eigenvalues and eigenvectors of *C* respectively.

$$
\boldsymbol{\lambda} = \begin{pmatrix}
\boldsymbol{\lambda}\_{1,1} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} \\
\boldsymbol{0} & \boldsymbol{\lambda}\_{2,2} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} \\
\boldsymbol{0} & \boldsymbol{0} & \boldsymbol{\ldots} & \boldsymbol{0} & \boldsymbol{0} \\
\boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{\lambda}\_{M-1,M-1} & \boldsymbol{0} \\
\boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{\lambda}\_{M,M}
\end{pmatrix} \tag{9}
$$

$$
\boldsymbol{\Lambda} = \begin{pmatrix}
\boldsymbol{\Lambda}\_{1,1} & \boldsymbol{\Lambda}\_{1,2} & \ldots & \boldsymbol{\Lambda}\_{1,m-1} & \boldsymbol{\Lambda}\_{1,w} \\
\boldsymbol{\Lambda}\_{2,1} & \boldsymbol{\lambda}\_{2,2} & \ldots & \boldsymbol{\Lambda}\_{2,m-1} & \boldsymbol{\Lambda}\_{2,w} \\
\ldots & \ldots & \ldots & \ldots & \ldots & \ldots \\
\boldsymbol{\Lambda}\_{m-1,1} & \boldsymbol{\Lambda}\_{m-1,2} & \ldots & \boldsymbol{\Lambda}\_{m-1,m-1} & \boldsymbol{\Lambda}\_{m-1,n} \\
\boldsymbol{\Lambda}\_{m,1} & \boldsymbol{\Lambda}\_{m,2} & \ldots & \ldots & \boldsymbol{\Lambda}\_{m,m-1} & \boldsymbol{\Lambda}\_{m,n}
\end{pmatrix} \tag{10}
$$

Equation (9) can be changed into row vector as seen in the following equation

1,1 2,2 3,3 1, 1 , . . . . . . . . . . . . . . . . . . . . *m m mm* (10)

To obtain the most until the less dominant features, the eigenvalues were sorted in descending order <sup>123</sup> . . . . . . . . . . . . . . . . . . . *m m* <sup>1</sup> and followed by corresponding eigenvectors.

,1 ,2 ,1 ,

*m m mm mm*

#### **3. Kernel principal component analysis**

Principal Component Analysis has inspired some researchers to develop it. Kernel Principal Component Analysis (KPCA) is Principal Component Analysis in feature space [Sch¨olkopf 148 Principal Component Analysis

Therefore, Equation (5) can be replicated as many as *m* row size. The zero mean matrix can

1..*m*. Furthermore, the covariance value can be computed by using the following

*T*

( ).

As shown in Equation (7), *C* has *mxm* size and the value of *m*<<*n*. To obtain the principal components, the eigenvalues and eigenvectors can be computed by using the following

The values of and represent eigenvalues and eigenvectors of *C* respectively.

2,2

Equation (9) can be changed into row vector as seen in the following equation

0 00 0

1,1

> 

**3. Kernel principal component analysis** 

descending order

corresponding eigenvectors.

. . . ..

 

*C I I C Det I C*

*C*

. 0

00 0 0 0 00 0 0 0 .... 0 0 0 00 0

 

0

1, 1

*M M*

1,1 1,2 1, 1 1, 2,1 2,2 2, 1 2,

 

..... ...... ...... ...... .... ...... ...... ...... ......

1,1 1,2 1, 1 1, ,1 ,2 ,1 ,

*m m mm mm m m mm mm*

1,1 2,2 3,3 1, 1 , . . . . . . . . . . . . . . . . . . . .

To obtain the most until the less dominant features, the eigenvalues were sorted in

<sup>123</sup> . . . . . . . . . . . . . . . . . . .

Principal Component Analysis has inspired some researchers to develop it. Kernel Principal Component Analysis (KPCA) is Principal Component Analysis in feature space [Sch¨olkopf

 

,

*M M*

*m m m m*

 

 

> 

*m m mm* (10)

*M M X X* (6)

*C X XX X M M* (7)

(8)

(9)

(10)

*m m* <sup>1</sup> and followed by

be formulated by using the following equation

*M*, *M*

equation

equation:

et al., 1998; Sch¨olkopf et al., 1999; Arif et al., 2008b; Mauridhi et al., 2010]. Principally, KPCA works in feature space [Arif et al., 2008b]. Input space of training set is transformed into feature space by using Mercer Kernel that yields positive semi definite matrix as seen in the Kernel Trick [Sch¨olkopf et al., 1998; Sch¨olkopf et al., 1999]

$$k(X,Y) = \left(\phi(X), \phi(Y)\right) \tag{11}$$

Functions that can be used to transform are *Gaussian*, *Polynomial*, and *Sigmoidal* as seen in the following equation

$$k(X,Y) = \exp(\frac{-||\,\,\, |\,\, X - Y\,\,\|\,\, ^2}{\sigma})\tag{12}$$

$$k(X,Y) = \left(a(X.Y) + b\right)^d\tag{13}$$

$$k(X,Y) = \tanh(a(X,Y) + b)^d \tag{14}$$

### **4. Maximum value selection of kernel principal component analysis as feature extraction in feature space**

The results of Equation (12), (13) and (14) will be selected as object feature candidates [Arif et al., 2008b, Mauridhi, 2010]. The biggest value of them will be employed as feature space in the next stage, as seen in the following equation

$$F = \max(\phi\_i : R\_i \xrightarrow[k(x, y)]{} F\_i) \tag{15}$$

For each kernel function has yielded one matrix feature, so we have 3 matrix of feature space from 3 kernel functions. For each corresponding matrix position will be compared and will be selected the maximum value (the greatest value). The maximum value will be used as feature candidate. It can be represented by using the following equation

 ( ) ( ) ( ) ( ) ( ) 1,1 1,2 1,3 1, 1 1, ( ) ( ) ( ) ( ) ( ) 2,1 2,2 2,3 2, 1 2, ( ) ( ) ( ) ( ) ( ) 3,1 3,2 3,3 3, 1 3, ( ) ( ) ( ) ( ) ( ) 4,1 4,2 4,3 4, 1 4, ...... ...... ...... ( ) ...... ... *kkk k k w m kkk k k w m kkk k k w m kkk k k w m XXX X X XXX X X XXX X X X XXX X X* ( ) ( ) ( ) ( ) ( ) 1,1 1,2 1,3 1, 1 1, ( ) ( ) ( ) ( ) ( ) ,1 ,2 ,3 , 1 , ... ...... ...... ...... ...... ...... ...... ...... *kkk k k m m m m m m m kkk k k m m m m m m m XXX X X XXX X X* (16)

The biggest value of feature space is the most dominant feature value. As we know, feature space as seen on equation (16) is yielded by using kernel (in this case, training set is transformed into feature space using equation (12), (13) and (14) and followed by selection of the biggest value at the same position using equation (15). where feature selection in kernel space will be used to determine average, zero mean, covariance matrix, eigenvalue

The Maximum Non-Linear Feature Selection of Kernel Based on Object Appearance 151

bigger value of the eigenvalue in the feature space, the more dominant the corresponding eigenvector in feature space. The result of sorting Equation (21) can be shown in the

In this chapter, the experimental results of "The Maximum Value Selection of Kernel Principal Component Analysis for Face Recognition" will be explained. We use Olivetti-Att-ORL (ORL) [Research Center of Att, 2007] and YALE face image databases [Yale Center for

ORL face image database consist of 40 persons, 36 of them are men and the other 4 are women. Each of them has 10 poses. The poses were taken at different time with various kinds of lighting and expressions (eyes open/close, smiling/not smiling) [Research Center of Att, 2007]. The face position is frontal with 10 up to 20% angles. The face image size is

The experiments are employed for 5 times, and for each experiment 5, 6, 7, 8 and 9 poses for each person are used. The rest of training set, i.e. 5, 4, 3, 2 and 1, will be used as the testing

*m m* 1, 1 *m m*, (23)

 

1,1 2,2 3,3 . . . . . . . . . . . . . . . . . . . .

**5. Experimental results of face recognition by using maximum value selection of kernel principal component analysis as feature extraction in** 

following equation

**feature space** 

 

92x112 pixels as shown in Figure 2.

Fig. 2. Face Images of ORL Database

[Arif et al., 2008b] as seen in Table 1

 

 

Computational Vision and Control, 2007] as experimental material.

**5.1 Experimental results using the ORL face image database** 

and eigenvector in feature space. These values are yielded by using kernel trick as nonlinear component. Nonlinear component is linear component (principal component) improvement. So, it is clear that the biggest value of these kernels is improvement of the PCA performance. The average value of Equation (16) can be expressed in the following equation

$$\phi\left(\overline{X}\right) = \frac{\sum\_{k=1}^{m} \phi\left(X\_{1,N}^{(k)}\right)}{m} \tag{17}$$

So, zero mean in the feature space can be found by using the following equation

$$
\phi\left(\Phi\_M\right) = \phi\left(X\_M\right) - \phi\left(\overline{X}\right) \tag{18}
$$

Where *M*, *M*1..*m*. The result of Equation (18) has *m*x*m*. To obtain the eigenvalues and the eigenvectors in feature space, it is necessary to calculated the covariance matrix in feature space. It can be computed by using the following equation

$$\phi\left(\mathbf{C}\right) = \left(\phi\left(X\right) - \phi\left(\overline{X}\right)\right) \cdot \left(\phi\left(X\right) - \phi\left(\overline{X}\right)\right)^{r} \tag{19}$$

Based on Equation (19), the eigenvalues and the eigenvectors in feature space can be determined by using the following equation

$$\begin{aligned} \phi(\mathcal{C}).\phi(\Lambda) &= \phi(\mathcal{A}).\phi(\Lambda) \\ \phi(\mathcal{C}).\phi(\Lambda) &= \phi(\mathcal{A}).I.\phi(\Lambda) \\ \left(\phi(\mathcal{A})I - \phi(\mathcal{C})\right).\phi(\Lambda) &= 0 \\ \left(\phi(\mathcal{A})I - \phi(\mathcal{C})\right) &= 0 \end{aligned} \tag{20}$$

The eigenvalues and eigenvectors yielded by Equation (20) can be written in following matrices

$$
\phi(\boldsymbol{\Lambda}) = \begin{pmatrix}
\phi(\boldsymbol{\lambda}\_{1,1}) & 0 & 0 & 0 & 0 \\
0 & \phi(\boldsymbol{\lambda}\_{2,2}) & 0 & 0 & 0 \\
0 & 0 & \dots & 0 & 0 \\
0 & 0 & 0 & \phi(\boldsymbol{\lambda}\_{M-1,M-1}) & 0 \\
0 & 0 & 0 & 0 & \phi(\boldsymbol{\lambda}\_{M,M})
\end{pmatrix} \tag{21}
$$

$$
\phi(\boldsymbol{\Lambda}) = \begin{pmatrix}
\phi(\boldsymbol{\Lambda}\_{1,1}) & \phi(\boldsymbol{\Lambda}\_{1,2}) & \dots & \phi(\boldsymbol{\Lambda}\_{1,m-1}) & \phi(\boldsymbol{\Lambda}\_{1,m}) \\
\phi(\boldsymbol{\Lambda}\_{2,1}) & \phi(\boldsymbol{\lambda}\_{2,2}) & \dots & \phi(\boldsymbol{\Lambda}\_{2,m-1}) & \phi(\boldsymbol{\Lambda}\_{2,m}) \\
\vdots & \dots & \dots & \dots & \dots & \dots \\
\phi(\boldsymbol{\Lambda}\_{m-1,1}) & \phi(\boldsymbol{\Lambda}\_{m-1,2}) & \dots & \phi(\boldsymbol{\Lambda}\_{m-1,m-1}) & \phi(\boldsymbol{\Lambda}\_{m-1,m}) \\
\phi(\boldsymbol{\Lambda}\_{m,1}) & \phi(\boldsymbol{\Lambda}\_{m,2}) & \dots & \dots & \phi(\boldsymbol{\Lambda}\_{m,m-1})
\end{pmatrix} \tag{22}
$$

To obtain the value of the most until the less dominant feature, the Equation (21) will be sorted decreasingly and followed by Equation (20) [Arif et al., 2008b, Mauridhi, 2010]. The 150 Principal Component Analysis

and eigenvector in feature space. These values are yielded by using kernel trick as nonlinear component. Nonlinear component is linear component (principal component) improvement. So, it is clear that the biggest value of these kernels is improvement of the PCA performance.

( )

1 *<sup>m</sup> <sup>k</sup> N*

*k*

1,

1..*m*. The result of Equation (18) has *m*x*m*. To obtain the eigenvalues and the

(17)

(20)

(21)

(22)

*M M X X* (18)

*T*

*C XXXX* (19)

*X*

*m*

eigenvectors in feature space, it is necessary to calculated the covariance matrix in feature

( ) .

Based on Equation (19), the eigenvalues and the eigenvectors in feature space can be

 

 

 

 

The eigenvalues and eigenvectors yielded by Equation (20) can be written in following

 

0 00 0 0 0 .... 0 0 0 00 0

 

> 

 

 

..... ...... ...... ...... .... ...... ...... ...... ......

1,1 1,2 1, 1 1, 2,1 2,2 2, 1 2,

 

 

 

 

To obtain the value of the most until the less dominant feature, the Equation (21) will be sorted decreasingly and followed by Equation (20) [Arif et al., 2008b, Mauridhi, 2010]. The

1,1 1,2 1, 1 1, ,1 ,2 , 1 ,

 

*m m m m m m m m m m m m*

 

. . . ..

 

 

*C I I C Det I C*

 

. 0 0

00 0 0

*M M*

1, 1

 

 

*m m m m*

> 

> >

 

  ,

*M M*

The average value of Equation (16) can be expressed in the following equation

*X*

So, zero mean in the feature space can be found by using the following equation

 

*C*

 

 

  2,2

0 00 0

 

 

1,1

space. It can be computed by using the following equation

determined by using the following equation

 

Where *M*, *M*

matrices

bigger value of the eigenvalue in the feature space, the more dominant the corresponding eigenvector in feature space. The result of sorting Equation (21) can be shown in the following equation

$$\phi(\boldsymbol{\upbeta}) = \left[ \phi(\boldsymbol{\upbeta}\_{1,1}) \,\,\phi(\boldsymbol{\upbeta}\_{2,2}) \,\,\phi(\boldsymbol{\upbeta}\_{3,3}) \,\,\dots \,\,\dots \,\,\dots \,\dots \dots \dots \dots \,\,\dots \,\,\right.} \,\, \phi(\boldsymbol{\upbeta}\_{m-1,m-1}) \,\, \phi(\boldsymbol{\upbeta}\_{m,n}) \right] \tag{23}$$
