**8. Appendix A: PRINCALS**

PRINCALS by Gifi [Gifi, 1990] can handle multiple nominal variables in addition to the single nominal, ordinal and numerical variables accepted in PRINCIPALS. We denote the set of multiple variables by J*<sup>M</sup>* and the set of single variables with single nominal and ordinal scales and numerical measurements by J*S*. For **X** consisting of a mixture of multiple and single variables, the algorithm alternates between estimation of **Z**, **A** and **X**∗ subject to minimizing

$$\theta^\* = \text{tr}(\mathbf{Z} - \mathbf{X}^\* \mathbf{A})^\top (\mathbf{Z} - \mathbf{X}^\* \mathbf{A})$$

under the restriction

$$\mathbf{Z}^{\top}\mathbf{1}\_{\mathrm{ll}} = \mathbf{0}\_{\mathrm{l}} \qquad \text{and} \qquad \mathbf{Z}^{\top}\mathbf{Z} = \mathfrak{n}\mathbf{I}\_{\mathrm{p}}.\tag{9}$$

For the initialization of PRINCALS, we determine initial data **Z**(0) , **A**(0) and **X**∗(0) . The values of **<sup>Z</sup>**(0) are initialized with random numbers under the restriction (9). For *<sup>j</sup>* ∈ J*M*, the initial value of **X**∗ *<sup>j</sup>* is obtained by **<sup>X</sup>**∗(0) *<sup>j</sup>* = **G***j*(**G**� *<sup>j</sup>* **<sup>G</sup>***j*)−1**G**� *<sup>j</sup>* **<sup>Z</sup>**(0). For *<sup>j</sup>* ∈ J*S*, **<sup>X</sup>**∗(0) *<sup>j</sup>* is defined as the first *Kj* successive integers under the normalization restriction, and the initial value of **A***<sup>j</sup>* is calculated as the vector **A**(0) *<sup>j</sup>* <sup>=</sup> **<sup>Z</sup>**(0)�**X**∗(0) *<sup>j</sup>* . Given these initial values, PRINCALS as provided in Michailidis and de Leeuw [Michailidis and Leeuw, 1998] iterates the following two steps:

• *Model parameter estimation step*: Calculate **Z**(*t*+1) by

$$\mathbf{Z}^{(t+1)} = p^{-1} \left( \sum\_{j \in \mathcal{J}\_M} \mathbf{X}\_j^{\*(t)} + \sum\_{j \in \mathcal{J}s} \mathbf{X}\_j^{\*(t)} \mathbf{A}\_j^{(t)} \right).$$

Columnwise center and orthonormalize **Z**(*t*+1) . Estimate **A**(*t*+1) *<sup>j</sup>* for the single variable *j* by **A**(*t*+1) *<sup>j</sup>* <sup>=</sup> **<sup>Z</sup>**(*t*+1)�**X**∗(*t*) *<sup>j</sup>* /**X**∗(*t*)� *<sup>j</sup>* **<sup>X</sup>**∗(*t*) *<sup>j</sup>* .

• *Optimal scaling step*: Estimate the optimally scaled vector for *j* ∈ J*<sup>M</sup>* by

$$\mathbf{X}\_{j}^{\*(t+1)} = \mathbf{G}\_{j} (\mathbf{G}\_{j}^{\top} \mathbf{G}\_{j})^{-1} \mathbf{G}\_{j}^{\top} \mathbf{Z}^{(t+1)}$$

and for *j* ∈ J*<sup>S</sup>* by

$$\mathbf{X}\_{\mathbf{j}}^{\*(t+1)} = \mathbf{G}\_{\mathbf{j}} (\mathbf{G}\_{\mathbf{j}}^{\top} \mathbf{G}\_{\mathbf{j}})^{-1} \mathbf{G}\_{\mathbf{j}}^{\top} \mathbf{Z}^{(t+1)} \mathbf{A}\_{\mathbf{j}}^{(t+1)} / \mathbf{A}\_{\mathbf{j}}^{(t+1)\top} \mathbf{A}\_{\mathbf{j}}^{(t+1)\top}$$

under measurement restrictions on each of the variables.

Cadima, J., Cerdeira, J.O. and Manuel, M. (2004). Computational aspects of algorithms for

<sup>143</sup> Acceleration of Convergence of the Alternating Least

Iizuka, M., Mori, Y., Tarumi, T. and Tanaka, Y. (2003). Computer intensive trials to determine

Jolliffe, I.T. (1972). Discarding variables in a principal component analysis. I. Artificial data.

Kiers, H.A.L. (2002). Setting up alternating least squares and iterative majorization algorithm

Krijnen, W.P. (2006). Convergence of the sequence of parameters generated by alternating least squares algorithms. *Computational Statistics and Data Analysis*, 51, 481-489. Krzanowski, W.J. (1987). Selection of variables to preserve multivariate data structure using

Kuroda, M. and Sakakihara, M. (2006). Accelerating the convergence of the EM algorithm

Kuroda, M., Mori, Y., Iizuka, M. and Sakakihara, M. (2011). Accelerating the convergence

Michailidis, G. and de Leeuw, J. (1998). The Gifi system of descriptive multivariate analysis.

Mori, Y., Tanaka, Y. and Tarumi, T. (1997). Principal component analysis based on a subset

Mori, Y., Tarumi, T. and Tanaka, Y. (1998). Principal component analysis based on a subset of

Mori, Y., Iizuka, M., Tanaka, Y. and Tarumi, T. (2006). Variable Selection in Principal

R Development Core Team (2008). *R: A language and environment for statistical computing*.

Robert, P. and Escoufier, Y. (1976). A unifying tool for linear multivariate statistical methods:

Sano, K., Manaka, S., Kitamura, K., Kagawa M., Takeuchi, K., Ogashiwa, M., Kameyama, M.,

Tanaka, Y. and Mori, Y. (1997). Principal component analysis based on a subset of variables:

investigation. *Sinkei Kenkyu no Shinpo* 21, 1052-1065 (in Japanese).

Gifi, A. (1990). *Nonlinear multivariate analysis*. John Wiley & Sons, Ltd., Chichester.

principal components. *Applied Statistics*, 36, 22-33.

McCabe, G.P. (1984). Principal variables. *Technometrics*, 26, 137-144.

*(Proceedings of IFCS-96)*, 547-554, Springer-Verlag.

*Biostatistics and Related Fields*, 265-283, Springer.

the RV-coefficient. *Applied Statistics*, 25, 257-265.

http://www.R-project.org.

*Management Sciences*, 17, 61-89.

*Computational Statistics Society of Japan*, 11, 1-12 (in Japanese).

*Data Analysis*, 47, 225-236.

Squares Algorithm for Nonlinear Principal Components Analysis

*Applied Statistics*, 21, 160-173.

*Statistics*, 15, 337-345.

*Analysis*, 41, 157-170.

*Data Analysis*, 55, 143-153.

*Statistical Science*, 13, 307-336.

1549-1561.

variable selection in the context of principal components. *Computational Statistics and*

the number of variables in PCA. *Journal of the Japanese Society of Computational*

for solving various matrix optimization problems. *Computational Statistics and Data*

using the vector epsilon algorithm. *Computational Statistics and Data Analysis*, 51,

of the EM algorithm using the vector epsilon algorithm. *Computational Statistics and*

of variables for qualitative data. *Data Science, Classification, and Related Methods*

variables - Numerical investigation on variable selection procedures -. *Bulletin of the*

Component Analysis. Härdle, W., Mori, Y. and Vieu, P. (eds), *Statistical Methods for*

R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL

Tohgi, H. and Yamada, H. (1977). Statistical studies on evaluation of mild disturbance of consciousness - Abstraction of characteristic clinical pictures by cross-sectional

Variable selection and sensitivity analysis. *The American Journal of Mathematical and*

#### **9. Appendix B: The v***ε* **algorithm**

Let **<sup>Y</sup>**(*t*) denote a vector of dimensionality *<sup>d</sup>* that converges to a vector **<sup>Y</sup>**(∞) as *<sup>t</sup>* <sup>→</sup> <sup>∞</sup>. Let the inverse [**Y**] <sup>−</sup><sup>1</sup> of a vector **Y** be defined by

$$[\mathbf{Y}]^{-1} = \frac{\mathbf{Y}}{\|\mathbf{Y}\|^2 \mathbf{'}} $$

where ||**Y**|| is the Euclidean norm of **Y**.

In general, the v*<sup>ε</sup>* algorithm for a sequence {**Y**(*t*)}*t*≥<sup>0</sup> starts with

$$\varepsilon^{(t,-1)} = 0, \qquad \varepsilon^{(t,0)} = \mathbf{Y}^{(t)},$$

and then generates a vector *ε*(*t*,*k*+1) by

$$\varepsilon^{(t,k+1)} = \varepsilon^{(t+1,k-1)} + \left[\varepsilon^{(t+1,k)} - \varepsilon^{(t,k)}\right]^{-1}, \qquad k = 0, 1, 2, \ldots \tag{10}$$

For practical implementation, we apply the v*ε* algorithm for *k* = 1 to accelerate the convergence of {**Y**(*t*) }*t*≥0. From Equation (10), we have

$$\begin{aligned} \varepsilon^{(t,2)} &= \varepsilon^{(t+1,0)} + \left[\varepsilon^{(t+1,1)} - \varepsilon^{(t,1)}\right]^{-1} \quad \text{for } k = 1, \\\varepsilon^{(t,1)} &= \varepsilon^{(t+1,-1)} + \left[\varepsilon^{(t+1,0)} - \varepsilon^{(t,0)}\right]^{-1} = \left[\varepsilon^{(t+1,0)} - \varepsilon^{(t,0)}\right]^{-1} \quad \text{for } k = 0. \end{aligned}$$

Then the vector *ε*(*t*,2) becomes as follows:

$$\varepsilon^{(t,2)} = \varepsilon^{(t+1,0)} + \left[ \left[ \varepsilon^{(t,0)} - \varepsilon^{(t+1,0)} \right]^{-1} + \left[ \varepsilon^{(t+2,0)} - \varepsilon^{(t+1,0)} \right]^{-1} \right]^{-1} $$
 
$$= \mathbf{Y}^{(t+1)} + \left[ \left[ \mathbf{Y}^{(t)} - \mathbf{Y}^{(t+1)} \right]^{-1} + \left[ \mathbf{Y}^{(t+2)} - \mathbf{Y}^{(t+1)} \right]^{-1} \right]^{-1} .$$

#### **10. Appendix C: VASpca**

URL of VASpca http://mo161.soci.ous.ac.jp/vaspca/indexE.html

#### **11. References**


14 Principal Component Analysis

Let **<sup>Y</sup>**(*t*) denote a vector of dimensionality *<sup>d</sup>* that converges to a vector **<sup>Y</sup>**(∞) as *<sup>t</sup>* <sup>→</sup> <sup>∞</sup>. Let the

<sup>−</sup><sup>1</sup> <sup>=</sup> **<sup>Y</sup>** �**Y**�<sup>2</sup> ,

(*t*,0) = **Y**(*t*)

for *k* = 1,

(*t*+1,0) <sup>−</sup> *<sup>ε</sup>*

(*t*+2,0) <sup>−</sup> *<sup>ε</sup>*

**<sup>Y</sup>**(*t*+2) <sup>−</sup> **<sup>Y</sup>**(*t*+1)

(*t*,0) −<sup>1</sup>

(*t*+1,0)

−<sup>1</sup> −<sup>1</sup> .

−<sup>1</sup> −<sup>1</sup>

(*t*,*k*) −<sup>1</sup> ,

, *k* = 0, 1, 2, . . . . (10)

for *k* = 0.

[**Y**]

(*t*,−1) = 0, *ε*

(*t*+1,*k*) <sup>−</sup> *<sup>ε</sup>*

For practical implementation, we apply the v*ε* algorithm for *k* = 1 to accelerate the

 *ε*

}*t*≥0. From Equation (10), we have

(*t*,1) −<sup>1</sup>

> (*t*,0) −<sup>1</sup> = *ε*

(*t*+1,0)

−<sup>1</sup> + *ε*

−<sup>1</sup> + 

Al-Kandari, N.M. and Jolliffe, I.T. (2001). Variable selection and interpretation of covariance

Al-Kandari, N.M. and Jolliffe, I.T. (2005). Variable selection and interpretation in correlation

Brezinski, C. and Zaglia, M. (1991). *Extrapolation methods: theory and practice*. Elsevier Science

principal components. *Communications in Statistics. Simulation and Computation*, 30,

(*t*+1,1) <sup>−</sup> *<sup>ε</sup>*

(*t*+1,0) <sup>−</sup> *<sup>ε</sup>*

**<sup>Y</sup>**(*t*) <sup>−</sup> **<sup>Y</sup>**(*t*+1)

**9. Appendix B: The v***ε* **algorithm**

where ||**Y**|| is the Euclidean norm of **Y**.

and then generates a vector *ε*(*t*,*k*+1) by

(*t*,*k*+1) = *ε*

(*t*+1,0) +

(*t*+1,−1) +

(*t*+1,0) +

= **Y**(*t*+1) +

Then the vector *ε*(*t*,2) becomes as follows:

 *ε*

> *ε*

> > *ε* (*t*,0) <sup>−</sup> *<sup>ε</sup>*

http://mo161.soci.ous.ac.jp/vaspca/indexE.html

principal components. *Environmetrics*, 16, 659-672.

Ltd. North-Holland, Amsterdam.

*ε*

convergence of {**Y**(*t*)

*ε* (*t*,2) = *ε*

*ε* (*t*,1) = *ε*

> *ε* (*t*,2) = *ε*

**10. Appendix C: VASpca**

339-354.

URL of VASpca

**11. References**

<sup>−</sup><sup>1</sup> of a vector **Y** be defined by

In general, the v*<sup>ε</sup>* algorithm for a sequence {**Y**(*t*)}*t*≥<sup>0</sup> starts with

*ε*

(*t*+1,*k*−1) +

inverse [**Y**]


**8** 

**The Maximum Non-Linear** 

*Institut Teknologi Sepuluh Nopember, Surabaya* 

*Universitas Trunojoyo Madura* 

*Indonesia* 

**Feature Selection of Kernel** 

**Based on Object Appearance** 

Mauridhi Hery Purnomo1, Diah P. Wulandari1, I. Ketut Eddy Purnama1 and Arif Muntasa2

*2Informatics Engineering Department – Engineering Faculty,* 

*1Electrical Engineering Department – Industrial Engineering Faculty,* 

Principal component analysis (PCA) is linear method for feature extraction that is known as Karhonen Loove method. PCA was first proposed to recognize face by Turk and Pentland, and was also known as eigenface in 1991 [Turk, 1991]. However, PCA has some weaknesses. The first, it cannot capture the simplest invariance of the face image [Arif et al., 2008b] , when this information is not provided in the training data. The last, the result of feature extraction is global structure [Arif, 2008]. The PCA is very simple, has overcome curse of dimensionality problem, this method have been known and expanded by some researchers to recognize face such as Linear Discriminant Analysis (LDA)[Yambor, 2000; A.M. Martinez, 2003; J.H.P.N. Belhumeur 1998], Linear Preserving Projection that known Lapalacianfaces [Cai, 2005; Cai et al, 2006; Kokiopoulou, 2004; X. He et al., 2005], Independent Component Analysis, Kernel Principal Component Analysis [Scholkopf et al., 1998; Sch¨olkopf 1999], Kernel Linear Discriminant Analysis (KLDA) [Mika, 1999] and maximum feature value selection of nonlinear function based on Kernel PCA [Arif et al., 2008b]. As we know, PCA is dimensionality reduction method based on object appearance by projecting an original *ndimensional* (row\*column) image into *k* eigenface where *k*<<*n*. Although PCA have been developed into some methods, but in some cases, PCA can outperform LDA, LPP and ICA

This chapter will explain some theoretical of modified PCA that derived from Principal Component Analysis. The first, PCA transforms input space into feature space by using three non-linear functions followed by selection of the maximum value of kernel PCA. The feature space is called as kernel of PCA [Arif et al., 2008b]. The function used to transform is the function that qualifies Mercer Kernel and generates positive semi-definite matrix. Kernel PCA as been implemented to recognize face image[Arif et al., 2008b] and has been compared with some method such as Principal Component Analysis, Principal Linear Discriminant Analysis, and Linear Preserving Projection. The last, the maximum value selection has been enhanced

**1. Introduction** 

when it uses small sample size.

