**5. Comparison with conventional methods**

This section compares SubKPCA with related conventional methods.

#### **5.1 Improved KPCA**

Improved KPCA (IKPCA) (Xu et al., 2007) directly approximates *ui* � *T*v˜*<sup>i</sup>* in Eq. (7). From *SS*∗u*<sup>i</sup>* = *λiui*, the approximated eigenvalue problem is

$$SS^\*T\tilde{\upsilon} = \lambda\_i T\tilde{\upsilon}\_i. \tag{41}$$

By multiplying *T*∗ from left side, one gets the approximated generalized EVD, K� xyKxyv˜ = *λi*Kyv˜*i*. The parameter vector v*<sup>i</sup>* is substituted to the relation *u*˜*<sup>i</sup>* = *T*v˜*i*, hence, the transform of an input vector x is

$$\mathcal{U}^\*\_{\text{IKPCA}}\Phi(x) = \left(\text{diag}([\frac{1}{\sqrt{\kappa\_1}}, \dots, \frac{1}{\sqrt{\kappa\_r}}])\right) \mathbf{Z}^\top h\_{\mathbf{z}\boldsymbol{\omega}} \tag{42}$$

where *κ<sup>i</sup>* is the *i*th largest eigenvalue of (21).

This approximation has no guarantee to be good approximation of *ui*. In our experiments in the next section, IKPCA showed worse performance than SubKPCA. In so far as feature extraction, each dimension of the feature vector is multiplied by √ 1 *<sup>κ</sup><sup>i</sup>* comparing to SubKPCA. If the classifier accepts such linear transforms, the classification accuracy of feature vectors

where <sup>G</sup> <sup>∈</sup> **<sup>R</sup>***n*×*<sup>m</sup>* whose upper triangle part is zero, and *<sup>m</sup>* is a parameter that specifies the trade-off between approximation accuracy and computational complexity. Instead of performing EVD of <sup>K</sup>, eigenvectors of <sup>K</sup> is obtained from EVD of <sup>G</sup>�<sup>G</sup> <sup>∈</sup> **<sup>R</sup>***m*×*<sup>m</sup>* using the relation Eq. (7) approximately. Along with Nyström approximation, ICD reduces computational complexity in the construction stage, but not in evaluation stage, and all

Subset Basis Approximation of Kernel Principal Component Analysis 79

In the next section, our experimental results indicate that ICD is slower than SubKPCA for

This section presents numerical examples and numerical comparisons with the other methods.

At first, methods to be compared and evaluation criteria are described. Following methods

8. Kernel Hebbian algorithm with stochastic meta-decent (Günter et al., 2007) [KHA-SMD]

*n*

*n* ∑ *i*=1

where *X* is replaced by each operator. Note that full KPCA gives the minimum values for *E*emp(*X*) under the rank constraint. Since *E*emp(*X*) depends on the problem, normalized by that of full KPCA is also used, *E*emp(*X*)/*E*emp(*P*Fkp), where *P*FKp is a projector of full KPCA. Validation error *E*val that uses validation samples instead of training samples in the empirical

K�

xyKxy is approximated by

xyKxy �GG�. (45)

�Φ(x*i*) <sup>−</sup> *<sup>X</sup>*Φ(x*i*)�2, (46)

training samples have to be stored for the evaluation.

very large dataset, *n* is more than several thousand. ICD can also be applied to SubKPCA. In Eq. (21), K�

Then approximated z is obtained from EVD of G�KyG.

**6. Numerical examples**

are compared,

1. SubKPCA [SubKp] 2. Full KPCA [FKp]

3. Reduced KPCA [RKp]

error is also used.

**6.1 Methods and evaluation criteria**

Standard KPCA using all training samples.

Abbreviations in [] are used in Figures and Tables.

4. Improved KPCA (Xu et al., 2007) [IKp] 5. Sparse KPCA (Tipping, 2001) [SpKp]

7. ICD (Bach & Jordan, 2002) [ICD]

Standard KPCA using subset of training samples.

6. Nyström approximation (Williams & Seeger, 2001) [Nys]

For evaluation criteria, the empirical error that is *J*1, is used.

*<sup>E</sup>*emp(*X*) =*J*1(*X*) = <sup>1</sup>

may be the same with SubKPCA. Indeed, (Xu et al., 2007) uses IKPCA only for feature extraction of a classification problem, and IKPCA shows good performance.
