**5.2 Sparse KPCA**

Two methods to obtain a sparse solution to KPCA are proposed (Smola et al., 1999; Tipping, 2001). Both approaches focus on reducing the computational complexity in the evaluation stage, and do not consider that in the construction stage. In addition, the degree of sparsity cannot be tuned directly for these sparse KPCAs, where as the number of the subset *m* can be tuned for SubKPCA.

As mentioned in Section 4.1.2, (Tipping, 2001) is based on a backward search, therefore, it requires to calculate the kernel Gram matrix using all training samples, and its inverse. These procedures have high computational complexity, especially, when *n* is large.

(Smola et al., 1999) utilizes *l*<sup>1</sup> norm regularization to make the solution sparse. The principal components are represented by linear combinations of mapped samples, *ui* = ∑*<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>α</sup><sup>j</sup> i* Φ(x*j*).

The coefficients *<sup>α</sup><sup>j</sup> <sup>i</sup>* have many zero entry due to *<sup>l</sup>*<sup>1</sup> norm regularization. However, since *<sup>α</sup><sup>j</sup> <sup>i</sup>* has two indeces, even if each principal component *ui* is represented by a few samples, it may not be sparse for many *i*.

#### **5.3 Nyström approximation**

Nyström approximation is a method to approximate EVD, and it is applied to KPCA (Williams & Seeger, 2001). Let ˜u*<sup>i</sup>* and u*<sup>i</sup>* be the *i*th eigenvectors of K<sup>y</sup> and K respectively. Nyström approximation approximates

$$
\tilde{v}\_{i} = \sqrt{\frac{m}{n}} \frac{1}{\lambda\_{i}} \mathbf{K}\_{xy} v\_{i\prime} \tag{43}
$$

where *λ<sup>i</sup>* is the *i*th eigenvalue of Ky. Since the eigenvector of K<sup>x</sup> is approximated by the eigenvector of Ky, the computational complexity in the construction stage is reduced, but that in the evaluation stage is not reduced. In our experiments, SubKPCA shows better performance than Nyström approximation.

#### **5.4 Iterative KPCA**

There are some iterative approaches for KPCA (Ding et al., 2010; Günter et al., 2007; Kim et al., 2005). They update the transform matrix **Λ**−1/2V � KPCA in Eq. (14) for incoming samples.

Iterative approaches are sometimes used for reduction of computational complexities. Even if optimization step does not converge to the optimal point, early stopping point may be a good approximation of the optimal solution. However, Kim et al. (2005) and Günter et al. (2007) do not compare their computational complexity with standard KPCA. In the next section, comparisons of run-times show that iterative KPCAs are not faster than batch approaches.

#### **5.5 Incomplete Cholesky decomposition**

ICD can also be used for reduction of computational complexity of KPCA. ICD approximates the kernel Gram matrix K by

$$K \simeq \overline{\!\!\!\!G\!\!\!F}^{\top} ,\tag{44}$$

where <sup>G</sup> <sup>∈</sup> **<sup>R</sup>***n*×*<sup>m</sup>* whose upper triangle part is zero, and *<sup>m</sup>* is a parameter that specifies the trade-off between approximation accuracy and computational complexity. Instead of performing EVD of <sup>K</sup>, eigenvectors of <sup>K</sup> is obtained from EVD of <sup>G</sup>�<sup>G</sup> <sup>∈</sup> **<sup>R</sup>***m*×*<sup>m</sup>* using the relation Eq. (7) approximately. Along with Nyström approximation, ICD reduces computational complexity in the construction stage, but not in evaluation stage, and all training samples have to be stored for the evaluation.

In the next section, our experimental results indicate that ICD is slower than SubKPCA for very large dataset, *n* is more than several thousand.

ICD can also be applied to SubKPCA. In Eq. (21), K� xyKxy is approximated by

$$K\_{xy}^{\top}K\_{xy} \cong \mathbf{G} \mathbf{G}^{\top}. \tag{45}$$

Then approximated z is obtained from EVD of G�KyG.
