**2. Basics. Computing principal components - discrete case in R***<sup>d</sup>*

The central idea and motivation of PCA is to reduce the dimensionality of a point set by identifying *the most significant directions (principal components)*. Let *P* = {*p*1,*p*2,...,*pn*} be a set of vectors (points) in **<sup>R</sup>***d*, and *<sup>μ</sup>* = (*μ*1, *<sup>μ</sup>*2,..., *<sup>μ</sup>d*) <sup>∈</sup> **<sup>R</sup>***<sup>d</sup>* be the center of gravity of *<sup>P</sup>*. For 1 ≤ *k* ≤ *d*, we use *pi*,*<sup>k</sup>* to denote the *k*-th coordinate of the vector *pi*. Given two vectors *u* and *<sup>v</sup>*, we use �*<sup>u</sup>*,*<sup>v</sup>*� to denote their inner product. For any unit vector *<sup>v</sup>* <sup>∈</sup> **<sup>R</sup>***d*, the *variance of P in direction v* is

$$\text{var}(P, \vec{v}) = \frac{1}{n} \sum\_{i=1}^{n} \langle p\_i - \vec{\mu} \, , \, \vec{v} \rangle^2. \tag{1}$$

The most significant direction corresponds to the unit vector *v*<sup>1</sup> such that var(*P*,*v*1) is maximum. In general, after identifying the *j* most significant directions *v*1,...,*vj*, the (*j* + 1)-th most significant direction corresponds to the unit vector *vj*<sup>+</sup><sup>1</sup> such that var(*P*,*vj*<sup>+</sup>1) is maximum among all unit vectors perpendicular to *v*1,*v*2,...,*vj*.

It can be verified that for any unit vector *<sup>v</sup>* <sup>∈</sup> **<sup>R</sup>***d*,

$$\text{var}(P, \vec{v}) = \langle \Sigma \,\vec{v}, \vec{v} \rangle,\tag{2}$$

where Σ is the *covariance matrix* of *P*. Σ is a symmetric *d* × *d* matrix where the (*i*, *j*)-th component, *σij*, 1 ≤ *i*, *j* ≤ *d*, is defined as

$$
\sigma\_{\rm ij} = \frac{1}{n} \sum\_{k=1}^{n} (p\_{i,k} - \mu\_i)(p\_{j,k} - \mu\_j). \tag{3}
$$

The procedure of finding the most significant directions, in the sense mentioned above, can be formulated as an eigenvalue problem. If *λ*<sup>1</sup> ≥ *λ*<sup>2</sup> ≥ ··· ≥ *λ<sup>d</sup>* are the eigenvalues of Σ, then the unit eigenvector *vj* for *λ<sup>j</sup>* is the *j*-th most significant direction. Since the matrix Σ is symmetric positive semidefinite, its eigenvectors are orthogonal, all *λj*s are non-negative and *λ<sup>j</sup>* = var(*X*,*vj*).

Computation of the eigenvalues, when *d* is not very large, can be done in *O*(*d*3) time, for example with the *Jacobi* or the *QR method* (Press et al. (1995)). Thus, the time complexity of computing principal components of *n* points in **R***<sup>d</sup>* is *O*(*n* + *d*3). The additive factor of *O*(*d*3) throughout the paper will be omitted, since we will assume that *d* is fixed. For very large *d*, the problem of computing eigenvalues is non-trivial. In practice, the above mentioned methods for computing eigenvalues converge rapidly. In theory, it is unclear how to bound the running time combinatorially and how to compute the eigenvalues in decreasing order. In Cheng & Y. Wang (2008) a modification of the *Power method* (Parlett (1998)) is presented, which can give a guaranteed approximation of the eigenvalues with high probability.
