3. Population principal components

Principal components are a particular case of linear combinations of p r.v.s, X1,…, Xp. These linear combinations represent, geometrically, a new coordinate system that is obtained by rotating the original reference system that has X1, …, Xp as coordinate axes. The new axes represent the directions with maximum variability and provide a simple description of the structure of the covariance.

Principal components depend only on the variance/covariance matrix P (or on the correlation matrix r) of X1, …, Xp, and it is not necessary to assume that the r.v.s follows an approximately normal distribution. In case of having a normal multivariate distribution, we will have interpretations in terms of ellipsoids of constant density, if we consider the distance that defines the P matrix, and the inferences can be made from the population components.

Let X ¼ X<sup>1</sup> … Xp � �<sup>t</sup> be a p-dimensional random vector with covariance matrix P and eigenvalues λ<sup>1</sup> ≥ λ<sup>2</sup> ≥ ⋯ ≥ λp. Let us consider the following p linear combinations:

$$\begin{aligned} Y\_1 &= l\_1^t \mathbf{X} = l\_{11} \mathbf{X}\_1 + \dots + l\_{p1} \mathbf{X}\_p\\ &\vdots\\ Y\_p &= l\_p^t \mathbf{X} = l\_{1p} \mathbf{X}\_1 + \dots + l\_{pp} \mathbf{X}\_p \end{aligned} \tag{5}$$

Yi <sup>¼</sup> <sup>e</sup><sup>t</sup> i

In addition, with this choice it is verified that:

Σe<sup>i</sup> ¼ λ<sup>i</sup> i ¼ 1, …, p .

� � <sup>¼</sup> <sup>0</sup> i, j <sup>¼</sup> <sup>1</sup>, …,p i 6¼ <sup>j</sup> .

p i¼1

forms between vectors of fixed norm max<sup>l</sup>6¼<sup>0</sup>

properties on the trace of a matrix (if <sup>Σ</sup> <sup>¼</sup> <sup>P</sup>ΛP<sup>t</sup>

Each component of the coefficient vector e<sup>t</sup>

diagonal, it can be shown that <sup>Σ</sup>�<sup>1</sup> <sup>¼</sup> <sup>P</sup>Λ�<sup>1</sup>

to simplify the expressions, then

by those five components without a great loss of information.

it is a measure of the relationship between the r.v.s Xkand Yi.

<sup>1</sup>X, …, Yp <sup>¼</sup> <sup>e</sup><sup>t</sup>

ance matrix <sup>P</sup>, with pairs of eigenvalues and eigenvectors ð Þ <sup>λ</sup>1; <sup>e</sup><sup>1</sup> … <sup>λ</sup>p; <sup>e</sup><sup>p</sup>

correlation coefficients between the variables Xk and the components Yi are given by

<sup>r</sup>Xk ,Yi <sup>¼</sup> eki

Therefore, eki is proportional to the correlation coefficient between Xkand Yi.

<sup>X</sup> is constant in the ellipsoids with the center at <sup>μ</sup> given by ð Þ <sup>X</sup> � <sup>μ</sup> <sup>t</sup>

If the covariance matrix, <sup>P</sup>, can be decomposed into <sup>Σ</sup> <sup>¼</sup> <sup>P</sup>ΛP<sup>t</sup>

1. V Yi ½ �¼ <sup>e</sup><sup>t</sup>

2. Cov Yi; Yj

is given by <sup>λ</sup><sup>i</sup>

axes �<sup>c</sup> ffiffiffiffi λi

λ1þ⋯þλ<sup>p</sup> .

Proposition 3.2: If Y<sup>1</sup> <sup>¼</sup> <sup>e</sup><sup>t</sup>

i

coefficients is not unique.

4. <sup>σ</sup><sup>11</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>σ</sup>pp <sup>¼</sup> <sup>P</sup>

X ¼ e1iX<sup>1</sup> þ ⋯ þ epiXp i ¼ 1, …, p (7)

Application of Principal Component Analysis to Image Compression

http://dx.doi.org/10.5772/intechopen.75007

115

� �. Also, the Lagrange multipliers method can be

<sup>i</sup> <sup>¼</sup> <sup>e</sup>1i; …;epi � �, eki, also deserves our attention, since

<sup>p</sup>X are the principal components obtained from the covari-

<sup>σ</sup>kk <sup>p</sup> i, k <sup>¼</sup> <sup>1</sup>, …, p (8)

Σ�<sup>1</sup>

� � , then the linear

ð Þ¼ <sup>X</sup> � <sup>μ</sup> <sup>c</sup><sup>2</sup> that have

, where P is orthogonal and Λ

. Also, if it can be assumed that μ ¼ 0,

, then trð Þ¼ <sup>Σ</sup> tr <sup>P</sup>ΛP<sup>t</sup> � � <sup>¼</sup> trð Þ <sup>Λ</sup> ).

3. If any of the eigenvalues are equal, the choice of the corresponding eigenvectors as vectors of

p j¼1 V Yj � �.

Remark 3.1: For the demonstration of these results, expressions are used on maximums of quadratic

used, expressions when the abovementioned maximum is subject to orthogonality conditions and

Due to the previous result, principal components are uncorrelated among them, with variances equal to the eigenvalues of P, and the proportion of the population variance due to the ith principal component

If a high percentage of the population variance, for example, the 90%, of a p-dimensional r.v., with large p, can be attributed to, for example, the five first principal components, then we can replace all the r.v.s

> ffiffiffiffi λi p ffiffiffiffiffiffi

In the particular case that X has a normal p-dimensional distribution, Ν<sup>p</sup> μ;Σ � �, the density of

<sup>P</sup><sup>t</sup> <sup>¼</sup> <sup>P</sup> p i¼1 1 λi eie<sup>t</sup> i

<sup>p</sup> <sup>e</sup><sup>i</sup> and <sup>i</sup> <sup>¼</sup> <sup>1</sup>, …, p, where <sup>λ</sup><sup>i</sup> ð Þ ; <sup>e</sup><sup>i</sup> are the pairs of eigenvalues and eigenvectors of <sup>P</sup>.

l t Σl l t <sup>l</sup> ¼ λ<sup>1</sup>

V Xi ½ �¼ <sup>λ</sup><sup>1</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>λ</sup><sup>p</sup> <sup>¼</sup> <sup>P</sup>

These new r.v.s verify the following equalities:

$$\begin{aligned} V[Y\_i] &= l\_i^t \Sigma l\_i & \quad & i = 1, \ldots, p\\ \text{Cov}[Y\_i, Y\_j] &= l\_i^t \Sigma l\_j & \quad & i \neq j \quad \text{i} \neq j \end{aligned} \tag{6}$$

Principal components are those linear combinations that, being uncorrelated among them, have the greatest possible variance. Thus, the first principal component is the linear combination with the greatest variance, that is, V Y½ �¼ <sup>1</sup> l t <sup>1</sup>Σl<sup>1</sup> is maximum. Since if we multiply l<sup>1</sup> by some constant the previous variance grows, we will restrict our attention to vectors of norm one, with which the aforementioned indeterminacy disappears. The second principal component is the linear combination that maximizes the variance and is uncorrelated with the first one, and the norm of the coefficient vector is equal to 1.

Proposition 3.1: Let <sup>P</sup> be the covariance matrix of the random vector <sup>X</sup> <sup>¼</sup> <sup>X</sup><sup>1</sup> … Xp � �<sup>t</sup> . Let us assume that <sup>P</sup> has p pairs of eigenvalues and eigenvectors, ð Þ <sup>λ</sup>1; <sup>e</sup><sup>1</sup> , …, <sup>λ</sup>p; <sup>e</sup><sup>p</sup> � �, with λ<sup>1</sup> ≥ λ<sup>2</sup> ≥ ⋯ ≥ λp. Then, the ith principal component is given by

$$\mathbf{Y}\_{i} = \mathbf{e}\_{i}^{\dagger}\mathbf{X} = \varepsilon\_{1i}X\_{1} + \dots + \varepsilon\_{pi}X\_{p} \quad \text{i} = 1, \dots, p \tag{7}$$

In addition, with this choice it is verified that:

1. V Yi ½ �¼ <sup>e</sup><sup>t</sup> i Σe<sup>i</sup> ¼ λ<sup>i</sup> i ¼ 1, …, p .

distance from point R to the origin; however, Q seems to have more to do with the cloud of points than the origin. If we take into account the variability of the points in the cloud and take

The above given explanation has tried to be an illustration of the need to consider distances

Principal components are a particular case of linear combinations of p r.v.s, X1,…, Xp. These linear combinations represent, geometrically, a new coordinate system that is obtained by rotating the original reference system that has X1, …, Xp as coordinate axes. The new axes represent the directions with maximum variability and provide a simple description of the

Principal components depend only on the variance/covariance matrix P (or on the correlation matrix r) of X1, …, Xp, and it is not necessary to assume that the r.v.s follows an approximately normal distribution. In case of having a normal multivariate distribution, we will have interpretations in terms of ellipsoids of constant density, if we consider the distance that defines the

� �<sup>t</sup> be a p-dimensional random vector with covariance matrix P and

<sup>1</sup>X ¼ l11X<sup>1</sup> þ ⋯ þ lp1Xp

(5)

<sup>p</sup>X ¼ l1pX<sup>1</sup> þ ⋯ þ lppXp

<sup>i</sup>Σlj i, j <sup>¼</sup> <sup>1</sup>,…,p i 6¼ <sup>j</sup> (6)

<sup>1</sup>Σl<sup>1</sup> is maximum. Since if we multiply l<sup>1</sup> by

� �<sup>t</sup>

� �, with λ<sup>1</sup> ≥ λ<sup>2</sup> ≥ ⋯ ≥ λp.

. Let us

Σli i ¼ 1, …, p

Principal components are those linear combinations that, being uncorrelated among them, have the greatest possible variance. Thus, the first principal component is the linear combina-

some constant the previous variance grows, we will restrict our attention to vectors of norm one, with which the aforementioned indeterminacy disappears. The second principal component is the linear combination that maximizes the variance and is uncorrelated with the first

t

P matrix, and the inferences can be made from the population components.

Y<sup>1</sup> ¼ l t

Yp ¼ l t

V Yi ½ �¼ l t i

Cov Yi; Yj � � <sup>¼</sup> <sup>l</sup>

These new r.v.s verify the following equalities:

tion with the greatest variance, that is, V Y½ �¼ <sup>1</sup> l

Then, the ith principal component is given by

one, and the norm of the coefficient vector is equal to 1.

eigenvalues λ<sup>1</sup> ≥ λ<sup>2</sup> ≥ ⋯ ≥ λp. Let us consider the following p linear combinations:

⋮

t

Proposition 3.1: Let <sup>P</sup> be the covariance matrix of the random vector <sup>X</sup> <sup>¼</sup> <sup>X</sup><sup>1</sup> … Xp

assume that <sup>P</sup> has p pairs of eigenvalues and eigenvectors, ð Þ <sup>λ</sup>1; <sup>e</sup><sup>1</sup> , …, <sup>λ</sup>p; <sup>e</sup><sup>p</sup>

the statistical measure, then Q will be closer to R than the origin.

other than the Euclidean.

structure of the covariance.

Let X ¼ X<sup>1</sup> … Xp

3. Population principal components

114 Statistics - Growing Data Sets and Growing Demand for Statistics


$$\mathbf{4.} \quad \sigma\_{11} + \dots + \sigma\_{pp} = \sum\_{i=1}^{p} V[\mathbf{X}\_i] = \lambda\_1 + \dots + \lambda\_p = \sum\_{j=1}^{p} V\left[\mathbf{Y}\_j\right].$$

Remark 3.1: For the demonstration of these results, expressions are used on maximums of quadratic forms between vectors of fixed norm max<sup>l</sup>6¼<sup>0</sup> l t Σl l t <sup>l</sup> ¼ λ<sup>1</sup> � �. Also, the Lagrange multipliers method can be used, expressions when the abovementioned maximum is subject to orthogonality conditions and properties on the trace of a matrix (if <sup>Σ</sup> <sup>¼</sup> <sup>P</sup>ΛP<sup>t</sup> , then trð Þ¼ <sup>Σ</sup> tr <sup>P</sup>ΛP<sup>t</sup> � � <sup>¼</sup> trð Þ <sup>Λ</sup> ).

Due to the previous result, principal components are uncorrelated among them, with variances equal to the eigenvalues of P, and the proportion of the population variance due to the ith principal component is given by <sup>λ</sup><sup>i</sup> λ1þ⋯þλ<sup>p</sup> .

If a high percentage of the population variance, for example, the 90%, of a p-dimensional r.v., with large p, can be attributed to, for example, the five first principal components, then we can replace all the r.v.s by those five components without a great loss of information.

Each component of the coefficient vector e<sup>t</sup> <sup>i</sup> <sup>¼</sup> <sup>e</sup>1i; …;epi � �, eki, also deserves our attention, since it is a measure of the relationship between the r.v.s Xkand Yi.

Proposition 3.2: If Y<sup>1</sup> <sup>¼</sup> <sup>e</sup><sup>t</sup> <sup>1</sup>X, …, Yp <sup>¼</sup> <sup>e</sup><sup>t</sup> <sup>p</sup>X are the principal components obtained from the covariance matrix <sup>P</sup>, with pairs of eigenvalues and eigenvectors ð Þ <sup>λ</sup>1; <sup>e</sup><sup>1</sup> … <sup>λ</sup>p; <sup>e</sup><sup>p</sup> � � , then the linear correlation coefficients between the variables Xk and the components Yi are given by

$$\rho\_{X\_k, Y\_i} = \frac{e\_{ki}\sqrt{\lambda\_i}}{\sqrt{\sigma\_{kk}}} \quad i, k = 1, \ldots, p \tag{8}$$

Therefore, eki is proportional to the correlation coefficient between Xkand Yi.

In the particular case that X has a normal p-dimensional distribution, Ν<sup>p</sup> μ;Σ � �, the density of <sup>X</sup> is constant in the ellipsoids with the center at <sup>μ</sup> given by ð Þ <sup>X</sup> � <sup>μ</sup> <sup>t</sup> Σ�<sup>1</sup> ð Þ¼ <sup>X</sup> � <sup>μ</sup> <sup>c</sup><sup>2</sup> that have axes �<sup>c</sup> ffiffiffiffi λi <sup>p</sup> <sup>e</sup><sup>i</sup> and <sup>i</sup> <sup>¼</sup> <sup>1</sup>, …, p, where <sup>λ</sup><sup>i</sup> ð Þ ; <sup>e</sup><sup>i</sup> are the pairs of eigenvalues and eigenvectors of <sup>P</sup>. If the covariance matrix, <sup>P</sup>, can be decomposed into <sup>Σ</sup> <sup>¼</sup> <sup>P</sup>ΛP<sup>t</sup> , where P is orthogonal and Λ diagonal, it can be shown that <sup>Σ</sup>�<sup>1</sup> <sup>¼</sup> <sup>P</sup>Λ�<sup>1</sup> <sup>P</sup><sup>t</sup> <sup>¼</sup> <sup>P</sup> p i¼1 1 λi eie<sup>t</sup> i . Also, if it can be assumed that μ ¼ 0, to simplify the expressions, then

$$\mathbf{c}^2 = \mathbf{x}^t \boldsymbol{\Sigma}^{-1} \mathbf{x} = \frac{1}{\lambda\_1} \left( \mathbf{e}\_1^t \mathbf{x} \right)^2 + \frac{1}{\lambda\_2} \left( \mathbf{e}\_2^t \mathbf{x} \right)^2 + \dots + \frac{1}{\lambda\_p} \left( \mathbf{e}\_p^t \mathbf{x} \right)^2 \tag{9}$$

If the principal components <sup>y</sup><sup>1</sup> <sup>¼</sup> <sup>e</sup><sup>t</sup> <sup>1</sup>x, …, yp <sup>¼</sup> <sup>e</sup><sup>t</sup> <sup>p</sup>x are considered, the equation of the constant density ellipsoid is given by

$$c^2 = \frac{1}{\lambda\_1}y\_1^2 + \frac{1}{\lambda\_2}y\_2^2 + \dots + \frac{1}{\lambda\_p}y\_p^2 \tag{10}$$

component. If, in addition, it is assumed that the distribution of X is normal, Ν<sup>3</sup> μ;Σ � �, with a null

The ellipsoid with c<sup>2</sup> <sup>¼</sup> <sup>8</sup> has been represented in Figure 5 (a), together with its axes and the ellipsoid projections on planes parallel to the coordinate axes. The aforementioned projections are ellipses of red, green, and blue colors that are reproduced in Figure 5 (b). Also, in this figure, the black ellipse obtained by projecting the ellipsoid on the plane determined by the first two main components has been

diameters of the ellipse determined by the first two components are larger than the others. Therefore, the area enclosed by this ellipse is the largest of all, indicating that it is the one that gathers the greatest

considered, which in matrix notation is Z ¼ V Xð Þ -μ , where V is the diagonal matrix whose

Principal components of Z are obtained by the eigenvalues and eigenvectors of the correlation matrix, r, of X. Furthermore, with some simplification, the previous results can be applied,

Figure 5. Ellipsoid of constant statistical distance and projections. (a) Ellipsoid of constant density and projections on the

<sup>b</sup><sup>2</sup> <sup>¼</sup> <sup>8</sup>, where a <sup>¼</sup> <sup>c</sup>

ffiffiffi η1 <sup>p</sup> and b <sup>¼</sup> <sup>c</sup>

<sup>σ</sup>pp <sup>p</sup> . It is easily verified that the r.v. Z verifies E½ �¼ Z 0 and

i

, and the axes are determined by Y<sup>1</sup> and Y2. As can be seen, the

<sup>x</sup> <sup>¼</sup> <sup>c</sup><sup>2</sup> can be considered. An ellipsoid of constant

http://dx.doi.org/10.5772/intechopen.75007

Application of Principal Component Analysis to Image Compression

ffiffiffi η2

<sup>σ</sup><sup>11</sup> <sup>p</sup> , …, Zp <sup>¼</sup> Xp�μ<sup>p</sup> ffiffiffiffiffi

� �, <sup>i</sup> <sup>¼</sup> <sup>1</sup>, …, p, the pairs of eigen-

<sup>p</sup> , with η<sup>1</sup> and η<sup>2</sup> being

117

<sup>σ</sup>pp <sup>p</sup> can also be

Σ�<sup>1</sup>

1 <sup>a</sup><sup>2</sup> <sup>þ</sup> <sup>y</sup><sup>2</sup> 2

3.1. Principal components with respect to standardized variables

The principal components of the normalized variables <sup>Z</sup><sup>1</sup> <sup>¼</sup> <sup>X</sup>1�μ<sup>1</sup> ffiffiffiffiffi

mean vector, ellipsoids of constant density x<sup>t</sup>

represented. The equation of this ellipse is <sup>y</sup><sup>2</sup>

<sup>σ</sup><sup>11</sup> <sup>p</sup> ,…, <sup>1</sup>

since the variance of each Zi is equal to 1.

ffiffiffiffiffi

Cov½ �¼ Z VΣV ¼ r, where r is the correlation matrix of X.

Let W1, …, Wp be the principal components of Z and vi; u<sup>t</sup>

values and eigenvectors of r, since they do not have to be the same.

coordinate planes. (b) Projections on the coordinate planes and the base plane f g Y1;Y<sup>2</sup> .

the two smallest eigenvalues of Σ�<sup>1</sup>

variability.

elements are <sup>1</sup>ffiffiffiffiffi

statistical distance and projections is shown in Figure 5.

Therefore, the axes of the ellipsoid have the directions of the principal components.

Example 3.1: Let X1, X2, X<sup>3</sup> be the three-unidimensional r.v.s and <sup>X</sup> <sup>¼</sup> ½ � <sup>X</sup>1; <sup>X</sup>2X<sup>3</sup> <sup>t</sup> , with covariance matrix

$$
\Sigma = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 8 & -3 \\ 0 & -3 & 2 \end{bmatrix} \tag{11}
$$

It can be verified that the pairs of eigenvalues and eigenvectors are <sup>λ</sup><sup>1</sup> <sup>¼</sup> <sup>9</sup>:243; <sup>e</sup><sup>t</sup> <sup>1</sup> <sup>¼</sup> <sup>½</sup> 0 0:<sup>924</sup> � �0:383�Þ, <sup>λ</sup><sup>2</sup> <sup>¼</sup> <sup>2</sup>; <sup>e</sup><sup>t</sup> <sup>2</sup> <sup>¼</sup> ½ � <sup>100</sup> � �, and <sup>λ</sup><sup>3</sup> <sup>¼</sup> <sup>0</sup>:757; <sup>e</sup><sup>t</sup> <sup>3</sup> <sup>¼</sup> ½ � 0 0:383 0:<sup>924</sup> � �. Therefore, the principal components are the following:

$$\begin{aligned} Y\_1 &= \mathbf{e}\_1^t \mathbf{X} = 0.924 \mathbf{X}\_2 - 0.383 \mathbf{X}\_3 \\ Y\_2 &= \mathbf{e}\_2^t \mathbf{X} = \mathbf{X}\_1 \\ Y\_3 &= \mathbf{e}\_3^t \mathbf{X} = 0.383 \mathbf{X}\_2 + 0.924 \mathbf{X}\_3 \end{aligned} \tag{12}$$

The norm of all the eigenvectors is equal to 1, and, in addition, the variable X<sup>1</sup> is the second principal component, because X<sup>1</sup> is uncorrelated with the other two variables.

The results of Proposition 3.1 can be verified for this data, for example, V Y½ �¼ <sup>1</sup> 9:243 and Cov Y½ �¼ <sup>1</sup>;Y<sup>2</sup> <sup>0</sup>. Also, <sup>P</sup> 3 i¼1 V Xi ½ �¼ <sup>2</sup> <sup>þ</sup> <sup>8</sup> <sup>þ</sup> <sup>2</sup> <sup>¼</sup> <sup>12</sup> <sup>¼</sup> <sup>9</sup>:<sup>243</sup> <sup>þ</sup> <sup>2</sup> <sup>þ</sup> <sup>0</sup>:<sup>757</sup> <sup>¼</sup> <sup>P</sup> 3 j¼1 V Yj � �. Thus, the proportion of the total variance explained by the first component is λ1=12 ¼ 77%, and the one explained by the first two is ð Þ λ<sup>1</sup> þ λ<sup>2</sup> =12 ¼ 93:69%, so that the components Y<sup>1</sup> and Y<sup>2</sup> can replace the original variables with a small loss of information.

The correlation coefficients between the principal components and the variables are the following:

$$\begin{aligned} \rho\_{X\_{\rm{I},Y\_{\rm{I}}}} &= 0 & \rho\_{X\_{\rm{2},Y\_{\rm{I}}}} &= 0.993 & \rho\_{X\_{\rm{3},Y\_{\rm{I}}}} &= -0.823\\ \rho\_{X\_{\rm{I},Y\_{\rm{2}}}} &= 1 & \rho\_{X\_{\rm{2},Y\_{\rm{2}}}} &= 0 & \rho\_{X\_{\rm{3},Y\_{\rm{2}}}} &= 0\\ \rho\_{X\_{\rm{1},Y\_{\rm{3}}}} &= 0 & \rho\_{X\_{\rm{2},Y\_{\rm{3}}}} &= 0.118 & \rho\_{X\_{\rm{3},Y\_{\rm{3}}}} &= 0.568 \end{aligned} \tag{13}$$

In view of these values, it can be concluded that X2and X<sup>3</sup> individually are practically equally important with respect to the first principal component, although this is not the case with respect to the third component. If, in addition, it is assumed that the distribution of X is normal, Ν<sup>3</sup> μ;Σ � �, with a null mean vector, ellipsoids of constant density x<sup>t</sup> Σ�<sup>1</sup> <sup>x</sup> <sup>¼</sup> <sup>c</sup><sup>2</sup> can be considered. An ellipsoid of constant statistical distance and projections is shown in Figure 5.

The ellipsoid with c<sup>2</sup> <sup>¼</sup> <sup>8</sup> has been represented in Figure 5 (a), together with its axes and the ellipsoid projections on planes parallel to the coordinate axes. The aforementioned projections are ellipses of red, green, and blue colors that are reproduced in Figure 5 (b). Also, in this figure, the black ellipse obtained by projecting the ellipsoid on the plane determined by the first two main components has been represented. The equation of this ellipse is <sup>y</sup><sup>2</sup> 1 <sup>a</sup><sup>2</sup> <sup>þ</sup> <sup>y</sup><sup>2</sup> 2 <sup>b</sup><sup>2</sup> <sup>¼</sup> <sup>8</sup>, where a <sup>¼</sup> <sup>c</sup> ffiffiffi η1 <sup>p</sup> and b <sup>¼</sup> <sup>c</sup> ffiffiffi η2 <sup>p</sup> , with η<sup>1</sup> and η<sup>2</sup> being the two smallest eigenvalues of Σ�<sup>1</sup> , and the axes are determined by Y<sup>1</sup> and Y2. As can be seen, the diameters of the ellipse determined by the first two components are larger than the others. Therefore, the area enclosed by this ellipse is the largest of all, indicating that it is the one that gathers the greatest variability.

#### 3.1. Principal components with respect to standardized variables

c <sup>2</sup> <sup>¼</sup> <sup>x</sup><sup>t</sup> Σ�<sup>1</sup>

116 Statistics - Growing Data Sets and Growing Demand for Statistics

If the principal components <sup>y</sup><sup>1</sup> <sup>¼</sup> <sup>e</sup><sup>t</sup>

density ellipsoid is given by

matrix

�0:383�Þ, <sup>λ</sup><sup>2</sup> <sup>¼</sup> <sup>2</sup>; <sup>e</sup><sup>t</sup>

Cov Y½ �¼ <sup>1</sup>;Y<sup>2</sup> <sup>0</sup>. Also, <sup>P</sup>

principal components are the following:

<sup>x</sup> <sup>¼</sup> <sup>1</sup> λ1 et <sup>1</sup><sup>x</sup> � �<sup>2</sup> þ 1 λ2 et <sup>2</sup><sup>x</sup> � �<sup>2</sup>

c <sup>2</sup> <sup>¼</sup> <sup>1</sup> λ1 y2 <sup>1</sup> þ 1 λ2 y2 <sup>2</sup> þ ⋯ þ

<sup>1</sup>x, …, yp <sup>¼</sup> <sup>e</sup><sup>t</sup>

Therefore, the axes of the ellipsoid have the directions of the principal components.

2 6 4

20 0 0 8 �3 0 �3 2

<sup>1</sup>X ¼ 0:924X<sup>2</sup> � 0:383X<sup>3</sup>

<sup>3</sup>X ¼ 0:383X<sup>2</sup> þ 0:924X<sup>3</sup>

The norm of all the eigenvectors is equal to 1, and, in addition, the variable X<sup>1</sup> is the second principal

The results of Proposition 3.1 can be verified for this data, for example, V Y½ �¼ <sup>1</sup> 9:243 and

portion of the total variance explained by the first component is λ1=12 ¼ 77%, and the one explained by the first two is ð Þ λ<sup>1</sup> þ λ<sup>2</sup> =12 ¼ 93:69%, so that the components Y<sup>1</sup> and Y<sup>2</sup> can replace the original

> r<sup>X</sup>1,Y<sup>1</sup> ¼ 0 r<sup>X</sup>2,Y<sup>1</sup> ¼ 0:993 r<sup>X</sup>3,Y<sup>1</sup> ¼ �0:823 r<sup>X</sup>1,Y<sup>2</sup> ¼ 1 r<sup>X</sup>2,Y<sup>2</sup> ¼ 0 r<sup>X</sup>3,Y<sup>2</sup> ¼ 0 r<sup>X</sup>1,Y<sup>3</sup> ¼ 0 r<sup>X</sup>2,Y<sup>3</sup> ¼ 0:118 r<sup>X</sup>3,Y<sup>3</sup> ¼ 0:568

In view of these values, it can be concluded that X2and X<sup>3</sup> individually are practically equally important with respect to the first principal component, although this is not the case with respect to the third

The correlation coefficients between the principal components and the variables are the following:

V Xi ½ �¼ <sup>2</sup> <sup>þ</sup> <sup>8</sup> <sup>þ</sup> <sup>2</sup> <sup>¼</sup> <sup>12</sup> <sup>¼</sup> <sup>9</sup>:<sup>243</sup> <sup>þ</sup> <sup>2</sup> <sup>þ</sup> <sup>0</sup>:<sup>757</sup> <sup>¼</sup> <sup>P</sup>

Example 3.1: Let X1, X2, X<sup>3</sup> be the three-unidimensional r.v.s and <sup>X</sup> <sup>¼</sup> ½ � <sup>X</sup>1; <sup>X</sup>2X<sup>3</sup> <sup>t</sup>

Σ ¼

It can be verified that the pairs of eigenvalues and eigenvectors are <sup>λ</sup><sup>1</sup> <sup>¼</sup> <sup>9</sup>:243; <sup>e</sup><sup>t</sup>

<sup>2</sup>X ¼ X<sup>1</sup>

<sup>2</sup> <sup>¼</sup> ½ � <sup>100</sup> � �, and <sup>λ</sup><sup>3</sup> <sup>¼</sup> <sup>0</sup>:757; <sup>e</sup><sup>t</sup>

<sup>Y</sup><sup>1</sup> <sup>¼</sup> <sup>e</sup><sup>t</sup>

<sup>Y</sup><sup>2</sup> <sup>¼</sup> <sup>e</sup><sup>t</sup>

<sup>Y</sup><sup>3</sup> <sup>¼</sup> <sup>e</sup><sup>t</sup>

component, because X<sup>1</sup> is uncorrelated with the other two variables.

3 i¼1

variables with a small loss of information.

þ ⋯ þ

1 λp y2

3 7

1 λp et px � �<sup>2</sup>

<sup>p</sup>x are considered, the equation of the constant

<sup>p</sup> (10)

<sup>5</sup> (11)

<sup>3</sup> <sup>¼</sup> ½ � 0 0:383 0:<sup>924</sup> � �. Therefore, the

3 j¼1 V Yj

<sup>1</sup> <sup>¼</sup> <sup>½</sup> 0 0:<sup>924</sup> �

, with covariance

(12)

(13)

� �. Thus, the pro-

(9)

The principal components of the normalized variables <sup>Z</sup><sup>1</sup> <sup>¼</sup> <sup>X</sup>1�μ<sup>1</sup> ffiffiffiffiffi <sup>σ</sup><sup>11</sup> <sup>p</sup> , …, Zp <sup>¼</sup> Xp�μ<sup>p</sup> ffiffiffiffiffi <sup>σ</sup>pp <sup>p</sup> can also be considered, which in matrix notation is Z ¼ V Xð Þ -μ , where V is the diagonal matrix whose elements are <sup>1</sup>ffiffiffiffiffi <sup>σ</sup><sup>11</sup> <sup>p</sup> ,…, <sup>1</sup> ffiffiffiffiffi <sup>σ</sup>pp <sup>p</sup> . It is easily verified that the r.v. Z verifies E½ �¼ Z 0 and Cov½ �¼ Z VΣV ¼ r, where r is the correlation matrix of X.

Principal components of Z are obtained by the eigenvalues and eigenvectors of the correlation matrix, r, of X. Furthermore, with some simplification, the previous results can be applied, since the variance of each Zi is equal to 1.

Let W1, …, Wp be the principal components of Z and vi; u<sup>t</sup> i � �, <sup>i</sup> <sup>¼</sup> <sup>1</sup>, …, p, the pairs of eigenvalues and eigenvectors of r, since they do not have to be the same.

Figure 5. Ellipsoid of constant statistical distance and projections. (a) Ellipsoid of constant density and projections on the coordinate planes. (b) Projections on the coordinate planes and the base plane f g Y1;Y<sup>2</sup> .

Proposition 3.3: Let Z ¼ Z1;…;Zp � �<sup>t</sup> be a random vector with covariance matrix r. Let ð Þ v1; u<sup>1</sup> , …, vp; u<sup>p</sup> � � be the pairs of eigenvalues and eigenvectors of r, with v<sup>1</sup> ≥ ⋯ ≥ vp. Then, the ith principal component is given by Wi <sup>¼</sup> <sup>u</sup><sup>t</sup> <sup>i</sup>V Xð Þ -μ , i ¼ 1, …, p. In addition, with this choice it is verified that:


$$\mathbf{4.} \quad \sum\_{i=1}^{p} V[W\_i] = \upsilon\_1 + \dots + \upsilon\_p = \sum\_{j=1}^{p} V\left[Z\_j\right] = p.c.$$

5. The linear correlation coefficients between the variables Zk and the principal components Wi are rZk ,Wi ¼ uki ffiffiffiffi vi <sup>p</sup> and i, k <sup>¼</sup> <sup>1</sup>, …, p.

These results are a consequence of those obtained in Proposition 3.1 and Proposition 3.2 applied to Z and r instead of X and P.

The total population variance of the normalized variables is the sum of the elements of the diagonal of r, that is, p. Therefore, the proportion of the total variability explained by the ith principal component is vi <sup>p</sup> , i ¼ 1, …, p.

Example 3.2: Let X<sup>1</sup> and X<sup>2</sup> be the two-unidimensional r.v.s and <sup>X</sup> <sup>¼</sup> ½ � <sup>X</sup>1; <sup>X</sup><sup>2</sup> <sup>t</sup> with the covariance matrix, P, and correlation matrix, r, given by

$$\begin{aligned} \Sigma &= \begin{bmatrix} 1 & 2 \\ & 3 & 4 \end{bmatrix} \\ \clubsuit &= \begin{bmatrix} 1 & 0.2 \\ 0.2 & 1 \end{bmatrix} \end{aligned} \tag{14}$$

<sup>W</sup><sup>1</sup> <sup>¼</sup> <sup>u</sup><sup>t</sup>

<sup>W</sup><sup>2</sup> <sup>¼</sup> <sup>u</sup><sup>t</sup>

ffiffiffiffiffi v1

has important consequences.

components will be more balanced.

4. Sample principal components

variation of n measurements made on p variables.

combinations are called principal sample components.

t <sup>1</sup>Sl2.

Given n values of any linear combination l

<sup>1</sup>xj, and its sample variance is l

their sample covariance is l

<sup>p</sup> <sup>¼</sup> <sup>0</sup>:<sup>707</sup> ffiffiffiffiffiffi

r<sup>Z</sup>1,W<sup>1</sup> ¼ u<sup>11</sup>

<sup>Σ</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup>A.

is l t <sup>1</sup>Z ¼ 0:707Z<sup>1</sup> þ 0:707Z<sup>2</sup> ¼ 0:707 X<sup>1</sup> � μ<sup>1</sup>

<sup>2</sup>Z ¼ �0:707Z<sup>1</sup> þ 0:707Z<sup>2</sup> ¼ �0:707 X<sup>1</sup> � μ<sup>1</sup>

determined by X2, and the proportion of variability explained by that first component is <sup>λ</sup><sup>1</sup>

<sup>1</sup>:<sup>2</sup> <sup>p</sup> <sup>¼</sup> <sup>0</sup>:<sup>774</sup> and <sup>r</sup><sup>Z</sup>2,W<sup>1</sup> <sup>¼</sup> <sup>u</sup><sup>21</sup>

weights, in terms of Xi are 0:707 and 0:0707 for r, as opposed to �0:02 and �0:999 for Σ.

portion of the total variability explained by the first component is <sup>v</sup><sup>1</sup>

Because the variance of X<sup>2</sup> is much greater than that of X1, the first principal component for Σ is

When considering the normalized variables, each variable also contributes to the components determined by r, and the dependencies between the normalized variables and their first component are

Therefore, the importance of the first component is strongly affected by normalization. In fact, the

Remark 3.2: The above example shows that the principal components deduced from the original variables are, in general, different from those derived from the normalized variables. So, normalization

When the units in which the different one-dimensional random variables are given are very different and in the case that one of the variances is very dominant compared to the others, the first principal component, with respect to the original variables, will be determined by the variable whose variance is the dominant one. On the other hand, if the variables are normalized, their relationship with the first

Principal components can be expressed in particular ways if the covariance matrix, or the correlation matrix, has special structures, such as diagonal ones, or structures of the form

Once we have the theoretical framework, we can now address the problem of summarizing the

Let x1, …, x<sup>n</sup> be a sample of a p-dimensional r.v. Xwith mean vector μ and covariance matrix Σ. These data have a vector of sample means x, covariance matrix S, and correlation matrix R.

This section is aimed at constructing linear uncorrelated combinations of the measured characteristics that contain the greatest amount of variability contained in the sample. These linear

t

t

� � <sup>þ</sup> <sup>0</sup>:<sup>0707</sup> <sup>X</sup><sup>2</sup> � <sup>μ</sup><sup>2</sup>

<sup>p</sup> ¼ �0:<sup>707</sup> ffiffiffiffiffiffi

<sup>p</sup> ¼ 0:6.

<sup>1</sup>x<sup>j</sup> ¼ l11x1<sup>j</sup> þ ⋯ þ lp1xpj, j ¼ 1, …, n, its sample mean

t <sup>1</sup>x<sup>j</sup> and l t <sup>2</sup>xj,

<sup>1</sup>Sl1. If we consider two linear combinations, l

ffiffiffiffiffi v1

� � <sup>þ</sup> <sup>0</sup>:<sup>0707</sup> <sup>X</sup><sup>2</sup> � <sup>μ</sup><sup>2</sup>

Application of Principal Component Analysis to Image Compression

� �

http://dx.doi.org/10.5772/intechopen.75007

� �

(16)

119

<sup>λ</sup>1þλ<sup>2</sup> <sup>¼</sup> <sup>0</sup>:99.

<sup>1</sup>:<sup>2</sup> <sup>p</sup> ¼ �0:774. The pro-

It can be verified that the pairs of eigenvalues and eigenvectors for S are <sup>λ</sup><sup>1</sup> <sup>¼</sup> <sup>100</sup>:04; <sup>e</sup><sup>t</sup> <sup>1</sup> <sup>¼</sup> � <sup>½</sup> �0:<sup>02</sup> �0:<sup>999</sup> �Þ and <sup>λ</sup><sup>2</sup> <sup>¼</sup> <sup>0</sup>:96; <sup>e</sup><sup>t</sup> <sup>2</sup> ¼ �½ � <sup>0</sup>:999 0:<sup>02</sup> � �. Therefore, the principal components are the following:

$$\begin{aligned} Y\_1 &= \mathbf{e}\_1^t \mathbf{X} = -0.02 \mathbf{X}\_1 - 0.999 \mathbf{X}\_2 \\ Y\_2 &= \mathbf{e}\_2^t \mathbf{X} = -0.999 \mathbf{X}\_1 + 0.02 \mathbf{X}\_2 \end{aligned} \tag{15}$$

Furthermore, the eigenvalues and eigenvectors of <sup>r</sup> are v<sup>1</sup> <sup>¼</sup> <sup>1</sup>:2; <sup>u</sup><sup>t</sup> <sup>1</sup> <sup>¼</sup> ½ � <sup>0</sup>:707 0:<sup>707</sup> � � and <sup>v</sup><sup>2</sup> <sup>¼</sup> <sup>0</sup>:8; <sup>u</sup><sup>t</sup> <sup>2</sup> ¼ �½ � <sup>0</sup>:707 0:<sup>707</sup> � �; hence, the principal components of the normalized variables are the following:

$$\begin{aligned} W\_1 &= \mathbf{u}\_1^\dagger \mathbf{Z} = 0.707 \mathbf{Z}\_1 + 0.707 \mathbf{Z}\_2 = 0.707 \left( \mathbf{X}\_1 - \boldsymbol{\mu}\_1 \right) + 0.0707 \left( \mathbf{X}\_2 - \boldsymbol{\mu}\_2 \right) \\ W\_2 &= \mathbf{u}\_2^\dagger \mathbf{Z} = -0.707 \mathbf{Z}\_1 + 0.707 \mathbf{Z}\_2 = -0.707 \left( \mathbf{X}\_1 - \boldsymbol{\mu}\_1 \right) + 0.0707 \left( \mathbf{X}\_2 - \boldsymbol{\mu}\_2 \right) \end{aligned} \tag{16}$$

Because the variance of X<sup>2</sup> is much greater than that of X1, the first principal component for Σ is determined by X2, and the proportion of variability explained by that first component is <sup>λ</sup><sup>1</sup> <sup>λ</sup>1þλ<sup>2</sup> <sup>¼</sup> <sup>0</sup>:99.

When considering the normalized variables, each variable also contributes to the components determined by r, and the dependencies between the normalized variables and their first component are r<sup>Z</sup>1,W<sup>1</sup> ¼ u<sup>11</sup> ffiffiffiffiffi v1 <sup>p</sup> <sup>¼</sup> <sup>0</sup>:<sup>707</sup> ffiffiffiffiffiffi <sup>1</sup>:<sup>2</sup> <sup>p</sup> <sup>¼</sup> <sup>0</sup>:<sup>774</sup> and <sup>r</sup><sup>Z</sup>2,W<sup>1</sup> <sup>¼</sup> <sup>u</sup><sup>21</sup> ffiffiffiffiffi v1 <sup>p</sup> ¼ �0:<sup>707</sup> ffiffiffiffiffiffi <sup>1</sup>:<sup>2</sup> <sup>p</sup> ¼ �0:774. The proportion of the total variability explained by the first component is <sup>v</sup><sup>1</sup> <sup>p</sup> ¼ 0:6.

Therefore, the importance of the first component is strongly affected by normalization. In fact, the weights, in terms of Xi are 0:707 and 0:0707 for r, as opposed to �0:02 and �0:999 for Σ.

Remark 3.2: The above example shows that the principal components deduced from the original variables are, in general, different from those derived from the normalized variables. So, normalization has important consequences.

When the units in which the different one-dimensional random variables are given are very different and in the case that one of the variances is very dominant compared to the others, the first principal component, with respect to the original variables, will be determined by the variable whose variance is the dominant one. On the other hand, if the variables are normalized, their relationship with the first components will be more balanced.

Principal components can be expressed in particular ways if the covariance matrix, or the correlation matrix, has special structures, such as diagonal ones, or structures of the form <sup>Σ</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup>A.
