4. Sample principal components

Proposition 3.3: Let Z ¼ Z1;…;Zp

118 Statistics - Growing Data Sets and Growing Demand for Statistics

principal component is given by Wi <sup>¼</sup> <sup>u</sup><sup>t</sup>

� � <sup>¼</sup> <sup>0</sup>, i, j <sup>¼</sup> <sup>1</sup>, …, p, i 6¼ <sup>j</sup>.

<sup>p</sup> and i, k <sup>¼</sup> <sup>1</sup>, …, p.

p j¼1 V Zj � � <sup>¼</sup> <sup>p</sup>.

<sup>p</sup> , i ¼ 1, …, p.

ð Þ v1; u<sup>1</sup> , …, vp; u<sup>p</sup>

2. Cov Wi; Wj

rZk ,Wi ¼ uki

principal component is vi

1. V Wi ½ �¼ vi, i ¼ 1, …, p.

coefficients is not unique.

V Wi ½ �¼ <sup>v</sup><sup>1</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> vp <sup>¼</sup> <sup>P</sup>

ffiffiffiffi vi

applied to Z and r instead of X and P.

matrix, P, and correlation matrix, r, given by

<sup>½</sup> �0:<sup>02</sup> �0:<sup>999</sup> �Þ and <sup>λ</sup><sup>2</sup> <sup>¼</sup> <sup>0</sup>:96; <sup>e</sup><sup>t</sup>

the following:

<sup>v</sup><sup>2</sup> <sup>¼</sup> <sup>0</sup>:8; <sup>u</sup><sup>t</sup>

the following:

that:

4. P p i¼1 � �<sup>t</sup> be a random vector with covariance matrix r. Let

<sup>i</sup>V Xð Þ -μ , i ¼ 1, …, p. In addition, with this choice it is verified

� � be the pairs of eigenvalues and eigenvectors of r, with v<sup>1</sup> ≥ ⋯ ≥ vp. Then, the ith

3. If any of the eigenvalues are equal, the choice of the corresponding eigenvectors as vectors of

5. The linear correlation coefficients between the variables Zk and the principal components Wi are

These results are a consequence of those obtained in Proposition 3.1 and Proposition 3.2

The total population variance of the normalized variables is the sum of the elements of the diagonal of r, that is, p. Therefore, the proportion of the total variability explained by the ith

Example 3.2: Let X<sup>1</sup> and X<sup>2</sup> be the two-unidimensional r.v.s and <sup>X</sup> <sup>¼</sup> ½ � <sup>X</sup>1; <sup>X</sup><sup>2</sup> <sup>t</sup> with the covariance

1 2

" #

3 4

1 0:2

" # (14)

<sup>1</sup> <sup>¼</sup> �

<sup>1</sup> <sup>¼</sup> ½ � <sup>0</sup>:707 0:<sup>707</sup> � � and

(15)

0:2 1

<sup>1</sup>X ¼ �0:02X<sup>1</sup> � 0:999X<sup>2</sup>

<sup>2</sup>X ¼ �0:999X<sup>1</sup> þ 0:02X<sup>2</sup>

<sup>2</sup> ¼ �½ � <sup>0</sup>:707 0:<sup>707</sup> � �; hence, the principal components of the normalized variables are

<sup>2</sup> ¼ �½ � <sup>0</sup>:999 0:<sup>02</sup> � �. Therefore, the principal components are

It can be verified that the pairs of eigenvalues and eigenvectors for S are <sup>λ</sup><sup>1</sup> <sup>¼</sup> <sup>100</sup>:04; <sup>e</sup><sup>t</sup>

Σ ¼

r ¼

<sup>Y</sup><sup>1</sup> <sup>¼</sup> <sup>e</sup><sup>t</sup>

<sup>Y</sup><sup>2</sup> <sup>¼</sup> <sup>e</sup><sup>t</sup>

Furthermore, the eigenvalues and eigenvectors of <sup>r</sup> are v<sup>1</sup> <sup>¼</sup> <sup>1</sup>:2; <sup>u</sup><sup>t</sup>

Once we have the theoretical framework, we can now address the problem of summarizing the variation of n measurements made on p variables.

Let x1, …, x<sup>n</sup> be a sample of a p-dimensional r.v. Xwith mean vector μ and covariance matrix Σ. These data have a vector of sample means x, covariance matrix S, and correlation matrix R.

This section is aimed at constructing linear uncorrelated combinations of the measured characteristics that contain the greatest amount of variability contained in the sample. These linear combinations are called principal sample components.

Given n values of any linear combination l t <sup>1</sup>x<sup>j</sup> ¼ l11x1<sup>j</sup> þ ⋯ þ lp1xpj, j ¼ 1, …, n, its sample mean is l t <sup>1</sup>xj, and its sample variance is l t <sup>1</sup>Sl1. If we consider two linear combinations, l t <sup>1</sup>x<sup>j</sup> and l t <sup>2</sup>xj, their sample covariance is l t <sup>1</sup>Sl2.

The first principal component will be the linear combination, l t <sup>1</sup>xj, which maximizes the sample variance, subject to the condition l t <sup>1</sup>l<sup>1</sup> ¼ 1. The second component will be the linear combination, l t <sup>2</sup>xj, which maximizes the sample variance, subject to the condition that l t <sup>2</sup>l<sup>2</sup> ¼ 1 and that the sample covariance of the pairs l t <sup>1</sup>xj; l t 2xj � � is equal to zero. This procedure is continued until the p principal components are completed.

matrix. Keeping in mind the hypothesis of normality, contours of constant density,

ð Þ¼ <sup>x</sup> � <sup>x</sup> <sup>c</sup><sup>2</sup> � �, can be estimated and make inferences from them.

Although it is not possible to assume normality in the data, geometrically the data are n points

eigenvalues of S. Since all eigenvectors have been chosen such that their norm is equal to 1, the

ð Þ x � x on the vector e^i. Therefore, the principal components can be seen as a translation of the origin to the point x and a rotation of the axes until they pass through the directions with

When there is a high positive correlation between all the variables and a principal component with all its coordinates of the same sign, this component can be considered as a weighted average of all the variables or the size of the index that forms that component. The components that have coordinates of different signs oppose a subset of variables against another, being a

The interpretation of the results is simplified assuming that the small coefficients are zero and rounding the rest to express the component as sums, differences, or quotients of variables.

The interpretation of the principal components can be facilitated by graphic representations in two dimensions. A usual graph is to represent two components as coordinate axes and project all points on those axes. These representations also help to test hypotheses of normality and to detect anomalous observations. If there is an observation that is atypical in the first variable, we will have that the variability in that first variable will grow and that the covariance with the other variables will decrease, in absolute value. Consequently, the first component will be

Sometimes, it is necessary to verify that the first components are approximately normal, although it is not reasonable to expect this result from a linear combination of variables that

The last component can help detect suspicious observations. Each observation x can be

suspect of observations that have a large contribution to the square of the aforementioned

An especially small value of the last eigenvalue of the covariance matrix, or correlation matrix, can indicate a linear dependence between the variables that have not been taken into account. In this case, some variable is redundant and should be removed from the analysis. If we have four variables and the fourth is the sum of the other three, then the last eigenvalue will be close to zero due to rounding errors, in which case we should suspect some dependence. In general, eigenvalues close to zero should not be ignored, and eigenvalues associated with these

i ð Þ <sup>x</sup> � <sup>x</sup> � � �

axes are the axes of the ellipsoid Ep and with lengths proportional to

, and the principal components represent an orthogonal transformation whose coordinate

ffiffiffiffiffi λ^i q

http://dx.doi.org/10.5772/intechopen.75007

� � is the length of the projection of the vector

Application of Principal Component Analysis to Image Compression

, with λ^<sup>i</sup> being the

121

e^<sup>1</sup> þ ⋯ þ y^pje^p, with which

pj, and we will

e^<sup>1</sup> þ ⋯ þ y^qje^<sup>q</sup> and the observation x<sup>j</sup> is

<sup>q</sup>�1<sup>j</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>y</sup>^<sup>2</sup>

Ep <sup>¼</sup> <sup>x</sup> <sup>∈</sup> <sup>ℜ</sup><sup>p</sup>

greater variability.

do not have to be normal.

<sup>y</sup>^<sup>q</sup>�1<sup>j</sup>

norm.

ℜ<sup>p</sup>

jð Þ <sup>x</sup> � <sup>x</sup> <sup>t</sup>

S�<sup>1</sup>

absolute value of the <sup>i</sup>th component <sup>y</sup>^<sup>i</sup> <sup>¼</sup> <sup>e</sup>^<sup>t</sup>

weighted average of two groups of variables.

strongly influenced by the first variable, distorting the analysis.

expressed as a linear combination of the eigenvectors of S, x<sup>j</sup> ¼ y^1<sup>j</sup>

<sup>e</sup>^<sup>q</sup>�<sup>1</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>y</sup>^pje^p, which is a vector with square of the norm <sup>y</sup>^<sup>2</sup>

the difference between the first components y^1<sup>j</sup>

Proposition 4.1: Let S ¼ ð Þ sik be the p by p matrix of sample covariances, whose pairs of eigenvalues and eigenvectors are λ^1; e^<sup>1</sup> � �, …, <sup>λ</sup>^p; <sup>e</sup>^<sup>p</sup> � �, with <sup>λ</sup>^<sup>1</sup> <sup>≥</sup> <sup>λ</sup>^<sup>2</sup> <sup>≥</sup> <sup>⋯</sup> <sup>≥</sup> <sup>λ</sup>^<sup>p</sup> <sup>≥</sup> <sup>0</sup>. Let <sup>x</sup> be an observation of the pdimensional random variable X, then:


In the case that the random variables have a normal distribution, the principal components can be obtained from a maximum likelihood estimation <sup>Σ</sup>^ <sup>¼</sup> <sup>S</sup>n, and, in this case, the sampling principal components can be considered as maximum likelihood estimates of the population principal components. Although the eigenvalues of S and Σ^ are different but proportional, with constant proportionality fixed, the proportion of variability they explain is the same. The sample correlation matrix is the same for S and Σ^. We still do not consider the particular case of normal distribution of the variables, so as not to have to include hypotheses that should be verified for the data under study.

Sometimes, the observations x are centered by subtracting the mean x. This operation does not affect the covariance matrix and produces principal components of the form <sup>y</sup>^<sup>i</sup> <sup>¼</sup> <sup>e</sup>^<sup>t</sup> i ð Þ x � x , and in this case <sup>y</sup>^<sup>i</sup> for any component, while the sample variances remain <sup>λ</sup>^1, …, <sup>λ</sup>^p.

When trying to interpret the principal components, the correlation coefficients rxk, <sup>y</sup>^<sup>i</sup> are more reliable guides than the coefficients ^eik, since they avoid interpretive problems caused by the different scales in which the variables are measured.

#### 4.1. Interpretations of the principal sample components

Principal sample components have several interpretations. If the distribution of X is close to Npð Þ <sup>μ</sup>;<sup>Σ</sup> , then components <sup>y</sup>^<sup>i</sup> <sup>¼</sup> <sup>e</sup>^<sup>t</sup> i ð Þ x � x are realizations of the main population components Yi <sup>¼</sup> <sup>e</sup><sup>t</sup> i ð Þ X � μ , which will have distribution Npð Þ 0; Λ , where Λ is the diagonal matrix whose elements are the eigenvalues, ordered from major to minor, from the sample covariance matrix. Keeping in mind the hypothesis of normality, contours of constant density, Ep <sup>¼</sup> <sup>x</sup> <sup>∈</sup> <sup>ℜ</sup><sup>p</sup> jð Þ <sup>x</sup> � <sup>x</sup> <sup>t</sup> S�<sup>1</sup> ð Þ¼ <sup>x</sup> � <sup>x</sup> <sup>c</sup><sup>2</sup> � �, can be estimated and make inferences from them.

The first principal component will be the linear combination, l

t

t <sup>1</sup>xj; l t 2xj

, …, λ^p; e^<sup>p</sup> � �

; y^<sup>k</sup>

p

i¼1

5. The sample correlation coefficients between xkand <sup>y</sup>^<sup>i</sup> are rxk, <sup>y</sup>^<sup>i</sup> <sup>¼</sup> ^eki

� �, i 6¼ k, is equal to 0.

sii <sup>¼</sup> <sup>λ</sup>^<sup>1</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>λ</sup>^p.

<sup>2</sup>xj, which maximizes the sample variance, subject to the condition that l

Proposition 4.1: Let S ¼ ð Þ sik be the p by p matrix of sample covariances, whose pairs of eigenvalues

i

In the case that the random variables have a normal distribution, the principal components can be obtained from a maximum likelihood estimation <sup>Σ</sup>^ <sup>¼</sup> <sup>S</sup>n, and, in this case, the sampling principal components can be considered as maximum likelihood estimates of the population principal components. Although the eigenvalues of S and Σ^ are different but proportional, with constant proportionality fixed, the proportion of variability they explain is the same. The sample correlation matrix is the same for S and Σ^. We still do not consider the particular case of normal distribution of the variables, so as not to have to include hypotheses that should be

Sometimes, the observations x are centered by subtracting the mean x. This operation does not

When trying to interpret the principal components, the correlation coefficients rxk, <sup>y</sup>^<sup>i</sup> are more reliable guides than the coefficients ^eik, since they avoid interpretive problems caused by the

Principal sample components have several interpretations. If the distribution of X is close to

ð Þ X � μ , which will have distribution Npð Þ 0; Λ , where Λ is the diagonal matrix whose elements are the eigenvalues, ordered from major to minor, from the sample covariance

affect the covariance matrix and produces principal components of the form <sup>y</sup>^<sup>i</sup> <sup>¼</sup> <sup>e</sup>^<sup>t</sup>

and in this case <sup>y</sup>^<sup>i</sup> for any component, while the sample variances remain <sup>λ</sup>^1, …, <sup>λ</sup>^p.

variance, subject to the condition l

the sample covariance of the pairs l

dimensional random variable X, then:

3. The sample covariance of y^<sup>i</sup>

4. The total sample variance is P

verified for the data under study.

Npð Þ <sup>μ</sup>;<sup>Σ</sup> , then components <sup>y</sup>^<sup>i</sup> <sup>¼</sup> <sup>e</sup>^<sup>t</sup>

Yi <sup>¼</sup> <sup>e</sup><sup>t</sup> i

different scales in which the variables are measured.

4.1. Interpretations of the principal sample components

i

and eigenvectors are λ^1; e^<sup>1</sup>

the p principal components are completed.

120 Statistics - Growing Data Sets and Growing Demand for Statistics

� �

1. The ith principal component is given by <sup>y</sup>^<sup>i</sup> <sup>¼</sup> <sup>e</sup>^<sup>t</sup>

2. The sample variance of <sup>y</sup>^<sup>k</sup> is <sup>λ</sup>^k, k <sup>¼</sup> <sup>1</sup>, …, p.

tion, l t t

<sup>1</sup>l<sup>1</sup> ¼ 1. The second component will be the linear combina-

� � is equal to zero. This procedure is continued until

x ¼ ^e1ix<sup>1</sup> þ ⋯ þ ^epixp, i ¼ 1, …, p.

, with λ^<sup>1</sup> ≥ λ^<sup>2</sup> ≥ ⋯ ≥ λ^<sup>p</sup> ≥ 0. Let x be an observation of the p-

ffiffiffi λ^i p ffiffiffiffi skk

ð Þ x � x are realizations of the main population components

<sup>p</sup> , i, k ¼ 1, …, p.

<sup>1</sup>xj, which maximizes the sample

t

<sup>2</sup>l<sup>2</sup> ¼ 1 and that

i ð Þ x � x , Although it is not possible to assume normality in the data, geometrically the data are n points ℜ<sup>p</sup> , and the principal components represent an orthogonal transformation whose coordinate axes are the axes of the ellipsoid Ep and with lengths proportional to ffiffiffiffiffi λ^i q , with λ^<sup>i</sup> being the eigenvalues of S. Since all eigenvectors have been chosen such that their norm is equal to 1, the absolute value of the <sup>i</sup>th component <sup>y</sup>^<sup>i</sup> <sup>¼</sup> <sup>e</sup>^<sup>t</sup> i ð Þ <sup>x</sup> � <sup>x</sup> � � � � � is the length of the projection of the vector ð Þ x � x on the vector e^i. Therefore, the principal components can be seen as a translation of the origin to the point x and a rotation of the axes until they pass through the directions with greater variability.

When there is a high positive correlation between all the variables and a principal component with all its coordinates of the same sign, this component can be considered as a weighted average of all the variables or the size of the index that forms that component. The components that have coordinates of different signs oppose a subset of variables against another, being a weighted average of two groups of variables.

The interpretation of the results is simplified assuming that the small coefficients are zero and rounding the rest to express the component as sums, differences, or quotients of variables.

The interpretation of the principal components can be facilitated by graphic representations in two dimensions. A usual graph is to represent two components as coordinate axes and project all points on those axes. These representations also help to test hypotheses of normality and to detect anomalous observations. If there is an observation that is atypical in the first variable, we will have that the variability in that first variable will grow and that the covariance with the other variables will decrease, in absolute value. Consequently, the first component will be strongly influenced by the first variable, distorting the analysis.

Sometimes, it is necessary to verify that the first components are approximately normal, although it is not reasonable to expect this result from a linear combination of variables that do not have to be normal.

The last component can help detect suspicious observations. Each observation x can be expressed as a linear combination of the eigenvectors of S, x<sup>j</sup> ¼ y^1<sup>j</sup> e^<sup>1</sup> þ ⋯ þ y^pje^p, with which the difference between the first components y^1<sup>j</sup> e^<sup>1</sup> þ ⋯ þ y^qje^<sup>q</sup> and the observation x<sup>j</sup> is <sup>y</sup>^<sup>q</sup>�1<sup>j</sup> <sup>e</sup>^<sup>q</sup>�<sup>1</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>y</sup>^pje^p, which is a vector with square of the norm <sup>y</sup>^<sup>2</sup> <sup>q</sup>�1<sup>j</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>y</sup>^<sup>2</sup> pj, and we will suspect of observations that have a large contribution to the square of the aforementioned norm.

An especially small value of the last eigenvalue of the covariance matrix, or correlation matrix, can indicate a linear dependence between the variables that have not been taken into account. In this case, some variable is redundant and should be removed from the analysis. If we have four variables and the fourth is the sum of the other three, then the last eigenvalue will be close to zero due to rounding errors, in which case we should suspect some dependence. In general, eigenvalues close to zero should not be ignored, and eigenvalues associated with these eigenvalues can indicate linear dependencies in the data and cause deformations in the interpretations, calculations, and consequent analysis.

b. Select components until obtaining a proportion of the preset variance (e.g., 80%). This rule should be applied with care, since components that are interesting to reflect certain

Application of Principal Component Analysis to Image Compression

http://dx.doi.org/10.5772/intechopen.75007

c. A rule that does not have a great theoretical support, which must be applied carefully so as not to discard any valid component for the analysis, but which has given good empirical results, is to retain those components with variances, λ^i, above a certain threshold. If the work matrix is the correlation matrix, in which case the average value of the eigenvalues is one, the criterion is to keep the components associated with eigenvalues greater than unity

We are going to illustrate the use of principal components to compress images. To this end, the image of Lena was considered. This photograph has been used by engineers, researchers, and

The black and white photograph shown in Figure 6 was considered. First, the image in .jpg format was converted into the numerical matrix Image of dimension 512 by 512 (i.e., 29

Second, to obtain the observation vectors, the matrix was divided into blocks of dimension

, Aij, with which 4096 blocks were obtained, and each of them was a vector of observa-

x2<sup>9</sup> ). 123

nuances suitable for the interpretation of the analysis could be excluded.

and discard the rest.

5.1. Black and white photography

Figure 6. Black and white photograph of Lena.

23x23

tions.

5. Application to image compression

students for experiments related to image processing.

#### 4.2. Standardized sample principal components

In general, principal components are not invariant against changes of scale in the original variables, as has been mentioned when referring to the normalized population principal components. Normalizing, or standardizing, the variables consists of performing the following transformation <sup>z</sup><sup>j</sup> <sup>¼</sup> D x<sup>j</sup> � <sup>x</sup> � � <sup>¼</sup> <sup>x</sup>1j�x<sup>1</sup> ffiffiffiffi <sup>s</sup><sup>11</sup> <sup>p</sup> ;…; xpj�xp ffiffiffiffi spp p h i<sup>t</sup> , j ¼ 1, …, p. If the matrix Z is the p by n matrix whose columns are zj, it can be shown that its sample mean vector is the null vector and that its correlation matrix is the sample correlation matrix, R, of the original variables.

Remark 4.1: Applying that the principal components of the normalized variables are those obtained for the sample observations but substituting the matrix S for R, we can establish that if z1, …, z<sup>n</sup> are the normalized observations, with covariance matrix R ¼ ð Þ rik , where rik is the sample correlation coefficient between observations x<sup>i</sup> and xk, and if the pairs of eigenvalues and eigenvectors of R are ð Þ v^1; u^<sup>1</sup> ,…, v^p; u^<sup>p</sup> � �, with v^<sup>1</sup> ≥ ⋯ ≥ v^<sup>p</sup> ≥ 0, then


#### 4.3. Criteria for reducing the dimension

The eigenvalues and eigenvectors of the covariance matrix, or correlation matrix, are the essence of the analysis of principal components, since the eigenvalues indicate the directions of maximum variability and the eigenvectors determine the variances. If a few eigenvalues are much larger than the rest, most of the variance can be explained with less than p variables.

In practice, decisions about the number of components to be considered must be made in terms of the pairs of eigenvalues and eigenvectors of the covariance matrix, or correlation matrix, and different rules have been suggested:

a. When performing the graph i; λ^<sup>i</sup> � �, it has been empirically verified that with the first values there is a decrease with a linear tendency of quite steep slope and that from a certain eigenvalue this decrease is stabilized. That is, there is a point from which the eigenvalues are very similar. The criterion consists of staying with the components that exclude the small eigenvalues and that are approximately equal.

