**5. Criteria for determining the number of meaningful components to retain**

In principal component analysis the number of components extracted is equal to the number of variables being analyzed (under the general condition *n p* ). This means that an analysis of our 5 variables would actually result in 5 components, not two. However, since PCA aims at reducing dimensionality, only the first few components will be important enough to be retained for interpretation and used to present the data. It is therefore reasonable to wonder how many independent components are necessary to best describe the data.

Eigenvalues are thought of as quantitative assessment of how much a component represents the data. The higher the eigenvalues of a component, the more representative it is of the data. Eigenvalues are therefore used to determine the meaningfulness of components. Table 3 provides the eigenvalues from the PCA applied to our dataset. In the column headed "Eignenvalue", the eigenvalue for each component is presented. Each raw in the table presents information about one of the 5 components: the raw "1" provides information about the first component (PCA1) extracted, the raw "2" provides information about the second component (PCA2) extracted, and so forth. Eigenvalues are ranked from the highest to the lowest.

The Basics of Linear Principal Components Analysis 191

eigenvalue due to sampling. Lambert, Wildt and Durand (1990) proposed a bootstrapped

Table 3 shows that the first component has an eigenvalue substantially greater than 1. It therefore explains more variance than a single variable, in fact 2.653 times as much. The second component displays an eigenvalue of 1.98, which is substantially greater than 1, and the third component displays an eigenvalue of 0.269, which is clearly lower than 1. The application of the Kaiser criterion leads us to retain unambiguously the first two principal

The scree test is another device for determining the appropriate number of components to retain. First, it graphs the eigenvalues against the component number. As eigenvalues are constrained to decrease monotonically from the first principal component to the last, the scree plot shows the decreasing rate at which variance is explained by additional principal components. To choose the number of meaningful components, we next look at the scree plot and stop at the point it begins to level off (Cattell, 1966; Horn, 1965). The components that appear *before* the "break" are assumed to be meaningful and are retained for interpretation; those appearing *after* the break are assumed to be unimportant and are not

The scree plot of eigenvalues derived from Table 3 is displayed in Figure 1. The component numbers are listed on the horizontal axis, while eigenvalues are listed on the vertical axis. The Figure shows a relatively large break appearing between components 2 and 3, meaning the each successive component is accounting for smaller and smaller amounts of the total variance. This agrees with the preceding conclusion that two principal components provide a reasonable summary of the data, accounting for about

Sometimes a scree plot will display a pattern such that it is difficult to determine exactly where a break exists. When encountered, the use of the scree plot must be supplemented with additional criteria, such as the Kaiser method or the cumulative percent of variance

When determining the number of meaningful components, remember that the subspace of components retained must account for a reasonable amount of variance in the data. It is usually typical to express the eigenvalues as a percentage of the total. The fraction of an eigenvalue out of the sum of all eigenvalues represents the amount of variance accounted by the corresponding principal component. The cumulative percent of variance explained by

1

 

*r*

*q j j q p*

100

(27)

1

*j j*

version of the Kaiser approach to determine the interpretability of eigenvalues.

retained. Between the components before and after the break lies a scree.

**5.3 Cumulative percent of total variance accounted for** 

the first *q* components is calculated with the formula:

components.

**5.2 Cattell scree test** 

93% of the total variance.

accounted for criterion.

It can be seen that the eigenvalue for component 1 is 2.653, while the eigenvalue for component 2 is 1.98. This means that the first component accounts for 2.653 units of total variance while the second component accounts for 1.98 units. The third component accounts for about 0.27 unit of variance. Note that the sum of the eigenvalues is 5, which is also the number of variables. How do we determine how many components are worth interpreting?


Table 3. **Eigenvalues from PCA**

Several criteria have been proposed for determining how many meaningful components should be retained for interpretation. This section will describe three criteria: the Kaiser eigenvalue-one criterion, the Cattell Scree test, and the cumulative percent of variance accounted for.
