**5. Discussion**

The focus here has been on determining the number of dimensions needed to represent a complex of variables adequately. The algebraic solution devolves upon the analysis of properties of the covariance matrix of the variables, especially through its eigensystem.

### **5.1 Regression on principal components**

Next, we consider applying principal component analysis in the context of *multiple regression.* In this context, there is, of course, a response variable *Y* and explanatory variables *X*1,*X*2, … ,*Xp:* One may transform the *X*s to their principal components, as this may aid in the interpretation of the results of the regression. In addition, the number of significant regression coefficients may be decreased. In such *regression on principal components* (see, e.g., [10]), however, one should not necessarily eliminate the principal components with small eigenvalues, as they may still be strongly related to the response variable.

The value of the Bayesian information criterion for Model *k* is

$$\text{BIC}\_{k} = -2LL\_{k} + m\_{k} \ln n,\tag{44}$$

for alternative models indexed by *k* ¼ 1, 2, … , *K*, where *LLk* is the maximum log likelihood for Model *k*, that is, *LLk* ¼ max ln *Lk* and *mk* is the number of independent parameters in Model *k:* For linear regression models with Gaussian-distributed errors, �2*LLk* ¼ Const*:* þ *n* ln MSE*<sup>k</sup>* and so BIC takes the form

$$\text{BIC}\_{k} = n \ln \text{MSE}\_{k} + m\_{k} \ln n,\tag{45}$$

where here MSE*<sup>k</sup>* is the maximum likelihood estimate (MLE) of the mean squared error (MSE) of Model *k*, with divisor *n*, of the error variance.

The total number of subsets of *p* things is 2*<sup>p</sup>:* Therefore, with *p* explanatory variables, there are 2*<sup>p</sup>* alternative models—"subset regressions"—(including the model where no explanatory variables are used and the fitted value of *Y* is simply *y*Þ*:* For example, if there are three Xs, the eight subsets are *X*<sup>1</sup> alone, *X*<sup>2</sup> alone, *X*<sup>3</sup> alone, (*X*1, *X*2Þ,ð Þ *X*1, *X*<sup>3</sup> ,ð Þ *X*2,*X*<sup>3</sup> ,ð Þ *X*1, *X*2,*X*<sup>3</sup> , and the empty set. It would usually seem to be expedient to evaluate all 2*<sup>p</sup>* regression models—regressions on all 2*<sup>p</sup>* subsets of principal components, using adjusted R-square, AIC, and/or BIC rather than reducing the number of models considered by regressing on only a few principal components. That is, in the context of regression on principal components, it is probably wise *not* to reduce the number of principal components, for, as stated above, it is conceivable that some principal components with small eigenvalues may nevertheless be important in explaining and predicting the response variable.
