**2.2 Modified PCA**

M.PCA of Tanaka and Mori [1] derives PCs that are computed using only a selected subset but represent all of the variables, including those not selected. This means that M.PCA naturally includes variable selection procedures in its estimation process. Although there are several variable selection methods in PCA, we use M.PCA, because a subset of variables selected by M.PCA can represent all the variables very well and it is easy to incorporate the quantification method in Section 2.1 into M.PCA, which will be described in Section 2.3.

Suppose we obtain an *n* � *p* data matrix **Y** that consists of numerical variables or optimally quantified variables. Let **Y** be decomposed into an *n* � *q* submatrix **Y**<sup>1</sup> and an *n* � ð Þ *p* � *q* submatrix **Y**<sup>2</sup> ð Þ 1≤*q*≤*p* . **Y** is represented by *r* PCs, which is a linear combination of a submatrix **Y**1, that is, **Z** ¼ **Y**1**A**, where *r* is the number of PCs ð Þ 1≤*r*≤*q* . To derive **A** ¼ ð Þ **a**<sup>1</sup> … **a***<sup>r</sup>* , the following Criterion 1 based on Rao [7] and Criterion 2 based on Robert and Escoufier [8] can be used:

**(Criterion 1)** The prediction efficiency **Y** is maximized using a linear predictor in terms of **Z**.

**(Criterion 2)** The closeness of configurations between **Y** and **Z** is maximized using the *RV*-coefficient.

We denote the covariance matrix of **<sup>Y</sup>** <sup>¼</sup> ð Þ **<sup>Y</sup>**1, **<sup>Y</sup>**<sup>2</sup> as **<sup>S</sup>** <sup>¼</sup> **<sup>S</sup>**<sup>11</sup> **<sup>S</sup>**<sup>12</sup> **<sup>S</sup>**<sup>21</sup> **<sup>S</sup>**<sup>22</sup> � �, where the subscript *i* of **S** corresponds to **Y***i*. The maximization criteria for the above Criterion 1 and Criterion 2 are given by the proportion *P*

$$P = \sum\_{j=1}^{r} \lambda\_j / \text{tr} \,(\mathbf{S}),\tag{7}$$

and the *RV*-coefficient

$$RV = \left\{\sum\_{j=1}^{r} \lambda\_j^2 / \text{tr}\left(\mathbf{S}^2\right)\right\}^{1/2},\tag{8}$$

respectively, where *λ <sup>j</sup>* is the *j*-th eigenvalue with the order of magnitude of the EVP

$$\left[\left(\mathbf{S}\_{11}^2 + \mathbf{S}\_{12}\mathbf{S}\_{21}\right) - \lambda \mathbf{S}\_{11}\right] \mathbf{a} = \mathbf{0}.\tag{9}$$

The solution is obtained as a matrix **A**, the columns of which consist of the eigenvectors associated with the largest *r* eigenvalues of EVP (9), and **Y**<sup>1</sup> that provides the largest value of *P* or *RV* is the best subset of *q* variables among all possible subsets of size *q*. Thus, to obtain a reasonable subset of variables with size *q* in PCA, you apply M.PCA to the data and find the subset of size *q*, **Y**1, that has the largest *P* or *RV*. The selected subset **Y**<sup>1</sup> is reasonable in the sense of PCA because it contains information that includes not only the selected variables **Y**<sup>1</sup> but also the deleted ones **Y**2.

#### **2.3 Modified PCA for mixed measurement level data**

M.PCA is a good method to find a reasonable subset of numerical variables as described in the previous section. To select variables from mixed measurement level data by using a criterion in M.PCA, qualitative/categorical variables in the data should be quantified in an appropriate manner. Based on the original idea in ref. [9], considering PRINCIPALS in Section 2.1 and M.PCA in Section 2.2, it is easy to incorporate the quantification (PRINCIPALS) into M.PCA, because we can formulate M.PCA for qualitative data only by replacing the EVP (2)) in the *Model estimation step* of PRINCIPALS by the EVP (9) to get the model parameters **A** and **Z** for M.PCA. Thus, M.PCA and optimal scaling are alternately executed until *<sup>θ</sup>* <sup>∗</sup> <sup>¼</sup> tr **<sup>Y</sup>**<sup>∗</sup> � **<sup>Y</sup>**^ � �<sup>⊤</sup> **<sup>Y</sup>**<sup>∗</sup> � **<sup>Y</sup>**^ � � <sup>¼</sup> tr **<sup>Y</sup>**<sup>∗</sup> � **ZA**<sup>⊤</sup> � �<sup>⊤</sup> **<sup>Y</sup>**<sup>∗</sup> � **ZA**<sup>⊤</sup> � � is minimized. This is nonlinear M.PCA or NL.M.PCA.

Here, we rewrite the ALS algorithm of PRINCIPALS as follows—for given initial data **<sup>Y</sup>**<sup>∗</sup> ð Þ <sup>0</sup> <sup>¼</sup> **<sup>Y</sup>**<sup>∗</sup> ð Þ <sup>0</sup> <sup>1</sup> , **<sup>Y</sup>**<sup>∗</sup> ð Þ <sup>0</sup> 2 � � from the original data **<sup>Y</sup>**, the following two steps are iterated until convergence:

• *Model estimation step*: From **<sup>Y</sup>**<sup>∗</sup> ð Þ*<sup>t</sup>* <sup>¼</sup> **<sup>Y</sup>**<sup>∗</sup> ð Þ*<sup>t</sup>* <sup>1</sup> , **<sup>Y</sup>**<sup>∗</sup> ð Þ*<sup>t</sup>* 2 , obtained **<sup>A</sup>**ð Þ*<sup>t</sup>* by solving the EVP (9).

Compute **<sup>Z</sup>**ð Þ*<sup>t</sup>* from **<sup>Z</sup>**ð Þ*<sup>t</sup>* <sup>¼</sup> **<sup>Y</sup>**<sup>∗</sup> ð Þ*<sup>t</sup>* <sup>1</sup> **<sup>A</sup>**ð Þ*<sup>t</sup>* . Update **<sup>Y</sup>**^ð Þ *<sup>t</sup>*þ<sup>1</sup> <sup>¼</sup> **<sup>Z</sup>**ð Þ*<sup>t</sup>* **<sup>A</sup>**ð Þ*<sup>t</sup>* .

• *Optimal scaling step*: Obtain **<sup>Y</sup>**<sup>∗</sup> ð Þ *<sup>t</sup>*þ<sup>1</sup> for fixed **<sup>Y</sup>**^ð Þ *<sup>t</sup>*þ<sup>1</sup> by separately estimating **<sup>y</sup>**<sup>∗</sup> *j* (=**G***j***q** *<sup>j</sup>* Þ for each variable *j* under the measurement restrictions. Re-compute **Y**<sup>∗</sup> ð Þ *<sup>t</sup>*þ<sup>1</sup> *<sup>j</sup>* by an additional transformation to keep the monotonicity restriction for ordinal variables and skip this computation for numerical variables.

**<sup>Y</sup>**<sup>∗</sup> <sup>¼</sup> **<sup>Y</sup>**<sup>∗</sup> <sup>1</sup> , **Y**<sup>∗</sup> 2 obtained after convergence is an optimally scaled (quantified) matrix of **Y**, and **Y**<sup>1</sup> corresponding to **Y**<sup>∗</sup> <sup>1</sup> is a subset to be selected and **Y**<sup>2</sup> to **Y**<sup>∗</sup> <sup>2</sup> is one to be deleted.

NL.M.PCA procedure for fixed *q* is as described above, but since the variable selection performs M.PCA calculation for *q* ¼ *p*, … ,*r* and *pCq* times to find the best **Y**1, there are three possible *type*s of selection according to where the quantification is implemented in the computation flow (see Fig. 4.1 in [2]).

The first type (*Type 1*) is that the quantification is performed only once at first, that is, nonlinear PCA is applied to the data **Y** to obtain the quantified data **Y**<sup>∗</sup> , and ordinary M.PCA selection is applied to **Y**<sup>∗</sup> . No more quantification is carried out in the selection stage. The second type (*Type 2*) is that the quantification is carried out every time after the best subset of size *q* is found in the selection stage. That is, the quantified **Y**<sup>∗</sup> <sup>1</sup> , **Y**<sup>∗</sup> 2 based on the best subset of the size *q* found in the previous selection is used to find the best subset of size *q* � 1 or *q* þ 1 in the next selection. The third type (*Type 3*) is that the quantification is carried out for every temporary ð Þ **Y**1, **Y**<sup>2</sup> in the section stage, that is, NL.M.PCA is performed whenever temporary ð Þ **Y**1, **Y**<sup>2</sup> is given to compute its criterion value.

A reasonable subset of size *q* is given as **Y**<sup>1</sup> corresponding to the best subset **Y**<sup>∗</sup> 1 which is finally found at *q* when the selection procedure is terminated.
