**2. Modified PCA for mixed measurement level data**

### **2.1 Quantification of qualitative data**

We must use a suitable quantification method in the context of PCA because we here wish to consider a variable selection problem in PCA. One of the best methods is the optimal scaling in nonlinear PCA. Nonlinear PCA is a method to deal with qualitative data, which estimates the parameters of PCA and quantifies qualitative variables simultaneously by alternating between estimation and quantification. PRINCIPALS of Young et al. [4] and PRINCIPALS of Gifi [5] are algorithms for nonlinear PCA. Here we use PRINCIPALS.

PRINCIPALS is an algorithm using the alternating least squares (ALS) algorithm as follows: Let **Y** = (**y**<sup>1</sup> **y**<sup>2</sup> … **y***p*) be a data matrix of *n* objects by *p* categorical variables and let **y** *<sup>j</sup>* of **Y** be a qualitative vector with *K <sup>j</sup>* categories labeled 1, … , *K <sup>j</sup>*. PRINCIPALS minimizes the loss function

$$\sigma\_{L}(\mathbf{Z}, \mathbf{A}, \mathbf{Y}^\*) = \text{tr}\left(\mathbf{Y}^\* - \hat{\mathbf{Y}}\right)^\top (\mathbf{Y}^\* - \hat{\mathbf{Y}}) = \text{tr}\left(\mathbf{Y}^\* - \mathbf{Z}\mathbf{A}^\top\right)^\top (\mathbf{Y}^\* - \mathbf{Z}\mathbf{A}^\top), \tag{1}$$

where **<sup>Y</sup>**<sup>∗</sup> is an optimally scaled matrix form **<sup>Y</sup>**, **<sup>Z</sup>** is an *<sup>n</sup>* � *<sup>r</sup>* matrix of *<sup>n</sup>* component scores on *r* ð Þ 1≤*r*≤*p* components, and **A** ¼ f g **a**<sup>1</sup> **a**<sup>2</sup> … **a***<sup>r</sup>* is a *p* � *r* weight matrix that gives the coefficients of the linear combinations. PRINCIPALS alternately makes two estimations: the model parameters **Z** and **A** for ordinary PCA, and the data parameter for optimally scaled data **Y**<sup>∗</sup> .

In the computation of PRINCIPALS, **Y**<sup>∗</sup> are standardized for each variable such as to satisfy restrictions **<sup>Y</sup>**∗ ⊤**1***<sup>n</sup>* <sup>¼</sup> **<sup>0</sup>***<sup>p</sup>* and diag **<sup>Y</sup>**∗ ⊤**Y**<sup>∗</sup> *n* h i <sup>¼</sup> **<sup>I</sup>***p*. We denote the value *<sup>θ</sup>* estimated the *t*-th iteration by *θ*ð Þ*<sup>t</sup>* . Given the initial data **Y**<sup>∗</sup> ð Þ <sup>0</sup> (the observed data **Y** may be used as **Y**<sup>∗</sup> ð Þ <sup>0</sup> after the above standardization), PRINCIPALS iterates the following two steps:

• *Model estimation step*: By solving the eigenvalue problem (EVP) of the covariance matrix of **<sup>Y</sup>**<sup>∗</sup> ð Þ*<sup>t</sup>* (<sup>¼</sup> **<sup>S</sup>**)

$$[\mathbf{S} - \lambda \mathbf{I}]\mathbf{a} = \mathbf{0},\tag{2}$$

where *<sup>λ</sup>* is the eigenvalues, obtain **<sup>A</sup>**ð Þ *<sup>t</sup>*þ<sup>1</sup> and compute **<sup>Z</sup>**ð Þ *<sup>t</sup>*þ<sup>1</sup> <sup>¼</sup> **<sup>Y</sup>**<sup>∗</sup> ð Þ*<sup>t</sup>* **<sup>A</sup>**ð Þ *<sup>t</sup>*þ<sup>1</sup> . Update **<sup>Y</sup>**^ð Þ *<sup>t</sup>*þ<sup>1</sup> <sup>¼</sup> **<sup>Z</sup>**ð Þ *<sup>t</sup>*þ<sup>1</sup> **<sup>A</sup>**ð Þ *<sup>t</sup>*þ<sup>1</sup> <sup>⊤</sup>.

*Variable Selection in Nonlinear Principal Component Analysis DOI: http://dx.doi.org/10.5772/intechopen.103758*

• *Optimal scaling step*: Obtain **Y**<sup>∗</sup> ð Þ *<sup>t</sup>*þ<sup>1</sup> such that

$$\mathbf{Y}^{\*(t+1)} = \arg\min\_{\mathbf{Y}^{\*(t)}} \text{tr}\left(\mathbf{Y}^{\*(t)} - \hat{\mathbf{Y}}^{(t+1)}\right)^{\dagger} \left(\mathbf{Y}^{\*(t)} - \hat{\mathbf{Y}}^{(t+1)}\right) \tag{3}$$

for fixed **<sup>Y</sup>**^ð Þ *<sup>t</sup>*þ<sup>1</sup> by separately estimating **<sup>y</sup>**<sup>∗</sup> *<sup>j</sup>* for each variable *j* under the measurement restrictions on each of the variables. That is, compute **q**ð Þ *<sup>t</sup>*þ<sup>1</sup> *<sup>j</sup>* for nominal variables as

$$\mathbf{q}\_{j}^{(t+1)} = \left(\mathbf{G}\_{j}^{\top}\mathbf{G}\_{j}\right)^{-1}\mathbf{G}\_{j}^{\top}\hat{\mathbf{y}}\_{j}^{(t+1)},\tag{4}$$

where **<sup>q</sup>** *<sup>j</sup>* is a *<sup>K</sup> <sup>j</sup>* � 1 category score vector for **<sup>y</sup>**<sup>∗</sup> *<sup>j</sup>* and **G***<sup>j</sup>* is an *n* � *K <sup>j</sup>* indicator matrix

$$\mathbf{G}\_{j} = \begin{pmatrix} \mathbf{g}\_{jik} \end{pmatrix} = \begin{pmatrix} \mathbf{g}\_{j11} & \cdots & \mathbf{g}\_{j1K\_{j}} \\ \vdots & \vdots & \vdots \\ \mathbf{g}\_{jn1} & \cdots & \mathbf{g}\_{jnK\_{j}} \end{pmatrix} = \begin{pmatrix} \mathbf{g}\_{j1} \dots \mathbf{g}\_{jK\_{j}} \end{pmatrix},\tag{5}$$

where

$$\mathcal{g}\_{jk} = \begin{cases} 1 & \text{if } \text{ object } i \text{ belongs to category } k \\ 0 & \text{if } \text{ object } i \text{ belongs to some other category } k'(\neq k), \end{cases} \tag{6}$$

and then the optimally scaled vector **y**<sup>∗</sup> *<sup>j</sup>* is obtained by **y**<sup>∗</sup> *<sup>j</sup>* =**G***j***q** *<sup>j</sup>* .

Re-compute **q**ð Þ *<sup>t</sup>*þ<sup>1</sup> *<sup>j</sup>* for ordinal variables using the monotone regression [6]. For nominal and ordinal variables, update **y**<sup>∗</sup> ð Þ *<sup>t</sup>*þ<sup>1</sup> *<sup>j</sup>* <sup>¼</sup> **<sup>G</sup>***j***q**ð Þ *<sup>t</sup>*þ<sup>1</sup> *<sup>j</sup>* and standardize **<sup>y</sup>**<sup>∗</sup> ð Þ *<sup>t</sup>*þ<sup>1</sup> *<sup>j</sup>* . For numerical variables, standardize the observed vector **<sup>y</sup>** *<sup>j</sup>* and set **<sup>y</sup>**<sup>∗</sup> ð Þ *<sup>t</sup>*þ<sup>1</sup> *<sup>j</sup>* ¼ **y** *<sup>j</sup>* .

These two steps alternately iterate until convergence, and **y**<sup>∗</sup> *<sup>j</sup>* obtained at convergence is the quantified variable while **A** and **Z** are the solutions of PCA for qualitative data.
