**1. Introduction**

22 Will-be-set-by-IN-TECH

128 Principal Component Analysis

Hauser, H., Neumann, G., Ijspeert, A. J. & Maass, W. (2007). Biologically inspired kinematic

Jerde, T. E., Soechting, J. F. & Flanders, M. (2003). Biological constraints simplify the

Mason, C. R., Gomez, J. E. & Ebner, T. J. (2001). Hand synergies during reach-to-grasp, *Journal*

Poggio, T. & Bizzi, E. (2004). Generalization in vision and motor control, *Nature* 431: 768–774. Popovic, M. & Popovic, D. (2001). Cloning biological synergies improves control of elbow

Roweis, S. & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding,

Santello, M., Flanders, M. & Soechting, J. F. (1998). Postural hand synergies for tool use, *Journal*

Santello, M., Flanders, M. & Soechting, J. F. (2002). Patterns of hand motion during grasping and the influence of sensory guidance, *Journal of Neuroscience* 22(4): 1426–1435. Scholkopf, B., Smola, A. . J. & Muller, K.-R. (1998). Nonlinear component analysis as a kernel

Tenenbaum, J. . B., Silva, V. D. & Langford, J. C. (2000). A global geometric framework for

Thakur, P. H., Bastian, A. J. & Hsiao, S. S. (2008). Multidigit movement synergies of the

Todorov, E. & Ghahramani, Z. (2004). Analysis of the synergies underlying complex hand

van der Maaten, L. J. P., Postma, E. O. & van den Herik, H. J. (2009). Dimensionality reduction: A comparative review, *Tilburg University Technical Report, TiCC-TR* 2009-005. Vinjamuri, R., Crammond, D. J., Kondziolka, D. & Mao, Z.-H. (2009). Extraction of sources of

Vinjamuri, R., Mao, Z.-H., Sclabassi, R. & Sun, M. (2007). Time-varying synergies in velocity

Vinjamuri, R., Sun, M., Chang, C.-C., Lee, H.-N., Sclabassi, R. J. & Mao, Z.-H. (2010b).

Vinjamuri, R., Sun, M., Crammond, D., Sclabassi, R. & Mao, Z.-H. (2008). Inherent bimanual

Vinjamuri, R., Weber, D., Mao, Z. H., Collinger, J., Degenhart, A., Kelly, J., Boninger,

*International Conference of the IEEE EMBS*, Lyon, France, pp. 4846–4849. Vinjamuri, R., Sun, M., Chang, C.-C., Lee, H.-N., Sclabassi, R. J. & Mao, Z.-H. (2010a).

human hand in an unconstrained haptic exploration task, *Journal of Neuroscience*

manipulation, *Proc. 26th Annual International Conference of the IEEE EMBS*, San

tremor in hand movements of patients with movement disorders, *IEEE Transactions*

profiles of finger joints of the hand during reach and grasp, *Proc. 29th Annual*

Dimensionality reduction in control and coordination of human hand, *IEEE*

Temporal postural synergies of the hand in rapid grasping tasks, *IEEE Transactions*

postural synergies in hands, *Proc. 30th Annual International Conference of the IEEE*

M., Tyler-Kabara, E. & Wang, W. (2011). Towards synergy-based brain-machine interfaces, *IEEE Transactions on Information Technology and Biomedicine* 15(5): 726–736.

eigenvalue problem., *Neural Computation* 10(5): 1299–1319.

nonlinear dimensionality reduction, *Science* 290: 2319–2323.

*on Information Technology and Biomedicine* 13(1): 49–59.

*Transactions on Biomedical Engineering* 57(2): 284–295.

*on Information Technology and Biomedicine* 14(4): 986–994.

*EMBS*, Vancouver, British Columbia, Canada, pp. 5093–5096.

Jolliffe, I. T. (2002). *Principal Component Analysis, 2nd Ed.*, Springer, New York, NY, USA. Mackenzie, C. L. & Iberall, T. (1994). *The Grasping Hand (Advances in Psychology)*,

*IEEE-RAS 7th Intl. Conf. Humanoid Robots*, Pittsburgh, PA, USA.

recognition of hand shapes, 50(2): 565–569.

North-Holland, Amsterdam, Netherlands.

*of Neurophysiology* 86(6): 2896–2910.

*of Neuroscience* 18(23): 10105–10115.

Francisco, CA, USA, pp. 4637–4640.

neuroprostheses, 20(1): 74–81.

*Science* 290(5500): 2323–2326.

28(6): 1271–1281.

synergies provide a new paradigm for balance control of humanoid robots, *Proc.*

Principal components analysis (PCA) is a popular descriptive multivariate method for handling quantitative data. In PCA of a mixture of quantitative and qualitative data, it requires quantification of qualitative data to obtain optimal scaling data and use ordinary PCA. The extended PCA including such quantification is called *nonlinear PCA*, see Gifi [Gifi, 1990]. The existing algorithms for nonlinear PCA are PRINCIPALS of Young et al. [Young et al., 1978] and PRINCALS of Gifi [Gifi, 1990] in which the alternating least squares (ALS) algorithm is utilized. The algorithm alternates between quantification of qualitative data and computation of ordinary PCA of optimal scaling data.

In the application of nonlinear PCA for very large data sets and variable selection problems, many iterations and much computation time may be required for convergence of the ALS algorithm, because its speed of convergence is linear. Kuroda et al. [Kuroda et al., 2011] proposed an acceleration algorithm for speeding up the convergence of the ALS algorithm using the vector *ε* (v*ε*) algorithm of Wynn [Wynn, 1962]. During iterations of the v*ε* accelerated ALS algorithm, the v*ε* algorithm generates an accelerated sequence of optimal scaling data estimated by the ALS algorithm. Then the v*ε* accelerated sequence converges faster than the original sequence of the estimated optimal scaling data. In this paper, we use PRINCIPALS as the ALS algorithm for nonlinear PCA and provide the v*ε* acceleration for PRINCIPALS (v*ε*-PRINCIPALS). The computation steps of PRINCALS are given in Appendix A. As shown in Kuroda et al. [Kuroda et al., 2011], the v*ε* acceleration is applicable to PRINCALS.

The paper is organized as follows. We briefly describe nonlinear PCA of a mixture of quantitative and qualitative data in Section 2, and describe PRINCIPALS for finding least squares estimates of the model and optimal scaling parameters in Section 3. Section 4 presents the procedure of v*ε*-PRINCIPALS that adds the v*ε* algorithm to PRINCIPALS for speeding up convergence and demonstrate the performance of the v*ε* acceleration using numerical experiments. In Section 5, we apply v*ε*-PRINCIPALS to variable selection in nonlinear PCA. Then we utilize modified PCA (M.PCA) approach of Tanaka and Mori [Tanaka and Mori, 1997] for variable selection problems and give the variable selection procedures in M.PCA of qualitative data. Numerical experiments examine the the performance and properties of v*ε*-PRINCIPALS. In Section 6, we present our concluding remarks.

for these parameters. The model parameters are used to compute the predictive values of the model. The optimal scaling parameters are obtained by solving the least squares regression problem for the predictive values. Krijnen [Krijnen, 2006] gave sufficient conditions for convergence of the ALS algorithm and discussed convergence properties in its application to several statistical models. Kiers [Kiers, 2002] described setting up the ALS and iterative

<sup>131</sup> Acceleration of Convergence of the Alternating Least

PRINCIPALS proposed by Young et al. [Young et al., 1978] is a method for utilizing the ALS algorithm for nonlinear PCA of a mixture of quantitative and qualitative data. PRINCIPALS alternates between ordinary PCA and optimal scaling, and minimizes *θ*∗ defined by Equation (4) under the restriction (3). Then *θ*∗ is to be determined by model parameters **Z** and **A** and optimal scaling parameter **X**∗, by updating each of the parameters in turn, keeping the others

For the initialization of PRINCIPALS, we determine initial data **X**∗(0). The observed data **X** may be used as **X**∗(0) after it is standardized to satisfy the restriction (3). For given initial data

where **A**�**A** = **I***<sup>r</sup>* and **D***<sup>r</sup>* is an *r* × *r* diagonal matrix of eigenvalues, and the superscript (*t*)

for fixed **X**ˆ (*t*+1) under measurement restrictions on each of the variables. Scale **X**∗(*t*+1) by

We briefly introduce the v*ε* algorithm of Wynn [Wynn, 1962] used in the acceleration of the ALS algorithm. The v*ε* algorithm is utilized to speed up the convergence of a slowly convergent vector sequence and is very effective for linearly converging sequences. Kuroda and Sakakihara [Kuroda and Sakakihara, 2006] proposed the *ε*-accelerated EM algorithm that speeds up the convergence of the EM sequence via the v*ε* algorithm and demonstrated that its speed of convergence is significantly faster than that of the EM algorithm. Wang et al. [Wang

**A** = **AD***r*, (5)

**A**(*t*)� from Equation (1). Find **X**∗(*t*+1) such that

)

**A**(*t*) .

)�(**X**<sup>∗</sup> <sup>−</sup> **<sup>X</sup>**<sup>ˆ</sup> (*t*+1)

,... } be a linear convergent sequence generated by an

, **Y**˙ (2)

,... } be the accelerated

, **Y**˙ (1)

}*t*≥<sup>0</sup> <sup>=</sup> {**Y**˙ (0)

majorization algorithms for solving various matrix optimization problems.

**X**∗(0) with the restriction (3), PRINCIPALS iterates the following two steps:

indicates the *t*-th iteration. Compute **Z**(*t*) from **Z**(*t*) = **X**∗(*t*)

**X**∗(*t*)�**X**∗(*t*) *n*

**<sup>X</sup>**∗(*t*+1) <sup>=</sup> arg min **<sup>X</sup>**<sup>∗</sup> tr(**X**<sup>∗</sup> <sup>−</sup> **<sup>X</sup>**<sup>ˆ</sup> (*t*+1)

et al., 2008] studied the convergence properties of the *ε*-accelerated EM algorithm.

• *Model parameter estimation step*: Obtain **A**(*t*) by solving

Squares Algorithm for Nonlinear Principal Components Analysis

• *Optimal scaling step*: Calculate **X**ˆ (*t*+1) = **Z**(*t*)

columnwise centering and normalizing.

**4. The v***ε* **acceleration of the ALS algorithm**

, **Y**(1), **Y**(2)

iterative computational procedure and let {**Y**˙ (*t*)

Let {**Y**(*t*)}*t*≥<sup>0</sup> <sup>=</sup> {**Y**(0)

**PRINCIPALS**

fixed.

#### **2. Nonlinear principal components analysis**

PCA transforms linearly an original data set of variables into a substantially smaller set of uncorrelated variables that contains much of the information in the original data set. The original data matrix is then replaced by an estimate constructed by forming the product of matrices of component scores and eigenvectors.

Let **X** = (**X**<sup>1</sup> **X**<sup>2</sup> ··· **X***p*) be an *n* × *p* matrix of *n* observations on *p* variables and be columnwise standardized. In PCA, we postulate that **X** is approximated by the following bilinear form:

$$
\hat{\mathbf{X}} = \mathbf{Z} \mathbf{A}^{\top},\tag{1}
$$

where **Z** = (**Z**<sup>1</sup> **Z**<sup>2</sup> ··· **Z***r*) is an *n* × *r* matrix of *n* component scores on *r* (1 ≤ *r* ≤ *p*) components, and **A** = (**A**<sup>1</sup> **A**<sup>2</sup> ··· **A***r*) is a *p* × *r* matrix consisting of the eigenvectors of **X**�**X**/*n* and **A**�**A** = **I***r*. Then we determine model parameters **Z** and **A** such that

$$\boldsymbol{\theta} = \text{tr}(\mathbf{X} - \hat{\mathbf{X}})^\top (\mathbf{X} - \hat{\mathbf{X}}) = \text{tr}(\mathbf{X} - \mathbf{Z}\mathbf{A}^\top)^\top (\mathbf{X} - \mathbf{Z}\mathbf{A}^\top) \tag{2}$$

is minimized for the prescribed *r* components.

Ordinary PCA assumes that all variables are measured with interval and ratio scales and can be applied only to quantitative data. When the observed data are a mixture of quantitative and qualitative data, ordinary PCA cannot be directly applied to such data. In such situations, optimal scaling is used to quantify the observed qualitative data and then ordinary PCA can be applied.

To quantify **X***<sup>j</sup>* of qualitative variable *j* with *Kj* categories, the vector is coded by using an *n* × *Kj* indicator matrix **G***<sup>j</sup>* with entries *g*(*j*)*ik* = 1 if object *i* belongs to category *k*, and *g*(*j*)*ik*� = 0 if object *i* belongs to some other category *k*� (�= *k*), *i* = 1, . . . , *n* and *k* = 1, . . . , *Kj*. Then the optimally scaled vector **X**∗ *<sup>j</sup>* of **X***<sup>j</sup>* is given by **X**<sup>∗</sup> *<sup>j</sup>* = **G***jαj*, where *α<sup>j</sup>* is a *Kj* × 1 score vector for categories of **X***j*. Let **X**<sup>∗</sup> = (**X**<sup>∗</sup> <sup>1</sup> **X**<sup>∗</sup> <sup>2</sup> ··· **X**<sup>∗</sup> *<sup>p</sup>*) be an *n* × *p* matrix of optimally scaled observations to satisfy restrictions

$$\mathbf{X}^{\*\top}\mathbf{1}\_{\mathrm{il}} = \mathbf{0}\_{p} \qquad \text{and} \qquad \mathrm{diag}\left[\frac{\mathbf{X}^{\*\top}\mathbf{X}^{\*}}{n}\right] = \mathbf{I}\_{p\prime} \tag{3}$$

where **1***n* and **0***p* are vectors of ones and zeros of length *n* and *p* respectively. In the presence of nominal and/or ordinal variables, the optimization criterion (2) is replaced by

$$\boldsymbol{\theta}^{\*} = \text{tr}(\mathbf{X}^{\*} - \hat{\mathbf{X}})^{\top}(\mathbf{X}^{\*} - \hat{\mathbf{X}}) = \text{tr}(\mathbf{X}^{\*} - \mathbf{Z}\mathbf{A}^{\top})^{\top}(\mathbf{X}^{\*} - \mathbf{Z}\mathbf{A}^{\top}).\tag{4}$$

In nonlinear PCA, we determine the optimal scaling parameter **X**∗, in addition to estimating **Z** and **A**.

#### **3. Alternating least squares algorithm for nonlinear principal components analysis**

A possible computational algorithm for estimating simultaneously **Z**, **A** and **X**∗ is the ALS algorithm. The algorithm involves dividing an entire set of parameters of a model into the model parameters and the optimal scaling parameters, and finds the least squares estimates for these parameters. The model parameters are used to compute the predictive values of the model. The optimal scaling parameters are obtained by solving the least squares regression problem for the predictive values. Krijnen [Krijnen, 2006] gave sufficient conditions for convergence of the ALS algorithm and discussed convergence properties in its application to several statistical models. Kiers [Kiers, 2002] described setting up the ALS and iterative majorization algorithms for solving various matrix optimization problems.

#### **PRINCIPALS**

2 Principal Component Analysis

PCA transforms linearly an original data set of variables into a substantially smaller set of uncorrelated variables that contains much of the information in the original data set. The original data matrix is then replaced by an estimate constructed by forming the product of

Let **X** = (**X**<sup>1</sup> **X**<sup>2</sup> ··· **X***p*) be an *n* × *p* matrix of *n* observations on *p* variables and be columnwise standardized. In PCA, we postulate that **X** is approximated by the following

where **Z** = (**Z**<sup>1</sup> **Z**<sup>2</sup> ··· **Z***r*) is an *n* × *r* matrix of *n* component scores on *r* (1 ≤ *r* ≤ *p*) components, and **A** = (**A**<sup>1</sup> **A**<sup>2</sup> ··· **A***r*) is a *p* × *r* matrix consisting of the eigenvectors of

Ordinary PCA assumes that all variables are measured with interval and ratio scales and can be applied only to quantitative data. When the observed data are a mixture of quantitative and qualitative data, ordinary PCA cannot be directly applied to such data. In such situations, optimal scaling is used to quantify the observed qualitative data and then ordinary PCA can

To quantify **X***<sup>j</sup>* of qualitative variable *j* with *Kj* categories, the vector is coded by using an *n* × *Kj* indicator matrix **G***<sup>j</sup>* with entries *g*(*j*)*ik* = 1 if object *i* belongs to category *k*, and *g*(*j*)*ik*� = 0

where **1***n* and **0***p* are vectors of ones and zeros of length *n* and *p* respectively. In the presence

In nonlinear PCA, we determine the optimal scaling parameter **X**∗, in addition to estimating

**3. Alternating least squares algorithm for nonlinear principal components analysis** A possible computational algorithm for estimating simultaneously **Z**, **A** and **X**∗ is the ALS algorithm. The algorithm involves dividing an entire set of parameters of a model into the model parameters and the optimal scaling parameters, and finds the least squares estimates

 **X**∗�**X**∗ *n*

*<sup>θ</sup>*<sup>∗</sup> <sup>=</sup> tr(**X**<sup>∗</sup> <sup>−</sup> **<sup>X</sup>**ˆ)�(**X**<sup>∗</sup> <sup>−</sup> **<sup>X</sup>**ˆ) = tr(**X**<sup>∗</sup> <sup>−</sup> **ZA**�)�(**X**<sup>∗</sup> <sup>−</sup> **ZA**�). (4)

*<sup>θ</sup>* <sup>=</sup> tr(**<sup>X</sup>** <sup>−</sup> **<sup>X</sup>**ˆ)�(**<sup>X</sup>** <sup>−</sup> **<sup>X</sup>**ˆ) = tr(**<sup>X</sup>** <sup>−</sup> **ZA**�)�(**<sup>X</sup>** <sup>−</sup> **ZA**�) (2)

**X**�**X**/*n* and **A**�**A** = **I***r*. Then we determine model parameters **Z** and **A** such that

*<sup>j</sup>* of **X***<sup>j</sup>* is given by **X**<sup>∗</sup>

<sup>2</sup> ··· **X**<sup>∗</sup>

**X**∗�**1***<sup>n</sup>* = **0***<sup>p</sup>* and diag

of nominal and/or ordinal variables, the optimization criterion (2) is replaced by

<sup>1</sup> **X**<sup>∗</sup>

**X**ˆ = **ZA**�, (1)

(�= *k*), *i* = 1, . . . , *n* and *k* = 1, . . . , *Kj*. Then the

*<sup>p</sup>*) be an *n* × *p* matrix of optimally scaled observations

*<sup>j</sup>* = **G***jαj*, where *α<sup>j</sup>* is a *Kj* × 1 score vector for

= **I***p*, (3)

**2. Nonlinear principal components analysis**

matrices of component scores and eigenvectors.

is minimized for the prescribed *r* components.

if object *i* belongs to some other category *k*�

optimally scaled vector **X**∗

to satisfy restrictions

categories of **X***j*. Let **X**<sup>∗</sup> = (**X**<sup>∗</sup>

bilinear form:

be applied.

**Z** and **A**.

PRINCIPALS proposed by Young et al. [Young et al., 1978] is a method for utilizing the ALS algorithm for nonlinear PCA of a mixture of quantitative and qualitative data. PRINCIPALS alternates between ordinary PCA and optimal scaling, and minimizes *θ*∗ defined by Equation (4) under the restriction (3). Then *θ*∗ is to be determined by model parameters **Z** and **A** and optimal scaling parameter **X**∗, by updating each of the parameters in turn, keeping the others fixed.

For the initialization of PRINCIPALS, we determine initial data **X**∗(0). The observed data **X** may be used as **X**∗(0) after it is standardized to satisfy the restriction (3). For given initial data **X**∗(0) with the restriction (3), PRINCIPALS iterates the following two steps:

• *Model parameter estimation step*: Obtain **A**(*t*) by solving

$$
\left[\frac{\mathbf{X}^{\*(t)}\top\mathbf{X}^{\*(t)}}{n}\right]\mathbf{A}=\mathbf{A}\mathbf{D}\_{r\prime}\tag{5}
$$

where **A**�**A** = **I***<sup>r</sup>* and **D***<sup>r</sup>* is an *r* × *r* diagonal matrix of eigenvalues, and the superscript (*t*) indicates the *t*-th iteration. Compute **Z**(*t*) from **Z**(*t*) = **X**∗(*t*) **A**(*t*) .

• *Optimal scaling step*: Calculate **X**ˆ (*t*+1) = **Z**(*t*) **A**(*t*)� from Equation (1). Find **X**∗(*t*+1) such that

$$\mathbf{X}^{\*(t+1)} = \arg\min\_{\mathbf{X}^\*} \text{tr}(\mathbf{X}^\* - \hat{\mathbf{X}}^{(t+1)})^\top (\mathbf{X}^\* - \hat{\mathbf{X}}^{(t+1)})^\top$$

for fixed **X**ˆ (*t*+1) under measurement restrictions on each of the variables. Scale **X**∗(*t*+1) by columnwise centering and normalizing.
