**Stage A:** *Initial fixed-variables stage*

	- **B-1** Remove one variable from among *q* variables in **X***V*<sup>1</sup> , make a temporary subset of size *q* − 1, and compute *P* based on the subset. Repeat this for each variable in **X***V*<sup>1</sup> , then obtain *q* values on *P*. Find the best subset of size *q* − 1 which provides the largest *P* among these *q* values and remove the corresponding variable from the present **X***V*<sup>1</sup> . Put *q* := *q* − 1.
	- **B-2** If *P* or *q* is larger than preassigned values, go to **B-1**. Otherwise stop.

#### **[Forward selection]**

**Stage A:** *Initial fixed-variables stage*


#### **Stage B:** *Variable selection stage (Forward)*


In Backward elimination, to find the best subset of *q* − 1 variables, we perform M.PCA for each of *q* possible subsets of the *q* − 1 variables among *q* variables selected in the previous selection step. The total number of estimations for M.PCA from *q* = *p* − 1 to *q* = *r* is therefore large, i.e., *p* + (*p* − 1) + ··· + (*r* + 1)=(*p* − *r*)(*p* + *r* + 1)/2. In Forward selection, the total number of estimations for M.PCA from *q* = *r* to *q* = *p* − 1 is *pCr* + (*p* − *r*)+(*p* − (*r* + 1)) + ··· + 2 = *pCr* + (*p* − *r* − 1)(*p* − *r* + 2)/2.

#### **Numerical experiments 3: Variable selection in M.PCA for simulated data**

We apply PRINCIPALS and v*ε*-PRINCIPALS to variable selection in M.PCA of qualitative data using simulated data consisting of 100 observations on 10 variables with 3 levels.

Table 3 shows the number of iterations and CPU time taken by two algorithms for finding a subset of *q* variables based on 3 (= *r*) principal components. The values of the second to fifth columns in the table indicate that the number of iterations of PRINCIPALS is very large and a long computation time is taken for convergence, while v*ε*-PRINCIPALS converges considerably faster than PRINCIPALS. We can see from the sixth and seventh columns in the

(a) Backward elimination PRINCIPALS v*ε*-PRINCIPALS Speed-up *q* Iteration CPU time Iteration CPU time Iteration CPU time *P* 23 36 1.39 10 0.65 3.60 2.13 0.694 22 819 32.42 231 15.40 3.55 2.11 0.694 21 779 30.79 221 14.70 3.52 2.10 0.693 20 744 29.37 212 14.05 3.51 2.09 0.693 19 725 28.43 203 13.41 3.57 2.12 0.692 18 705 27.45 195 12.77 3.62 2.15 0.692 17 690 26.67 189 12.25 3.65 2.18 0.691 16 671 25.73 180 11.61 3.73 2.22 0.690 15 633 24.26 169 10.85 3.75 2.24 0.689 14 565 21.79 153 10.02 3.69 2.17 0.688 13 540 20.69 147 9.48 3.67 2.18 0.687 12 498 19.09 132 8.64 3.77 2.21 0.686 11 451 17.34 121 7.95 3.73 2.18 0.684 10 427 16.29 117 7.46 3.65 2.18 0.682 9 459 16.99 115 7.05 3.99 2.41 0.679 8 419 15.43 106 6.42 3.95 2.40 0.676 7 382 14.02 100 5.89 3.82 2.38 0.673 6 375 13.51 96 5.41 3.91 2.50 0.669 5 355 12.58 95 5.05 3.74 2.49 0.661 4 480 16.11 117 5.33 4.10 3.02 0.648 3 2,793 86.55 1,354 43.48 2.06 1.99 0.620 2 35 1.92 10 1.34 3.50 1.43 0.581

<sup>139</sup> Acceleration of Convergence of the Alternating Least

Squares Algorithm for Nonlinear Principal Components Analysis

Total 13,581 498.82 4,273 229.20 3.18 2.18 (b) Forward selection PRINCIPALS v*ε*-PRINCIPALS Speed-up *q* Iteration CPU time Iteration CPU time Iteration CPU time *P* 2 3,442 176.76 1,026 119.07 3.35 1.48 0.597 3 5,389 170.82 1,189 44.28 4.53 3.86 0.633 4 1,804 60.96 429 20.27 4.21 3.01 0.650 5 1,406 48.53 349 17.41 4.03 2.79 0.662 6 1,243 43.25 305 15.75 4.08 2.75 0.668 7 1,114 39.03 278 14.61 4.01 2.67 0.674 8 871 31.35 221 12.39 3.94 2.53 0.677 9 789 28.57 202 11.52 3.91 2.48 0.680 10 724 26.32 187 10.74 3.87 2.45 0.683 11 647 23.69 156 9.39 4.15 2.52 0.685 12 578 21.30 142 8.60 4.07 2.48 0.687 13 492 18.39 125 7.76 3.94 2.37 0.688 14 432 16.23 110 6.94 3.93 2.34 0.689 15 365 13.91 95 6.13 3.84 2.27 0.690 16 306 11.80 80 5.30 3.83 2.22 0.691 17 267 10.32 71 4.66 3.76 2.21 0.691 18 226 8.77 60 3.96 3.77 2.21 0.692 19 193 7.48 51 3.39 3.78 2.21 0.692 20 152 5.91 40 2.65 3.80 2.23 0.693 21 108 4.26 30 2.00 3.60 2.13 0.693 22 72 2.85 20 1.33 3.60 2.14 0.694 23 36 1.39 10 0.66 3.60 2.11 0.694 Total 20,656 771.88 5,176 328.81 3.99 2.35

Table 4. The numbers of iterations and CPU times of PRINCIPALS and v*ε*-PRINCIPALS, their speed-ups and *P* in application to variable selection for finding a subset of *q* variables

using MDOC.

table that v*ε*-PRINCIPALS requires the number of iterations 3 - 5 times smaller and CPU time 2 - 5 times shorter than v*ε*-PRINCIPALS. In particular, the v*ε* acceleration effectively works to accelerate the convergence of {**X**∗(*t*)}*t*≥<sup>0</sup> for the larger number of iterations of PRINCIPALS.


Table 3. The numbers of iterations and CPU times of PRINCIPALS and v*ε*-PRINCIPALS and their speed-ups in application to variable selection for finding a subset of *q* variables using simulated data.

The last row in Table 3 shows the total number of iterations and total CPU time for selecting 8 subsets for *q* = 3, . . . , 10. When searching the best subset for each *q*, PRINCIPALS requires 64,491 iterations in Backward elimination and 178,249 iterations in Forward selection, while v*ε*-PRINCIPALS finds the subsets after 17,530 and 32,405 iterations, respectively. These values show that the computation times by v*ε*-PRINCIPALS are reduced to only 28%(= 1/3.52) and 19% = (1/5.16) of those of ordinary PRINCIPALS. The iteration and CPU time speed-ups given in the sixth and seventh columns of the table demonstrate that the v*ε* acceleration works well to speed up the convergence of {**X**∗(*t*)}*t*≥<sup>0</sup> and consequently results in greatly reduced computation times in variable selection problems.

#### **Numerical experiments 4: Variable selection in M.PCA for real data**

We consider the variable selection problems in M.PCA of qualitative data to mild distribution of consciousness (MDOC) data from Sano et al. [Sano et al. 1977]. MDOC is the data matrix of 87 individuals on 23 variables with 4 levels. In the variable selection problem, we select a suitable subset based on 2 (= *r*) principal components.

Table 4 summarizes the results of variable selection using Backward elimination and Forward selection procedures for finding a subset of *q* variables. We see from the last row of the table 10 Principal Component Analysis

table that v*ε*-PRINCIPALS requires the number of iterations 3 - 5 times smaller and CPU time 2 - 5 times shorter than v*ε*-PRINCIPALS. In particular, the v*ε* acceleration effectively works to accelerate the convergence of {**X**∗(*t*)}*t*≥<sup>0</sup> for the larger number of iterations of PRINCIPALS. (a) Backward elimination

> PRINCIPALS v*ε*-PRINCIPALS Speed-up *q* Iteration CPU time Iteration CPU time Iteration CPU time 10 141 1.70 48 0.68 2.94 2.49 9 1,363 17.40 438 6.64 3.11 2.62 8 1,620 20.19 400 5.98 4.05 3.37 7 1,348 16.81 309 4.80 4.36 3.50 6 4,542 53.72 869 11.26 5.23 4.77 5 13,735 159.72 2,949 35.70 4.66 4.47 4 41,759 482.59 12,521 148.13 3.34 3.26 3 124 1.98 44 1.06 2.82 1.86 Total 64,491 752.40 17,530 213.57 3.68 3.52 (b) Forward selection

> PRINCIPALS v*ε*-PRINCIPALS Speed-up *q* Iteration CPU time Iteration CPU time Iteration CPU time 3 4,382 67.11 1442 33.54 3.04 2.00 4 154,743 1,786.70 26,091 308.33 5.93 5.79 5 13,123 152.72 3,198 38.61 4.10 3.96 6 3,989 47.02 1,143 14.24 3.49 3.30 7 1,264 15.27 300 4.14 4.21 3.69 8 340 4.38 108 1.70 3.15 2.58 9 267 3.42 75 1.17 3.56 2.93 10 141 1.73 48 0.68 2.94 2.54 Total 178,249 2,078.33 32,405 402.40 5.50 5.16

Table 3. The numbers of iterations and CPU times of PRINCIPALS and v*ε*-PRINCIPALS and their speed-ups in application to variable selection for finding a subset of *q* variables using

The last row in Table 3 shows the total number of iterations and total CPU time for selecting 8 subsets for *q* = 3, . . . , 10. When searching the best subset for each *q*, PRINCIPALS requires 64,491 iterations in Backward elimination and 178,249 iterations in Forward selection, while v*ε*-PRINCIPALS finds the subsets after 17,530 and 32,405 iterations, respectively. These values show that the computation times by v*ε*-PRINCIPALS are reduced to only 28%(= 1/3.52) and 19% = (1/5.16) of those of ordinary PRINCIPALS. The iteration and CPU time speed-ups given in the sixth and seventh columns of the table demonstrate that the v*ε* acceleration works well to speed up the convergence of {**X**∗(*t*)}*t*≥<sup>0</sup> and consequently results in greatly reduced

We consider the variable selection problems in M.PCA of qualitative data to mild distribution of consciousness (MDOC) data from Sano et al. [Sano et al. 1977]. MDOC is the data matrix of 87 individuals on 23 variables with 4 levels. In the variable selection problem, we select a

Table 4 summarizes the results of variable selection using Backward elimination and Forward selection procedures for finding a subset of *q* variables. We see from the last row of the table

simulated data.

computation times in variable selection problems.

suitable subset based on 2 (= *r*) principal components.

**Numerical experiments 4: Variable selection in M.PCA for real data**


Table 4. The numbers of iterations and CPU times of PRINCIPALS and v*ε*-PRINCIPALS, their speed-ups and *P* in application to variable selection for finding a subset of *q* variables using MDOC.

**7. Acknowledgment**

**8. Appendix A: PRINCALS**

under the restriction

calculated as the vector **A**(0)

*<sup>j</sup>* <sup>=</sup> **<sup>Z</sup>**(*t*+1)�**X**∗(*t*)

and for *j* ∈ J*<sup>S</sup>* by

value of **X**∗

**A**(*t*+1)

20500263.

The authors would like to thank the editor and two referees whose valuable comments and kind suggestions that led to an improvement of this paper. This research is supported by the Japan Society for the Promotion of Science (JSPS), Grant-in-Aid for Scientific Research (C), No

<sup>141</sup> Acceleration of Convergence of the Alternating Least

PRINCALS by Gifi [Gifi, 1990] can handle multiple nominal variables in addition to the single nominal, ordinal and numerical variables accepted in PRINCIPALS. We denote the set of multiple variables by J*<sup>M</sup>* and the set of single variables with single nominal and ordinal scales and numerical measurements by J*S*. For **X** consisting of a mixture of multiple and single variables, the algorithm alternates between estimation of **Z**, **A** and **X**∗ subject to minimizing

*θ*<sup>∗</sup> = tr(**Z** − **X**∗**A**)�(**Z** − **X**∗**A**)

of **<sup>Z</sup>**(0) are initialized with random numbers under the restriction (9). For *<sup>j</sup>* ∈ J*M*, the initial

*<sup>j</sup>* **<sup>G</sup>***j*)−1**G**�

first *Kj* successive integers under the normalization restriction, and the initial value of **A***<sup>j</sup>* is

in Michailidis and de Leeuw [Michailidis and Leeuw, 1998] iterates the following two steps:

**X**∗(*t*) *<sup>j</sup>* + ∑ *j*∈J*<sup>S</sup>*

*<sup>j</sup>* **<sup>G</sup>***j*)−1**G**�

**A**(*t*+1)

*<sup>j</sup>* **<sup>Z</sup>**(*t*+1)

⎛ <sup>⎝</sup> ∑ *j*∈J*<sup>M</sup>*

*<sup>j</sup>* = **G***j*(**G**�

*<sup>j</sup>* **<sup>G</sup>***j*)−1**G**�

For the initialization of PRINCALS, we determine initial data **Z**(0)

Squares Algorithm for Nonlinear Principal Components Analysis

*<sup>j</sup>* = **G***j*(**G**�

*<sup>j</sup>* <sup>=</sup> **<sup>Z</sup>**(0)�**X**∗(0)

**Z**(*t*+1) = *p*−<sup>1</sup>

*<sup>j</sup>* **<sup>X</sup>**∗(*t*) *<sup>j</sup>* . • *Optimal scaling step*: Estimate the optimally scaled vector for *j* ∈ J*<sup>M</sup>* by

**X**∗(*t*+1)

• *Model parameter estimation step*: Calculate **Z**(*t*+1) by

Columnwise center and orthonormalize **Z**(*t*+1)

**X**∗(*t*+1)

*<sup>j</sup>* /**X**∗(*t*)�

*<sup>j</sup>* = **G***j*(**G**�

under measurement restrictions on each of the variables.

*<sup>j</sup>* is obtained by **<sup>X</sup>**∗(0)

**Z**�**1***<sup>n</sup>* = **0***<sup>r</sup>* and **Z**�**Z** = *n***I***p*. (9)

**X**∗(*t*) *<sup>j</sup>* **<sup>A</sup>**(*t*) *j*

. Estimate **A**(*t*+1)

*<sup>j</sup>* **<sup>Z</sup>**(*t*+1)

*<sup>j</sup>* /**A**(*t*+1)�

*<sup>j</sup>* **<sup>A</sup>**(*t*+1) *j*

*<sup>j</sup>* **<sup>Z</sup>**(0). For *<sup>j</sup>* ∈ J*S*, **<sup>X</sup>**∗(0)

*<sup>j</sup>* . Given these initial values, PRINCALS as provided

⎞ ⎠ .

, **A**(0) and **X**∗(0)

. The values

*<sup>j</sup>* is defined as the

*<sup>j</sup>* for the single variable *j* by

that the iteration speed-ups are 3.18 in Backward elimination and 3.99 in Forward selection and thus v*ε*-PRINCIPALS well accelerates the convergence of {**X**∗(*t*) }*t*≥0. The CPU time speed-ups are 2.18 in Backward elimination and 2.35 in Forward selection, and are not as large as the iteration speed-ups. The computation time per iteration of v*ε*-PRINCIPALS is greater than that of PRINCIPALS due to computation of the *Acceleration step*. Therefore, for the smaller number of iterations, the CPU time of v*ε*-PRINCIPALS is almost same as or may be longer than that of PRINCIPALS. For example, in Forward selection for *q* = 2, PRINCIPALS converges in almost cases after less than 15 iterations and then the CPU time speed-up is 1.48.

The proportion *P* in the eighth column of the table indicates the variation explained by the first 2 principal components for the selected *q* variables. Iizuka et al. [Iizuka et al., 2003] selected the subset of 6 variables found by either procedures as a best subset, since *P* slightly changes until *q* = 6 in Backward elimination and after *q* = 6 in Forward selection.
