4. Head pose estimation via manifold learning

The manifold learning methods can successfully model the head pose variations in both yaw and pitch as discussed in the previous sections. However, there are still several difficulties to state. The introduction of noise, for example, identity, and illumination variations will affect the performance of those methods on the head pose estimation. Another point is that they do not infer how the low-dimensional representation of an unseen head pose image is obtained (except LPP) and how the pose is estimated. In this section, more sophisticated methods are introduced to solve these problems based on the original or extended manifold learning algorithms.

### 4.1. PCA-based head pose estimation

In Ref. [20], the PCA has been turned to be robust to invariance of identity. Another important conclusion is that the angle of 10° is found to be the lower bound to be discriminative. For the data set constructed following this finding, the PCA would produce promising results for head pose estimation.

A kernel machine-based method is proposed using the kernel PCA (KPCA) and kernel support vector classifier (KSVC) [21]. The KPCA is an extension of the classical PCA. Let <sup>φ</sup>ðx<sup>Þ</sup> : <sup>R</sup><sup>D</sup> ! RD′ be a kernel that maps the original dimensional into a higher dimensional, which makes the nonlinearly separable data linearly separable in the higher dimensional space. Correspondingly, the covariance matrix <sup>C</sup> is replaced by <sup>C</sup> <sup>¼</sup> ΦΦ<sup>T</sup> , of which the bold Φ is the kernel represented data points set. The projection matrix V can be similarly obtained through eigenvector decomposition of matrix C. After the feature dimensionality reduction, a multiclass KSVC is trained which can estimate the view of head. Given a testing image xts, it is first mapped by the kernel and then the low-dimensional features can be obtained by the projection matrix learned from KPCA <sup>y</sup>ts <sup>¼</sup> <sup>V</sup><sup>T</sup> φðxtsÞ which will be fed into the KSVC to predict the head pose estimation. This method is proved to be outperformed its linear counterpart, i.e., PCA + SVC.

## 4.2. View representation by Isomap

3.5. Laplacian preserving projections (LPP)

position defined in Eq. (10) is then reformulated as follows:

4. Head pose estimation via manifold learning

projection y = V<sup>T</sup>

124 Manifolds - Current Research Areas

algorithms.

pose estimation.

<sup>φ</sup>ðx<sup>Þ</sup> : <sup>R</sup><sup>D</sup> ! RD′

explored in the original references.

4.1. PCA-based head pose estimation

The previously introduced algorithms do not clarify how an unseen data is projected to the low-dimensional space. To solve this problem, LPP reformulates the LE by representing the dimensionality reduction as a linear projection from the original to the low-dimensional data. The first two steps of LPP are exactly the same as LE, which construct the adjacent graph and compute the weights for each connection. The most significant difference is the LPP representing the dimensionality reduction from the original to the low-dimensional space as a

instead of directly compute the low-dimensional features. The generalized eigenvector decom-

The bottom d eigenvectors decomposed from Eq. (11) construct the projection matrix <sup>V</sup><sup>D</sup> · <sup>d</sup> ¼ fvig. Any data from the original space can be dimensionally reduced through <sup>y</sup> <sup>=</sup> <sup>V</sup><sup>T</sup>

More improved nonlinear manifold learning algorithms are developed [13, 19], but in this section, the main idea of how to derive the low-dimensional representation of the head poses is the core. Details of the advanced versions of the manifold learning algorithms can be

The manifold learning methods can successfully model the head pose variations in both yaw and pitch as discussed in the previous sections. However, there are still several difficulties to state. The introduction of noise, for example, identity, and illumination variations will affect the performance of those methods on the head pose estimation. Another point is that they do not infer how the low-dimensional representation of an unseen head pose image is obtained (except LPP) and how the pose is estimated. In this section, more sophisticated methods are introduced to solve these problems based on the original or extended manifold learning

In Ref. [20], the PCA has been turned to be robust to invariance of identity. Another important conclusion is that the angle of 10° is found to be the lower bound to be discriminative. For the data set constructed following this finding, the PCA would produce promising results for head

A kernel machine-based method is proposed using the kernel PCA (KPCA) and kernel support vector classifier (KSVC) [21]. The KPCA is an extension of the classical PCA. Let

which makes the nonlinearly separable data linearly separable in the higher dimensional

be a kernel that maps the original dimensional into a higher dimensional,

x. The problem is converted to the one which aims to find a projection space

XLX<sup>T</sup><sup>v</sup> <sup>¼</sup> <sup>λ</sup>XAX<sup>T</sup><sup>v</sup> (11)

x.

The derivation of how the Isomap reduces the dimensionality of the original data to a lowdimension has been introduced in the previous section. Now the problem is how to connect the head pose to the features. A pose parameter map F is proposed in Ref. [22] to build such connections.

$$
\Theta = \mathbf{F} \mathbf{Y} \tag{12}
$$

where Θ ¼ ðθ1, θ2, …, θMÞ denotes the angles of the head poses from the training data and Y is the low-dimensional representation of the data obtained from Isomap. Actually, the matrix of F can be seen as a set of linear transformations that map the features to corresponding pose angles. During training time, the head poses Θ are given as annotations, and the low-dimensional features Y can be learned by manifold learning, then, F can be obtained using the singular value decomposition (SVD) of Y<sup>T</sup> .

$$\mathbf{F}^T = \mathbf{P}\_\mathbf{Y} \mathbf{W}\_\mathbf{Y}^\mathbf{L} \mathbf{U}\_\mathbf{Y}^\mathbf{G} \mathbf{f}^T \tag{13}$$

where PY, WY, and UY are the SVD of Y<sup>T</sup> .

Given a testing image xts, the goal now is to obtain the low-dimensional feature according to the embedding Y. The first step is to construct a geodesic distance vector for the testing image to all the training images <sup>d</sup>ts,<sup>M</sup> ¼ ðd<sup>2</sup> ts, <sup>1</sup>,d<sup>2</sup> ts,2,…,d<sup>2</sup> ts,M,Þ <sup>T</sup>. Then, <sup>d</sup>ts <sup>¼</sup> diagðY<sup>T</sup>YÞ−dts,M. Next, the low-dimensional representation of the testing image is obtained by

$$\mathbf{y}\_{ts} = \frac{1}{2} \left( (\mathbf{Y}^T \mathbf{Y})^{-1} \mathbf{Y}^T \right)^T \mathbf{d}\_{ts} \tag{14}$$

Finally, the estimated pose of the testing image is computed from

$$\boldsymbol{\theta}\_{\rm ts} = \mathbf{F} \mathbf{y}\_{\rm ts} \tag{15}$$

The insight of this method focuses on the conversion from testing data to the subspace learned by nonlinear manifold learning. The algorithms of LLE and LE can also be generalized by the proposed idea.
