3.4. Laplacian eigenmaps (LE)

Compared to Isomap, the idea of graph representation of the data is also taken by the algorithm of LE. However, the difference is the later attempts to construct a weighted graph (other than distance graph) for the data, which is then represented as a Laplacian [12].

The first step of LE is to construct an adjacent graph whose vertices are the data points and edges are the adjacent connections for neighbors. A pair of points x<sup>i</sup> and x<sup>j</sup> are ϵ-neighbors and will be connected with edge if ||xi − xj|| ≤ ε. The other criterion to connect or disconnect the pair of points is to find if they are K-nearest neighbors for each other. The second step is to choose appropriate weights for the graph. There are two options: the heat kernel defines the weight as wij <sup>¼</sup> <sup>e</sup><sup>−</sup> ‖xi−xj‖<sup>2</sup> <sup>t</sup> if the two points of x<sup>i</sup> and x<sup>j</sup> are connected and zero otherwise. The other option is straightforwardly setting wij =1 for connected edges and zero otherwise.

The third step is to minimize an objective function

$$\min\_{y: y \text{A} y^T = 1} \sum\_{i,j} (y\_i - y\_j)^2 w\_{ij} \tag{9}$$

where the diagonal matrix of <sup>A</sup> <sup>¼</sup> diagfaii<sup>g</sup> is computed by column sums of <sup>W</sup>: aii <sup>¼</sup> ∑<sup>j</sup> wji. From the definition of the objective function, the goal of LE is to preserve the weights for the mapped data from the original data. If the pair of data is close or apart from each other in the original space, they should be also kept close or apart in the embedding. The weight matrix strongly punishes the "connection" for apart data points. Next, the objective function can be derived to

$$\mathbf{L}y = \lambda \mathbf{A}y \tag{10}$$

where L = A − W is the Laplacian matrix for the weighted graph. Such form is a generalized eigenvector decomposition. And the matrix Y whose columns are the bottom d eigenvectors decomposed from Eq. (10) is the d-dimensional representation of the data. To be compared, the LE is less sensitive to outlier and noise due to its property of local preservation. The weights for nonedges are set to be zeros, which diminish the problem of short circuiting.

Figure 8 shows the results of the low-dimensional head poses obtained from Isomap. An interesting shape of "bowl" of the embedding surface obtained for the head pose images.

Compared to Isomap, the idea of graph representation of the data is also taken by the algorithm of LE. However, the difference is the later attempts to construct a weighted graph (other

Figure 8. Visualization of the low-dimensional features obtained by Isomap. (a) The variation in yaw with the pitch of - 30° is found along the edge of the shape. (b) Another variation in yaw with pitch of 0° is found along the geodesic path in

The first step of LE is to construct an adjacent graph whose vertices are the data points and edges are the adjacent connections for neighbors. A pair of points x<sup>i</sup> and x<sup>j</sup> are ϵ-neighbors and will be connected with edge if ||xi − xj|| ≤ ε. The other criterion to connect or disconnect the pair of points is to find if they are K-nearest neighbors for each other. The second step is to choose appropriate weights for the graph. There are two options: the heat kernel defines the

other option is straightforwardly setting wij =1 for connected edges and zero otherwise.

y:yAyT¼1

where the diagonal matrix of <sup>A</sup> <sup>¼</sup> diagfaii<sup>g</sup> is computed by column sums of <sup>W</sup>: aii <sup>¼</sup> ∑<sup>j</sup> wji. From the definition of the objective function, the goal of LE is to preserve the weights for the mapped data from the original data. If the pair of data is close or apart from each other in the original space, they should be also kept close or apart in the embedding. The weight matrix strongly punishes the "connection" for apart data points. Next, the objective function can be

∑ i, j ðyi −yj Þ 2

min

<sup>t</sup> if the two points of x<sup>i</sup> and x<sup>j</sup> are connected and zero otherwise. The

wij (9)

than distance graph) for the data, which is then represented as a Laplacian [12].

the middle of the shape. The interesting thing is the frontal face locates approximated at the center.

3.4. Laplacian eigenmaps (LE)

122 Manifolds - Current Research Areas

‖xi−xj‖<sup>2</sup>

The third step is to minimize an objective function

weight as wij <sup>¼</sup> <sup>e</sup><sup>−</sup>

derived to

As shown in Figure 9, the embedding surface with the shape of parabolais generated by LE, which is similar to the results obtained by LLE. But the latter produces smoother and more symmetric shape of the surface. The variation in yaw from left to right is shown symmetrically, and the frontal face approximately locates on the vertex of bottom.

Figure 9. Visualization of the low-dimensional features obtained by LE.

## 3.5. Laplacian preserving projections (LPP)

The previously introduced algorithms do not clarify how an unseen data is projected to the low-dimensional space. To solve this problem, LPP reformulates the LE by representing the dimensionality reduction as a linear projection from the original to the low-dimensional data. The first two steps of LPP are exactly the same as LE, which construct the adjacent graph and compute the weights for each connection. The most significant difference is the LPP representing the dimensionality reduction from the original to the low-dimensional space as a projection y = V<sup>T</sup> x. The problem is converted to the one which aims to find a projection space instead of directly compute the low-dimensional features. The generalized eigenvector decomposition defined in Eq. (10) is then reformulated as follows:

$$\mathbf{X}\mathbf{L}\mathbf{X}^T\boldsymbol{\upsilon} = \lambda\mathbf{X}\mathbf{A}\mathbf{X}^T\boldsymbol{\upsilon} \tag{11}$$

The bottom d eigenvectors decomposed from Eq. (11) construct the projection matrix <sup>V</sup><sup>D</sup> · <sup>d</sup> ¼ fvig. Any data from the original space can be dimensionally reduced through <sup>y</sup> <sup>=</sup> <sup>V</sup><sup>T</sup> x.

More improved nonlinear manifold learning algorithms are developed [13, 19], but in this section, the main idea of how to derive the low-dimensional representation of the head poses is the core. Details of the advanced versions of the manifold learning algorithms can be explored in the original references.
