**Two-Dimensional Principal Component Analysis and Its Extensions**

Parinya Sanguansat

*Faculty of Engineering and Technology, Panyapiwat Institute of Management Thailand*

#### **1. Introduction**

Normally in Principal Component Analysis (PCA) (Sirovich & Kirby, 1987; Turk & Pentland, 1991), the 2D image matrices are firstly transformed to 1D image vectors by vectorization. The vectorization of a matrix is the column vector obtain by stacking the columns of the matrix on top of one another. The covariance or scatter matrix are formulated from the these image vectors. The covariance matrix will be well estimated if and only if the number of available training samples is not far smaller than the dimension of this matrix. In fact, it is too hard to collect this the number of samples. Then, normally in 1D subspace analysis, the estimated covariance matrix is not well estimated and not full rank.

Two-Dimensional Principal Component Analysis (2DPCA) was proposed by Yang et al. (2004) to apply with face recognition and representation. Evidently, the experimental results in Kong et al. (2005); Yang & Yang (2002); Yang et al. (2004); Zhang & Zhou (2005) have shown the improvement of 2DPCA over PCA on several face databases. Unlike PCA, the image covariance matrix is computed directly on image matrices so the spatial structure information can be preserved. This yields a covariance matrix whose dimension just equals to the width of the face image. This is far smaller than the size of covariance matrix in PCA. Therefore, the image covariance matrix can be better estimated and will usually be full rank. That means the curse of dimensionality and the Small Sample Size (SSS) problem can be avoided.

In this chapter, the detail of 2DPCA's extensions will be presented as follows: The bilateral projection scheme, the kernel version, the supervised framework, the variation of image alignment and the random approaches.

For the first extension, there are many techniques were proposed in bilateral projection schemes such as 2*D*2*PCA* (Zhang & Zhou, 2005), *Bilateral 2DPCA (B2DPCA)* (Kong et al., 2005), *Generalized Low-Rank Approximations of Matrices (GLRAM)* (Liu & Chen, 2006; Liu et al., 2010; Ye, 2004), *Bi-Dierectional PCA (BDPCA)* (Zuo et al., 2005) and *Coupled Subspace Analysis (CSA)* (Xu et al., 2004). The left and right projections are determined by solving two eigenvalue problems per iteration. One corresponds to the column direction and another one corresponds to the row direction of image, respectively. In this way, it is not only consider the image in both directions but also reduce the feature matrix smaller than the original 2DPCA.

As the successful of the kernel method in kernel PCA (KPCA), the kernel based 2DPCA was proposed as *Kernel 2DPCA (K2DPCA)* in Kong et al. (2005). That means the nonlinear mapping can be utilized to improve the feature extraction of 2DPCA.

This matrix **G** is called *image covariance matrix*. Therefore, the alternative criterion can be

Two-Dimensional Principal Component Analysis and Its Extensions 3

where the image inner-scatter matrix **Gx** is computed in a straightforward manner by

**A¯** <sup>=</sup> <sup>1</sup> *M*

number of eigenvectors (*d*) can be chosen according to a predefined threshold (*θ*).

*θ* ≤

the *d* first eigenvectors such that their corresponding eigenvalues satisfy

**Y** = [**y1**,..., **yd**] is then an *m* by *d* matrix given by:

where **X** = [**x1**,..., **xd**] is a *n* by *d* projection matrix.

call it the column-based image covariance matrix **H**, thus

**<sup>H</sup>** <sup>=</sup> <sup>1</sup> *M*

**<sup>H</sup>** <sup>=</sup> <sup>1</sup> *M*

**2.1 Column-based 2DPCA**

*M* ∑ *k*=1

It can be shown that the vector **x** maximizing Eq. (4) correspond to the largest eigenvalue of **G** (Yang & Yang, 2002). This can be done, for example, by using the Eigenvalue decomposition or Singular Value Decomposition (SVD) algorithm. However, one projection axis is usually not enough to accurately represent the data, thus several eigenvectors of **G** are needed. The

Let *λ*<sup>1</sup> ≥ *λ*<sup>2</sup> ≥ ··· ≥ *λ<sup>n</sup>* be eigenvalues of **G** which sorted in non-increasing order. We select

*d* ∑ *i*=1 *λi*

*n* ∑ *i*=1 *λi*

For feature extraction, Let **x1**,..., **xd** be *d* selected largest eigenvectors of **G**. Each image **A** is projected onto these *d* dimensional subspace according to Eq. (1). The projected image

The original 2DPCA can be called the row-based 2DPCA. The alternative way of 2DPCA can

This method can be consider as same as the original 2DPCA but the input images are previously transposed. From Eq. (7), replace the image **A** with the transposed image **A***<sup>T</sup>* and

*<sup>k</sup>* <sup>−</sup> **A¯** *<sup>T</sup>*)*T*(**A***<sup>T</sup>*

be using the column instead of row, column-based 2DPCA (Zhang & Zhou, 2005).

*M* ∑ *k*=1 (**A***<sup>T</sup>*

*M* ∑ *k*=1

*M* ∑ *k*=1

**<sup>G</sup>** <sup>=</sup> <sup>1</sup> *M*

*J*(**x**) = *tr*(**x***T***Gx**), (6)

(**A***<sup>k</sup>* <sup>−</sup> **A¯** )*T*(**A***<sup>k</sup>* <sup>−</sup> **A¯** ), (7)

**A***k*. (8)

. (9)

*<sup>k</sup>* <sup>−</sup> **A¯** *<sup>T</sup>*) (11)

(**A***<sup>k</sup>* <sup>−</sup> **A¯** )(**A***<sup>k</sup>* <sup>−</sup> **A¯** )*T*. (12)

**Y** = **AX**, (10)

expressed by

where **A¯** denotes the average image,

Since 2DPCA is unsupervised projection method, the class information is ignored. To embed this information for feature extraction, the Linear Discriminant Analysis (LDA) is applied in Yang et al. (2004). Moreover, the 2DLDA was proposed and then applied with 2DPCA in Sanguansat et al. (2006b). Another method was proposed in Sanguansat et al. (2006a) based on class-specific subspace which each subspace is constructed from only the training samples in own class while only one subspace is considered in the conventional 2DPCA. In this way, their representation can provide the minimum reconstruction error.

Because of the image covariance matrix is the key of 2DPCA and it is corresponding to the alignment of pixels in image. Different image covariance matrix will obtain the difference information. To produce alternated version of the image covariance matrix, it can be done by rearranging the pixels. The diagonal alignment 2DPCA and the generalized alignment 2DPCA were proposed in Zhang et al. (2006) and Sanguansat et al. (2007a), respectively.

Finally, the random subspace based 2DPCA were proposed by random selecting the subset of eigenvectors of image covariance matrix as in Nguyen et al. (2007); Sanguansat et al. (2007b; n.d.) to build the new projection matrix. From the experimental results, some subset eigenvectors can perform better than others but it cannot predict by their eigenvalues. However, the mutual information can be used in filter strategy for selecting these subsets as shown in Sanguansat (2008).

#### **2. Two-dimensional principal component analysis**

Let each image is represented by a *m* by *n* matrix **A** of its pixels' gray intensity. We consider linear projection of the form

$$\mathbf{y} = \mathbf{A}\mathbf{x},\tag{1}$$

where **x** is an *n* dimensional projection axis and **y** is the projected feature of this image on **x**, called *principal component vector*.

In original algorithm of 2DPCA (Yang et al., 2004), like PCA, 2DPCA search for the optimal projection by maximize the total scatter of projected data. Instead of using the criterion as in PCA, the total scatter of the projected samples can be characterized by the trace of the covariance matrix of the projected feature vectors. From this point of view, the following criterion was adopt as

$$\text{where}$$

$$\mathbf{S}\_{\mathbf{X}} = E[(\mathbf{y} - E\mathbf{y})(\mathbf{y} - E\mathbf{y})^T]. \tag{3}$$

*J*(**x**) = *tr*(**Sx**), (2)

The total power equals to the sum of the diagonal elements or trace of the covariance matrix, the trace of **Sx** can be rewritten as

$$\begin{split} tr(\mathbf{S}\_{\mathbf{x}}) &= tr\{E[(\mathbf{y} - E\mathbf{y})(\mathbf{y} - E\mathbf{y})^{T}]\} \\ &= tr\{E[(\mathbf{A} - E\mathbf{A})\mathbf{x}^{T}(\mathbf{A} - E\mathbf{A})^{T}]\} \\ &= tr\{E[\mathbf{x}^{T}(\mathbf{A} - E\mathbf{A})^{T}(\mathbf{A} - E\mathbf{A})\mathbf{x}]\} \\ &= tr\{\mathbf{x}^{T}E[(\mathbf{A} - E\mathbf{A})^{T}(\mathbf{A} - E\mathbf{A})]\mathbf{x}\} \\ &= tr\{\mathbf{x}^{T}\mathbf{G}\mathbf{x}\}. \end{split} \tag{4}$$

Giving that

$$\mathbf{G} = E[(\mathbf{A} - E\mathbf{A})^T(\mathbf{A} - E\mathbf{A})].\tag{5}$$

This matrix **G** is called *image covariance matrix*. Therefore, the alternative criterion can be expressed by

$$J(\mathbf{x}) = tr(\mathbf{x}^T \mathbf{G} \mathbf{x}),\tag{6}$$

where the image inner-scatter matrix **Gx** is computed in a straightforward manner by

$$\mathbf{G} = \frac{1}{M} \sum\_{k=1}^{M} (\mathbf{A}\_k - \mathbf{\bar{A}})^T (\mathbf{A}\_k - \mathbf{\bar{A}}),\tag{7}$$

where **A¯** denotes the average image,

2 Will-be-set-by-IN-TECH

Since 2DPCA is unsupervised projection method, the class information is ignored. To embed this information for feature extraction, the Linear Discriminant Analysis (LDA) is applied in Yang et al. (2004). Moreover, the 2DLDA was proposed and then applied with 2DPCA in Sanguansat et al. (2006b). Another method was proposed in Sanguansat et al. (2006a) based on class-specific subspace which each subspace is constructed from only the training samples in own class while only one subspace is considered in the conventional 2DPCA. In this way,

Because of the image covariance matrix is the key of 2DPCA and it is corresponding to the alignment of pixels in image. Different image covariance matrix will obtain the difference information. To produce alternated version of the image covariance matrix, it can be done by rearranging the pixels. The diagonal alignment 2DPCA and the generalized alignment 2DPCA were proposed in Zhang et al. (2006) and Sanguansat et al. (2007a), respectively.

Finally, the random subspace based 2DPCA were proposed by random selecting the subset of eigenvectors of image covariance matrix as in Nguyen et al. (2007); Sanguansat et al. (2007b; n.d.) to build the new projection matrix. From the experimental results, some subset eigenvectors can perform better than others but it cannot predict by their eigenvalues. However, the mutual information can be used in filter strategy for selecting these subsets

Let each image is represented by a *m* by *n* matrix **A** of its pixels' gray intensity. We consider

where **x** is an *n* dimensional projection axis and **y** is the projected feature of this image on **x**,

In original algorithm of 2DPCA (Yang et al., 2004), like PCA, 2DPCA search for the optimal projection by maximize the total scatter of projected data. Instead of using the criterion as in PCA, the total scatter of the projected samples can be characterized by the trace of the covariance matrix of the projected feature vectors. From this point of view, the following

The total power equals to the sum of the diagonal elements or trace of the covariance matrix,

<sup>=</sup> *tr*{*E*[(**<sup>A</sup>** <sup>−</sup> *<sup>E</sup>***A**)**xx***T*(**<sup>A</sup>** <sup>−</sup> *<sup>E</sup>***A**)*T*]} <sup>=</sup> *tr*{*E*[**x***T*(**<sup>A</sup>** <sup>−</sup> *<sup>E</sup>***A**)*T*(**<sup>A</sup>** <sup>−</sup> *<sup>E</sup>***A**)**x**]} <sup>=</sup> *tr*{**x***TE*[(**<sup>A</sup>** <sup>−</sup> *<sup>E</sup>***A**)*T*(**<sup>A</sup>** <sup>−</sup> *<sup>E</sup>***A**)]**x**}

*tr*(**S***x*) = *tr*{*E*[(**<sup>y</sup>** <sup>−</sup> *<sup>E</sup>***y**)(**<sup>y</sup>** <sup>−</sup> *<sup>E</sup>***y**)*T*]}

**y** = **Ax**, (1)

*J*(**x**) = *tr*(**Sx**), (2)

**Sx** <sup>=</sup> *<sup>E</sup>*[(**<sup>y</sup>** <sup>−</sup> *<sup>E</sup>***y**)(**<sup>y</sup>** <sup>−</sup> *<sup>E</sup>***y**)*T*]. (3)

<sup>=</sup> *tr*{**x***T***Gx**}. (4)

**<sup>G</sup>** <sup>=</sup> *<sup>E</sup>*[(**<sup>A</sup>** <sup>−</sup> *<sup>E</sup>***A**)*T*(**<sup>A</sup>** <sup>−</sup> *<sup>E</sup>***A**)]. (5)

their representation can provide the minimum reconstruction error.

as shown in Sanguansat (2008).

linear projection of the form

criterion was adopt as

where

Giving that

called *principal component vector*.

the trace of **Sx** can be rewritten as

**2. Two-dimensional principal component analysis**

$$\bar{\mathbf{A}} = \frac{1}{M} \sum\_{k=1}^{M} \mathbf{A}\_k. \tag{8}$$

It can be shown that the vector **x** maximizing Eq. (4) correspond to the largest eigenvalue of **G** (Yang & Yang, 2002). This can be done, for example, by using the Eigenvalue decomposition or Singular Value Decomposition (SVD) algorithm. However, one projection axis is usually not enough to accurately represent the data, thus several eigenvectors of **G** are needed. The number of eigenvectors (*d*) can be chosen according to a predefined threshold (*θ*).

Let *λ*<sup>1</sup> ≥ *λ*<sup>2</sup> ≥ ··· ≥ *λ<sup>n</sup>* be eigenvalues of **G** which sorted in non-increasing order. We select the *d* first eigenvectors such that their corresponding eigenvalues satisfy

$$\theta \le \frac{\sum\_{i=1}^{d} \lambda\_i}{\sum\_{i=1}^{n} \lambda\_i}. \tag{9}$$

For feature extraction, Let **x1**,..., **xd** be *d* selected largest eigenvectors of **G**. Each image **A** is projected onto these *d* dimensional subspace according to Eq. (1). The projected image **Y** = [**y1**,..., **yd**] is then an *m* by *d* matrix given by:

$$\mathbf{Y} = \mathbf{A}\mathbf{X},\tag{10}$$

where **X** = [**x1**,..., **xd**] is a *n* by *d* projection matrix.

#### **2.1 Column-based 2DPCA**

The original 2DPCA can be called the row-based 2DPCA. The alternative way of 2DPCA can be using the column instead of row, column-based 2DPCA (Zhang & Zhou, 2005).

This method can be consider as same as the original 2DPCA but the input images are previously transposed. From Eq. (7), replace the image **A** with the transposed image **A***<sup>T</sup>* and call it the column-based image covariance matrix **H**, thus

$$\mathbf{H} = \frac{1}{M} \sum\_{k=1}^{M} (\mathbf{A}\_k^T - \bar{\mathbf{A}}^T)^T (\mathbf{A}\_k^T - \bar{\mathbf{A}}^T) \tag{11}$$

$$\mathbf{H} = \frac{1}{M} \sum\_{k=1}^{M} (\mathbf{A}\_k - \overline{\mathbf{A}})(\mathbf{A}\_k - \overline{\mathbf{A}})^T. \tag{12}$$

where **B** is a feature matrix which extracted from image **A** and **Z** is a left multiplying projection matrix. Similar to the right multiplying projection matrix **X** in Section 2, matrix **Z** is a *m* by *q* projection matrix that obtained by choosing the eigenvectors of image covariance matrix **H** corresponding to the *q* largest eigenvalues. Therefore, the dimension of feature matrix is decreasing from *m* × *n* to *q* × *d* (*q* < *m* and *d* < *n*). In this way, the computation time also be reducing. Moreover, the recognition accuracy of B2DPCA is often better than 2DPCA as the

Two-Dimensional Principal Component Analysis and Its Extensions 5

The bilateral projection scheme of 2DPCA with the iterative algorithm was proposed in Kong et al. (2005); Liu et al. (2010); Xu et al. (2004); Ye (2004). Let **<sup>Z</sup>** <sup>∈</sup> **<sup>R</sup>***m*×*<sup>q</sup>* and **<sup>X</sup>** <sup>∈</sup> **<sup>R</sup>***n*×*<sup>d</sup>* be the left and right multiplying projection matrix respectively. For an *m* × *n* image **A***<sup>k</sup>* and

The optimal projection matrices, **Z** and **X** in Eq. (17) can be computed by solving the following minimization criterion that the reconstructed image, **ZB***k***X***T*, gives the best approximation of

> *M* ∑ *k*=1

where *M* is the number of data samples and � • �*<sup>F</sup>* is the Frobenius norm of a matrix.

 

The detailed iterative scheme designed to compute the optimal projection matrices, **Z** and **X**, is listed in Table 1. The obtained solutions are locally optimal because the solutions are dependent on the initialized **Z**0. In Kong et al. (2005), the initialized **Z**<sup>0</sup> sets to the *m* × *m*

**I***q* 0 

Alternatively, The criterion in Eq. (18) is biquadratic and has no closed-form solution. Therefore, an iterative procedure to obtain the local optimal solution was proposed in Xu et al.

> 

**<sup>A</sup>***<sup>k</sup>* <sup>−</sup> **<sup>A</sup><sup>Z</sup>**

*<sup>k</sup>* <sup>=</sup> **ZZ***T***A***k*. The solution of Eq. (19) is the eigenvectors of the eigenvalue

*<sup>k</sup>* <sup>−</sup> **A¯ <sup>Z</sup>**)*T*(**A<sup>Z</sup>**

**<sup>A</sup>***<sup>k</sup>* <sup>−</sup> **ZZ***T***A<sup>X</sup>**

*k* 2 *F*

*<sup>k</sup>* **XX***<sup>T</sup>* 2 *F*

*M* ∑ *k*=1 **<sup>A</sup>***<sup>k</sup>* <sup>−</sup> **ZB***k***X***<sup>T</sup>*

 2 *F*

**<sup>B</sup>***<sup>k</sup>* = **<sup>Z</sup>***T***A***k***<sup>X</sup>** (17)

, (18)

, (19)

, (21)

*<sup>k</sup>* <sup>−</sup> **A¯ <sup>Z</sup>**). (20)

in Ye (2004), where **I***<sup>q</sup>* is the *q* × *q* identity

experimental results in Liu & Chen (2006); Zhang & Zhou (2005); Zuo et al. (2005).

*q* × *d* projected image **B***<sup>k</sup>* , the bilateral projection is formulated as follows:

*J*(**Z**, **X**) = min

(2004). For **<sup>X</sup>** <sup>∈</sup> **<sup>R</sup>***n*×*d*, the criterion in Eq. (18) can be rewritten as

**<sup>G</sup>** <sup>=</sup> <sup>1</sup> *M*

Similarly, for **<sup>Z</sup>** <sup>∈</sup> **<sup>R</sup>***m*×*q*, the criterion in Eq. (18) is changed to

*J*(**X**) = min

*M* ∑ *k*=1

*J*(**Z**) = min

(**A<sup>Z</sup>**

*M* ∑ *k*=1

 

where **B***<sup>k</sup>* is the extracted feature matrix for image **A***k*.

identity matrix **<sup>I</sup>***m*, while this value is set to

decomposition of image covariance matrix:

**3.2 Iterative method**

**A***k*:

matrix.

where **A<sup>Z</sup>**

Similarly in Eq. (10), the column-based optimal projection matrix can be obtained by computing the eigenvectors of **H** (**z**) corresponding to the *q* largest eigenvalues as

$$\mathbf{Y}\_{col} = \mathbf{Z}^T \mathbf{A}\_\prime \tag{13}$$

where **Z** = [**z1**,..., **zq**] is a *m* by *q* column-based optimal projection matrix. The value of *q* can also be controlled by setting a threshold as in Eq. (9).

#### **2.2 The relation of 2DPCA and PCA**

As Kong et al. (2005) 2DPCA, performed on the 2D images, is essentially PCA performed on the rows of the images if each row is viewed as a computational unit. That means the 2DPCA of an image can be viewed as the PCA of the set of rows of an image. The relation between 2DPCA and PCA can be proven that by rewriting the image covariance matrix **G** in normal covariance matrix as

$$\begin{aligned} \mathbf{G} &= E\left[ (\mathbf{A} - \mathbf{\bar{A}})^T (\mathbf{A} - \mathbf{\bar{A}}) \right] \\ &= E\left[ \mathbf{A}^T \mathbf{A} - \mathbf{\bar{A}}^T \mathbf{A} - \mathbf{A}^T \mathbf{\bar{A}} + \mathbf{\bar{A}}^T \mathbf{\bar{A}} \right] \\ &= E\left[ \mathbf{A}^T \mathbf{A} \right] - f\left( \mathbf{\bar{A}} \right) \\ &= E\left[ \mathbf{A}^T \left( \mathbf{A}^T \right)^T \right] - f\left( \mathbf{\bar{A}} \right) . \end{aligned} \tag{14}$$

where *f* (**A¯** ) can be neglected if the data were previously centralized. Thus,

$$\begin{split} \mathbf{G} &\approx \mathbb{E}\left[\mathbf{A}^T \left(\mathbf{A}^T\right)^T\right] \\ &= \frac{1}{M} \sum\_{k=1}^M \mathbf{A}\_k^T \left(\mathbf{A}\_k^T\right)^T \\ &= m \left(\frac{1}{mM} \mathbf{C} \mathbf{C}^T\right), \end{split} \tag{15}$$

where **C** = **A***<sup>T</sup>* <sup>1</sup> **<sup>A</sup>***<sup>T</sup>* <sup>2</sup> ··· **<sup>A</sup>***<sup>T</sup> M* and the term <sup>1</sup> *mM* **CC***<sup>T</sup>* is the covariance matrix of rows of all images.

#### **3. Bilateral projection frameworks**

There are two major difference techniques in this framework, i.e. non-iterative and iterative. All these methods use two projection matrices for both row and column. The former computes these projections separately while the latter computes them simultaneously via iterative process.

#### **3.1 Non-iterative method**

The non-iterative bilateral projection scheme was applied to 2DPCA via left and right multiplying projection matrices Xu et al. (2006); Zhang & Zhou (2005); Zuo et al. (2005) as follows

$$\mathbf{B} = \mathbf{Z}^T \mathbf{A} \mathbf{X},\tag{16}$$

where **B** is a feature matrix which extracted from image **A** and **Z** is a left multiplying projection matrix. Similar to the right multiplying projection matrix **X** in Section 2, matrix **Z** is a *m* by *q* projection matrix that obtained by choosing the eigenvectors of image covariance matrix **H** corresponding to the *q* largest eigenvalues. Therefore, the dimension of feature matrix is decreasing from *m* × *n* to *q* × *d* (*q* < *m* and *d* < *n*). In this way, the computation time also be reducing. Moreover, the recognition accuracy of B2DPCA is often better than 2DPCA as the experimental results in Liu & Chen (2006); Zhang & Zhou (2005); Zuo et al. (2005).

#### **3.2 Iterative method**

4 Will-be-set-by-IN-TECH

Similarly in Eq. (10), the column-based optimal projection matrix can be obtained by

where **Z** = [**z1**,..., **zq**] is a *m* by *q* column-based optimal projection matrix. The value of *q* can

As Kong et al. (2005) 2DPCA, performed on the 2D images, is essentially PCA performed on the rows of the images if each row is viewed as a computational unit. That means the 2DPCA of an image can be viewed as the PCA of the set of rows of an image. The relation between 2DPCA and PCA can be proven that by rewriting the image covariance matrix **G** in normal

*<sup>T</sup>* (**<sup>A</sup>** <sup>−</sup> **A¯** )

**<sup>A</sup>***T***<sup>A</sup>** <sup>−</sup> **A¯** *<sup>T</sup>***<sup>A</sup>** <sup>−</sup> **<sup>A</sup>***<sup>T</sup>***A¯** <sup>+</sup> **A¯** *<sup>T</sup>***A¯**

<sup>−</sup> *<sup>f</sup>* (**A¯** )

**<sup>Y</sup>***col* = **<sup>Z</sup>***T***A**, (13)

<sup>−</sup> *<sup>f</sup>* (**A¯** ), (14)

, (15)

*mM* **CC***<sup>T</sup>* is the covariance matrix of rows of all

**B** = **Z***T***AX**, (16)

computing the eigenvectors of **H** (**z**) corresponding to the *q* largest eigenvalues as

(**<sup>A</sup>** <sup>−</sup> **A¯** )

also be controlled by setting a threshold as in Eq. (9).

**G** = *E* 

> = *E*

= *E* **A***T***A** 

= *E* **A***<sup>T</sup>* **A***<sup>T</sup> T*

where *f* (**A¯** ) can be neglected if the data were previously centralized. Thus,

**G** ≈ *E*

<sup>=</sup> <sup>1</sup> *M*

= *m*

and the term <sup>1</sup>

 **A***<sup>T</sup>* **A***<sup>T</sup> T*

> *M* ∑ *k*=1 **A***<sup>T</sup> k* **A***<sup>T</sup> k T*

1

There are two major difference techniques in this framework, i.e. non-iterative and iterative. All these methods use two projection matrices for both row and column. The former computes these projections separately while the latter computes them simultaneously via iterative

The non-iterative bilateral projection scheme was applied to 2DPCA via left and right multiplying projection matrices Xu et al. (2006); Zhang & Zhou (2005); Zuo et al. (2005) as

*mM* **CC***<sup>T</sup>*

**2.2 The relation of 2DPCA and PCA**

covariance matrix as

where **C** =

images.

process.

follows

**A***<sup>T</sup>* <sup>1</sup> **<sup>A</sup>***<sup>T</sup>*

**3.1 Non-iterative method**

**3. Bilateral projection frameworks**

<sup>2</sup> ··· **<sup>A</sup>***<sup>T</sup> M*  The bilateral projection scheme of 2DPCA with the iterative algorithm was proposed in Kong et al. (2005); Liu et al. (2010); Xu et al. (2004); Ye (2004). Let **<sup>Z</sup>** <sup>∈</sup> **<sup>R</sup>***m*×*<sup>q</sup>* and **<sup>X</sup>** <sup>∈</sup> **<sup>R</sup>***n*×*<sup>d</sup>* be the left and right multiplying projection matrix respectively. For an *m* × *n* image **A***<sup>k</sup>* and *q* × *d* projected image **B***<sup>k</sup>* , the bilateral projection is formulated as follows:

$$\mathbf{B}\_{k} = \mathbf{Z}^{T} \mathbf{A}\_{k} \mathbf{X} \tag{17}$$

where **B***<sup>k</sup>* is the extracted feature matrix for image **A***k*.

The optimal projection matrices, **Z** and **X** in Eq. (17) can be computed by solving the following minimization criterion that the reconstructed image, **ZB***k***X***T*, gives the best approximation of **A***k*:

$$f(\mathbf{Z}, \mathbf{X}) = \min \sum\_{k=1}^{M} \left\| \mathbf{A}\_k - \mathbf{Z} \mathbf{B}\_k \mathbf{X}^T \right\|\_{F}^{2} \tag{18}$$

where *M* is the number of data samples and � • �*<sup>F</sup>* is the Frobenius norm of a matrix.

The detailed iterative scheme designed to compute the optimal projection matrices, **Z** and **X**, is listed in Table 1. The obtained solutions are locally optimal because the solutions are dependent on the initialized **Z**0. In Kong et al. (2005), the initialized **Z**<sup>0</sup> sets to the *m* × *m* identity matrix **<sup>I</sup>***m*, while this value is set to **I***q* 0 in Ye (2004), where **I***<sup>q</sup>* is the *q* × *q* identity matrix.

Alternatively, The criterion in Eq. (18) is biquadratic and has no closed-form solution. Therefore, an iterative procedure to obtain the local optimal solution was proposed in Xu et al. (2004). For **<sup>X</sup>** <sup>∈</sup> **<sup>R</sup>***n*×*d*, the criterion in Eq. (18) can be rewritten as

$$f(\mathbf{X}) = \min \sum\_{k=1}^{M} \left\| \mathbf{A}\_k - \mathbf{A}\_k^T \mathbf{X} \mathbf{X}^T \right\|\_{F}^2 \tag{19}$$

where **A<sup>Z</sup>** *<sup>k</sup>* <sup>=</sup> **ZZ***T***A***k*. The solution of Eq. (19) is the eigenvectors of the eigenvalue decomposition of image covariance matrix:

$$\mathbf{G} = \frac{1}{M} \sum\_{k=1}^{M} (\mathbf{A}\_k^{\mathbf{Z}} - \mathbf{A}^{\mathbf{Z}})^T (\mathbf{A}\_k^{\mathbf{Z}} - \mathbf{A}^{\mathbf{Z}}). \tag{20}$$

Similarly, for **<sup>Z</sup>** <sup>∈</sup> **<sup>R</sup>***m*×*q*, the criterion in Eq. (18) is changed to

$$J(\mathbf{Z}) = \min \sum\_{k=1}^{M} \left\| \mathbf{A}\_k - \mathbf{Z} \mathbf{Z}^T \mathbf{A}\_k^{\mathbf{X}} \right\|\_{F'}^2 \tag{21}$$

*S*1: Initialize **Z**, **Z** = **I***<sup>m</sup> S*2: For *i* = 1, 2, . . . , *T*max *S*3: Compute **A<sup>Z</sup>**

*S*4: Compute **G** = <sup>1</sup>

*S*6: **X***<sup>i</sup>* = [**e<sup>X</sup>**

*S*10: **Z***<sup>i</sup>* = [**e<sup>Z</sup>**

*S*14: End For *S*15: **Z** = **Z***<sup>i</sup> S*16: **X** = **X***<sup>i</sup>*

Table 2. Coupled Subspaces Analysis Algorithm.

. . . . . . . . . . .

. . . ..

. . . . . . . . . . .

**K** =

the traditional KPCA.

**5. Supervised frameworks**

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ *ϕ* � **a**1 1 � *ϕ* � **a**1 1 �*<sup>T</sup>* ··· *<sup>ϕ</sup>*

*ϕ* � **a***m* 1 � *ϕ* � **a**1 1 �*<sup>T</sup>* ··· *<sup>ϕ</sup>*

*ϕ* � **a**1 *M* � *ϕ* � **a**1 1 �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a**1 *M* � *ϕ* � **a***m* 1

*ϕ* � **a***m M* � *ϕ* � **a**1 1 �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a***m M* � *ϕ* � **a***m* 1

*S*12: Then Go to *S*<sup>3</sup> *S*13: Else Go to *S*<sup>15</sup>

*S*7: Compute **A<sup>X</sup>**

*S*8: Compute **H** = <sup>1</sup>

*<sup>k</sup>* <sup>=</sup> **<sup>Z</sup>***i*−1**Z***<sup>T</sup>*

*<sup>k</sup>* <sup>=</sup> **<sup>A</sup>***k***X***i***X***<sup>T</sup>*

*M M* ∑ *k*=1 (**A<sup>Z</sup>**

corresponding to the largest *d* eigenvalues

corresponding to the largest *q* eigenvalues

*<sup>S</sup>*11: If *<sup>t</sup>* > 2 and �**Z***<sup>i</sup>* − **<sup>Z</sup>***i*−1�*<sup>F</sup>* < *<sup>m</sup><sup>ε</sup>* and �**X***<sup>i</sup>* − **<sup>X</sup>***i*−1�*<sup>F</sup>* < *<sup>n</sup><sup>ε</sup>*

�*<sup>T</sup>* ··· *<sup>ϕ</sup>*

�*<sup>T</sup>* ··· *<sup>ϕ</sup>*

�*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a**1 *M* � *ϕ* � **a**1 *M* �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a**1 *M* � *ϕ* � **a***m M* �*T*

�*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a***m M* � *ϕ* � **a**1 *M* �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a***m M* � *ϕ* � **a***m M* �*T*

which is an *mM*-by-*mM* matrix. Unfortunately, there is a critical problem in implementation about the dimension of its kernel matrix. The kernel matrix is *M* × *M* matrix in KPCA, where *M* is the number of training samples, while it is *mM* × *mM* matrix in K2DPCA, where *m* is the number of row of each image. Thus, the K2DPCA kernel matrix is *m*<sup>2</sup> times of KPCA kernel matrix. For example, if the training set has 200 images with dimensions of 100 × 100 then the dimension of kernel matrix shall be 20000 × 20000, that is very big for fitting in memory unit. After that the projection can be formed by the eigenvectors of this kernel matrix as same as

Since the 2DPCA is the unsupervised technique, the class information is neglected. This section presents two methods which can be used to embedded class information to 2DPCA.

� **a**1 1 � *ϕ* � **a**1 *M*

. . . . .

> � **a***m* 1 � *ϕ* � **a**1 *M*

. . . . ..

. . . . .

*i*

*M M* ∑ *k*=1 (**A<sup>Z</sup>**

*<sup>S</sup>*5: Compute the *<sup>d</sup>* eigenvectors {**e<sup>X</sup>**

*<sup>S</sup>*9: Compute the *<sup>q</sup>* eigenvectors {**e<sup>Z</sup>**

<sup>1</sup> ,..., **<sup>e</sup><sup>Z</sup>** *q* ]

*S*17: Feature extraction: **B***<sup>k</sup>* = **Z***T***A***k***X**

separated. Therefore the element in the kernel matrix **K** can be computed by

� **a**1 1 � *ϕ* � **a***m* 1

� **a***m* 1 � *ϕ* � **a***m* 1

. . . . ..

<sup>1</sup> ,..., **<sup>e</sup><sup>X</sup>** *d* ] *<sup>i</sup>*−1**A***<sup>k</sup>*

Two-Dimensional Principal Component Analysis and Its Extensions 7

*<sup>k</sup>* <sup>−</sup> **A¯ <sup>Z</sup>**)*T*(**A<sup>Z</sup>**

*<sup>k</sup>* <sup>−</sup> **A¯ <sup>Z</sup>**)(**A<sup>Z</sup>**

*j* } *q <sup>j</sup>*=<sup>1</sup> of **H**

*j* }*d*

*<sup>k</sup>* <sup>−</sup> **A¯ <sup>Z</sup>**)

*<sup>k</sup>* <sup>−</sup> **A¯ <sup>Z</sup>**)*<sup>T</sup>*

�*<sup>T</sup>* ··· *<sup>ϕ</sup>*

.

�*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a***m* 1 � *ϕ* � **a***m M* �*T*

.

� **a**1 1 � *ϕ* � **a***m M* �*T* ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(24)

. . . .

. . . .

. . . .

*<sup>j</sup>*=<sup>1</sup> of **G**

*S*1: Initialize **Z**, **Z** = **Z**<sup>0</sup> and *i* = 0

*S*2: While not convergent

$$S\_3 \colon \quad \text{Compute } \mathbf{G} = \frac{1}{\mathcal{M}} \sum\_{k=1}^{M} \left( \mathbf{A}\_k - \mathbf{\bar{A}} \right)^T \mathbf{Z}\_{i-1} \mathbf{Z}\_{i-1}^T (\mathbf{A}\_k - \mathbf{\bar{A}})^T$$


$$\mathbf{S}\_{11} \colon \mathbf{Z} = \mathbf{Z}\_{i-1}$$

$$\mathbf{c} \colon \mathbf{v} \perp \mathbf{v}$$

*<sup>S</sup>*12: **<sup>X</sup>** = **<sup>X</sup>***i*−<sup>1</sup> *S*13: Feature extraction: **B***<sup>k</sup>* = **Z***T***A***k***X**

Table 1. The Bilateral Projection Scheme of 2DPCA with Iterative Algorithm.

where **A<sup>X</sup>** *<sup>k</sup>* <sup>=</sup> **<sup>A</sup>***k***XX***T*. Again, the solution of Eq. (21) is the eigenvectors of the eigenvalue decomposition of image covariance matrix:

$$\mathbf{H} = \frac{1}{M} \sum\_{k=1}^{M} (\mathbf{A}\_k^{\mathbf{Z}} - \bar{\mathbf{A}}^{\mathbf{Z}})(\mathbf{A}\_k^{\mathbf{Z}} - \bar{\mathbf{A}}^{\mathbf{Z}})^T. \tag{22}$$

By iteratively optimizing the objective function with respect to **Z** and **X**, respectively, we can obtain a local optimum of the solution. The whole procedure, namely Coupled Subspace Analysis (CSA) Xu et al. (2004), is shown in Table 2.

#### **4. Kernel based frameworks**

From Section 2.2, 2DPCA which performed on the 2D images, is basically PCA performed on the rows of the images if each row is viewed as a computational unit.

Similar to 2DPCA, the kernel-based 2DPCA (K2DPCA) can be processed by traditional kernel PCA (KPCA) in the same manner. Let **a***<sup>i</sup> <sup>k</sup>* is the *i*-th row of the *k*-th image, thus the *k*-th image can be rewritten as

$$\mathbf{A} = \left[ \left( \mathbf{a}\_k^1 \right)^T \left( \mathbf{a}\_k^2 \right)^T \cdots \left( \mathbf{a}\_k^m \right)^T \right]^T. \tag{23}$$

From Eq. (15), the covariance matrix **C** can be constructed by concatenating all rows of all training images together. Let *<sup>ϕ</sup>* : **<sup>R</sup>***<sup>m</sup>* <sup>→</sup> **<sup>R</sup>***m*� , *m* < *m*� be the mapping function that map the the row vectors into a feature space of higher dimensions in which the classes can be linearly 6 Will-be-set-by-IN-TECH

corresponding to the largest *d* eigenvalues

corresponding to the largest *l* eigenvalues

(**A***<sup>k</sup>* <sup>−</sup> **A¯** )*T***Z***i*−1**Z***<sup>T</sup>*

(**A***<sup>k</sup>* <sup>−</sup> **A¯** )**X***i***X***<sup>T</sup>*

*<sup>k</sup>* <sup>=</sup> **<sup>A</sup>***k***XX***T*. Again, the solution of Eq. (21) is the eigenvectors of the eigenvalue

*<sup>k</sup>* <sup>−</sup> **A¯ <sup>Z</sup>**)(**A<sup>Z</sup>**

By iteratively optimizing the objective function with respect to **Z** and **X**, respectively, we can obtain a local optimum of the solution. The whole procedure, namely Coupled Subspace

From Section 2.2, 2DPCA which performed on the 2D images, is basically PCA performed on

Similar to 2DPCA, the kernel-based 2DPCA (K2DPCA) can be processed by traditional kernel

From Eq. (15), the covariance matrix **C** can be constructed by concatenating all rows of all

the row vectors into a feature space of higher dimensions in which the classes can be linearly

**a***m k <sup>T</sup> <sup>T</sup>*

*j* }*d*

*j* } *q <sup>j</sup>*=<sup>1</sup> of **H**

*<sup>j</sup>*=<sup>1</sup> of **G**

*<sup>i</sup>* (**A***<sup>k</sup>* <sup>−</sup> **A¯** )*<sup>T</sup>*

*<sup>i</sup>*−1(**A***<sup>k</sup>* <sup>−</sup> **A¯** )

*<sup>k</sup>* <sup>−</sup> **A¯ <sup>Z</sup>**)*T*. (22)

. (23)

*<sup>k</sup>* is the *i*-th row of the *k*-th image, thus the *k*-th image

, *m* < *m*� be the mapping function that map the

*S*1: Initialize **Z**, **Z** = **Z**<sup>0</sup> and *i* = 0

*M M* ∑ *k*=1

*M M* ∑ *k*=1

*<sup>S</sup>*4: Compute the *<sup>d</sup>* eigenvectors {**e<sup>X</sup>**

<sup>1</sup> ,..., **<sup>e</sup><sup>X</sup>** *d* ]

*<sup>S</sup>*7: Compute the *<sup>q</sup>* eigenvectors {**e<sup>Z</sup>**

<sup>1</sup> ,..., **<sup>e</sup><sup>Z</sup>** *q* ]

*S*13: Feature extraction: **B***<sup>k</sup>* = **Z***T***A***k***X**

**<sup>H</sup>** <sup>=</sup> <sup>1</sup> *M*

the rows of the images if each row is viewed as a computational unit.

**A** = **a**1 *k <sup>T</sup>* **a**2 *k <sup>T</sup>* ···

Table 1. The Bilateral Projection Scheme of 2DPCA with Iterative Algorithm.

*M* ∑ *k*=1

(**A<sup>Z</sup>**

*S*2: While not convergent *S*3: Compute **G** = <sup>1</sup>

*S*5: **X***<sup>i</sup>* = [**e<sup>X</sup>**

*S*8: **Z***<sup>i</sup>* = [**e<sup>Z</sup>**

*S*9: *i* = *i* + 1 *S*10: End While *<sup>S</sup>*11: **<sup>Z</sup>** = **<sup>Z</sup>***i*−<sup>1</sup> *<sup>S</sup>*12: **<sup>X</sup>** = **<sup>X</sup>***i*−<sup>1</sup>

decomposition of image covariance matrix:

Analysis (CSA) Xu et al. (2004), is shown in Table 2.

**4. Kernel based frameworks**

can be rewritten as

PCA (KPCA) in the same manner. Let **a***<sup>i</sup>*

training images together. Let *<sup>ϕ</sup>* : **<sup>R</sup>***<sup>m</sup>* <sup>→</sup> **<sup>R</sup>***m*�

where **A<sup>X</sup>**

*S*6: Compute **H** = <sup>1</sup>

*S*1: Initialize **Z**, **Z** = **I***<sup>m</sup> S*2: For *i* = 1, 2, . . . , *T*max *S*3: Compute **A<sup>Z</sup>** *<sup>k</sup>* <sup>=</sup> **<sup>Z</sup>***i*−1**Z***<sup>T</sup> <sup>i</sup>*−1**A***<sup>k</sup> S*4: Compute **G** = <sup>1</sup> *M M* ∑ *k*=1 (**A<sup>Z</sup>** *<sup>k</sup>* <sup>−</sup> **A¯ <sup>Z</sup>**)*T*(**A<sup>Z</sup>** *<sup>k</sup>* <sup>−</sup> **A¯ <sup>Z</sup>**) *<sup>S</sup>*5: Compute the *<sup>d</sup>* eigenvectors {**e<sup>X</sup>** *j* }*d <sup>j</sup>*=<sup>1</sup> of **G** corresponding to the largest *d* eigenvalues *S*6: **X***<sup>i</sup>* = [**e<sup>X</sup>** <sup>1</sup> ,..., **<sup>e</sup><sup>X</sup>** *d* ] *S*7: Compute **A<sup>X</sup>** *<sup>k</sup>* <sup>=</sup> **<sup>A</sup>***k***X***i***X***<sup>T</sup> i S*8: Compute **H** = <sup>1</sup> *M M* ∑ *k*=1 (**A<sup>Z</sup>** *<sup>k</sup>* <sup>−</sup> **A¯ <sup>Z</sup>**)(**A<sup>Z</sup>** *<sup>k</sup>* <sup>−</sup> **A¯ <sup>Z</sup>**)*<sup>T</sup> <sup>S</sup>*9: Compute the *<sup>q</sup>* eigenvectors {**e<sup>Z</sup>** *j* } *q <sup>j</sup>*=<sup>1</sup> of **H** corresponding to the largest *q* eigenvalues *S*10: **Z***<sup>i</sup>* = [**e<sup>Z</sup>** <sup>1</sup> ,..., **<sup>e</sup><sup>Z</sup>** *q* ] *<sup>S</sup>*11: If *<sup>t</sup>* > 2 and �**Z***<sup>i</sup>* − **<sup>Z</sup>***i*−1�*<sup>F</sup>* < *<sup>m</sup><sup>ε</sup>* and �**X***<sup>i</sup>* − **<sup>X</sup>***i*−1�*<sup>F</sup>* < *<sup>n</sup><sup>ε</sup> S*12: Then Go to *S*<sup>3</sup> *S*13: Else Go to *S*<sup>15</sup> *S*14: End For *S*15: **Z** = **Z***<sup>i</sup> S*16: **X** = **X***<sup>i</sup> S*17: Feature extraction: **B***<sup>k</sup>* = **Z***T***A***k***X**

Table 2. Coupled Subspaces Analysis Algorithm.

separated. Therefore the element in the kernel matrix **K** can be computed by

**K** = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ *ϕ* � **a**1 1 � *ϕ* � **a**1 1 �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a**1 1 � *ϕ* � **a***m* 1 �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a**1 1 � *ϕ* � **a**1 *M* �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a**1 1 � *ϕ* � **a***m M* �*T* . . . . . . . . . . . . . . . . . . . . . *ϕ* � **a***m* 1 � *ϕ* � **a**1 1 �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a***m* 1 � *ϕ* � **a***m* 1 �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a***m* 1 � *ϕ* � **a**1 *M* �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a***m* 1 � *ϕ* � **a***m M* �*T* . . . .. . . . . .. . . . . .. . . . . *ϕ* � **a**1 *M* � *ϕ* � **a**1 1 �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a**1 *M* � *ϕ* � **a***m* 1 �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a**1 *M* � *ϕ* � **a**1 *M* �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a**1 *M* � *ϕ* � **a***m M* �*T* . . . . . . . . . . . . . . . . . . . . . *ϕ* � **a***m M* � *ϕ* � **a**1 1 �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a***m M* � *ϕ* � **a***m* 1 �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a***m M* � *ϕ* � **a**1 *M* �*<sup>T</sup>* ··· *<sup>ϕ</sup>* � **a***m M* � *ϕ* � **a***m M* �*T* ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ (24)

which is an *mM*-by-*mM* matrix. Unfortunately, there is a critical problem in implementation about the dimension of its kernel matrix. The kernel matrix is *M* × *M* matrix in KPCA, where *M* is the number of training samples, while it is *mM* × *mM* matrix in K2DPCA, where *m* is the number of row of each image. Thus, the K2DPCA kernel matrix is *m*<sup>2</sup> times of KPCA kernel matrix. For example, if the training set has 200 images with dimensions of 100 × 100 then the dimension of kernel matrix shall be 20000 × 20000, that is very big for fitting in memory unit. After that the projection can be formed by the eigenvectors of this kernel matrix as same as the traditional KPCA.

#### **5. Supervised frameworks**

Since the 2DPCA is the unsupervised technique, the class information is neglected. This section presents two methods which can be used to embedded class information to 2DPCA.

with this dimension, the face image do not need to be resized, since all information still be

Two-Dimensional Principal Component Analysis and Its Extensions 9

Let **z** be a *q* dimensional vector. A matrix **A** is projected onto this vector via the similar

2DLDA searches for the projection axis **z** that maximizing the Fisher's discriminant

*<sup>J</sup>*(**z**) = *tr* (**S***b*) *tr* (**S***w*)

where **S***<sup>w</sup>* is the *within-class scatter matrix* and **S***<sup>b</sup>* is the *between-class scatter matrix*. In particular, the within-class scatter matrix describes how data are scattered around the means of their

where *K* is the number of classes, *Pr*(*ωi*) is the prior probability of each class, and **H** = **A** − *E***A**. The between-class scatter matrix describes how different classes. Which represented by

*Pr*(*ωi*)*E*

With the linearity properties of both the trace function and the expectation, *J*(**z**) may be

*<sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*)*<sup>E</sup>*

*<sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*)*E*

*<sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*)*E*

*<sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*)*E*

*<sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*)*<sup>E</sup>* [(**Hz**)(**Hz**)*T*|*<sup>ω</sup>* = *<sup>ω</sup>i*])

*<sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*) *<sup>E</sup>* (*tr* [(**Hz**)(**Hz**)*T*|*<sup>ω</sup>* = *<sup>ω</sup>i*])

*<sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*) *<sup>E</sup>* (*tr* [(**Hz**)*T*(**Hz**)|*<sup>ω</sup>* = *<sup>ω</sup>i*])

*<sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*)*<sup>E</sup>*

(**Hz**)(**Hz**)*T*|*<sup>ω</sup>* <sup>=</sup> *<sup>ω</sup><sup>i</sup>*

(**Fz**)(**Fz**)*<sup>T</sup>*

(**Fz**)(**Fz**)*<sup>T</sup>*

*tr*((**Fz**)(**Fz**)*T*)

*tr*((**Fz**)*T*(**Fz**))

**F***T***F z** 

**<sup>H</sup>***T***H**|*<sup>ω</sup>* <sup>=</sup> *<sup>ω</sup><sup>i</sup>*

)

 **z** 

. (29)

**v=Az**. (25)

, (26)

, (27)

, (28)

**5.2 Two-dimensional linear discriminant analysis (2DLDA)**

This projection yields an *m* dimensional feature vector.

**S***<sup>w</sup>* =

*K* ∑ *i*=1

their expected value, are scattered around the mixture means by

*<sup>J</sup>*(**z**) = *tr*(∑*<sup>K</sup>*

*tr*(∑*<sup>K</sup>*

<sup>=</sup> <sup>∑</sup>*<sup>K</sup>*

<sup>=</sup> <sup>∑</sup>*<sup>K</sup>*

*tr* **z***T* ∑*K*

∑*K*

*tr* **z***T* ∑*<sup>K</sup>*

<sup>=</sup> *tr*(**z***T***S**˜ *<sup>b</sup>***z**) *tr*(**z***T***S**˜ *<sup>w</sup>***z**)

=

∑*K*

**S***<sup>b</sup>* =

*Pr*(*ωi*)*E*

*K* ∑ *i*=1

criterion Belhumeur et al. (1997); Fukunaga (1990):

preserved.

transformation as Eq. (1):

respective class, and is given by

where **F** = *E*[**A**|*ω* = *ωi*] − *E*[**A**].

rewritten as

Firstly, Linear Discriminant Analysis (LDA) is implemented in 2D framework. Secondly, an 2DPCA is performed for each class in class-specific subspace.

#### **5.1 Two-dimensional linear discriminant analysis of principal component vectors**

The PCA's criterion chooses the subspace in the function of data distribution while Linear Discriminant Analysis (LDA) chooses the subspace which yields maximal inter-class distance, and at the same time, keeping the intra-class distance small. In general, LDA extracts features which are better suitable for classification task. However, when the available number of training samples is small compared to the feature dimension, the covariance matrix estimated by these features will be singular and then cannot be inverted. This is called singularity problem or Small Sample Size (SSS) problem Fukunaga (1990).

Various solutions have been proposed for solving the SSS problem Belhumeur et al. (1997); Chen et al. (2000); Huang et al. (2002); Lu et al. (2003); Zhao, Chellappa & Krishnaswamy (1998); Zhao, Chellappa & Nandhakumar (1998) within LDA framework. Among these LDA extensions, Fisherface Belhumeur et al. (1997) and the discriminant analysis of principal components framework Zhao, Chellappa & Krishnaswamy (1998); Zhao, Chellappa & Nandhakumar (1998) demonstrates a significant improvement when applying LDA over principal components from the PCA-based subspace. Since both PCA and LDA can overcome the drawbacks of each other. PCA is constructed around the criteria of preserving the data distribution. Hence, it is suited for representation and reconstruction from the projected feature. However, in the classification tasks, PCA only normalize the input data according to their variance. This is not efficient since the between classes relationship is neglected. In general, the discriminant power depends on both within and between classes relationship. LDA considers these relationships via the analysis of within and between-class scatter matrices. Taking this information into account, LDA allows further improvement. Especially, when there are prominent variation in lighting condition and expression. Nevertheless, all of above techniques, the spatial structure information still be not employed.

Two-Dimensional Linear Discriminant Analysis (2DLDA) was proposed in Ye et al. (2005). For overcoming the SSS problem in classical LDA by working with images in matrix representation, like in 2DPCA. In particular, bilateral projection scheme was applied there via left and right multiplying projection matrices. In this way, the eigenvalue problem was solved two times per iteration. One corresponds to the column direction and another one corresponds to the row direction of image, respectively

Because of 2DPCA is more suitable for face representation than face recognition, like PCA. For better performance in recognition task, LDA is still necessary. Unfortunately, the linear transformation of 2DPCA reduces the input image to a vector with the same dimension as the number of rows or the height of the input image. Thus, the SSS problem may still occurred when LDA is performed after 2DPCA directly. To overcome this problem, a simplified version of the 2DLDA is applied only unilateral projection scheme, based on the 2DPCA concept (Sanguansat et al., 2006b;c). Applying 2DLDA to 2DPCA not only can solve the SSS problem and the curse of dimensionality dilemma but also allows us to work directly on the image matrix in all projections. Hence, spatial structure information is maintained and the size of all scatter matrices cannot be greater than the width of face image. Furthermore, computing 8 Will-be-set-by-IN-TECH

Firstly, Linear Discriminant Analysis (LDA) is implemented in 2D framework. Secondly, an

The PCA's criterion chooses the subspace in the function of data distribution while Linear Discriminant Analysis (LDA) chooses the subspace which yields maximal inter-class distance, and at the same time, keeping the intra-class distance small. In general, LDA extracts features which are better suitable for classification task. However, when the available number of training samples is small compared to the feature dimension, the covariance matrix estimated by these features will be singular and then cannot be inverted. This is called singularity

Various solutions have been proposed for solving the SSS problem Belhumeur et al. (1997); Chen et al. (2000); Huang et al. (2002); Lu et al. (2003); Zhao, Chellappa & Krishnaswamy (1998); Zhao, Chellappa & Nandhakumar (1998) within LDA framework. Among these LDA extensions, Fisherface Belhumeur et al. (1997) and the discriminant analysis of principal components framework Zhao, Chellappa & Krishnaswamy (1998); Zhao, Chellappa & Nandhakumar (1998) demonstrates a significant improvement when applying LDA over principal components from the PCA-based subspace. Since both PCA and LDA can overcome the drawbacks of each other. PCA is constructed around the criteria of preserving the data distribution. Hence, it is suited for representation and reconstruction from the projected feature. However, in the classification tasks, PCA only normalize the input data according to their variance. This is not efficient since the between classes relationship is neglected. In general, the discriminant power depends on both within and between classes relationship. LDA considers these relationships via the analysis of within and between-class scatter matrices. Taking this information into account, LDA allows further improvement. Especially, when there are prominent variation in lighting condition and expression. Nevertheless, all of above techniques, the spatial structure information still be

Two-Dimensional Linear Discriminant Analysis (2DLDA) was proposed in Ye et al. (2005). For overcoming the SSS problem in classical LDA by working with images in matrix representation, like in 2DPCA. In particular, bilateral projection scheme was applied there via left and right multiplying projection matrices. In this way, the eigenvalue problem was solved two times per iteration. One corresponds to the column direction and another one

Because of 2DPCA is more suitable for face representation than face recognition, like PCA. For better performance in recognition task, LDA is still necessary. Unfortunately, the linear transformation of 2DPCA reduces the input image to a vector with the same dimension as the number of rows or the height of the input image. Thus, the SSS problem may still occurred when LDA is performed after 2DPCA directly. To overcome this problem, a simplified version of the 2DLDA is applied only unilateral projection scheme, based on the 2DPCA concept (Sanguansat et al., 2006b;c). Applying 2DLDA to 2DPCA not only can solve the SSS problem and the curse of dimensionality dilemma but also allows us to work directly on the image matrix in all projections. Hence, spatial structure information is maintained and the size of all scatter matrices cannot be greater than the width of face image. Furthermore, computing

**5.1 Two-dimensional linear discriminant analysis of principal component vectors**

2DPCA is performed for each class in class-specific subspace.

problem or Small Sample Size (SSS) problem Fukunaga (1990).

corresponds to the row direction of image, respectively

not employed.

with this dimension, the face image do not need to be resized, since all information still be preserved.

#### **5.2 Two-dimensional linear discriminant analysis (2DLDA)**

Let **z** be a *q* dimensional vector. A matrix **A** is projected onto this vector via the similar transformation as Eq. (1):

$$\mathbf{v} = \mathbf{A}\mathbf{z}.\tag{25}$$

This projection yields an *m* dimensional feature vector.

2DLDA searches for the projection axis **z** that maximizing the Fisher's discriminant criterion Belhumeur et al. (1997); Fukunaga (1990):

$$J(\mathbf{z}) = \frac{\text{tr}\left(\mathbf{S}\_b\right)}{\text{tr}\left(\mathbf{S}\_w\right)}\tag{26}$$

where **S***<sup>w</sup>* is the *within-class scatter matrix* and **S***<sup>b</sup>* is the *between-class scatter matrix*. In particular, the within-class scatter matrix describes how data are scattered around the means of their respective class, and is given by

$$\mathbf{S}\_{w} = \sum\_{i=1}^{K} \Pr(\omega\_{i}) E\left[ (\mathbf{Hz}) (\mathbf{Hz})^{T} | \omega = \omega\_{i} \right],\tag{27}$$

where *K* is the number of classes, *Pr*(*ωi*) is the prior probability of each class, and **H** = **A** − *E***A**. The between-class scatter matrix describes how different classes. Which represented by their expected value, are scattered around the mixture means by

$$\mathbf{S}\_b = \sum\_{i=1}^{K} \Pr(\omega\_i) \mathbf{E}\left[ (\mathbf{Fz})(\mathbf{Fz})^T \right],\tag{28}$$

where **F** = *E*[**A**|*ω* = *ωi*] − *E*[**A**].

With the linearity properties of both the trace function and the expectation, *J*(**z**) may be rewritten as

*<sup>J</sup>*(**z**) = *tr*(∑*<sup>K</sup> <sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*)*<sup>E</sup>* (**Fz**)(**Fz**)*<sup>T</sup>* ) *tr*(∑*<sup>K</sup> <sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*)*<sup>E</sup>* [(**Hz**)(**Hz**)*T*|*<sup>ω</sup>* = *<sup>ω</sup>i*]) <sup>=</sup> <sup>∑</sup>*<sup>K</sup> <sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*)*E tr*((**Fz**)(**Fz**)*T*) ∑*K <sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*) *<sup>E</sup>* (*tr* [(**Hz**)(**Hz**)*T*|*<sup>ω</sup>* = *<sup>ω</sup>i*]) <sup>=</sup> <sup>∑</sup>*<sup>K</sup> <sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*)*<sup>E</sup> tr*((**Fz**)*T*(**Fz**)) ∑*K <sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*) *<sup>E</sup>* (*tr* [(**Hz**)*T*(**Hz**)|*<sup>ω</sup>* = *<sup>ω</sup>i*]) = *tr* **z***T* ∑*K <sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*)*<sup>E</sup>* **F***T***F z** *tr* **z***T* ∑*<sup>K</sup> <sup>i</sup>*=<sup>1</sup> *Pr*(*ωi*)*E* **<sup>H</sup>***T***H**|*<sup>ω</sup>* <sup>=</sup> *<sup>ω</sup><sup>i</sup>* **z** <sup>=</sup> *tr*(**z***T***S**˜ *<sup>b</sup>***z**) *tr*(**z***T***S**˜ *<sup>w</sup>***z**) . (29)

The 2DLDA optimal projection matrix **Z** can be obtained by solving the eigenvalue problem in Eq. (32). Finally, the composite linear transformation matrix, **L=XZ**, is used to map the face

Two-Dimensional Principal Component Analysis and Its Extensions 11

The matrix **D** is 2DPCA+2DLDA feature matrix of image **A** with dimension *m* by *q*. However, the number of 2DLDA feature vectors *q* cannot exceed the number of principal component vectors *d*. In general case (*q* < *d*), the dimension of **D** is less than **Y** in Section 2. Thus,

2DPCA is a unsupervised technique that is no information of class labels are considered. Therefore, the directions that maximize the scatter of the data from all training samples might not be as adequate to discriminate between classes. In recognition task, a projection that emphasize the discrimination between classes is more important. The extension of Eigenface, PCA-based, was proposed by using alternative way to represent by projecting to Class-Specific Subspace (CSS) (Shan et al., 2003). In conventional PCA method, the images are analyzed on the features extracted in a low-dimensional space learned from all training samples from all classes. While each subspaces of CSS learned from training samples from one class. In this way, the CSS representation can provide a minimum reconstruction error. The reconstruction error is used to classify the input data via the Distance From CSS (DFCSS). Less DFCSS means more probability that the input data belongs to the corresponding class. This extension was based on Sanguansat et al. (2006a). Let **G***<sup>k</sup>* be the image covariance matrix

where **A¯** *<sup>k</sup>* is the average image of class *ωk*. The *kth* projection matrix **X***<sup>k</sup>* is a *n* by *dk* projection matrix which composed by the eigenvectors of **G***<sup>k</sup>* corresponding to the *dk* largest eigenvalues.

2DPCA+2DLDA can reduce the classification time compared to 2DPCA.

**5.4 Class-specific subspace-based two-dimensional principal component analysis**

**D** = **AL**. (37)

(**A***<sup>c</sup>* <sup>−</sup> **A¯** *<sup>k</sup>*)*T*(**A***<sup>c</sup>* <sup>−</sup> **A¯** *<sup>k</sup>*), (38)

*<sup>k</sup>* <sup>=</sup> {**X***k*, **A¯** *<sup>k</sup>*, *dk*} (39)

**U***<sup>k</sup>* = **W***k***X***k*, (40)

*<sup>k</sup>* . (41)

. (42)

*<sup>k</sup>* can be evaluates by

− **s**(*m*,*n*) 

image space into the classification space by,

of the *kth* CSS. Then **G***<sup>k</sup>* can be evaluated by

The *kth* CSS of 2DPCA was represented as a 3-tuple:

where **<sup>W</sup>***<sup>k</sup>* <sup>=</sup> **<sup>S</sup>** <sup>−</sup> **A¯** *<sup>k</sup>*. Then the reconstruct image **<sup>W</sup>***<sup>r</sup>*

If *ε<sup>t</sup>* = min 1≤*k*≤*K*

Therefore, the DFCSS is defined by reconstruction error as follows

*<sup>k</sup>*, **S**) =

(*εk*) then the input sample **S** is belong to class *ωt*.

*<sup>ε</sup>k*(**W***<sup>r</sup>*

**<sup>G</sup>***<sup>k</sup>* <sup>=</sup> <sup>1</sup>

*<sup>M</sup>* ∑ **A***c*∈*ω<sup>k</sup>*

�2*DPCA*

Let **S** be a input sample and **U***<sup>k</sup>* be a feature matrix which projected to the *kth* CSS, by

**W***<sup>r</sup>*

*nrow* ∑ *m*=1

*<sup>k</sup>* <sup>=</sup> **<sup>U</sup>***k***X***<sup>T</sup>*

*ncol* ∑ *n*=1 **w***r* (*m*,*n*)*<sup>k</sup>*

Furthermore, **S**˜ *<sup>b</sup>* and **S**˜ *<sup>w</sup>* can be evaluated as follows:

$$\tilde{\mathbf{S}}\_b = \sum\_{i=1}^{K} \frac{n\_i}{K} (\tilde{\mathbf{A}}\_i - \tilde{\mathbf{A}})^T (\tilde{\mathbf{A}}\_i - \tilde{\mathbf{A}}) \tag{30}$$

$$\tilde{\mathbf{S}}\_{w} = \sum\_{i=1}^{K} \frac{n\_i}{K} \sum\_{\mathbf{A}\_k \in \omega\_l} (\mathbf{A}\_k - \mathbf{A}\_i)^T (\mathbf{A}\_k - \mathbf{A}\_i)\_{\prime} \tag{31}$$

where *ni* and **A¯** *<sup>i</sup>* are the number of elements and the expected value of class *ω<sup>i</sup>* respectively. **A¯** denotes the overall mean.

Then the optimal projection vector can be found by solving the following generalized eigenvalue problem:

$$
\tilde{\mathbf{S}}\_{b}\mathbf{z} = \lambda \tilde{\mathbf{S}}\_{w}\mathbf{z}.\tag{32}
$$

Again the SVD algorithm can be applied to solve this eigenvalue problem on the matrix **S**˜ <sup>−</sup><sup>1</sup> *<sup>w</sup>* **<sup>S</sup>**˜ *<sup>b</sup>*. Note that, in this size of scatter matrices involved in eigenvalue decomposition process is also become *n* by *n*. Thus, with the limited the training set, this decomposition is more reliably than the eigenvalue decomposition based on the classical covariance matrix.

The number of projection vectors is then selected by the same procedure as in Eq. (9). Let **Z** = [**z**1,..., **z***q*] be the projection matrix composed of *q* largest eigenvectors for 2DLDA. Given a *m* by *n* matrix **A**, its projection onto the principal subspace spanned by **z***<sup>i</sup>* is then given by

$$\mathbf{V} = \mathbf{A}\mathbf{Z}.\tag{33}$$

The result of this projection **V** is another matrix of size *m* by *q*. Like 2DPCA, this procedure takes a matrix as input and outputs another matrix. These two techniques can be further combined, their combination is explained in the next section.

#### **5.3 2DPCA+2DLDA**

In this section, we apply an 2DLDA within the well-known frameworks for face recognition, the LDA of PCA-based feature (Zhao, Chellappa & Krishnaswamy, 1998). This framework consists of 2DPCA and 2DLDA steps, namely 2DPCA+2DLDA. From Section 2, we obtain a linear transformation matrix **X** on which each input face image **A** is projected. At the 2DPCA step, a feature matrix **Y** is obtained. The matrix **Y** is then used as the input for the 2DLDA step. Thus, the evaluation of within and between-class scatter matrices in this step will be slightly changed. From Eqs. (30) and (31), the image matrix **A** is substituted for the 2DPCA feature matrix **Y** as follows

$$\tilde{\mathbf{S}}\_b^Y = \sum\_{i=1}^K \frac{n\_i}{K} (\tilde{\mathbf{Y}}\_i - \tilde{\mathbf{Y}})^T (\tilde{\mathbf{Y}}\_i - \tilde{\mathbf{Y}}) \tag{34}$$

$$\tilde{\mathbf{S}}\_{w}^{Y} = \sum\_{i=1}^{K} \frac{n\_{i}}{\mathbf{K}} \sum\_{\mathbf{Y}\_{k} \in \omega\_{l}} (\mathbf{Y}\_{k} - \mathbf{\bar{Y}}\_{i})^{T} (\mathbf{Y}\_{k} - \mathbf{\bar{Y}}\_{i}) \tag{35}$$

where **Y***<sup>k</sup>* is the feature matrix of the *k*-th image matrix **A***k*, **Y**¯ *<sup>i</sup>* be the average of **Y***<sup>k</sup>* which belong to class *ω<sup>i</sup>* and **Y**¯ denotes a overall mean of **Y**,

$$\bar{\mathbf{Y}} = \frac{1}{M} \sum\_{k=1}^{M} \mathbf{Y}\_k. \tag{36}$$

10 Will-be-set-by-IN-TECH

where *ni* and **A¯** *<sup>i</sup>* are the number of elements and the expected value of class *ω<sup>i</sup>* respectively.

Then the optimal projection vector can be found by solving the following generalized

Again the SVD algorithm can be applied to solve this eigenvalue problem on the matrix **S**˜ <sup>−</sup><sup>1</sup>

Note that, in this size of scatter matrices involved in eigenvalue decomposition process is also become *n* by *n*. Thus, with the limited the training set, this decomposition is more reliably

The number of projection vectors is then selected by the same procedure as in Eq. (9). Let **Z** = [**z**1,..., **z***q*] be the projection matrix composed of *q* largest eigenvectors for 2DLDA. Given a *m* by *n* matrix **A**, its projection onto the principal subspace spanned by **z***<sup>i</sup>* is then given by

The result of this projection **V** is another matrix of size *m* by *q*. Like 2DPCA, this procedure takes a matrix as input and outputs another matrix. These two techniques can be further

In this section, we apply an 2DLDA within the well-known frameworks for face recognition, the LDA of PCA-based feature (Zhao, Chellappa & Krishnaswamy, 1998). This framework consists of 2DPCA and 2DLDA steps, namely 2DPCA+2DLDA. From Section 2, we obtain a linear transformation matrix **X** on which each input face image **A** is projected. At the 2DPCA step, a feature matrix **Y** is obtained. The matrix **Y** is then used as the input for the 2DLDA step. Thus, the evaluation of within and between-class scatter matrices in this step will be slightly changed. From Eqs. (30) and (31), the image matrix **A** is substituted for the 2DPCA

where **Y***<sup>k</sup>* is the feature matrix of the *k*-th image matrix **A***k*, **Y**¯ *<sup>i</sup>* be the average of **Y***<sup>k</sup>* which

*M* ∑ *k*=1

**Y¯** <sup>=</sup> <sup>1</sup> *M*

*<sup>K</sup>* (**A¯** *<sup>i</sup>* <sup>−</sup> **A¯** )*T*(**A¯** *<sup>i</sup>* <sup>−</sup> **A¯** ) (30)

(**A***<sup>k</sup>* <sup>−</sup> **A¯** *<sup>i</sup>*)*T*(**A***<sup>k</sup>* <sup>−</sup> **A¯** *<sup>i</sup>*), (31)

**S**˜ *<sup>b</sup>***z** = *λ***S**˜ *<sup>w</sup>***z**. (32)

**V** = **AZ**. (33)

*<sup>K</sup>* (**Y¯** *<sup>i</sup>* <sup>−</sup> **Y¯**)*T*(**Y¯** *<sup>i</sup>* <sup>−</sup> **Y¯**) (34)

(**Y***<sup>k</sup>* <sup>−</sup> **Y¯** *<sup>i</sup>*)*T*(**Y***<sup>k</sup>* <sup>−</sup> **Y¯** *<sup>i</sup>*) (35)

**Y***k*. (36)

*<sup>w</sup>* **<sup>S</sup>**˜ *<sup>b</sup>*.

Furthermore, **S**˜ *<sup>b</sup>* and **S**˜ *<sup>w</sup>* can be evaluated as follows:

**A¯** denotes the overall mean.

eigenvalue problem:

**5.3 2DPCA+2DLDA**

feature matrix **Y** as follows

**S**˜ *<sup>b</sup>* =

**S**˜ *<sup>w</sup>* =

combined, their combination is explained in the next section.

**S**˜ *Y <sup>b</sup>* =

**S**˜ *Y <sup>w</sup>* =

belong to class *ω<sup>i</sup>* and **Y**¯ denotes a overall mean of **Y**,

*K* ∑ *i*=1

*K* ∑ *i*=1 *ni*

*ni <sup>K</sup>* ∑ **Y***k*∈*ω<sup>i</sup>*

*K* ∑ *i*=1

*K* ∑ *i*=1 *ni*

*ni <sup>K</sup>* ∑ **A***k*∈*ω<sup>i</sup>*

than the eigenvalue decomposition based on the classical covariance matrix.

The 2DLDA optimal projection matrix **Z** can be obtained by solving the eigenvalue problem in Eq. (32). Finally, the composite linear transformation matrix, **L=XZ**, is used to map the face image space into the classification space by,

$$\mathbf{D} = \mathbf{A}\mathbf{L}.\tag{37}$$

The matrix **D** is 2DPCA+2DLDA feature matrix of image **A** with dimension *m* by *q*. However, the number of 2DLDA feature vectors *q* cannot exceed the number of principal component vectors *d*. In general case (*q* < *d*), the dimension of **D** is less than **Y** in Section 2. Thus, 2DPCA+2DLDA can reduce the classification time compared to 2DPCA.

#### **5.4 Class-specific subspace-based two-dimensional principal component analysis**

2DPCA is a unsupervised technique that is no information of class labels are considered. Therefore, the directions that maximize the scatter of the data from all training samples might not be as adequate to discriminate between classes. In recognition task, a projection that emphasize the discrimination between classes is more important. The extension of Eigenface, PCA-based, was proposed by using alternative way to represent by projecting to Class-Specific Subspace (CSS) (Shan et al., 2003). In conventional PCA method, the images are analyzed on the features extracted in a low-dimensional space learned from all training samples from all classes. While each subspaces of CSS learned from training samples from one class. In this way, the CSS representation can provide a minimum reconstruction error. The reconstruction error is used to classify the input data via the Distance From CSS (DFCSS). Less DFCSS means more probability that the input data belongs to the corresponding class.

This extension was based on Sanguansat et al. (2006a). Let **G***<sup>k</sup>* be the image covariance matrix of the *kth* CSS. Then **G***<sup>k</sup>* can be evaluated by

$$\mathbf{G}\_{k} = \frac{1}{M} \sum\_{\mathbf{A}\_{\mathcal{C}} \in \omega\_{k}} (\mathbf{A}\_{\mathcal{C}} - \mathbf{\bar{A}}\_{k})^{T} (\mathbf{A}\_{\mathcal{C}} - \mathbf{\bar{A}}\_{k}) \, \tag{38}$$

where **A¯** *<sup>k</sup>* is the average image of class *ωk*. The *kth* projection matrix **X***<sup>k</sup>* is a *n* by *dk* projection matrix which composed by the eigenvectors of **G***<sup>k</sup>* corresponding to the *dk* largest eigenvalues. The *kth* CSS of 2DPCA was represented as a 3-tuple:

$$
\mathfrak{R}\_k^{2DPCA} = \{ \mathbf{X}\_{k\prime} \mathbf{\bar{A}}\_{k\prime} d\_k \} \tag{39}
$$

Let **S** be a input sample and **U***<sup>k</sup>* be a feature matrix which projected to the *kth* CSS, by

$$\mathbf{U}\_{k} = \mathbf{W}\_{k}\mathbf{X}\_{k'} \tag{40}$$

where **<sup>W</sup>***<sup>k</sup>* <sup>=</sup> **<sup>S</sup>** <sup>−</sup> **A¯** *<sup>k</sup>*. Then the reconstruct image **<sup>W</sup>***<sup>r</sup> <sup>k</sup>* can be evaluates by

$$\mathbf{W}\_k^r = \mathbf{U}\_k \mathbf{X}\_k^T. \tag{41}$$

Therefore, the DFCSS is defined by reconstruction error as follows

$$\varepsilon\_k(\mathbf{W}\_{k'}^r, \mathbf{S}) = \sum\_{m=1}^{n\_{\text{coul}}} \sum\_{n=1}^{n\_{\text{col}}} \left| \mathbf{w}\_{(m,n)\_k}^r - \mathbf{s}\_{(m,n)} \right|. \tag{42}$$

If *ε<sup>t</sup>* = min 1≤*k*≤*K* (*εk*) then the input sample **S** is belong to class *ωt*.

useful block or structure information for recognition in original images. The sample diagonal

Two-Dimensional Principal Component Analysis and Its Extensions 13

Experimental results on a subset of FERET database (Zhang et al., 2006) show that DiaPCA is more accurate than both PCA and 2DPCA. Furthermore, it is shown that the accuracy can be

In PCA, the covariance matrix provides a measure of the strength of the correlation of all pixel pairs. Because of the limit of the number of training samples, thus this covariance cannot be well estimated. While the performance of 2DPCA is better than PCA, although all of the correlation information of pixel pairs are not employed for estimating the image covariance matrix. Nevertheless, the disregard information may possibly include the useful information. Sanguansat et al. (2007a) proposed a framework for investigating the information which was neglected by original 2DPCA technique, so called Image Cross-Covariance Analysis (ICCA). To achieve this point, the *image cross-covariance matrix* is defined by two variables, the first variable is the original image and the second one is the shifted version of the former. By our shifting algorithm, many image cross-covariance matrices are formulated to cover all of the information. The Singular Value Decomposition (SVD) is applied to the image cross-covariance matrix for obtaining the optimal projection matrices. And we will show that these matrices can be considered as the orthogonally rotated projection matrices of traditional 2DPCA. ICCA is different from the original 2DPCA on the fact that the transformations of our

First of all, the relationship between 2DPCA's image covariance matrix **G**, in Eq. (5), and PCA's

For illustration, let the dimension of all training images are 3 by 3. Thus, the covariance matrix of these images will be a 9 by 9 matrix and the dimension of image covariance matrix is only

From Eq. (43), each elements of **G** is the sum of all the same label elements in **C**, for example:

**G**(**1**, **1**) = **C**(**1**, **1**) + **C**(**2**, **2**) + **C**(**3**, **3**),

**G**(**1**, **3**) = **C**(**1**, **7**) + **C**(**2**, **8**) + **C**(**3**, **9**).

It should be note that the total power of image covariance matrix equals and traditional

From this point of view in Eq. (43), we can see that image covariance matrix is collecting the classification information only 1/*m* of all information collected in traditional covariance matrix. However, there are the other (*m* − 1)/*m* elements of the covariance matrix still be not

**C** (*m* (*i* − 1) + *k*, *m* (*j* − 1) + *k*) (43)

**G**(**1**, **2**) = **C**(**1**, **4**) + **C**(**2**, **5**) + **C**(**3**, **6**), (44)

*tr*(**G**) = *tr*(**C**). (45)

*th* column element of matrix **G** and matrix **C**,

face images on Yale database are displayed in Fig. 4.

**6.2 Image cross-covariance analysis**

further improved by combining DiaPCA and 2DPCA together.

method are generalized transformation of the original 2DPCA.

*m* ∑ *k*=1

*th* row, *j*

**G** (*i*, *j*) =

covariance matrix **C** can be considered as

respectively. And *m* is the height of the image.

where **G**(*i*, *j*) and **C**(*i*, *j*) are the *i*

covariance matrix **C** are identical,

3 by 3, as shown in Fig. 5.

Fig. 1. CSS-based 2DPCA diagram.

For illustration, we assume that there are 4 classes, as shown in Fig. 1. The input image must be normalized with the averaging images of all 4 classes. And then project to 2DPCA subspaces of each class. After that the image is reconstructed by the projection matrices (**X**) in each class. The DFCSS is used now to measure the similarity between the reconstructed image and the normalized original image on each CSS. From Fig. 1, the DFCSS of the first class is minimum, thus we decide this input image is belong to the first class.
