Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary Gradient Pattern for Face Recognition in Video Surveillance Area

*Nuzrul Fahmi Nordin, Samsul Setumin, Abduljalil Radman and Shahrel Azmin Suandi*

## **Abstract**

An excellent face recognition for a surveillance camera system requires remarkable and robust face descriptor. Binary gradient pattern (BGP) descriptor is one of the ideal descriptors for facial feature extraction. However, exploiting local features merely from smaller region or microstructure does not capture a complete facial feature. In this paper, an extended binary gradient pattern (eBGP) is proposed to capture both micro- and macrostructure information of a local region to boost up the descriptor performance and discriminative power. Two topologies, the patchbased and circular-based topologies, are incorporated with the eBGP to test its robustness against illumination, image quality, and uncontrolled capture conditions using the SCface database. Experimental results show that the fusion between micro- and macrostructure information significantly boosts up the descriptor performance. It also illustrates that the proposed eBGP descriptor outperforms the conventional BGP on both the patch-based topology and the circular-based topology. Furthermore, a fusion of information from two different image types, orientational image gradient magnitude (OIGM) and grayscale image, attained better performance than using OIGM image only. The overall results indicate that the proposed eBGP descriptor improves the recognition performance with respect to the baseline BGP descriptor.

**Keywords:** surveillance system, face recognition, binary gradient pattern (BGP), facial feature extraction, patch-based topology, circular-based topology

## **1. Introduction**

Face recognition is one of the biometric verification methods that offers a wide range of applications such as law enforcement, forensics, biometric authentication, surveillance, and health monitoring [1]. Face recognition has also been used

to authenticate payment using mobile wallet, and the social media company like Facebook uses face recognition algorithm for the purpose of image tagging [2]. One of the advantages of face recognition is being contactless between the subject and camera. Given the advantages offered by face recognition and with the advancement in computing power, significant research and methods have been proposed over the years in face recognition domain. In fact, a robust facial recognition system must be able to work with various real-life situations or unconstrained conditions, such as but not limited to pose, lighting, image or camera quality, occlusion, rotation, and translation. The system must also be able to perform extremely well in a domain where limited sample is available. In surveillance monitoring applications, a typical approach is to sample face appearing in videos and then match them with facial models generated from high-quality target face image [3, 4].

Feature extraction is the process of capturing feature of interest from the face and represents it in the form of feature vector. The extraction process is usually done by a face descriptor. This descriptor must be able to work with multiple variations such as illumination, occlusion, face expression, and image quality [4]. Indeed, there is a collection of face descriptors proposed over the years such as scale-invariant feature transform (SIFT) [5], speeded up robust feature (SURF) [6], local binary pattern (LBP) [7], and histogram of oriented gradient (HOG) [8]. In terms of facial feature representation, there are two types of representations that many descriptors have evolved around over the years. They are global and local feature representations. Global-based feature extraction like principal component analysis [9], linear discriminant analysis [10], and independent component analysis [11] preserves the statistical information of the face by turning each face image into a high-dimensional feature vector. Meanwhile, local-based feature splits input image into smaller patches and extracts the micro textural details from each patch before fusing these features back to form the global shape information. Local-based feature extraction has shown to be resilient to multiple variations by enforcing spatial locality in both pixel and patch levels. For instance, local feature descriptor is robust to local deformation in expression and occlusion. LBP [7] is an example of feature extraction method that works on this principle which achieved reasonably good performance but heuristic in nature. Recently, LBP has drawn great intention as a face descriptor due its reputation as a powerful texture descriptor [9]. LBP extracts local-based spatial structure of an image by thresholding intensity of center pixel with its neighborhood. The product of this operation is characterized as local binary pattern, which then the distribution of binary pattern over the whole image is used to form the LBP histogram vector or feature vector. Neighborhood pixels are sampled on a circle, and any neighbor which does not fall exactly on the center of the pixel has an intensity computed from interpolation [7]. Due to some shortcomings of LBP, for instance, LBP produces long histogram, and therefore it is memory-consuming [12], LBP is very sensitive for image rotation and noise [13], and it only captures microstructure and ignores macrostructure of the texture resulting in missing extra discriminative power [14]. Several variants of LBP have been proposed in the literature, for example, rotation-invariant LBP [13], median robust extended local binary pattern (MRELBP) [15], and binary gradient pattern (BGP) [14]. This paper touches on a number of relevant existing LBP-based descriptors. The rest of this paper is organized as follows. In Section 2, two state-of-the-art descriptors (the LBP [7] and its variant, the BGP [14]) would be briefly reviewed since we would embed the proposed extended BGP (eBGP) into these two descriptors. Section 3 describes the proposed eBGP descriptor. The evaluating results are analyzed and discussed in Section 4. Finally, conclusions are drawn in Section 5.

**139**

**Figure 1.**

*LBP neighborhood and thresholding.*

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary…*

**2. From local binary pattern (LBP) to binary gradient pattern (BGP)**

entire image is represented in a histogram as a feature vector:

the texture resulting in missing extra discriminative power.

*i*=0 *P*−1

*s*(*gi* − *gc*)2*<sup>i</sup>*

(*P* = 8) histogram bins, while for (16,2) neighborhood, 216 histogram bins are produced. This is a significant drawback as LBP produces long histogram and therefore memory-consuming. The LBP is also intolerant to image rotation and highly sensitive to noise where noise on the center pixel will dominate local characteristic [12]. Furthermore, the LBP only captures microstructure and ignores macrostructure of

The success of LBP has continued since then. A variety of LBP-based descriptors have been proposed recently to overcome all shortcomings toward noise, illumination, color, and temporal information. Huang and Yin [14] proposed an improved version of LBP, called binary gradient pattern (BGP), by introducing structural pattern and image gradient orientation (IGO) implementation in multiple directions rather than on X and Y directions only, as in the conventional manner. The implementation of IGO in multiple directions helps to improve discriminative power of the proposed descriptor. **Figure 2** shows how BGP encodes binary string

where *gi* and *gc* are the gray values of the center pixel and its neighbors, respectively, *P* is the number of neighbors, and *R* is the radius of the neighborhood. LBP offers few advantages in terms of low computational complexity, illumination invariant, and ease of implementation, but it has significant disadvantages. In LBP implementation, the individual operator of particular (*P*,*R*) produces different histogram length. For instance, in (8,1) neighborhood, LBP generates 2*<sup>P</sup>*

,*s*(*x*) <sup>=</sup> {

1,*x* ≥ 0

0,*<sup>x</sup>* <sup>&</sup>lt; <sup>0</sup> (1)

= 256

*LBPR*,*P*(*c*) = ∑

LBP [7] is one of various texture descriptors and is known for being computationally efficient [16]. It extracts local-based spatial structure of an image by thresholding intensity of center pixel with its neighborhood pixel *P* within a radius *R*. The product of this operation is characterized as local binary pattern, which then the distribution of binary pattern over the whole image is used to form the LBP histogram vector or feature vector. The original LBP works on 3 × 3 square neighborhood and only considers the sign information to form the LBP pattern. Neighborhood pixels are sampled on a circle, and any neighbor which does not fall exactly on the center of the pixel has an intensity computed from interpolation [7]. **Figure 1(a)** illustrates LBP neighborhoods around the center pixel with *R* = 1. Assuming all the pixels hold values as in **Figure 1(b)**, thresholding all eight neighborhood pixels with the center pixel using Eq. (1) will produce the result as in **Figure 1(c)**. This binary string is then multiplied with weights, and the sum of these values corresponds to the LBP label for that particular pixel. The distribution of LBP labels across the

*DOI: http://dx.doi.org/10.5772/intechopen.86473*

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary… DOI: http://dx.doi.org/10.5772/intechopen.86473*

## **2. From local binary pattern (LBP) to binary gradient pattern (BGP)**

LBP [7] is one of various texture descriptors and is known for being computationally efficient [16]. It extracts local-based spatial structure of an image by thresholding intensity of center pixel with its neighborhood pixel *P* within a radius *R*. The product of this operation is characterized as local binary pattern, which then the distribution of binary pattern over the whole image is used to form the LBP histogram vector or feature vector. The original LBP works on 3 × 3 square neighborhood and only considers the sign information to form the LBP pattern. Neighborhood pixels are sampled on a circle, and any neighbor which does not fall exactly on the center of the pixel has an intensity computed from interpolation [7]. **Figure 1(a)** illustrates LBP neighborhoods around the center pixel with *R* = 1. Assuming all the pixels hold values as in **Figure 1(b)**, thresholding all eight neighborhood pixels with the center pixel using Eq. (1) will produce the result as in **Figure 1(c)**. This binary string is then multiplied with weights, and the sum of these values corresponds to the LBP label for that particular pixel. The distribution of LBP labels across the entire image is represented in a histogram as a feature vector:

$$LBP\_{R,P}(\mathfrak{c}) = \sum\_{i=0}^{P-1} \mathfrak{s} \left( \mathfrak{g}\_i - \mathfrak{g}\_{\mathcal{C}} \right) \mathfrak{2}^i, \mathfrak{s} \left( \mathfrak{x} \right) = \begin{cases} 1, \mathfrak{x} \succeq \mathbf{0} \\ 0, \mathfrak{x} < \mathbf{0} \end{cases} \tag{1}$$

where *gi* and *gc* are the gray values of the center pixel and its neighbors, respectively, *P* is the number of neighbors, and *R* is the radius of the neighborhood. LBP offers few advantages in terms of low computational complexity, illumination invariant, and ease of implementation, but it has significant disadvantages. In LBP implementation, the individual operator of particular (*P*,*R*) produces different histogram length. For instance, in (8,1) neighborhood, LBP generates 2*<sup>P</sup>* = 256 (*P* = 8) histogram bins, while for (16,2) neighborhood, 216 histogram bins are produced. This is a significant drawback as LBP produces long histogram and therefore memory-consuming. The LBP is also intolerant to image rotation and highly sensitive to noise where noise on the center pixel will dominate local characteristic [12]. Furthermore, the LBP only captures microstructure and ignores macrostructure of the texture resulting in missing extra discriminative power.

The success of LBP has continued since then. A variety of LBP-based descriptors have been proposed recently to overcome all shortcomings toward noise, illumination, color, and temporal information. Huang and Yin [14] proposed an improved version of LBP, called binary gradient pattern (BGP), by introducing structural pattern and image gradient orientation (IGO) implementation in multiple directions rather than on X and Y directions only, as in the conventional manner. The implementation of IGO in multiple directions helps to improve discriminative power of the proposed descriptor. **Figure 2** shows how BGP encodes binary string

**Figure 1.** *LBP neighborhood and thresholding.*

*Visual Object Tracking with Deep Neural Networks*

to authenticate payment using mobile wallet, and the social media company like Facebook uses face recognition algorithm for the purpose of image tagging [2]. One of the advantages of face recognition is being contactless between the subject and camera. Given the advantages offered by face recognition and with the advancement in computing power, significant research and methods have been proposed over the years in face recognition domain. In fact, a robust facial recognition system must be able to work with various real-life situations or unconstrained conditions, such as but not limited to pose, lighting, image or camera quality, occlusion, rotation, and translation. The system must also be able to perform extremely well in a domain where limited sample is available. In surveillance monitoring applications, a typical approach is to sample face appearing in videos and then match them with

Feature extraction is the process of capturing feature of interest from the face and represents it in the form of feature vector. The extraction process is usually done by a face descriptor. This descriptor must be able to work with multiple variations such as illumination, occlusion, face expression, and image quality [4]. Indeed, there is a collection of face descriptors proposed over the years such as scale-invariant feature transform (SIFT) [5], speeded up robust feature (SURF) [6], local binary pattern (LBP) [7], and histogram of oriented gradient (HOG) [8]. In terms of facial feature representation, there are two types of representations that many descriptors have evolved around over the years. They are global and local feature representations. Global-based feature extraction like principal component analysis [9], linear discriminant analysis [10], and independent component analysis [11] preserves the statistical information of the face by turning each face image into a high-dimensional feature vector. Meanwhile, local-based feature splits input image into smaller patches and extracts the micro textural details from each patch before fusing these features back to form the global shape information. Local-based feature extraction has shown to be resilient to multiple variations by enforcing spatial locality in both pixel and patch levels. For instance, local feature descriptor is robust to local deformation in expression and occlusion. LBP [7] is an example of feature extraction method that works on this principle which achieved reasonably good performance but heuristic in nature. Recently, LBP has drawn great intention as a face descriptor due its reputation as a powerful texture descriptor [9]. LBP extracts local-based spatial structure of an image by thresholding intensity of center pixel with its neighborhood. The product of this operation is characterized as local binary pattern, which then the distribution of binary pattern over the whole image is used to form the LBP histogram vector or feature vector. Neighborhood pixels are sampled on a circle, and any neighbor which does not fall exactly on the center of the pixel has an intensity computed from interpolation [7]. Due to some shortcomings of LBP, for instance, LBP produces long histogram, and therefore it is memory-consuming [12], LBP is very sensitive for image rotation and noise [13], and it only captures microstructure and ignores macrostructure of the texture resulting in missing extra discriminative power [14]. Several variants of LBP have been proposed in the literature, for example, rotation-invariant LBP [13], median robust extended local binary pattern (MRELBP) [15], and binary gradient pattern (BGP) [14]. This paper touches on a number of relevant existing LBP-based descriptors. The rest of this paper is organized as follows. In Section 2, two state-of-the-art descriptors (the LBP [7] and its variant, the BGP [14]) would be briefly reviewed since we would embed the proposed extended BGP (eBGP) into these two descriptors. Section 3 describes the proposed eBGP descriptor. The evaluating results are analyzed and

facial models generated from high-quality target face image [3, 4].

discussed in Section 4. Finally, conclusions are drawn in Section 5.

**138**

**Figure 2.** *Basic BGP operator with eight neighbors.*

from a region of interest (ROI). Given a set of grayscale intensity value of 9 pixels as in **Figure 2(a)**, BGP computes binary correlations between symmetric neighbors of central pixel from multiple *k* directions. With the number of neighbors always twice than the number of directions *k*, in (8,1) spatial resolution, there are four different thresholding directions denoted as G1, G2, G3, and G4 as shown in **Figure 2(b)**. Principal binary, *Bi* + , is computed from all directions using Eq. (2), and its associated binary *Bi* − from Eq. (3), where *Gi* + and *Gi* − are intensity values of the pixels. The resulting principal binary numbers and its associated are shown in **Figure 2(c)**:

$$\begin{aligned} \text{minimize } & \mathbf{a} = \mathbf{a} \mathbf{a}^+ \mathbf{a} = \mathbf{a} \mathbf{a}^+ \mathbf{a} = \mathbf{a} \mathbf{a}^+ \mathbf{a} = \mathbf{a} \mathbf{a}^+ \mathbf{a} = \mathbf{a}^+ \mathbf{a} = \mathbf{a}^+ \mathbf{a}^- \\\\ & B\_i^+ = \begin{cases} \mathbf{1}, \text{if } G\_i^+ - G\_i^- \succeq \mathbf{0} \\ \mathbf{0}, \text{if } G\_i^+ - G\_i^- < \mathbf{0} \end{cases} \end{aligned} \tag{2}$$

$$B\_i^- = \mathbb{1} - B\_i^+, i = \mathbb{1}, \mathbb{2}, \dots, k \tag{3}$$

$$L = \sum\_{i=1}^{k} 2^{i-1} B\_i^\* \tag{4}$$

**141**

**Figure 3.**

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary…*

To further enhance discriminative power and robustness of BGP, Huang and Yin [14] introduced another descriptor by applying BGP on orientational image gradient magnitude (OIGM) which is abbreviated as BGPM. The use of image gradient magnitude (IGM) enhances the strength of edge information which effectively allows BGPM to gain greater discriminant ability with only small increment in complexity. The overall process of BGPM descriptor is depicted in **Figure 4**.

*Face and location maps of eight structural patterns (SP00-SP15) and nonstructural pattern.*

Based on a series of results obtained from multiple databases such as Extended Yale B [17], AR [18], CMU Multi-PIE [19], FERET [20], and LFW [21] against a wide range of descriptors, BGPM is proven to be the best descriptor for each database. The BGPM descriptor has achieved invariance against illumination changes and local distortions while reducing the vector dimensionality. BGP compact representation makes BGP extremely fast and uses much fewer pattern labels than LBP at any spatial resolution. For instance, in a system with spatial resolution of (8,1), BGP histogram only needs 9 bins, with 8 bins for structural patterns, and 1 bin for nonstructural patterns, in contrast to the LBP which requires 59 bins. BGP and BGPM have been demonstrated to possess strong spatial locality and orienta-

tion properties which lead to effective discrimination.

*DOI: http://dx.doi.org/10.5772/intechopen.86473*

Binary string for the ROI is constructed from four principal binary numbers which is equivalent to 0111, and the label *L* is computed from Eq. (4). Because the principal and associated binary numbers are always complementary, only a single bit is required to describe the direction, this allowing for more compact representation of BGP label by only considering principal binary numbers. The total number of BGP label *NL* is determined by the numbers of principal binary only, which is also equivalent to the number of directions *k*. At any spatial resolution, *NL* equals to 2*k*. Using **Figure 2(b)** as an example, features extracted from four directions in (8,1), spatial resolution will produce 24 or 16 different labels (i.e., from 0000 to 111/from 0 to 15). Structural pattern is a binary string which has continuous "1"s indicating a stable local change in texture and essentially describes the orientation of local edge texture. On the other hand, a nonstructural pattern is a binary string with a discontinuous "1"s, which contains arbitrary changes of local texture which is likely to indicate noise or outliers. From statistical experiment conducted by Huang and Yin [14] on 2600 face images, 95% of the patterns in typical BGP face having continuous "1"s.

The number of structural labels *Nsp* at any spatial resolution equals to the number of neighbors *P*. With eight neighbors, there will be 16 different labels where eight of it made up a structural label and the remaining belong to nonstructural label. For example, 0000, 0001, 0011, 0111, 1000, 1100, 1110, and 1111 are structural patterns in BGP8,1, and each structural pattern location map is illustrated in **Figure 3**. In BGP implementation, nonstructural patterns are discarded and not given a label in contrast to nonuniform pattern in LBP implementation. Location map of nonstructural patterns in **Figure 3** shows that nonstructural patterns contain less meaningful information and are often caused by noise and outliers.

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary… DOI: http://dx.doi.org/10.5772/intechopen.86473*

**Figure 3.**

*Visual Object Tracking with Deep Neural Networks*

Principal binary, *Bi*

−

*Basic BGP operator with eight neighbors.*

*Bi*

ated binary *Bi*

**Figure 2.**

+

*Bi*

*L* = ∑

from Eq. (3), where *Gi*

<sup>−</sup> = 1 − *Bi* +

from a region of interest (ROI). Given a set of grayscale intensity value of 9 pixels as in **Figure 2(a)**, BGP computes binary correlations between symmetric neighbors of central pixel from multiple *k* directions. With the number of neighbors always twice than the number of directions *k*, in (8,1) spatial resolution, there are four different thresholding directions denoted as G1, G2, G3, and G4 as shown in **Figure 2(b)**.

> + and *Gi* −

resulting principal binary numbers and its associated are shown in **Figure 2(c)**:

<sup>+</sup> <sup>=</sup> {

*i*=1 *k* 2*<sup>i</sup>*−1 *Bi*

Binary string for the ROI is constructed from four principal binary numbers which is equivalent to 0111, and the label *L* is computed from Eq. (4). Because the principal and associated binary numbers are always complementary, only a single bit is required to describe the direction, this allowing for more compact representation of BGP label by only considering principal binary numbers. The total number of BGP label *NL* is determined by the numbers of principal binary only, which is also equivalent to the number of directions *k*. At any spatial resolution, *NL* equals to 2*k*. Using **Figure 2(b)** as an example, features extracted from four directions in (8,1), spatial resolution will produce 24 or 16 different labels (i.e., from 0000 to 111/from 0 to 15). Structural pattern is a binary string which has continuous "1"s indicating a stable local change in texture and essentially describes the orientation of local edge texture. On the other hand, a nonstructural pattern is a binary string with a discontinuous "1"s, which contains arbitrary changes of local texture which is likely to indicate noise or outliers. From statistical experiment conducted by Huang and Yin [14] on 2600 face images, 95% of the patterns in typical BGP face having

The number of structural labels *Nsp* at any spatial resolution equals to the number of neighbors *P*. With eight neighbors, there will be 16 different labels where eight of it made up a structural label and the remaining belong to nonstructural label. For example, 0000, 0001, 0011, 0111, 1000, 1100, 1110, and 1111 are structural patterns in BGP8,1, and each structural pattern location map is illustrated in **Figure 3**. In BGP implementation, nonstructural patterns are discarded and not given a label in contrast to nonuniform pattern in LBP implementation. Location map of nonstructural patterns in **Figure 3** shows that nonstructural patterns contain less meaningful information and are often caused by noise and outliers.

1,*ifGi*

, is computed from all directions using Eq. (2), and its associ-

<sup>+</sup> − *Gi* <sup>−</sup> ≥ 0 0,*ifGi* <sup>+</sup> − *Gi*

are intensity values of the pixels. The

<sup>+</sup> (4)

,*i* = 1, 2,…,*k* (3)

<sup>−</sup> <sup>&</sup>lt; <sup>0</sup> (2)

**140**

continuous "1"s.

*Face and location maps of eight structural patterns (SP00-SP15) and nonstructural pattern.*

To further enhance discriminative power and robustness of BGP, Huang and Yin [14] introduced another descriptor by applying BGP on orientational image gradient magnitude (OIGM) which is abbreviated as BGPM. The use of image gradient magnitude (IGM) enhances the strength of edge information which effectively allows BGPM to gain greater discriminant ability with only small increment in complexity. The overall process of BGPM descriptor is depicted in **Figure 4**.

Based on a series of results obtained from multiple databases such as Extended Yale B [17], AR [18], CMU Multi-PIE [19], FERET [20], and LFW [21] against a wide range of descriptors, BGPM is proven to be the best descriptor for each database. The BGPM descriptor has achieved invariance against illumination changes and local distortions while reducing the vector dimensionality. BGP compact representation makes BGP extremely fast and uses much fewer pattern labels than LBP at any spatial resolution. For instance, in a system with spatial resolution of (8,1), BGP histogram only needs 9 bins, with 8 bins for structural patterns, and 1 bin for nonstructural patterns, in contrast to the LBP which requires 59 bins. BGP and BGPM have been demonstrated to possess strong spatial locality and orientation properties which lead to effective discrimination.

Although BGP has shown to be efficient in processing time and achieving outstanding results in several databases, BGP was never being tested with a proper surveillance database like [22], which consists of low-resolution non-frontal face images taken by different camera quality. Like most of other local-based descriptors, BGP exploits information from microstructure only, however exploiting facial feature from macrostructure to complement the microstructure feature resulting in a more complete image representation [23–24], especially for surveillance applications where noise, occlusion, and head position might impact the descriptor performance. In this paper, information from both micro- and macrostructures are captured and integrated into the BGP descriptor to boost up its performance for video surveillance applications. The new proposed descriptor is termed as an extended BGP (eBGP).

## **3. Extended binary gradient pattern (eBGP)**

An eBGP extends the BGP descriptor by exploiting macrostructure information from topology with larger spatial resolution. There are many different types of macrostructure topologies that have been proposed for other LBP variants [25]. In this paper, the patch-based topology with eight neighborhood patches and the circular topology are evolved with the proposed eBGP descriptor. Both topologies have been implemented by [24, 26], where each topology has its pros and cons with the implementation. Regardless of the topology, the microstructure information is always extracted using the same approach as in BGP. Herein, the eBGP is explained with the focus on extracting features from macrostructure based on the patch-based topology with eight neighborhood patches and the circular-based topology.

## **3.1 Patch-based topology**

Patch-based topology is inspired by multi-scale block local binary pattern (MBLBP) [24]. In this topology, macrostructure is made up of nine patches of pixels as in **Figure 5**. All these patches have the same size and width, while the center patch represents the ROI microstructure. Thereby, a default BGP operator is applied

**143**

follows:

**Figure 5.**

*pixels for* R *= 1.*

patch *P*.

*GP* = \_\_1

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary…*

to the center patch in order to extract the microstructure information, whereas the macrostructure information is extracted from the eight neighborhood patches. Accordingly, multiple sizes of patches could be selected from this topology, and the size of the structure is determined by the spatial resolution of the center patch.

*Topology for macrostructure information extraction. (a) Patch of 5 × 5 pixels for* R *= 2. (b) Patch of 3 × 3* 

For instance, when exploiting microstructure information from (8,1) spatial resolution, the size of the center patch will be 3 × 3 pixels as illustrated in **Figure 5(b)**. In this implementation, all patches have the same size and do not overlap each other; therefore the macrostructure is formed from nine patches of 3 × 3 pixels. **Figure 5(a)** depicts the macrostructure topology formed from 9 patches of 5 × 5 pixels when microstructure information is exploited from (16,2) spatial resolution. For comparison purposes, this research will evaluate two structures as illustrated in **Figure 5(a)** and **(b)**, to match BGP results exploited from (8,1) and (16,2) spatial resolution. Using **Figure 5(a)** as an example, each neighborhood patch contains 25 pixels with each pixel having its own grayscale value. Unlike the center patch, no feature is extracted from the individual neighborhood patch. Instead, each neighborhood patch is represented by a single intensity value which will be used for thresholding. In this topology, the patch's mean and median will be applied to represent the patch intensity. The patch's mean (G) of a neighborhood patch (P), accounted from 25 pixels in a single 5 × 5 patch, is computed as

> *<sup>n</sup>* ∑ *i*=1 *n*

where *x* is the intensity value of each pixel and *n* is the number of pixels in the

On the other hand, the patch median is computed by finding the middle value of ordered pixel values. Additional experiments are conducted in this research to find the best representation for the patch-based topology. As an example, feature extraction from macrostructure is illustrated in **Figure 6**. **Figure 6(a)** shows the patchbased topology with the size of 3 × 3 pixels and its intensity value. In each patch, a median is calculated from all pixels within the patch, and the median now represents the image intensity of the patch as shown in **Figure 6(b)**. The following steps are similar to what has been explained in BGP. By thresholding each patch with symmetric neighbors in four directions using Eqs. (2) and (3), four pairs of binary numbers are generated as shown in **Figure 6(c)**. Once all the principal bits are computed, the

*xi* (5)

*DOI: http://dx.doi.org/10.5772/intechopen.86473*

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary… DOI: http://dx.doi.org/10.5772/intechopen.86473*

#### **Figure 5.**

*Visual Object Tracking with Deep Neural Networks*

Although BGP has shown to be efficient in processing time and achieving outstanding results in several databases, BGP was never being tested with a proper surveillance database like [22], which consists of low-resolution non-frontal face images taken by different camera quality. Like most of other local-based descriptors, BGP exploits information from microstructure only, however exploiting facial feature from macrostructure to complement the microstructure feature resulting in a more complete image representation [23–24], especially for surveillance applications where noise, occlusion, and head position might impact the descriptor performance. In this paper, information from both micro- and macrostructures are captured and integrated into the BGP descriptor to boost up its performance for video surveillance applications. The new proposed descriptor is termed as an extended BGP (eBGP).

An eBGP extends the BGP descriptor by exploiting macrostructure information from topology with larger spatial resolution. There are many different types of macrostructure topologies that have been proposed for other LBP variants [25]. In this paper, the patch-based topology with eight neighborhood patches and the circular topology are evolved with the proposed eBGP descriptor. Both topologies have been implemented by [24, 26], where each topology has its pros and cons with the implementation. Regardless of the topology, the microstructure information is always extracted using the same approach as in BGP. Herein, the eBGP is explained with the focus on extracting features from macrostructure based on the patch-based

topology with eight neighborhood patches and the circular-based topology.

Patch-based topology is inspired by multi-scale block local binary pattern (MBLBP) [24]. In this topology, macrostructure is made up of nine patches of pixels as in **Figure 5**. All these patches have the same size and width, while the center patch represents the ROI microstructure. Thereby, a default BGP operator is applied

**3. Extended binary gradient pattern (eBGP)**

**142**

**Figure 4.**

*Framework of BGPM descriptor [14].*

**3.1 Patch-based topology**

*Topology for macrostructure information extraction. (a) Patch of 5 × 5 pixels for* R *= 2. (b) Patch of 3 × 3 pixels for* R *= 1.*

to the center patch in order to extract the microstructure information, whereas the macrostructure information is extracted from the eight neighborhood patches. Accordingly, multiple sizes of patches could be selected from this topology, and the size of the structure is determined by the spatial resolution of the center patch.

For instance, when exploiting microstructure information from (8,1) spatial resolution, the size of the center patch will be 3 × 3 pixels as illustrated in **Figure 5(b)**. In this implementation, all patches have the same size and do not overlap each other; therefore the macrostructure is formed from nine patches of 3 × 3 pixels. **Figure 5(a)** depicts the macrostructure topology formed from 9 patches of 5 × 5 pixels when microstructure information is exploited from (16,2) spatial resolution. For comparison purposes, this research will evaluate two structures as illustrated in **Figure 5(a)** and **(b)**, to match BGP results exploited from (8,1) and (16,2) spatial resolution. Using **Figure 5(a)** as an example, each neighborhood patch contains 25 pixels with each pixel having its own grayscale value. Unlike the center patch, no feature is extracted from the individual neighborhood patch. Instead, each neighborhood patch is represented by a single intensity value which will be used for thresholding. In this topology, the patch's mean and median will be applied to represent the patch intensity. The patch's mean (G) of a neighborhood patch (P), accounted from 25 pixels in a single 5 × 5 patch, is computed as follows:

$$G\_P = \frac{1}{n} \sum\_{i=1}^{n} \mathcal{X}\_i \tag{5}$$

where *x* is the intensity value of each pixel and *n* is the number of pixels in the patch *P*.

On the other hand, the patch median is computed by finding the middle value of ordered pixel values. Additional experiments are conducted in this research to find the best representation for the patch-based topology. As an example, feature extraction from macrostructure is illustrated in **Figure 6**. **Figure 6(a)** shows the patchbased topology with the size of 3 × 3 pixels and its intensity value. In each patch, a median is calculated from all pixels within the patch, and the median now represents the image intensity of the patch as shown in **Figure 6(b)**. The following steps are similar to what has been explained in BGP. By thresholding each patch with symmetric neighbors in four directions using Eqs. (2) and (3), four pairs of binary numbers are generated as shown in **Figure 6(c)**. Once all the principal bits are computed, the

**Figure 6.**

*Feature extraction from the macrostructure using median as the patch intensity.*

**Figure 7.**

*Patch-based feature extraction flow. The center patch represented by the orange box and the neighborhood patches by the purple boxes.*

label can be calculated using Eq. (4). In general, the flow for macrostructure extraction is like microstructure except for its representative value used during thresholding. Indeed, the microstructure information is extracted from neighborhood pixels, while the macrostructure information is extracted from neighborhood patches.

Since there are only eight neighbor patches, regardless of the structures' size, the generated histogram vector which represents the macrostructure information is bound to the maximum of 16 bins. Observing only a structural pattern will greatly reduce the dimensionality of macrostructure information to eight bins. The total length of the histogram vector (*Ht*) is computed as follows:

$$H\_t = \sum\_{k=1}^{N} \left< P\_R + \mathbf{8} \right>\_k \tag{6}$$

**145**

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary…*

Circular-based topology borrows the basic implementation of LBP which identifies a neighborhood as a set of pixels on a circular ring. In this topology, two levels of information are extracted from neighborhood at two different spatial resolutions. The first level of information is the microstructure information, which is extracted from a set of pixels on a circular ring with radius *R*1. Meanwhile, the macrostructure information is extracted from neighborhood pixels that lie on a circular ring of radius *R*2. The same BGP operator is used to extract information from the two different spatial resolutions with smaller spatial resolution that represents the microstructure and larger spatial resolution that represents the macrostructure. The visual illustration of circular-based topology implementation is presented in **Figure 9**. Circular rings with *R*1 and *R*2 represent the two different spatial resolutions (*P*;*R*). Assuming *R*1 is 1, running BGP descriptor on (8,1) neighborhood extracts the microstructure information of ROI. In this implementation, *R*2 is always larger than

*Sample image with 5 × 5 pixel patch-based structure: (a) the original image, (b) the image extracted using the microstructure, (c) the image extracted using the macrostructure based on the local median, and (d) the image* 

**Figure 10(a)** shows a sample of image intensity that falls on circular rings *R*<sup>1</sup> and *R*2 with spatial resolution (8,1) and (24,3), respectively. In this example, the microstructure information is extracted from 8 pixels, while the macrostructure information is extracted from 24 pixels as shown in **Figure 10(b)**. Using the same method in BGP, principal and associated bits are calculated using Eqs. (2) and (3) by thresholding symmetric neighbors in multiple directions. The computed binary pairs are shown in **Figure 10(c)** with 4 and 12 principal bits generated from 8 and 24 neighbors, respectively. Finally, label for both micro- and macrostructures is

In BGP scheme, the length of histogram vector is equal to the number of neighbors at any spatial resolution. Similar to the patch-based topology, the generated histogram vector which embeds micro- and macrostructure information is concatenated to form a final representation of features for each ROI. The total length of

(*P*<sup>1</sup> + *P*2)*<sup>k</sup>* (7)

*k*=1 *N*

where *N* is the number of blocks and *P*1 and *P*2 are the number of neighborhood pixels on the circular rings of radius *R*1 and *R*2, respectively. For instance, if *R*1 = 2 and *R*2 = 4, features are exploited from 16 and 32 neighborhood pixels, respectively. Thus, the combination of the two spatial resolutions will produce a histogram vector with a length of 48 at each *kth* block. Resulting from this observation, *R*2 is set to

*DOI: http://dx.doi.org/10.5772/intechopen.86473*

**3.2 Circular-based topology**

*extracted from the macrostructure using the local mean.*

**Figure 8.**

*R*1, and thus *R*2 must be set to any number >1.

histogram vector in this scheme can be computed using:

*Ht* = ∑

computed using Eq. (4).

where *N* is the number of blocks and *PR* is the number of neighborhood pixels used for extracting the microstructure information at the center patch and 8 is the length of the histogram vector extracted from the macrostructure. Using **Figure 6(b)** as an example, at each *kth* block, the length of histogram vector is 16, where 8 comes from the microstructure and the other 8 from the macrostructure.

Subsequently, information fusion between micro- and macrostructures is conducted through concatenating the feature vectors of both the microstructure and the macrostructure, as illustrated in **Figure 7**. At this point, both feature vectors are contributed by the same weight. **Figure 8** demonstrates an example of face image represented using the patch-based topology. **Figure 8** illustrates that eBGP on the patch-based topology capable to capture the micro textural details and the macrostructure provides complementary information to the small details. Moreover, the macrostructure information contains less detailed information and may reduce the noise or outlier embedded in the image.

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary… DOI: http://dx.doi.org/10.5772/intechopen.86473*

**Figure 8.**

*Visual Object Tracking with Deep Neural Networks*

*Patch-based feature extraction flow. The center patch represented by the orange box and the neighborhood* 

label can be calculated using Eq. (4). In general, the flow for macrostructure extraction is like microstructure except for its representative value used during thresholding. Indeed, the microstructure information is extracted from neighborhood pixels, while the macrostructure information is extracted from neighborhood patches. Since there are only eight neighbor patches, regardless of the structures' size, the generated histogram vector which represents the macrostructure information is bound to the maximum of 16 bins. Observing only a structural pattern will greatly reduce the dimensionality of macrostructure information to eight bins. The total

> *k*=1 *N*

where *N* is the number of blocks and *PR* is the number of neighborhood pixels used for extracting the microstructure information at the center patch and 8 is the length of the histogram vector extracted from the macrostructure. Using **Figure 6(b)** as an example, at each *kth* block, the length of histogram vector is 16, where 8 comes

Subsequently, information fusion between micro- and macrostructures is conducted through concatenating the feature vectors of both the microstructure and the macrostructure, as illustrated in **Figure 7**. At this point, both feature vectors are contributed by the same weight. **Figure 8** demonstrates an example of face image represented using the patch-based topology. **Figure 8** illustrates that eBGP on the patch-based topology capable to capture the micro textural details and the macrostructure provides complementary information to the small details. Moreover, the macrostructure information contains less detailed information and may reduce the

(*PR* + 8)*<sup>k</sup>* (6)

length of the histogram vector (*Ht*) is computed as follows:

*Feature extraction from the macrostructure using median as the patch intensity.*

from the microstructure and the other 8 from the macrostructure.

*Ht* = ∑

noise or outlier embedded in the image.

**144**

**Figure 7.**

**Figure 6.**

*patches by the purple boxes.*

*Sample image with 5 × 5 pixel patch-based structure: (a) the original image, (b) the image extracted using the microstructure, (c) the image extracted using the macrostructure based on the local median, and (d) the image extracted from the macrostructure using the local mean.*

#### **3.2 Circular-based topology**

Circular-based topology borrows the basic implementation of LBP which identifies a neighborhood as a set of pixels on a circular ring. In this topology, two levels of information are extracted from neighborhood at two different spatial resolutions. The first level of information is the microstructure information, which is extracted from a set of pixels on a circular ring with radius *R*1. Meanwhile, the macrostructure information is extracted from neighborhood pixels that lie on a circular ring of radius *R*2. The same BGP operator is used to extract information from the two different spatial resolutions with smaller spatial resolution that represents the microstructure and larger spatial resolution that represents the macrostructure. The visual illustration of circular-based topology implementation is presented in **Figure 9**. Circular rings with *R*1 and *R*2 represent the two different spatial resolutions (*P*;*R*). Assuming *R*1 is 1, running BGP descriptor on (8,1) neighborhood extracts the microstructure information of ROI. In this implementation, *R*2 is always larger than *R*1, and thus *R*2 must be set to any number >1.

**Figure 10(a)** shows a sample of image intensity that falls on circular rings *R*<sup>1</sup> and *R*2 with spatial resolution (8,1) and (24,3), respectively. In this example, the microstructure information is extracted from 8 pixels, while the macrostructure information is extracted from 24 pixels as shown in **Figure 10(b)**. Using the same method in BGP, principal and associated bits are calculated using Eqs. (2) and (3) by thresholding symmetric neighbors in multiple directions. The computed binary pairs are shown in **Figure 10(c)** with 4 and 12 principal bits generated from 8 and 24 neighbors, respectively. Finally, label for both micro- and macrostructures is computed using Eq. (4).

In BGP scheme, the length of histogram vector is equal to the number of neighbors at any spatial resolution. Similar to the patch-based topology, the generated histogram vector which embeds micro- and macrostructure information is concatenated to form a final representation of features for each ROI. The total length of histogram vector in this scheme can be computed using:

$$H\_t = \sum\_{k=1}^{N} \left( P\_1 + P\_2 \right)\_k \tag{7}$$

where *N* is the number of blocks and *P*1 and *P*2 are the number of neighborhood pixels on the circular rings of radius *R*1 and *R*2, respectively. For instance, if *R*1 = 2 and *R*2 = 4, features are exploited from 16 and 32 neighborhood pixels, respectively. Thus, the combination of the two spatial resolutions will produce a histogram vector with a length of 48 at each *kth* block. Resulting from this observation, *R*2 is set to

#### **Figure 9.**

*Circular-based topology.*

#### **Figure 10.**

*The microstructure information is devised from 8 pixels on the smaller ring, while the macrostructure information is devised from 24 pixels on the larger ring without any interpolation.*

**Figure 11.** *Circular-based feature extraction flow with* R*1 = 1 and* R*2 = 3.*

5 to limit the feature dimensionality of macrostructure to 40 because having larger spatial resolution will only increase the feature vector dimensionality. In contrast, *R*1 is limited to 4 because larger spatial resolution will prevent BGP operator from capturing micro edge and micro texture features which are mostly exploited from a smaller region.

**Figure 11** illustrates the general flow of feature extraction in the circular-based topology. Overall, this topology employs BGP operator on two different spatial resolutions, where the smaller resolution is for the microstructure information and the larger resolution is for the macrostructure information. In this research, no interpolation has been done to neighboring pixels where the circle does not fall

**147**

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary…*

exactly on the center of pixels. **Figure 12** presents a sample image that is extracted

*Sample image with* R*1 = 2 and* R*2 = 5 circular-based topology: (a) the original image, (b) the image extracted* 

*from the microstructure (*R*1 = 2), and (c) the image extracted from the macrostructure (*R*2 = 5).*

Similar to the patch-based topology, BGP captures the micro-oriented edges from the small structure while capturing less details of information at a much larger spatial resolution. But the combination of these two information will complement

To illustrate a real-world video surveillance system, the effectiveness of the proposed eBGP descriptor was evaluated using the Surveillance Camera Face (SCface) database [22]. The SCface database consists of low-resolution non-frontal face images taken by different camera quality. A series of experiments were planned to test all proposed topologies and structures on the SCface database. The performance of the proposed eBGP descriptor was evaluated against illumination, image

In fact, the SCface database is the most challenging database for face recognition, where its images were taken in uncontrolled indoor environment. The SCface database consists of 4160 images from 130 subjects. All images were taken at three distinct distances from the camera, where the cameras are installed at 2.25 m above the floor. Images were captured at distance 1 while the subject position is 4.20 m away from the camera, whereas for distances 2 and 3, the subject positions were at 2.60 and 1.00 m, respectively. The outdoor light was only the source of illumination, which came through a window on one side. The images were captured from five different quality commercial surveillance video cameras and two infrared night-vision cameras, in uncontrolled lighting so as to mimic the real-world conditions. Furthermore, full frontal mug shot image for each subject was captured using a high-quality photo camera with the capture conditions exactly the same as would be expected for any law enforcement. The high-quality photo camera for capturing visible light mug shots was installed the same way as the infrared camera but in a separate room with the standard indoor lighting, and it was equipped with adequate flash. In our experiments, the high-quality mug shot image of each person was used as a training gallery, while the remaining images from the five surveillance cameras and distances were used as test images, as depicted in **Figure 13**. With the focus of this research toward images in visible spectrum and single sample per person, especially for real-world surveillance system, the images taken from IR night-vision cameras and mug shot rotation were not used in this research. As preprocessing

quality, single sample per person, and real-world capture condition.

*DOI: http://dx.doi.org/10.5772/intechopen.86473*

from the two spatial resolutions *R*1 = 2 and *R*2 = 5.

**4. Results, discussion, and analysis**

**Figure 12.**

each other in providing a complete face representation.

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary… DOI: http://dx.doi.org/10.5772/intechopen.86473*

**Figure 12.**

*Visual Object Tracking with Deep Neural Networks*

**146**

**Figure 11.**

**Figure 10.**

**Figure 9.**

*Circular-based topology.*

smaller region.

*Circular-based feature extraction flow with* R*1 = 1 and* R*2 = 3.*

5 to limit the feature dimensionality of macrostructure to 40 because having larger spatial resolution will only increase the feature vector dimensionality. In contrast, *R*1 is limited to 4 because larger spatial resolution will prevent BGP operator from capturing micro edge and micro texture features which are mostly exploited from a

*The microstructure information is devised from 8 pixels on the smaller ring, while the macrostructure* 

*information is devised from 24 pixels on the larger ring without any interpolation.*

**Figure 11** illustrates the general flow of feature extraction in the circular-based topology. Overall, this topology employs BGP operator on two different spatial resolutions, where the smaller resolution is for the microstructure information and the larger resolution is for the macrostructure information. In this research, no interpolation has been done to neighboring pixels where the circle does not fall

*Sample image with* R*1 = 2 and* R*2 = 5 circular-based topology: (a) the original image, (b) the image extracted from the microstructure (*R*1 = 2), and (c) the image extracted from the macrostructure (*R*2 = 5).*

exactly on the center of pixels. **Figure 12** presents a sample image that is extracted from the two spatial resolutions *R*1 = 2 and *R*2 = 5.

Similar to the patch-based topology, BGP captures the micro-oriented edges from the small structure while capturing less details of information at a much larger spatial resolution. But the combination of these two information will complement each other in providing a complete face representation.

#### **4. Results, discussion, and analysis**

To illustrate a real-world video surveillance system, the effectiveness of the proposed eBGP descriptor was evaluated using the Surveillance Camera Face (SCface) database [22]. The SCface database consists of low-resolution non-frontal face images taken by different camera quality. A series of experiments were planned to test all proposed topologies and structures on the SCface database. The performance of the proposed eBGP descriptor was evaluated against illumination, image quality, single sample per person, and real-world capture condition.

In fact, the SCface database is the most challenging database for face recognition, where its images were taken in uncontrolled indoor environment. The SCface database consists of 4160 images from 130 subjects. All images were taken at three distinct distances from the camera, where the cameras are installed at 2.25 m above the floor. Images were captured at distance 1 while the subject position is 4.20 m away from the camera, whereas for distances 2 and 3, the subject positions were at 2.60 and 1.00 m, respectively. The outdoor light was only the source of illumination, which came through a window on one side. The images were captured from five different quality commercial surveillance video cameras and two infrared night-vision cameras, in uncontrolled lighting so as to mimic the real-world conditions. Furthermore, full frontal mug shot image for each subject was captured using a high-quality photo camera with the capture conditions exactly the same as would be expected for any law enforcement. The high-quality photo camera for capturing visible light mug shots was installed the same way as the infrared camera but in a separate room with the standard indoor lighting, and it was equipped with adequate flash. In our experiments, the high-quality mug shot image of each person was used as a training gallery, while the remaining images from the five surveillance cameras and distances were used as test images, as depicted in **Figure 13**. With the focus of this research toward images in visible spectrum and single sample per person, especially for real-world surveillance system, the images taken from IR night-vision cameras and mug shot rotation were not used in this research. As preprocessing

**Figure 13.**

*Sample images from the SCface database of distance 3: (a) the high-quality mug shot. (b–f) The images taken from five different surveillance cameras. (g and h) The images were taken from IR night-vision cameras.*

steps, all images in the SCface database were aligned based on the provided eye coordinates, so that the eyes' line lies on a straight line. The images were then scaled and cropped to 64x64 pixel as has been implemented in [22].

The performance of the proposed eBGP descriptor was evaluated using the histogram intersection, where the histogram intersection computes the similarity between two discretized probability distributions or histogram vectors. Given *H<sup>T</sup>* is the histogram vector of a training image reference and *H<sup>P</sup>* is the histogram vector of a probe image, each one containing *n* bins, the intersection between them is defined as follows:

$$H^T \cap H^P = \sum\_{j=1}^n \min\left(H\_j^T, H\_j^P\right) \tag{8}$$

where *HT* and *HP* are generated from distribution of labels computed from eBGP operator and the *min* function takes as arguments two values and return the smallest one. Any histogram pair that returns the highest intersection value based on Eq. (8) than any other pairs is considered to be matched and assigned to the label. By comparing this label against ground truth label, the recognition rate is determined by counting the occurrence of the correct label over the number of test images. Recognition rate is computed as follows:

$$\text{Recogntition rate (\%)} = \frac{N\_L}{N} \times 100\text{\%} \tag{9}$$

**149**

**Table 1.**

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary…*

focus on recognition rate improvement due to macrostructure information fusion. Hence, the recognition rate of the proposed eBGP descriptor and its baseline BGP descriptor were computed and compared to verify the recognition rate improvement. For comparative analysis, results of BGP descriptor on the SCface database are produced by running the BGP code requested from [14]. This is to ensure analysis of the result can be done without any concern on the validity of the results. In fact, Huang and Yin [14] do not use the SCface database in their work; thus BGP

As a preprocessing step, each image is first transformed into OIGM images using the same method used by the BGP descriptor. OIGM images are then divided into *N* numbers of non-overlapped blocks before applying eBGP descriptor, where *N* is set

For better presentation, several notations are used to describe the experiment setup and implementation. BGPM(*P*;*R*) is the implementation used in the BGP descriptor of spatial resolution (*P*,*R*), while eBGPM(*P*;*R*) is the implementation of the proposed eBGP descriptor with macrostructure information based on the patch-based topology. In this experiment, the patch-based topology uses the patch's

**Table 1** shows the performance of the proposed descriptor on the SCface database, where eBGPM(16;2) and eBGPM(8;1) represent the extended BGPM (eBGPM) with structures of **Figure 5(a**) and **Figure 5(b)**, respectively. Results of BGPM(16;2) and BGPM(8;1) represent the baseline descriptor. As mentioned before in this section, the images of SCface database were captured by five cameras with three different distances. **Table 1** shows the recognition rate results for each set and the average recognition rate over all cameras. The recognition rate for each set was

1 BGPM(8;1) 3.08 0.77 3.08 3.08 5.38 3.08

2 BGPM(8;1) 16.15 12.31 6.92 11.54 13.85 12.15

3 BGPM(8;1) 15.38 19.23 10.00 16.92 11.54 14.61

*Recognition rate (%) of the proposed eBGP descriptor on the SCface dataset using the patch-based topology.*

BGPM(16;1) 6.15 4.62 4.62 3.85 5.38 4.92 eBGPM(8;1) 4.62 1.54 4.62 3.85 6.15 4.16 eBGPM(16;1) 3.85 7.69 5.38 5.38 8.46 6.15

BGPM(16;1) 23.85 13.85 7.69 12.31 13.08 14.16 eBGPM(8;1) 20.77 13.85 10.77 16.92 16.15 15.69 eBGPM(16;1) 23.08 17.69 13.85 16.92 16.15 17.54

BGPM(16;1) 18.46 20.00 16.15 14.62 11.54 16.15 eBGPM(8;1) 19.23 17.69 11.54 17.69 13.08 15.85 eBGPM(16;1) 16.15 16.15 15.38 16.15 17.69 16.30

**1 2 3 4 5 Average**

median as a default scheme for the thresholding between patches.

**Distance Descriptor Camera**

*DOI: http://dx.doi.org/10.5772/intechopen.86473*

code was altered to work with the SCface database.

**4.1 Experiment settings and preprocessing**

**4.2 Results of patch-based topology**

calculated based on Eqs. (8) and (9).

to 16 in this research.

where *NL* is the total number of test images which are correctly matched and *N* is the total number of test images.

It is vital to stress that the classifier plays a decisive role in achieving better recognition rate. In this research, the experiments were dictated in such a way to *Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary… DOI: http://dx.doi.org/10.5772/intechopen.86473*

focus on recognition rate improvement due to macrostructure information fusion. Hence, the recognition rate of the proposed eBGP descriptor and its baseline BGP descriptor were computed and compared to verify the recognition rate improvement. For comparative analysis, results of BGP descriptor on the SCface database are produced by running the BGP code requested from [14]. This is to ensure analysis of the result can be done without any concern on the validity of the results. In fact, Huang and Yin [14] do not use the SCface database in their work; thus BGP code was altered to work with the SCface database.

#### **4.1 Experiment settings and preprocessing**

*Visual Object Tracking with Deep Neural Networks*

steps, all images in the SCface database were aligned based on the provided eye coordinates, so that the eyes' line lies on a straight line. The images were then scaled

*Sample images from the SCface database of distance 3: (a) the high-quality mug shot. (b–f) The images taken from five different surveillance cameras. (g and h) The images were taken from IR night-vision cameras.*

The performance of the proposed eBGP descriptor was evaluated using the histogram intersection, where the histogram intersection computes the similarity between two discretized probability distributions or histogram vectors. Given *H<sup>T</sup>*

a probe image, each one containing *n* bins, the intersection between them is defined

*j*=1 *n*

operator and the *min* function takes as arguments two values and return the smallest one. Any histogram pair that returns the highest intersection value based on Eq. (8) than any other pairs is considered to be matched and assigned to the label. By comparing this label against ground truth label, the recognition rate is determined by counting the occurrence of the correct label over the number of test images.

where *NL* is the total number of test images which are correctly matched and *N*

It is vital to stress that the classifier plays a decisive role in achieving better recognition rate. In this research, the experiments were dictated in such a way to

min(*Hj T* ,*Hj P*

are generated from distribution of labels computed from eBGP

is

is the histogram vector of

) (8)

*NLN* <sup>×</sup> 100% (9)

and cropped to 64x64 pixel as has been implemented in [22].

the histogram vector of a training image reference and *H<sup>P</sup>*

*H<sup>T</sup>* ∩ *H<sup>P</sup>* = ∑

and *HP*

Recognition rate is computed as follows:

is the total number of test images.

*Recognition rate* (%) = \_\_\_

**148**

as follows:

**Figure 13.**

where *HT*

As a preprocessing step, each image is first transformed into OIGM images using the same method used by the BGP descriptor. OIGM images are then divided into *N* numbers of non-overlapped blocks before applying eBGP descriptor, where *N* is set to 16 in this research.

#### **4.2 Results of patch-based topology**

For better presentation, several notations are used to describe the experiment setup and implementation. BGPM(*P*;*R*) is the implementation used in the BGP descriptor of spatial resolution (*P*,*R*), while eBGPM(*P*;*R*) is the implementation of the proposed eBGP descriptor with macrostructure information based on the patch-based topology. In this experiment, the patch-based topology uses the patch's median as a default scheme for the thresholding between patches.

**Table 1** shows the performance of the proposed descriptor on the SCface database, where eBGPM(16;2) and eBGPM(8;1) represent the extended BGPM (eBGPM) with structures of **Figure 5(a**) and **Figure 5(b)**, respectively. Results of BGPM(16;2) and BGPM(8;1) represent the baseline descriptor. As mentioned before in this section, the images of SCface database were captured by five cameras with three different distances. **Table 1** shows the recognition rate results for each set and the average recognition rate over all cameras. The recognition rate for each set was calculated based on Eqs. (8) and (9).


**Table 1.**

*Recognition rate (%) of the proposed eBGP descriptor on the SCface dataset using the patch-based topology.*

From **Table 1**, it can be seen that none of the descriptors achieved recognition rate higher than 35% over all cameras and distances. Particularly, the images of distance 1 recorded the lowest recognition rate with an average of 4.58%, while the images of distances 2 and 3 achieved better recognition rates with an average of 14.89 and 15.73%, respectively. **Table 1** also shows that eBGPM(8;1) slightly boosted up the performance comparable with BGPM(8;1) for all distances, where it attained the highest recognition rate over BGPM(8;1) on the distance 2 with an average recognition rate which equals to 3.54%. On the contrary, eBGPM(16;2) has a mix result with respect to its baseline BGPM(16;2); the performance drop can be observed from camera 1 gallery results, where distance 1, distance 2, and distance 3 show lower recognition rate comparable with the baseline descriptor. Similar to eBGPM(8;1), eBGPM(16;2) presented the highest recognition rate on distance 2 gallery images compared to those from distance 1 and distance 3. This is because the gallery images of distance 1, which have been acquired at 4.20 m distance, are low in resolution and small in size. Moreover, the process of scaling and cropping the images into 64 × 64 size leads to loss of the quality and some dominant features. On the other hand, the images of distance 3 are higher in quality and details. However, as the subjects are closer to the camera, which is installed at 2.25 m from the floor, in most natural head position, the upper half of the subject face is more dominant in the captured images as depicted by **Figure 14**. Figure 14 demonstrates that the images of distance 2 are slightly better in quality than the other two distances, but they still suffer from head position. This interprets the superiority of descriptors on this distance.

Due to these discouraging results by both the proposed eBGP descriptor and its baseline BGP, extra experiments were conducted on the SCface database. Since **Table 1** illustrated that the recognition rate is improved with increase of the spatial resolution, consequently the BGPM descriptor is first extended to larger spatial resolution of (24,3). Even though recognition rate increased by including the macrostructure in eBGP, the overall recognition rate is still too low for realistic applications. It might be because the structural pattern and OIGM image were

#### **Figure 14.**

*Samples of the SCface database: (a) training image mug shot and (b–d) test images captured by camera 2 at distances 1, 2, and 3, respectively. The upper row shows the original images, while the lower row shows the images after alignment, scaling, and cropping to 64 × 64.*

**151**

**Table 2.**

Type IIP has achieved.

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary…*

1 BGPM(24;3) 5.38 2.31 4.62 4.62 5.38 4.62

2 BGPM(24;3) 21.54 16.15 13.08 16.15 15.38 16.46

3 BGPM(24;3) 20.00 18.46 14.62 16.15 11.54 16.15

Type IP 3.85 6.92 4.62 6.92 3.85 6.00 Type IIP 10.77 6.92 6.92 5.38 10.77 8.31

Type IP 23.85 20.00 13.85 19.23 15.38 18.46 Type IIP 34.62 25.38 20.00 25.38 21.54 25.38

Type IP 16.92 16.92 14.62 16.92 16.15 16.31 Type IIP 22.31 23.08 15.38 23.85 16.92 20.31

**1 2 3 4 5 Average**

**Distance Descriptor Camera**

extracted from low-resolution and deformed images (after scaling and cropping have been done). Hence, two additional descriptors were then designed to investigate the effectiveness of structural patterns and OIGM image when exploiting the macrostructure information from low-resolution images. These descriptors still use BGPM in exploiting information from the microstructure, but they extract the

*Recognition rate (%) of BGPM(24;3). Type IP and Type IIP descriptors on the SCface database.*

The first additional descriptor, denoted as Type I*P* in **Table 2**, is equivalent to the eBGPM(16;2) descriptor with one exception. The structural pattern concepts are ignored, and all labels which are produced by (16,2) spatial resolution are assumed to hold some unique features. In this setup, all information from 16 labels are used to populate the histogram vector. This descriptor is designed to investigate if there is any other feature that may be discarded by the structural patterns when dealing with lowquality images. The second descriptor, denoted as Type IIP in **Table 2**, is designed to extract information from both OIGM and grayscale intensity images. This descriptor extracts the microstructure information from the OIGM image and the macrostructure information from the grayscale image. Type IIP descriptor is similar to the other proposed descriptor, where the local microstructure information is extracted from the central patch of ROI using BGPM(16;2). However, instead of using BGP operator

to assemble histogram vector from the macrostructure, a standard LBP8,1

is employed to extract the macrostructure information. The patch median of eight neighborhood patches is thresholded with the patch's median of the center patch, so as

but only 58 uniform patterns are kept for histogram fusion and the remaining are discarded. Histograms from both domains are concatenated and given equal weights. Results in **Table 2** expose that the Type IIP descriptor achieved better recognition rate than the rest of descriptors. The results also illustrate that Type IIP achieved better performance on images of distance 2 than those from distances 1 and 3. Furthermore, it is notable to mention that employing BGPM(24;3) at larger spatial resolution did not help much in improving the recognition rate as much as

As described in Section 3.2, the macrostructure information are exploited

from the outer circle which always has larger spatial resolution (*P;R*2) than

*u*2

*u*2

descriptor generates over 256 labels,

operator

*DOI: http://dx.doi.org/10.5772/intechopen.86473*

macrostructure information in a different way.

to produce a string of eight binaries or label. LBP8,1

**4.3 Results of circular-based topology**


*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary… DOI: http://dx.doi.org/10.5772/intechopen.86473*

#### **Table 2.**

*Visual Object Tracking with Deep Neural Networks*

From **Table 1**, it can be seen that none of the descriptors achieved recognition rate higher than 35% over all cameras and distances. Particularly, the images of distance 1 recorded the lowest recognition rate with an average of 4.58%, while the images of distances 2 and 3 achieved better recognition rates with an average of 14.89 and 15.73%, respectively. **Table 1** also shows that eBGPM(8;1) slightly boosted up the performance comparable with BGPM(8;1) for all distances, where it attained the highest recognition rate over BGPM(8;1) on the distance 2 with an average recognition rate which equals to 3.54%. On the contrary, eBGPM(16;2) has a mix result with respect to its baseline BGPM(16;2); the performance drop can be observed from camera 1 gallery results, where distance 1, distance 2, and distance 3 show lower recognition rate comparable with the baseline descriptor. Similar to eBGPM(8;1), eBGPM(16;2) presented the highest recognition rate on distance 2 gallery images compared to those from distance 1 and distance 3. This is because the gallery images of distance 1, which have been acquired at 4.20 m distance, are low in resolution and small in size. Moreover, the process of scaling and cropping the images into 64 × 64 size leads to loss of the quality and some dominant features. On the other hand, the images of distance 3 are higher in quality and details. However, as the subjects are closer to the camera, which is installed at 2.25 m from the floor, in most natural head position, the upper half of the subject face is more dominant in the captured images as depicted by **Figure 14**. Figure 14 demonstrates that the images of distance 2 are slightly better in quality than the other two distances, but they still suffer from head position. This interprets the superiority of descriptors on

Due to these discouraging results by both the proposed eBGP descriptor and its baseline BGP, extra experiments were conducted on the SCface database. Since **Table 1** illustrated that the recognition rate is improved with increase of the spatial resolution, consequently the BGPM descriptor is first extended to larger spatial resolution of (24,3). Even though recognition rate increased by including the macrostructure in eBGP, the overall recognition rate is still too low for realistic applications. It might be because the structural pattern and OIGM image were

**150**

**Figure 14.**

this distance.

*images after alignment, scaling, and cropping to 64 × 64.*

*Samples of the SCface database: (a) training image mug shot and (b–d) test images captured by camera 2 at distances 1, 2, and 3, respectively. The upper row shows the original images, while the lower row shows the* 

*Recognition rate (%) of BGPM(24;3). Type IP and Type IIP descriptors on the SCface database.*

extracted from low-resolution and deformed images (after scaling and cropping have been done). Hence, two additional descriptors were then designed to investigate the effectiveness of structural patterns and OIGM image when exploiting the macrostructure information from low-resolution images. These descriptors still use BGPM in exploiting information from the microstructure, but they extract the macrostructure information in a different way.

The first additional descriptor, denoted as Type I*P* in **Table 2**, is equivalent to the eBGPM(16;2) descriptor with one exception. The structural pattern concepts are ignored, and all labels which are produced by (16,2) spatial resolution are assumed to hold some unique features. In this setup, all information from 16 labels are used to populate the histogram vector. This descriptor is designed to investigate if there is any other feature that may be discarded by the structural patterns when dealing with lowquality images. The second descriptor, denoted as Type IIP in **Table 2**, is designed to extract information from both OIGM and grayscale intensity images. This descriptor extracts the microstructure information from the OIGM image and the macrostructure information from the grayscale image. Type IIP descriptor is similar to the other proposed descriptor, where the local microstructure information is extracted from the central patch of ROI using BGPM(16;2). However, instead of using BGP operator to assemble histogram vector from the macrostructure, a standard LBP8,1 *u*2 operator is employed to extract the macrostructure information. The patch median of eight neighborhood patches is thresholded with the patch's median of the center patch, so as to produce a string of eight binaries or label. LBP8,1 *u*2 descriptor generates over 256 labels, but only 58 uniform patterns are kept for histogram fusion and the remaining are discarded. Histograms from both domains are concatenated and given equal weights.

Results in **Table 2** expose that the Type IIP descriptor achieved better recognition rate than the rest of descriptors. The results also illustrate that Type IIP achieved better performance on images of distance 2 than those from distances 1 and 3. Furthermore, it is notable to mention that employing BGPM(24;3) at larger spatial resolution did not help much in improving the recognition rate as much as Type IIP has achieved.

#### **4.3 Results of circular-based topology**

As described in Section 3.2, the macrostructure information are exploited from the outer circle which always has larger spatial resolution (*P;R*2) than


#### **Table 3.**

*Circular-based topology on the SCface dataset at distance 1.*

(*P;R*1). In other words, more points are used for thresholding when extracting the macrostructure information. For the presentation purpose, S(*P*,*R*) *<sup>i</sup>* and S(*P*,*R*) *o* notations are used to represent the spatial resolution of inner circle and outer circle, respectively. In the circular-based topology, two types of descriptors are designed to evaluate the performance of this topology. Type Ic descriptor is similar to what has been discussed in Section 3.2. Learning from the results obtained based on the patch-based topology, Type IIc descriptor is designed to explore a fusion of texture extracted from grayscale image and OIGM image. This descriptor extracts the local microstructure information from the OIGM image and the macrostructure information from the grayscale image. The histograms generated from these two types of images are concatenated and given equal weights. In this topology, multiple combinations of spatial resolution of inner and outer circles are tested. By limiting *R*2 to 5, there are 10 combinations of descriptors at different spatial resolutions. Overall, there are 20 different combinations of descriptors that were put to the test.

Performance of Type Ic and Type IIc descriptors on the SCface dataset at distance 1, distance 2, and distance 3 is presented in **Tables 3, 4**, and **5**, respectively. Similar to the results obtained by the patch-based topology, the average recognition rate of the images that belong to distance 1 from all cameras is the lowest compared to

**153**

S(24,3)

**Table 4.**

S(24,3)

*<sup>i</sup>* and S(32,4)

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary…*

S(*P*,*R*) *<sup>i</sup>* S(*P*,*R*) *<sup>o</sup>* Type 1 2 3 4 5 Average (8,1) (16,2) Ic 20.77 12.31 10.00 11.54 14.62 13.85

(24,3) Ic 24.62 15.38 11.54 15.38 16.92 16.77

(32,4) Ic 26.92 17.69 15.38 17.69 13.85 18.31

(40,5) Ic 29.23 19.23 13.08 19.23 13.85 18.92

(32,4) Ic 25.38 18.46 13.85 13.85 13.85 17.08

(40,5) Ic 25.38 20.00 13.08 20.77 15.38 18.92

(40,5) Ic 22.31 17.69 14.62 16.15 14.62 17.08

BGPM(16;2) 23.85 13.85 7.69 12.31 13.08 14.16

(16,2) (24,3) Ic 26.15 16.15 11.54 13.08 15.38 16.46

(24,3) (32,4) Ic 20.77 18.46 12.31 13.85 14.62 16.00

(32,4) (40,5) Ic 22.31 16.92 13.85 17.69 13.85 16.92

Baseline BGPM(8;1) 16.15 12.31 6.92 11.54 13.85 12.15

IIc 25.38 19.23 15.38 17.69 14.62 18.46

IIc 25.38 21.54 16.15 19.23 14.62 19.38

IIc 23.85 19.23 16.92 18.46 15.38 18.77

IIc 23.08 19.23 15.38 17.69 16.92 18.46

IIc 25.38 22.31 16.15 21.54 19.23 20.92

IIc 24.62 21.54 17.69 21.54 20.00 21.08

IIc 24.62 20.77 16.92 20.00 17.69 20.00

IIc 28.46 24.62 16.92 20.77 20.77 22.31

IIc 28.46 23.85 15.38 16.15 16.92 20.15

IIc 25.38 25.38 16.92 20.00 16.15 20.77

**Circular eBGP Camera**

those from distance 2 and distance 3 as shown in **Table 3**. One noteworthy observation is that most of Type IIc descriptors at any spatial resolution achieved better recognition rate than Type Ic descriptors. Taking a closer look at the descriptor's performance in **Table 5**, Type IIc descriptor with spatial resolution of S(16,2)

*<sup>o</sup>* recorded the best results for all cameras on the test gallery of distance 3. On the other hand, for distance 2 test gallery, Type IIc descriptor with spatial resolution of

*<sup>o</sup>* achieved the best result against other combinations. For further evaluation, **Table 6** demonstrates results of the proposed eBGP descriptor compared with state-of-the-art descriptors such as PCA [27], SIFT and sparse representation-based classification (SRC) [28], and edge-preserving superresolution (SR) [29], on the SCface database at distance 2. All descriptors applied the same test conditions, where only one mug shot image per subject is used for training, while the remaining low-resolution images from all cameras are used as probe images. The results show that the proposed descriptors based on eBGP achieved the highest recognition rates compared to all other descriptors, especially eBGPM(16;2) (Type IIP) which has the best recognition rate over all camera images. Exploiting information from the macrostructure raised the BGPM results from the fifth highest to first. This indicates the importance of the macrostructure information in shaping

*Circular-based topology on the SCface dataset at distance 2.*

a complete face representation in single-reference face recognition problem.

*<sup>i</sup>* and

*DOI: http://dx.doi.org/10.5772/intechopen.86473*


*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary… DOI: http://dx.doi.org/10.5772/intechopen.86473*

#### **Table 4.**

*Visual Object Tracking with Deep Neural Networks*

**Circular eBGP Camera**

S(*P*,*R*) *<sup>i</sup>* S(*P*,*R*) *<sup>o</sup>* Type 1 2 3 4 5 Average (8,1) (16,2) Ic 5.38 3.85 3.85 3.08 4.62 4.12

(16,2) (24,3) Ic 5.38 4.62 6.15 3.85 6.15 5.23

(24,3) (32,4) Ic 5.38 3.85 6.92 3.85 6.15 5.23

(32,4) (40,5) Ic 4.62 5.38 6.15 2.31 7.69 5.23

Baseline BGPM(8;1) 3.08 0.77 3.08 3.08 5.38 3.08

(24,3) Ic 5.38 4.62 4.62 3.08 5.38 4.62

(32,4) Ic 5.38 6.15 5.38 6.15 6.15 5.84

(40,5) Ic 5.38 7.69 4.62 6.15 6.15 6.00

(32,4) Ic 6.15 6.92 7.69 4.62 6.15 6.31

(40,5) Ic 5.38 6.92 3.85 6.15 6.15 5.69

(40,5) Ic 6.15 6.15 5.38 2.31 6.92 5.38

BGPM(16;2) 6.15 4.62 4.62 3.85 5.38 4.92

IIc 6.92 6.92 6.92 6.15 7.69 6.92

IIc 7.69 5.38 6.92 7.69 6.92 6.92

IIc 6.92 6.15 6.92 8.46 6.15 6.92

IIc 9.23 7.69 6.92 7.69 6.15 7.54

IIc 6.92 6.92 5.38 6.15 7.69 6.61

IIc 10.00 6.92 3.85 7.69 7.69 7.23

IIc 8.46 7.69 6.15 8.46 6.92 7.54

IIc 12.31 7.69 7.69 9.23 6.92 8.77

IIc 10.77 8.46 9.23 9.23 4.62 8.46

IIc 9.23 7.69 10.00 10.77 5.38 8.61

(*P;R*1). In other words, more points are used for thresholding when extracting the

tions are used to represent the spatial resolution of inner circle and outer circle, respectively. In the circular-based topology, two types of descriptors are designed to evaluate the performance of this topology. Type Ic descriptor is similar to what has been discussed in Section 3.2. Learning from the results obtained based on the patch-based topology, Type IIc descriptor is designed to explore a fusion of texture extracted from grayscale image and OIGM image. This descriptor extracts the local microstructure information from the OIGM image and the macrostructure information from the grayscale image. The histograms generated from these two types of images are concatenated and given equal weights. In this topology, multiple combinations of spatial resolution of inner and outer circles are tested. By limiting *R*2 to 5, there are 10 combinations of descriptors at different spatial resolutions. Overall, there are 20 different combinations of descriptors that were

Performance of Type Ic and Type IIc descriptors on the SCface dataset at distance 1, distance 2, and distance 3 is presented in **Tables 3, 4**, and **5**, respectively. Similar to the results obtained by the patch-based topology, the average recognition rate of the images that belong to distance 1 from all cameras is the lowest compared to

*<sup>i</sup>* and S(*P*,*R*)

*o* nota-

macrostructure information. For the presentation purpose, S(*P*,*R*)

*Circular-based topology on the SCface dataset at distance 1.*

**152**

put to the test.

**Table 3.**

*Circular-based topology on the SCface dataset at distance 2.*

those from distance 2 and distance 3 as shown in **Table 3**. One noteworthy observation is that most of Type IIc descriptors at any spatial resolution achieved better recognition rate than Type Ic descriptors. Taking a closer look at the descriptor's performance in **Table 5**, Type IIc descriptor with spatial resolution of S(16,2) *<sup>i</sup>* and S(24,3) *<sup>o</sup>* recorded the best results for all cameras on the test gallery of distance 3. On the other hand, for distance 2 test gallery, Type IIc descriptor with spatial resolution of S(24,3) *<sup>i</sup>* and S(32,4) *<sup>o</sup>* achieved the best result against other combinations.

For further evaluation, **Table 6** demonstrates results of the proposed eBGP descriptor compared with state-of-the-art descriptors such as PCA [27], SIFT and sparse representation-based classification (SRC) [28], and edge-preserving superresolution (SR) [29], on the SCface database at distance 2. All descriptors applied the same test conditions, where only one mug shot image per subject is used for training, while the remaining low-resolution images from all cameras are used as probe images. The results show that the proposed descriptors based on eBGP achieved the highest recognition rates compared to all other descriptors, especially eBGPM(16;2) (Type IIP) which has the best recognition rate over all camera images. Exploiting information from the macrostructure raised the BGPM results from the fifth highest to first. This indicates the importance of the macrostructure information in shaping a complete face representation in single-reference face recognition problem.


#### **Table 5.**

*Circular-based topology on the SCface dataset at distance 3.*


#### **Table 6.**

*Comparison of recognition rate (%) of the proposed eBGP descriptor with state-of-the-art descriptors on the SCface database at distance 2.*

#### **5. Conclusion**

In this paper, an extended BGP (eBGP) descriptor, which incorporates macrostructure information into BGP descriptor, has been proposed to improve the overall descriptor performance in single-reference face recognition problem.

**155**

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary…*

Results obtained from a series of experiments on the SCface database showed that a fusion of information extracted from micro- and macrostructures is capable of boosting up the performance of BGP descriptor. The proposed eBGP descriptor was tested with the patch-based and circular-based topologies; in overall, the circular-based topology outperformed the patch-based topology in terms of recognition rate. In patch-based topology, 5 × 5 structure recorded better hike in recognition rate than 3 × 3 structure, while in circular-based topology, larger spatial resolution showed better hike in the recognition performance. Moreover, a fusion of micro- and macrostructure information extracted from OIGM and grayscale image, respectively, raised the recognition rate higher. In fact, Type IIc setup always illustrated a better performance boost than Type Ic. With regard to thresholding implementation, it is worth to mention that local mean is on par with the local median for the descriptor and does not offer additional boost in the

The authors highly acknowledge Universiti Sains Malaysia for its fund Universiti

, Samsul Setumin1,2, Abduljalil Radman1,3

1 Intelligent Biometric Group, School of Electrical and Electronics Engineering,

2 Faculty of Electrical Engineering, Universiti Teknologi MARA Pulau Pinang,

3 Faculty of Engineering and Information Technology, Taiz University, Taiz, Yemen

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

Sains Malaysia Research University Grant (RUI) no. 1001/PELECT/8014056.

*DOI: http://dx.doi.org/10.5772/intechopen.86473*

patch-based topology.

**Acknowledgements**

**Author details**

Nuzrul Fahmi Nordin1

and Shahrel Azmin Suandi1

Permatang Pauh, Malaysia

\*

Universiti Sains Malaysia, Nibong Tebal, Malaysia

\*Address all correspondence to: shahrel@usm.my

provided the original work is properly cited.

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary… DOI: http://dx.doi.org/10.5772/intechopen.86473*

Results obtained from a series of experiments on the SCface database showed that a fusion of information extracted from micro- and macrostructures is capable of boosting up the performance of BGP descriptor. The proposed eBGP descriptor was tested with the patch-based and circular-based topologies; in overall, the circular-based topology outperformed the patch-based topology in terms of recognition rate. In patch-based topology, 5 × 5 structure recorded better hike in recognition rate than 3 × 3 structure, while in circular-based topology, larger spatial resolution showed better hike in the recognition performance. Moreover, a fusion of micro- and macrostructure information extracted from OIGM and grayscale image, respectively, raised the recognition rate higher. In fact, Type IIc setup always illustrated a better performance boost than Type Ic. With regard to thresholding implementation, it is worth to mention that local mean is on par with the local median for the descriptor and does not offer additional boost in the patch-based topology.

## **Acknowledgements**

*Visual Object Tracking with Deep Neural Networks*

*Circular-based topology on the SCface dataset at distance 3.*

**Descriptor Camera**

**Circular eBGP Camera**

S(*P*,*R*) *<sup>i</sup>* S(*P*,*R*) *<sup>o</sup>* Type 1 2 3 4 5 Average (8,1) (16,2) Ic 20.77 21.54 13.85 15.38 13.85 17.08

(16,2) (24,3) Ic 20.77 20.77 13.08 17.69 13.08 17.08

(24,3) (32,4) Ic 17.69 16.15 13.85 17.69 9.23 14.92

(32,4) (40,5) Ic 16.15 15.38 13.08 18.46 10.00 14.61

Baseline BGPM(8;1) 15.38 19.23 10.00 16.92 11.54 14.61

PCA [27] 7.70 7.70 3.90 3.90 7.70 6.18 SIFT [28] 13.08 12.31 8.46 15.38 9.23 11.69 BGPM(16;2) 23.85 13.85 7.69 12.31 13.08 14.16 SRC [28] 29.23 16.15 12.31 25.38 13.08 19.23 Edge-preserving SR [29] 26.92 21.54 15.38 24.61 15.38 20.77 eBGPM(24;3)(32;4) (circular) 28.46 24.62 16.92 20.77 20.77 22.31 eBGPM(16;2) (Type IIP) 34.62 25.38 20.00 25.38 21.54 25.38

IIc 25.38 26.15 20.00 23.85 13.85 21.85

IIc 23.08 24.62 20.00 23.85 16.92 21.69

IIc 20.77 24.62 17.69 21.54 14.62 19.85

IIc 23.85 23.85 15.38 20.77 13.85 19.54

IIc 26.15 25.38 20.77 24.62 19.23 23.23

IIc 24.62 22.31 16.15 22.31 16.92 20.46

IIc 26.15 21.54 16.15 22.31 11.54 19.54

IIc 23.08 20.77 19.23 21.54 15.38 20.00

IIc 23.85 21.54 16.92 18.46 16.15 19.38

IIc 20.77 20.77 16.92 21.54 10.77 18.15

**1 2 3 4 5 Average**

(24,3) Ic 23.08 20.77 13.08 20.00 11.54 17.69

(32,4) Ic 20.00 21.54 14.62 17.69 11.54 17.08

(40,5) Ic 19.23 17.69 15.38 18.46 10.77 16.31

(32,4) Ic 20.77 18.46 16.15 19.23 10.00 16.92

(40,5) Ic 19.23 19.23 15.38 18.46 12.31 16.92

(40,5) Ic 20.00 16.15 13.85 19.23 10.77 16.00

BGPM(16;2) 18.46 20.00 16.15 14.62 11.54 16.15

**154**

**5. Conclusion**

*SCface database at distance 2.*

**Table 6.**

**Table 5.**

In this paper, an extended BGP (eBGP) descriptor, which incorporates macrostructure information into BGP descriptor, has been proposed to improve the overall descriptor performance in single-reference face recognition problem.

*Comparison of recognition rate (%) of the proposed eBGP descriptor with state-of-the-art descriptors on the* 

The authors highly acknowledge Universiti Sains Malaysia for its fund Universiti Sains Malaysia Research University Grant (RUI) no. 1001/PELECT/8014056.

## **Author details**

Nuzrul Fahmi Nordin1 , Samsul Setumin1,2, Abduljalil Radman1,3 and Shahrel Azmin Suandi1 \*

1 Intelligent Biometric Group, School of Electrical and Electronics Engineering, Universiti Sains Malaysia, Nibong Tebal, Malaysia

2 Faculty of Electrical Engineering, Universiti Teknologi MARA Pulau Pinang, Permatang Pauh, Malaysia

3 Faculty of Engineering and Information Technology, Taiz University, Taiz, Yemen

\*Address all correspondence to: shahrel@usm.my

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## **References**

[1] Radman A, Suandi SA. Robust face pseudo-sketch synthesis and recognition using morphologicalarithmetic operations and HOG-PCA. Multimedia Tools and Applications. 2018;**77**(19):25311-25332

[2] Matta F, Dugelay J-L. Person recognition using facial video information: A state of the art. Journal of Visual Languages and Computing. 2009;**20**(3):180-187

[3] De-la-Torre M, Granger E, Radtke PVW, Sabourin R, Gorodnichy DO. Partially-supervised learning from facial trajectories for face recognition in video surveillance. Information Fusion. 2015;**24**:31-53

[4] Zakaria Z, Suandi SA, Mohamad-Saleh J. Hierarchical skin-AdaBoostneural network (H-SKANN) for multi-face detection. Applied Soft Computing. 2018;**68**:172-190

[5] Lowe DG. Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision. 20-27 September; Kerkyra, Greece; 1999. pp. 1150-1157

[6] Bay H, Tuytelaars T, Van Gool L. Surf: Speeded up robust features. In: European Conference on Computer Vision. 7-13 May; Graz, Austria; 2006. pp. 404-417

[7] Ahonen T, Hadid A, Pietikainen M. Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006;**28**(12):2037-2041

[8] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: International Conference on Computer Vision and Pattern Recognition. 20-25 June; San Diego, CA; 2005. pp. 886-893

[9] Wold S, Esbensen K, Geladi P. Principal component analysis. Chemometrics and Intelligent Laboratory Systems. 1987;**2**(1-3):37-52

[10] Belhumeur PN, Hespanha JP, Kriegman DJ. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1997;**19**(7):711-720

[11] Bartlett MS, Movellan JR, Sejnowski TJ. Face recognition by independent component analysis. IEEE Transactions on Neural Networks. 2002;**13**(6):1450-1464

[12] Ren J, Jiang X, Yuan J. Noiseresistant local binary pattern with an embedded error-correction mechanism. IEEE Transactions on Image Processing. 2013;**22**(10):4049-4060

[13] Ojala T, Pietikäinen M, Mäenpää T. Gray scale and rotation invariant texture classification with local binary patterns. In: European Conference on Computer Vision. June 26-July 1; Dublin, Ireland; 2000. pp. 404-420

[14] Huang W, Yin H. Robust face recognition with structural binary gradient patterns. Pattern Recognition. 2017;**68**:126-140

[15] Liu L, Lao S, Fieguth PW, Guo Y, Wang X, Pietikäinen M. Median robust extended local binary pattern for texture classification. IEEE Transactions on Image Processing. 2016;**25**(3):1368-1381

[16] Ojala T, Pietikäinen M, Harwood D. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition. 1996;**29**(1):51-59

[17] Lee K-C, Ho J, Kriegman DJ. Acquiring linear subspaces for face

**157**

pp. 828-837

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary…*

Image and Vision Computing.

[27] Martínez AM, Kak AC. Pca versus lda. IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Hu X, Peng S, Wang L, Yang Z, Li Z. Surveillance video face recognition with single sample per person based on 3D modeling and blurring. Neurocomputing. 2017;**235**:46-58

[29] Mandal S, Thavalengal S, Sao AK. Explicit and implicit employment of edge-related information in superresolving distant faces for recognition. Pattern Analysis and Applications.

2012;**30**(2):86-99

2001;**23**(2):228-233

2016;**19**(3):867-884

*DOI: http://dx.doi.org/10.5772/intechopen.86473*

[18] Martinez AM. The AR face database.

Kanade T, Baker S. Multi-PIE. Image and Vision Computing. 2010;**28**(5):807-813

[20] Phillips PJ, Rizvi SA, Rauss PJ. The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Huang GB, Ramesh M, Berg T, Learned-Miller E. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. In: European Conference on Computer Vision Workshop on Faces in Real-life Images. October 200. pp.1-11

[22] Grgic M, Delac K, Grgic S. SCface—surveillance cameras face database. Multimedia Tools Applications. 2011;**51**(3):863-879

[23] Liu L, Fieguth P, Zhao G,

Pietikäinen M, Hu D. Extended local binary patterns for face recognition. Information Sciences. 2016;**358**:56-72

[24] Liao S, Zhu X, Lei Z, Zhang L, Li SZ. Learning multi-scale block local binary patterns for face recognition. In: International Conference on Biometrics. 27-29 August 2007; Seoul, Korea;.

[25] Liu L, Fieguth P, Guo Y, Wang X, Pietikäinen M. Local binary features for texture classification: Taxonomy and experimental study. Pattern Recognition. 2017;**62**:135-160

[26] Liu L, Zhao L, Long Y, Kuang G, Fieguth P. Extended local binary patterns for texture classification.

recognition under variable lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence.

2005;**27**(5):684-698

CVC Tech. Report. 1998. 24

2000;**22**(10):1090-1104

[19] Gross R, Matthews I, Cohn J,

*Extended Binary Gradient Pattern (eBGP): A Micro- and Macrostructure-Based Binary… DOI: http://dx.doi.org/10.5772/intechopen.86473*

recognition under variable lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005;**27**(5):684-698

[18] Martinez AM. The AR face database. CVC Tech. Report. 1998. 24

[19] Gross R, Matthews I, Cohn J, Kanade T, Baker S. Multi-PIE. Image and Vision Computing. 2010;**28**(5):807-813

[20] Phillips PJ, Rizvi SA, Rauss PJ. The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000;**22**(10):1090-1104

[21] Huang GB, Ramesh M, Berg T, Learned-Miller E. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. In: European Conference on Computer Vision Workshop on Faces in Real-life Images. October 200. pp.1-11

[22] Grgic M, Delac K, Grgic S. SCface—surveillance cameras face database. Multimedia Tools Applications. 2011;**51**(3):863-879

[23] Liu L, Fieguth P, Zhao G, Pietikäinen M, Hu D. Extended local binary patterns for face recognition. Information Sciences. 2016;**358**:56-72

[24] Liao S, Zhu X, Lei Z, Zhang L, Li SZ. Learning multi-scale block local binary patterns for face recognition. In: International Conference on Biometrics. 27-29 August 2007; Seoul, Korea;. pp. 828-837

[25] Liu L, Fieguth P, Guo Y, Wang X, Pietikäinen M. Local binary features for texture classification: Taxonomy and experimental study. Pattern Recognition. 2017;**62**:135-160

[26] Liu L, Zhao L, Long Y, Kuang G, Fieguth P. Extended local binary patterns for texture classification.

Image and Vision Computing. 2012;**30**(2):86-99

[27] Martínez AM, Kak AC. Pca versus lda. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2001;**23**(2):228-233

[28] Hu X, Peng S, Wang L, Yang Z, Li Z. Surveillance video face recognition with single sample per person based on 3D modeling and blurring. Neurocomputing. 2017;**235**:46-58

[29] Mandal S, Thavalengal S, Sao AK. Explicit and implicit employment of edge-related information in superresolving distant faces for recognition. Pattern Analysis and Applications. 2016;**19**(3):867-884

**156**

2005. pp. 886-893

*Visual Object Tracking with Deep Neural Networks*

[9] Wold S, Esbensen K, Geladi P. Principal component analysis. Chemometrics and Intelligent

[10] Belhumeur PN, Hespanha JP, Kriegman DJ. Eigenfaces vs. fisherfaces:

Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1997;**19**(7):711-720

[11] Bartlett MS, Movellan JR, Sejnowski TJ. Face recognition by independent component analysis. IEEE Transactions on Neural Networks.

[12] Ren J, Jiang X, Yuan J. Noiseresistant local binary pattern with an embedded error-correction mechanism. IEEE Transactions on Image Processing.

[13] Ojala T, Pietikäinen M, Mäenpää T. Gray scale and rotation invariant texture classification with local binary patterns. In: European Conference on Computer Vision. June 26-July 1; Dublin, Ireland;

[14] Huang W, Yin H. Robust face recognition with structural binary gradient patterns. Pattern Recognition.

[15] Liu L, Lao S, Fieguth PW, Guo Y, Wang X, Pietikäinen M. Median robust extended local binary pattern for texture classification. IEEE Transactions on Image Processing.

[16] Ojala T, Pietikäinen M, Harwood D.

A comparative study of texture measures with classification based on featured distributions. Pattern Recognition. 1996;**29**(1):51-59

[17] Lee K-C, Ho J, Kriegman DJ. Acquiring linear subspaces for face

2002;**13**(6):1450-1464

2013;**22**(10):4049-4060

2000. pp. 404-420

2017;**68**:126-140

2016;**25**(3):1368-1381

Laboratory Systems. 1987;**2**(1-3):37-52

[1] Radman A, Suandi SA. Robust face pseudo-sketch synthesis and recognition using morphologicalarithmetic operations and HOG-PCA. Multimedia Tools and

Applications. 2018;**77**(19):25311-25332

information: A state of the art. Journal of Visual Languages and Computing.

Radtke PVW, Sabourin R, Gorodnichy DO. Partially-supervised learning from facial trajectories for face recognition in video surveillance. Information Fusion.

[4] Zakaria Z, Suandi SA, Mohamad-Saleh J. Hierarchical skin-AdaBoostneural network (H-SKANN) for multi-face detection. Applied Soft Computing. 2018;**68**:172-190

[5] Lowe DG. Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision. 20-27 September; Kerkyra,

Greece; 1999. pp. 1150-1157

[6] Bay H, Tuytelaars T, Van Gool L. Surf: Speeded up robust features. In: European Conference on Computer Vision. 7-13 May; Graz, Austria; 2006. pp. 404-417

[7] Ahonen T, Hadid A, Pietikainen M. Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006;**28**(12):2037-2041

[8] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: International Conference on Computer Vision and Pattern

Recognition. 20-25 June; San Diego, CA;

[2] Matta F, Dugelay J-L. Person recognition using facial video

[3] De-la-Torre M, Granger E,

2009;**20**(3):180-187

2015;**24**:31-53

**References**

Chapter 8

Abstract

1. Introduction

159

Matrix Factorization on Complex

Viet-Hang Duong, Manh-Quan Bui and Jia-Ching Wang

Matrix factorization on complex domain is a natural extension of nonnegative matrix factorization, but it is still a very new trend in face recognition. In this chapter, we present two complex matrix factorization-based models for face recognition, in which the objective functions are the real-valued functions of complex variables. Our first model aims to build a learned base, which is embedded within original space. The second model finds the base whose volume is maximized. Experimental results on datasets with and without outliers show that our proposed

Keywords: complex matrix factorization, face recognition, nonnegative matrix

Face recognition is a central issue in computer vision and pattern recognition. The variations in lighting conditions, pose and viewpoint changes, facial expressions, makeup, aging, and occlusion are challenges that significantly affect recognition accuracy. Generally, the challenges in face recognition can be classified into

Illumination variations: The face of a person can appear dramatically different when illumination changes. This occurs because of spectra or source distribution and intensity changes. In practice, many two-dimensional (2D) methods show that recognition performance is notably decreased when illumination strongly occurs [1, 2]. Therefore, the problem of lighting variation is considered as one of the key challenges for face recognition system designer. Several methods have been proposed to handle variable illuminations such as extraction of illumination invariant features [3–7]; images with variable illuminations transformed to a canonical representation [8, 9]; modeling the illumination variations [10–11]; facial shapes and

Pose/viewpoint changes: Deformed face and self-occluded face usually occur by pose or viewpoint changes which affect the recognition process [13]. Generally, viewpoint face recognition approaches are divided into two categories: viewpointtransformed and cross-pose based [14]. Viewpoint transformed recognition methods aim to transform the probe image to match the gallery image in the pose, whereas cross-pose-based approaches attempt to estimate the light field of the face [15, 16]. Besides, other approaches integrated 2D and 3D information [17, 18] in

Domain for Face Recognition

algorithms are more effective than competitive algorithms.

factorization, projected gradient descent

four main categories as follows:

albedos are based on 3D face models [12].

order to cope with pose and illumination variations.

## Chapter 8

## Matrix Factorization on Complex Domain for Face Recognition

Viet-Hang Duong, Manh-Quan Bui and Jia-Ching Wang

### Abstract

Matrix factorization on complex domain is a natural extension of nonnegative matrix factorization, but it is still a very new trend in face recognition. In this chapter, we present two complex matrix factorization-based models for face recognition, in which the objective functions are the real-valued functions of complex variables. Our first model aims to build a learned base, which is embedded within original space. The second model finds the base whose volume is maximized. Experimental results on datasets with and without outliers show that our proposed algorithms are more effective than competitive algorithms.

Keywords: complex matrix factorization, face recognition, nonnegative matrix factorization, projected gradient descent

## 1. Introduction

Face recognition is a central issue in computer vision and pattern recognition. The variations in lighting conditions, pose and viewpoint changes, facial expressions, makeup, aging, and occlusion are challenges that significantly affect recognition accuracy. Generally, the challenges in face recognition can be classified into four main categories as follows:

Illumination variations: The face of a person can appear dramatically different when illumination changes. This occurs because of spectra or source distribution and intensity changes. In practice, many two-dimensional (2D) methods show that recognition performance is notably decreased when illumination strongly occurs [1, 2]. Therefore, the problem of lighting variation is considered as one of the key challenges for face recognition system designer. Several methods have been proposed to handle variable illuminations such as extraction of illumination invariant features [3–7]; images with variable illuminations transformed to a canonical representation [8, 9]; modeling the illumination variations [10–11]; facial shapes and albedos are based on 3D face models [12].

Pose/viewpoint changes: Deformed face and self-occluded face usually occur by pose or viewpoint changes which affect the recognition process [13]. Generally, viewpoint face recognition approaches are divided into two categories: viewpointtransformed and cross-pose based [14]. Viewpoint transformed recognition methods aim to transform the probe image to match the gallery image in the pose, whereas cross-pose-based approaches attempt to estimate the light field of the face [15, 16]. Besides, other approaches integrated 2D and 3D information [17, 18] in order to cope with pose and illumination variations.

Facial expression: Face recognition tasks are more challenging when dealing with emotional states of a person in an image. In addition, hairstyle or facial hair such as beard and mustache can change facial appearance. To handle with difficulties of expression, facial expression recognition (FER) systems, including static image FER [19–21], and dynamic sequence FER [22–24] are designed. In static-based methods, the spatial information from the current single image is extracted to obtain the feature representation. In contrary, the dynamic-based methods consider the temporal relation among adjacent frames in the sequence of input facial expression.

field by a mapping such that the distances of two data points in the original space and projection space are equivalent. Particularly, by transforming the real values of pixel intensive to complex domain, it is shown that the squared Frobenius norm of corresponding complex vectors and the cosine dissimilarity of real-valued vectors are equivalent. As a result, the real optimization problem with cosine divergence is replaced by optimizing a complex function with the Frobenius norm. Most of the mentioned CMF models were applied to facial expression and object recognition. In this chapter, we present two complex matrix factorization-based models for face recognition. In the following sections, we denote M-dimensional column vector

where R<sup>þ</sup> denotes the set of nonnegative real numbers. In the proposed models, the real data set Y is transformed to the complex domain, and the complex data matrix Z is factorized under imitating NMF frameworks. The contributions of this

structured complex matrix factorization (StCMF) and constrained complex

2. In complex domain, the updating rule for StCMF and CoCMF is derived based

3. A thorough experimental study on face recognition is conducted, the results show that the proposed StCMF and CoCMF yield better performance

where Dð Þ YkUV is a divergence function to measure the distance between

Most NMF techniques estimate the linear subspace of the given data by the Frobenius norm (F) or the generalized Kullback–Leibler (KL) divergence which

Dβð Þ¼ AkB ∑

The problem (1) is non-convex; thus, it may result in several local minimal solutions. To find an optimization solution, the iterative methods are commonly used.

<sup>F</sup> ¼ ∑ i,j

i,j

<sup>A</sup>ij log <sup>A</sup>ij Bij

1. The image analysis methods on the complex domain, which are called

N-observations; Y is expressed in the matrix form as Y ¼ y1; …; y<sup>N</sup>

matrix factorization (CoCMF), are proposed.

Matrix Factorization on Complex Domain for Face Recognition

compared to extensions of the real NMFs.

Assume that we are given an initial data matrix Y ∈ R<sup>M</sup>�<sup>N</sup>

min <sup>U</sup>, <sup>V</sup>

K≪ min{M, N}. NMF methods aim to find a basis matrix U ∈ R<sup>M</sup>�<sup>K</sup>

DFð Þ¼ <sup>A</sup>k<sup>B</sup> k k <sup>A</sup> � <sup>B</sup> <sup>2</sup>

β!0

DKLð Þ¼ AkB lim

<sup>þ</sup> to be an observed sample. Let Y be a dataset comprising of

<sup>þ</sup> , such that Y ≈ UV: The standard NMF is usually formu-

Dð Þ YkUV s:t:U ≥0, V ≥0 (1)

∈ RM�<sup>N</sup>

<sup>þ</sup> and a positive integer

<sup>A</sup>ij � <sup>B</sup>ij <sup>2</sup> (2)

� Aij þ Bij (3)

<sup>þ</sup> and a coding

<sup>þ</sup> ,

y ¼ y1; …; yM <sup>T</sup>

2. Background

variable matrix V ∈ R<sup>K</sup>�<sup>N</sup>

lated as an optimization:

have the following forms:

Y and UV.

161

∈ R<sup>M</sup>

DOI: http://dx.doi.org/10.5772/intechopen.85182

chapter are summarized as follows:

on gradient descent method.

2.1 Nonnegative matrix factorization

Occlusion: Faces may be partially occluded by other objects such as sunglasses, scarf [62], etc. Other situations of occlusion are some faces may be occluded by other faces of a group of people [25]. It is very difficult to be observed and recognized because the available part of the face is very small. Therefore, occlusion problems become harder and need to be solved in face recognition.

In face recognition, image representation (IR) techniques play an important role in improving the accuracy performance. Commonly, an IR system is to transform the input signal into a new representation which reduces its dimensionality and explicates its latent structures. Over the past decades, the subspace methods, such as principal component analysis (PCA) [26], linear discriminant analysis (LAD) [27, 28], and nonnegative matrix factorization (NMF) [29, 30] have been successfully used in feature extraction. In particularly, PCA is known as a powerful technique for dimensionality reduction and multivariate analysis. PCA seeks a linear combination of variables such that the maximum variance is extracted from the variables by projecting data onto an orthogonal base which is represented in the directions of largest variance. In image representation, eigenfaces (PCA) result in dense representations for facial images, which mainly applied the global structure of the whole facial image. Likewise, LAD finds a linear transformation that maximizes discrimination between classes.

NMF is known as an unsupervised data-driven approach in which all elements of the decomposed matrix and the obtained matrix factors are forced to be nonnegative. Furthermore, NMF is able to represent an object as various parts, for instance, a human face can be decomposed into eyes, lips, and other elements. In order to make NMF algorithms more efficient, one has proposed some constraints into the cost function such as sparsity [31, 32], orthogonally [33], discrimination [34], graph regularization [35, 36], and pixel dispersion penalty [37]. Additionally, proposing an appropriate distance metric for an NMF model plays an important role in enhancing the efficacy of the estimated linear subspace of the given data. NMF techniques commonly apply the squared Frobenius norm (Fr) or the generalized Kullback–Leibler (KL) divergence for the independent and identically distributed noise data. But in many cases, they produce an arbitrarily biased subspace when data is corrupted by outliers [38]. To overcome this drawback, L<sup>2</sup> and L<sup>1</sup> norms were proposed by Kong et al. [39] to obtain a robust NMF, in which the noise was assumed to follow the Laplacian distribution. Similarly, the earth mover's distance (EMD) and the Manhattan distance were also suggested in the work of Sandler et al. [40] and Guan et al. [41], respectively. A family of cost functions parameterized by a single shape parameter beta, called the beta-divergence [42], is commonly used on NMF approaches. Although NMFs are able to learn part-based representations and capture the Euclidean structure of high-dimensional data space, they are still limited to comprise the nonlinear sub-manifold structure behind the data.

Recently, matrix factorization techniques have been extended to complex matrix factorizations (CMFs) where the input data are complex matrices. These models have been obtaining promising results in facial expression recognition and data representation tasks [43–45]. The main idea of complex methods for face and facial expression recognition is that the original signal is projected on to the complex Matrix Factorization on Complex Domain for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85182

field by a mapping such that the distances of two data points in the original space and projection space are equivalent. Particularly, by transforming the real values of pixel intensive to complex domain, it is shown that the squared Frobenius norm of corresponding complex vectors and the cosine dissimilarity of real-valued vectors are equivalent. As a result, the real optimization problem with cosine divergence is replaced by optimizing a complex function with the Frobenius norm. Most of the mentioned CMF models were applied to facial expression and object recognition.

In this chapter, we present two complex matrix factorization-based models for face recognition. In the following sections, we denote M-dimensional column vector y ¼ y1; …; yM <sup>T</sup> ∈ R<sup>M</sup> <sup>þ</sup> to be an observed sample. Let Y be a dataset comprising of N-observations; Y is expressed in the matrix form as Y ¼ y1; …; y<sup>N</sup> ∈ RM�<sup>N</sup> <sup>þ</sup> , where R<sup>þ</sup> denotes the set of nonnegative real numbers. In the proposed models, the real data set Y is transformed to the complex domain, and the complex data matrix Z is factorized under imitating NMF frameworks. The contributions of this chapter are summarized as follows:


## 2. Background

Facial expression: Face recognition tasks are more challenging when dealing with emotional states of a person in an image. In addition, hairstyle or facial hair such as beard and mustache can change facial appearance. To handle with difficulties of expression, facial expression recognition (FER) systems, including static image FER [19–21], and dynamic sequence FER [22–24] are designed. In static-based methods, the spatial information from the current single image is extracted to obtain the feature representation. In contrary, the dynamic-based methods consider the temporal relation among adjacent frames in the sequence of input facial expression. Occlusion: Faces may be partially occluded by other objects such as sunglasses, scarf [62], etc. Other situations of occlusion are some faces may be occluded by other faces of a group of people [25]. It is very difficult to be observed and recognized because the available part of the face is very small. Therefore, occlusion

In face recognition, image representation (IR) techniques play an important role in improving the accuracy performance. Commonly, an IR system is to transform the input signal into a new representation which reduces its dimensionality and explicates its latent structures. Over the past decades, the subspace methods, such as principal component analysis (PCA) [26], linear discriminant analysis (LAD) [27, 28], and nonnegative matrix factorization (NMF) [29, 30] have been successfully used in feature extraction. In particularly, PCA is known as a powerful technique for dimensionality reduction and multivariate analysis. PCA seeks a linear combination of variables such that the maximum variance is extracted from the variables by projecting data onto an orthogonal base which is represented in the directions of largest variance. In image representation, eigenfaces (PCA) result in dense representations for facial images, which mainly applied the global structure of the whole facial image. Likewise, LAD finds a linear transformation that maxi-

NMF is known as an unsupervised data-driven approach in which all elements of the decomposed matrix and the obtained matrix factors are forced to be nonnegative. Furthermore, NMF is able to represent an object as various parts, for instance, a human face can be decomposed into eyes, lips, and other elements. In order to make NMF algorithms more efficient, one has proposed some constraints into the cost function such as sparsity [31, 32], orthogonally [33], discrimination [34], graph regularization [35, 36], and pixel dispersion penalty [37]. Additionally, proposing an appropriate distance metric for an NMF model plays an important role in enhancing the efficacy of the estimated linear subspace of the given data. NMF techniques commonly apply the squared Frobenius norm (Fr) or the generalized Kullback–Leibler (KL) divergence for the independent and identically distributed noise data. But in many cases, they produce an arbitrarily biased subspace when data is corrupted by outliers [38]. To overcome this drawback, L<sup>2</sup> and L<sup>1</sup> norms were proposed by Kong et al. [39] to obtain a robust NMF, in which the noise was assumed to follow the Laplacian distribution. Similarly, the earth mover's distance (EMD) and the Manhattan distance were also suggested in the work of Sandler et al. [40] and Guan et al. [41], respectively. A family of cost functions parameterized by a single shape parameter beta, called the beta-divergence [42], is commonly used on NMF approaches. Although NMFs are able to learn part-based representations and capture the Euclidean structure of high-dimensional data space, they are still lim-

ited to comprise the nonlinear sub-manifold structure behind the data.

Recently, matrix factorization techniques have been extended to complex matrix factorizations (CMFs) where the input data are complex matrices. These models have been obtaining promising results in facial expression recognition and data representation tasks [43–45]. The main idea of complex methods for face and facial expression recognition is that the original signal is projected on to the complex

problems become harder and need to be solved in face recognition.

mizes discrimination between classes.

Visual Object Tracking with Deep Neural Networks

160

#### 2.1 Nonnegative matrix factorization

Assume that we are given an initial data matrix Y ∈ R<sup>M</sup>�<sup>N</sup> <sup>þ</sup> and a positive integer K≪ min{M, N}. NMF methods aim to find a basis matrix U ∈ R<sup>M</sup>�<sup>K</sup> <sup>þ</sup> and a coding variable matrix V ∈ R<sup>K</sup>�<sup>N</sup> <sup>þ</sup> , such that Y ≈ UV: The standard NMF is usually formulated as an optimization:

$$\min\_{\mathbf{U},\mathbf{V}} D(\mathbf{Y} \| \mathbf{U} \mathbf{V}) \text{ s.t.}\\\mathbf{U} \ge \mathbf{0}, \mathbf{V} \ge \mathbf{0} \tag{1}$$

where Dð Þ YkUV is a divergence function to measure the distance between Y and UV.

Most NMF techniques estimate the linear subspace of the given data by the Frobenius norm (F) or the generalized Kullback–Leibler (KL) divergence which have the following forms:

$$D\_F(\mathbf{A}||\mathbf{B}) = ||\mathbf{A} - \mathbf{B}||\_F^2 = \sum\_{i,j} \left(\mathbf{A}\_{ij} - \mathbf{B}\_{ij}\right)^2\tag{2}$$

$$D\_{KL}(\mathbf{A}||\mathbf{B}) = \lim\_{\beta \to 0} D\_{\beta}(\mathbf{A}||\mathbf{B}) = \sum\_{i,j} \mathbf{A}\_{ij} \log \frac{\mathbf{A}\_{ij}}{\mathbf{B}\_{ij}} - \mathbf{A}\_{ij} + \mathbf{B}\_{ij} \tag{3}$$

The problem (1) is non-convex; thus, it may result in several local minimal solutions. To find an optimization solution, the iterative methods are commonly used.

Generally, there are three classes of algorithms for solving this problem including multiplicative update, gradient descent, and alternating nonnegative least squares algorithms. The most popular approach to solve (1) is the multiplicative update rules proposed by Lee and Seung [30]. For example, the iteratively updating rules of a Frobenius NMF cost function are given by

$$\mathbf{V}\_{\vec{\eta}}^{(t)} \leftarrow \mathbf{V}\_{\vec{\eta}}^{(t-1)} \frac{\left(\mathbf{U}^{(t-1)T}\mathbf{Y}\right)\_{\vec{\eta}}}{\mathbf{U}\_{\vec{\eta}}^{(t-1)^T} \left(\mathbf{U}^{(t-1)}\mathbf{V}\right)\_{\vec{\eta}}};\tag{4}$$

g y<sup>t</sup> � � � � �

> ffiffi 2

<sup>p</sup> <sup>e</sup> <sup>i</sup>απy<sup>t</sup> <sup>¼</sup> <sup>1</sup>

The nonlinear function h is to transform the real-valued features to complex feature space. In other words, a complex vector space with M-dimensions can be

It is proven that the cosine dissimilarity distance of a pair of data in the input real space equals to the Frobenius distance of the corresponding data in complex domain [47]. This observation is the first motivation of StCMF and CoCMF by mapping the samples into the complex space with a nonlinear mapping function

Any function of a complex variable <sup>z</sup> can be defined as f zð Þj<sup>z</sup>¼xþiy <sup>¼</sup> F xð Þ¼ ; <sup>y</sup>

Definition 1. Let Α ⊂ C be an open set. The function f : Α ! C is said to

A necessary condition for f being holomorphic is that the Cauchy-Riemann

<sup>∂</sup><sup>y</sup> ¼ � <sup>∂</sup><sup>V</sup>

statistical signal processing, the functions of interest are real-valued and have complex arguments z and hence are not analytic on complex plane. In this case we can use Wirtinger calculus [49], which writes the expansions in conjugate coordinate system by considering the function f(z) as a bivariate function f(z, z\*) and treating z

Definition 2. The pair of partial derivative operators for function f zð Þ¼ f z; <sup>z</sup> <sup>∗</sup> ð Þ

In case of real-valued function of complex variables, we also have one special

Lemma 1. The differential df of a real-valued functionf : Α ! R with complex

df <sup>¼</sup> 2Re <sup>∂</sup>f zð Þ

<sup>∂</sup><sup>z</sup> <sup>∗</sup> <sup>¼</sup> <sup>1</sup> 2 ∂f ∂x þ i ∂f ∂y

and <sup>∂</sup><sup>U</sup>

ffiffi 2 p

2 6 4

eiαπy<sup>t</sup> ð Þ1 ⋮ eiαπy<sup>t</sup> ð Þ M

<sup>2</sup> <sup>=</sup> �1 and x, <sup>y</sup><sup>∈</sup> <sup>R</sup>. Palka et al. [48] defined the complex

f zð Þ�f zð Þ <sup>0</sup>

<sup>z</sup>�z<sup>0</sup> which exists

<sup>∂</sup>x; otherwise, it is nonholomorphic. In

� � (12)

<sup>∂</sup><sup>z</sup> <sup>∗</sup> dz � � (13)

and <sup>h</sup> : <sup>R</sup><sup>M</sup> ! <sup>C</sup><sup>M</sup> is defined by

DOI: http://dx.doi.org/10.5772/intechopen.85182

2.4 Wirtinger calculus

U xð Þþ ; y iV xð Þ ; y , where i

differentiability as follows:

equations hold, that is, <sup>∂</sup><sup>U</sup>

and z\* as independent arguments.

∂f <sup>∂</sup><sup>z</sup> <sup>¼</sup> <sup>1</sup> 2

valued z∈ A ⊂ C can be expressed as

163

z<sup>t</sup> ¼ h y<sup>t</sup>

Matrix Factorization on Complex Domain for Face Recognition

regarded as a 2 M-dimensional real vector space.

� � <sup>¼</sup> <sup>1</sup>

h and performing matrix factorization in this complex feature space.

be differentiable at z<sup>0</sup> ∈ Α if there is a limit lim<sup>z</sup>!z<sup>0</sup>

<sup>∂</sup><sup>x</sup> <sup>¼</sup> <sup>∂</sup><sup>V</sup> ∂y

referred to as the Wirtinger derivative [49] is defined by

∂f <sup>∂</sup><sup>x</sup> � <sup>i</sup>

property which is useful for optimization theory described later.

∂f ∂y � �, <sup>∂</sup><sup>f</sup>

independently on the manner where z ! z0.

� ¼ 1 (10)

3 7

<sup>5</sup> (11)

$$\mathbf{U}\_{\vec{\boldsymbol{\eta}}}^{(t)} \leftarrow \mathbf{U}\_{\vec{\boldsymbol{\eta}}}^{(t-1)} \frac{\left(\mathbf{Y}\mathbf{V}^{(t-1)T}\right)\_{\vec{\boldsymbol{\eta}}}}{\left(\mathbf{U}^{(t-1)}\mathbf{V}\right)\_{\vec{\boldsymbol{\eta}}}\mathbf{V}\_{\vec{\boldsymbol{\eta}}}^{(t-1)^T}};\tag{5}$$

#### 2.2 The cosine divergence

Given the representations of two images, I<sup>t</sup> and I<sup>s</sup> are M-dimensional vectors yt, y<sup>s</sup> in the lexicographic order, respectively. First, y<sup>t</sup> , y<sup>s</sup> ∈ R<sup>M</sup> is normalized to get the values y<sup>t</sup> ð Þc , y<sup>s</sup> ð Þc ∈½ � 0; 1 , where c is the element vector index or the vector spatial location. The correlation between images I<sup>t</sup> and I<sup>s</sup> through the cosine dissimilarity between y<sup>t</sup> and ys, is introduced by

$$D\_C(\mathbf{y}\_t, \mathbf{y}\_s) = \sum\_{c=1}^{M} \left\{ \mathbf{1} \text{-} \cos \left( a\pi \mathbf{y}\_t(\mathbf{c}) \cdot a\pi \mathbf{y}\_s(\mathbf{c}) \right) \right\} \tag{6}$$

One of interesting properties of the cosine distance measurement is suppression outlier which is proved in [46]. The comparison between Frobenius norm and cosine divergence is showed in Figure 1. Liwiki et al. [46] show that the Frobenius distance between the original and the same subject is smaller; in contrary, a large distance between the original image and the image of a different person or occlusion image results from the cosine-based measure.

#### 2.3 Euler's formula and a space transformation

Let us consider two mappings: <sup>g</sup> : <sup>R</sup><sup>M</sup> ! <sup>R</sup><sup>2</sup><sup>M</sup> such that

$$\mathbf{g}\left(\mathbf{y}\_t\right) = \frac{1}{\sqrt{N}} \left[ \cos\left(\mathbf{y}\_t\right)^T \sin\left(\mathbf{y}\_t\right)^T \right]^T; \forall \mathbf{y}\_t \in \mathbb{R}^N \tag{7}$$

$$\text{where}\\
\qquad \cos(\mathbf{y}\_t) = \left[\cos(\mathbf{y}\_t(\mathbf{1})), \cos(\mathbf{y}\_t(\mathbf{2})), \dots, \cos(\mathbf{y}\_t(\mathbf{M}))\right]^T \tag{8}$$

$$\sin\left(\mathbf{y}\_t\right) = \left[\sin\left(\mathbf{y}\_t(\mathbf{1})\right), \sin\left(\mathbf{y}\_t(\mathbf{2})\right), \dots, \sin\left(\mathbf{y}\_t(M)\right)\right]^T \tag{9}$$

#### Figure 1.

Sample images for making comparison between dissimilarity measures.

Matrix Factorization on Complex Domain for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85182

$$\left\| \mathbf{g}(\mathbf{y}\_t) \right\| = 1 \tag{10}$$

and <sup>h</sup> : <sup>R</sup><sup>M</sup> ! <sup>C</sup><sup>M</sup> is defined by

Generally, there are three classes of algorithms for solving this problem including multiplicative update, gradient descent, and alternating nonnegative least squares algorithms. The most popular approach to solve (1) is the multiplicative update rules proposed by Lee and Seung [30]. For example, the iteratively updating rules of a

Uð Þ <sup>t</sup>�<sup>1</sup> <sup>T</sup>Y � �

ij <sup>U</sup>ð Þ <sup>t</sup>�<sup>1</sup> <sup>V</sup> � �

YVð Þ <sup>t</sup>�<sup>1</sup> <sup>T</sup> � �

ð Þc ∈½ � 0; 1 , where c is the element vector index or the vector spatial

Uð Þ <sup>t</sup>�<sup>1</sup> V � �

Given the representations of two images, I<sup>t</sup> and I<sup>s</sup> are M-dimensional vectors yt,

<sup>1</sup>‐ cos απ <sup>y</sup><sup>t</sup>

One of interesting properties of the cosine distance measurement is suppression

� �<sup>T</sup> sin <sup>y</sup><sup>t</sup> � �<sup>T</sup> h i<sup>T</sup>

ð Þ<sup>1</sup> � �;cos <sup>y</sup><sup>t</sup>

ð Þ<sup>2</sup> � �; …; sin <sup>y</sup><sup>t</sup> ð Þ <sup>M</sup> � � � � <sup>T</sup> (9)

outlier which is proved in [46]. The comparison between Frobenius norm and cosine divergence is showed in Figure 1. Liwiki et al. [46] show that the Frobenius distance between the original and the same subject is smaller; in contrary, a large distance between the original image and the image of a different person or occlusion

location. The correlation between images I<sup>t</sup> and I<sup>s</sup> through the cosine dissimilarity

M c¼1 Uð Þ <sup>t</sup>�<sup>1</sup> <sup>T</sup>

ij

ij

ijVð Þ <sup>t</sup>�<sup>1</sup> <sup>T</sup> ij

ð Þ<sup>c</sup> ‐απ <sup>y</sup><sup>s</sup>

ð Þ<sup>c</sup> � � � � (6)

ij

; (4)

; (5)

, y<sup>s</sup> ∈ R<sup>M</sup> is normalized to get the

; ∀y<sup>t</sup> ∈ R<sup>N</sup> (7)

ð Þ<sup>2</sup> � �; …;cos <sup>y</sup><sup>t</sup> ð Þ <sup>M</sup> � � � � <sup>T</sup> (8)

Frobenius NMF cost function are given by

Visual Object Tracking with Deep Neural Networks

2.2 The cosine divergence

ð Þc , y<sup>s</sup>

between y<sup>t</sup> and ys, is introduced by

values y<sup>t</sup>

Figure 1.

162

Vð Þ<sup>t</sup>

Uð Þ<sup>t</sup>

y<sup>s</sup> in the lexicographic order, respectively. First, y<sup>t</sup>

DC y<sup>t</sup> ; ys � � <sup>¼</sup> <sup>∑</sup>

image results from the cosine-based measure.

g y<sup>t</sup> � � <sup>¼</sup> <sup>1</sup>

ffiffiffiffi N p cos y<sup>t</sup>

� � <sup>¼</sup> sin <sup>y</sup><sup>t</sup>

Sample images for making comparison between dissimilarity measures.

� � <sup>¼</sup> <sup>c</sup>os <sup>y</sup><sup>t</sup>

ð Þ<sup>1</sup> � �; sin <sup>y</sup><sup>t</sup>

Let us consider two mappings:

where cos y<sup>t</sup>

sin y<sup>t</sup>

<sup>g</sup> : <sup>R</sup><sup>M</sup> ! <sup>R</sup><sup>2</sup><sup>M</sup> such that

2.3 Euler's formula and a space transformation

ij <sup>V</sup>ð Þ <sup>t</sup>�<sup>1</sup> ij

ij <sup>U</sup>ð Þ <sup>t</sup>�<sup>1</sup> ij

$$\mathbf{z}\_t = h\left(\mathbf{y}\_t\right) = \frac{1}{\sqrt{2}} e^{i\alpha \mathbf{x} \mathbf{y}\_t} = \frac{1}{\sqrt{2}} \begin{bmatrix} e^{i\alpha \mathbf{x} \mathbf{y}\_t(1)} \\ \vdots \\ e^{i\alpha \mathbf{x} \mathbf{y}\_t(M)} \end{bmatrix} \tag{11}$$

The nonlinear function h is to transform the real-valued features to complex feature space. In other words, a complex vector space with M-dimensions can be regarded as a 2 M-dimensional real vector space.

It is proven that the cosine dissimilarity distance of a pair of data in the input real space equals to the Frobenius distance of the corresponding data in complex domain [47]. This observation is the first motivation of StCMF and CoCMF by mapping the samples into the complex space with a nonlinear mapping function h and performing matrix factorization in this complex feature space.

#### 2.4 Wirtinger calculus

Any function of a complex variable <sup>z</sup> can be defined as f zð Þj<sup>z</sup>¼xþiy <sup>¼</sup> F xð Þ¼ ; <sup>y</sup> U xð Þþ ; y iV xð Þ ; y , where i <sup>2</sup> <sup>=</sup> �1 and x, <sup>y</sup><sup>∈</sup> <sup>R</sup>. Palka et al. [48] defined the complex differentiability as follows:

Definition 1. Let Α ⊂ C be an open set. The function f : Α ! C is said to be differentiable at z<sup>0</sup> ∈ Α if there is a limit lim<sup>z</sup>!z<sup>0</sup> f zð Þ�f zð Þ <sup>0</sup> <sup>z</sup>�z<sup>0</sup> which exists independently on the manner where z ! z0.

A necessary condition for f being holomorphic is that the Cauchy-Riemann equations hold, that is, <sup>∂</sup><sup>U</sup> <sup>∂</sup><sup>x</sup> <sup>¼</sup> <sup>∂</sup><sup>V</sup> ∂y and <sup>∂</sup><sup>U</sup> <sup>∂</sup><sup>y</sup> ¼ � <sup>∂</sup><sup>V</sup> <sup>∂</sup>x; otherwise, it is nonholomorphic. In statistical signal processing, the functions of interest are real-valued and have complex arguments z and hence are not analytic on complex plane. In this case we can use Wirtinger calculus [49], which writes the expansions in conjugate coordinate system by considering the function f(z) as a bivariate function f(z, z\*) and treating z and z\* as independent arguments.

Definition 2. The pair of partial derivative operators for function f zð Þ¼ f z; <sup>z</sup> <sup>∗</sup> ð Þ referred to as the Wirtinger derivative [49] is defined by

$$\frac{\partial \mathbf{f}}{\partial \mathbf{z}} = \frac{1}{2} \left( \frac{\partial \mathbf{f}}{\partial \mathbf{x}} - i \frac{\partial \mathbf{f}}{\partial \mathbf{y}} \right), \frac{\partial \mathbf{f}}{\partial \mathbf{z}^\*} = \frac{1}{2} \left( \frac{\partial \mathbf{f}}{\partial \mathbf{x}} + i \frac{\partial \mathbf{f}}{\partial \mathbf{y}} \right) \tag{12}$$

In case of real-valued function of complex variables, we also have one special property which is useful for optimization theory described later.

Lemma 1. The differential df of a real-valued functionf : Α ! R with complex valued z∈ A ⊂ C can be expressed as

$$df = 2\operatorname{Re}\left(\frac{\partial f(z)}{\partial z^\*} dz\right) \tag{13}$$

#### 3. Complex matrix factorization

Let the input data matrix Y = (Y1, Y2,…, YN) contain N data vectors as columns. As described in previous sections, the elements of real matrix Y are normalized and transformed into a complex number field to yield the complex data matrix Z. Two unconstraint and constrained optimization problems in an unordered complex field is introduced in the following sections, respectively.

#### 3.1 Structured complex matrix factorization (StCMF)

The idea of structured complex matrix factorization (StCMF) is to build a learned base which is embedded within original space. The basis matrix in StCMF is constructed by the linear combination of the complex training examples. Given the complex data matrix Z ∈ C<sup>M</sup>�N, StCMF factorizes Z into the encoding matrix <sup>V</sup> <sup>∈</sup> <sup>C</sup><sup>K</sup>�<sup>N</sup> and the exemplar-embed basis matrix <sup>U</sup> <sup>¼</sup> ZW where <sup>W</sup><sup>∈</sup> <sup>C</sup><sup>M</sup>�<sup>K</sup>: Therefore, the objective function of StCMF problem can be formulated as follows:

$$\min\_{\mathbf{W},\mathbf{V}} f\_{\text{SCMF}}(\mathbf{W},\mathbf{V}) = \min\_{\mathbf{W},\mathbf{V}} \frac{1}{2} \left\| \mathbf{Z} - \mathbf{Z}\mathbf{W}\mathbf{V} \right\|\_{F}^{2} \tag{14}$$

min

DOI: http://dx.doi.org/10.5772/intechopen.85182

Since 0 < det(W<sup>T</sup>

min

Input: Z, W

Output: v

165

2. Iterations, for k = 1, 2, …

<sup>W</sup>, <sup>V</sup> <sup>f</sup> CoCMFð Þ¼ <sup>W</sup>; <sup>V</sup> min

Matrix Factorization on Complex Domain for Face Recognition

penalty, and Eq. (17) can be written as the following form:

<sup>W</sup>, <sup>V</sup> <sup>f</sup> CoCMFð Þ¼ <sup>W</sup>; <sup>V</sup> min

s:t W∈ CM�K, V ∈ RK�<sup>N</sup>

s:t W∈ CM�K, V ∈ RK�<sup>N</sup>

<sup>W</sup>, <sup>V</sup> 1 2

3.3 Complex matrix factorization via projected gradient descent

problems (14) and (18) were solved by the following scheme:

min

min

Algorithm 1: Complex projected gradient (CPG) with Armijo rule

1. Initialize any feasible V0, 0 , β , 1, 0 , σ , 1

where <sup>α</sup><sup>k</sup> <sup>¼</sup> <sup>μ</sup>tk , tk is the first nonnegative integer such that

<sup>þ</sup> ∑ K i¼1

<sup>W</sup>, <sup>V</sup> 1 2

W) ≤ 1 holds under the assumptions 1<sup>T</sup>

model, in this work, the log-determinant function is exploited to modify the volume

k k <sup>Z</sup>�WV <sup>2</sup>

It can be seen that (12) and (16) are non-convex minimization problems with respect to both variables W and V, so they are impractical to obtain the optimal solution. These NP-hard problems can be tackled by applying the block coordinate descent (BCD) with two matrix blocks [53] to obtain a local solution. The specific

> V 1 2

> > V 1 2

<sup>þ</sup> , ∑ K i¼1

Then, W is updated based on the Moore-Penrose pseudoinverse [54], which is dented by † and W = (Z†Z)V† for Eq. (14) and W = ZV† for Eq. (18) with fixed V. Taking advanced of Wirtinger calculus, the gradient is evaluated in the forms

V<sup>k</sup>þ<sup>1</sup> ¼ P V<sup>k</sup> � αk∇<sup>V</sup> ½ � <sup>∗</sup> fð Þ W; V<sup>k</sup>

fð Þ� W; V<sup>k</sup>þ<sup>1</sup> fð Þ W; V<sup>k</sup> ≤ 2σRe ∇<sup>V</sup> f g h i <sup>∗</sup> fð Þ W; V<sup>k</sup> ; V<sup>k</sup>þ<sup>1</sup> � V<sup>k</sup>

Fixing W and solving the following one variable optimization problems

<sup>V</sup> <sup>f</sup> StCMF\_<sup>V</sup>ð Þ¼ <sup>V</sup> min

<sup>V</sup> <sup>f</sup> CoCMF\_<sup>V</sup>ð Þ¼ <sup>V</sup> min

s:t V ∈ R<sup>K</sup>�<sup>N</sup>

k k <sup>Z</sup>�WV <sup>2</sup>

Vij ¼ 1, and ∑

K i¼1

<sup>þ</sup> and ∑

F‐

K i¼1

k k <sup>Z</sup>�ZWV <sup>2</sup>

k k <sup>Z</sup>�WV <sup>2</sup>

Vij ¼ 1 ∀j

Wij 

Vij ¼ 1 ∀j

j j detð Þ W

<sup>F</sup>‐ log det <sup>W</sup><sup>T</sup><sup>W</sup> (18)

¼ 1∀j

<sup>F</sup> (19)

<sup>F</sup> (20)

ð Þ <sup>K</sup> � <sup>1</sup> ! (17)

W<sup>i</sup> = 1. To simply the

where k k: <sup>F</sup> denotes the Frobenius norm and K≪ min{N, M}

$$\begin{aligned} \text{Find } \|\mathbf{Z} - \mathbf{Z}\mathbf{W}\mathbf{V}\|\_{F}^{2} &= \text{Tr}(\mathbf{Z} - \mathbf{Z}\mathbf{W}\mathbf{V})^{H}(\mathbf{Z} - \mathbf{Z}\mathbf{W}\mathbf{V})\\ &= \text{Tr}\left(\mathbf{Z}^{H}\mathbf{Z} - \mathbf{V}^{H}\mathbf{W}^{H}\mathbf{Z}^{H}\mathbf{Z} - \mathbf{Z}^{H}\mathbf{Z}\mathbf{W}\mathbf{V} + \mathbf{V}^{H}\mathbf{W}^{H}\mathbf{Z}^{H}\mathbf{Z}\mathbf{W}\mathbf{V}\right) \end{aligned}$$

#### 3.2 Constrained complex matrix factorization (CoCMF)

Considering a dataset of N complex vectors Z = [Z1, Z2,…, ZN], each of Z<sup>i</sup> represents a data instance. The proposed CoCMF model decomposes Z into a product of two matrices W and V such that each instance Z<sup>i</sup> is a convex combination of latent components W. We call V and W the encoding matrix and the basis matrix, respectively. Geometrically, the data points Zi, i = 1, 2, ..., N all lie in or on the surface of a simplicial cone SW, whose vertices correspond to the columns of W and

$$\mathbf{S}\_{\mathbf{W}} = \left\{ \mathbf{z} : \mathbf{z} = \sum\_{i=1}^{K} \mathbf{W}\_{i} \mathbf{v}\_{i}; \mathbf{v}\_{i} \in \mathbb{R}\_{+} \right\} \tag{15}$$

Note that SW lies in the positive orthant and the volume of SW (Vol (SW)) is given by the following formula [48]:

$$Vol(\mathbf{S}\_{\mathbf{W}}) = \frac{|\det(\mathbf{W})|}{(K-1)!} \tag{16}$$

In [51], Zhou et al. illustrated that the small-cone constraint on the bases W will impose suitable sparseness on V. Inversely, the large-cone penalty will result in sparseness on the bases of factorization and the reconstruction errors on the training data, and the test data will be simultaneously decreased [50, 52]. Therefore, all observed data can be reconstructed by linearly combining the bases of a dictionary. Combining the goals of enlarging the volume of the simplex base, the constrained complex matrix factorization (CoCMF) problem is formulated as follows:

Matrix Factorization on Complex Domain for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85182

3. Complex matrix factorization

Visual Object Tracking with Deep Neural Networks

is introduced in the following sections, respectively.

min

and k k <sup>Z</sup>�ZWV <sup>2</sup>

given by the following formula [48]:

164

3.1 Structured complex matrix factorization (StCMF)

Let the input data matrix Y = (Y1, Y2,…, YN) contain N data vectors as columns. As described in previous sections, the elements of real matrix Y are normalized and transformed into a complex number field to yield the complex data matrix Z. Two unconstraint and constrained optimization problems in an unordered complex field

The idea of structured complex matrix factorization (StCMF) is to build a learned base which is embedded within original space. The basis matrix in StCMF is constructed by the linear combination of the complex training examples. Given the complex data matrix Z ∈ C<sup>M</sup>�N, StCMF factorizes Z into the encoding matrix <sup>V</sup> <sup>∈</sup> <sup>C</sup><sup>K</sup>�<sup>N</sup> and the exemplar-embed basis matrix <sup>U</sup> <sup>¼</sup> ZW where <sup>W</sup><sup>∈</sup> <sup>C</sup><sup>M</sup>�<sup>K</sup>: Therefore, the objective function of StCMF problem can be formulated as follows:

> W, V 1 2

k k <sup>Z</sup>�ZWV <sup>2</sup>

<sup>¼</sup> Tr <sup>Z</sup><sup>H</sup><sup>Z</sup> � <sup>V</sup><sup>H</sup>W<sup>H</sup>Z<sup>H</sup><sup>Z</sup> � <sup>Z</sup><sup>H</sup>ZWV <sup>þ</sup> <sup>V</sup><sup>H</sup>W<sup>H</sup>Z<sup>H</sup>ZWV

Wivi; v<sup>i</sup> ∈ R<sup>þ</sup>

j j detð Þ W

ð Þ <sup>K</sup> � <sup>1</sup> ! (16)

<sup>F</sup> (14)

(15)

<sup>W</sup>, <sup>V</sup> <sup>f</sup> StCMFð Þ¼ <sup>W</sup>; <sup>V</sup> min

where k k: <sup>F</sup> denotes the Frobenius norm and K≪ min{N, M}

SW ¼ z : z ¼ ∑

Vol Sð Þ¼ <sup>W</sup>

complex matrix factorization (CoCMF) problem is formulated as follows:

3.2 Constrained complex matrix factorization (CoCMF)

<sup>F</sup> <sup>¼</sup> Trð Þ <sup>Z</sup>�ZWV <sup>H</sup>ð Þ <sup>Z</sup>�ZWV

Considering a dataset of N complex vectors Z = [Z1, Z2,…, ZN], each of Z<sup>i</sup> represents a data instance. The proposed CoCMF model decomposes Z into a product of two matrices W and V such that each instance Z<sup>i</sup> is a convex combination of latent components W. We call V and W the encoding matrix and the basis matrix, respectively. Geometrically, the data points Zi, i = 1, 2, ..., N all lie in or on the surface of a simplicial cone SW, whose vertices correspond to the columns of W and

> K i¼1

Note that SW lies in the positive orthant and the volume of SW (Vol (SW)) is

In [51], Zhou et al. illustrated that the small-cone constraint on the bases W will impose suitable sparseness on V. Inversely, the large-cone penalty will result in sparseness on the bases of factorization and the reconstruction errors on the training data, and the test data will be simultaneously decreased [50, 52]. Therefore, all observed data can be reconstructed by linearly combining the bases of a dictionary. Combining the goals of enlarging the volume of the simplex base, the constrained

$$\min\_{\mathbf{W},\mathbf{V}} f\_{CoCMF}(\mathbf{W},\mathbf{V}) = \min\_{\mathbf{W},\mathbf{V}} \frac{1}{2} ||\mathbf{Z} - \mathbf{W}\mathbf{V}||\_F^2 \cdot \frac{|\det(\mathbf{W})|}{(K-1)!} \tag{17}$$
 
$$\text{s.t.}\\\mathbf{W} \in \mathbb{C}^{M \times K}, \mathbf{V} \in \mathbb{R}\_+^{K \times N} \text{ and } \sum\_{i=1}^K \mathbf{V}\_{ij} = \mathbf{1} \,\forall j$$

Since 0 < det(W<sup>T</sup> W) ≤ 1 holds under the assumptions 1<sup>T</sup> W<sup>i</sup> = 1. To simply the model, in this work, the log-determinant function is exploited to modify the volume penalty, and Eq. (17) can be written as the following form:

$$\min\_{\mathbf{W},\mathbf{V}} f\_{\text{CoCMF}}(\mathbf{W},\mathbf{V}) = \min\_{\mathbf{W},\mathbf{V}} \frac{1}{2} \|\mathbf{Z} - \mathbf{W}\mathbf{V}\|\_{F^\*}^2 \log\left(\det(\mathbf{W}^T\mathbf{W})\right) \tag{18}$$
 
$$\text{s.t.}\\\mathbf{W} \in \mathbb{C}^{M \times K}, \mathbf{V} \in \mathbb{R}\_+^{K \times N} \sum\_{i=1}^K \mathbf{V}\_{\vec{\eta}} = \mathbf{1}, \text{and } \sum\_{i=1}^K \left|\mathbf{W}\_{\vec{\eta}}\right| = \mathbf{1} \forall \mathbf{j}$$

#### 3.3 Complex matrix factorization via projected gradient descent

It can be seen that (12) and (16) are non-convex minimization problems with respect to both variables W and V, so they are impractical to obtain the optimal solution. These NP-hard problems can be tackled by applying the block coordinate descent (BCD) with two matrix blocks [53] to obtain a local solution. The specific problems (14) and (18) were solved by the following scheme:

Fixing W and solving the following one variable optimization problems

$$\min\_{\mathbf{V}} f\_{\text{StCMF}\_-V}(\mathbf{V}) = \min\_{\mathbf{V}} \frac{1}{2} \left\| \mathbf{Z} - \mathbf{Z} \mathbf{W} \mathbf{V} \right\|\_F^2 \tag{19}$$

$$\min\_{\mathbf{V}} f\_{\text{CoCMF}\_-V}(\mathbf{V}) = \min\_{\mathbf{V}} \frac{1}{2} \|\mathbf{Z} - \mathbf{W}\mathbf{V}\|\_F^2 \tag{20}$$
 
$$\text{s.t.} \; \mathbf{V} \in \mathbb{R}\_+^{K \times N}, \; \sum\_{i=1}^K \mathbf{V}\_{ij} = \mathbf{1} \; \forall j$$

Then, W is updated based on the Moore-Penrose pseudoinverse [54], which is dented by † and W = (Z†Z)V† for Eq. (14) and W = ZV† for Eq. (18) with fixed V. Taking advanced of Wirtinger calculus, the gradient is evaluated in the forms


Input: Z, W

Output: v

1. Initialize any feasible V0, 0 , β , 1, 0 , σ , 1

2. Iterations, for k = 1, 2, …

$$\mathbf{V}\_{k+1} = P[\mathbf{V}\_k - a\_k \nabla\_{\mathbf{V}} f(\mathbf{W}, \mathbf{V}\_k)]$$

where <sup>α</sup><sup>k</sup> <sup>¼</sup> <sup>μ</sup>tk , tk is the first nonnegative integer such that

fð Þ� W; V<sup>k</sup>þ<sup>1</sup> fð Þ W; V<sup>k</sup> ≤ 2σRe ∇<sup>V</sup> f g h i <sup>∗</sup> fð Þ W; V<sup>k</sup> ; V<sup>k</sup>þ<sup>1</sup> � V<sup>k</sup>

$$\nabla\_{\mathbf{V}} f\_{\text{SICMF}\_{-}V}(\mathbf{V}) = -\mathbf{W}^{H}\mathbf{Z}^{H}\mathbf{Z} + \mathbf{W}^{H}\mathbf{Z}^{H}\mathbf{Z}\mathbf{W}\mathbf{V} \tag{21}$$

$$\nabla\_{\mathbf{v}} f\_{\text{CaCMF}\_{-}V}(\overline{\mathbf{V}}) = \mathbf{W}^{H}\mathbf{W}\,\overline{\mathbf{V}} - \mathbf{W}^{H}\mathbf{Z} \tag{22}$$

We use the nearest neighbor (NN) classifier for all face recognition with/without occlusion experiments. The platform was a 3.0 GHz Pentium V with 1024 MB

For this case, in order to evaluate the performance of the proposed StCMF and CoCMF, we make the comparisons with seven representative algorithms, namely, NMF [29], P-NMF [57], P-NMF (Fr) [58], P-NMF (KL) [58], OPNMF (Fr) [59], OPNMF (KL) [59], NNDSVD-NMF [60], and GPNMF [60]. Different training numbers ranging from five to nine images were randomly chosen from each individual to construct the training set, and the rest images constitute the test set which was used to estimate the accuracy of face recognition [61]. The learning basic images in all selected algorithms are K = 40, and the mean recognition rate are

Table 1 shows the detailed recognition accuracies of compared algorithms. As can be seen, our algorithms significantly outperform the other algorithms in all the cases. Almost algorithms achieve the best accuracy when the number of training face images per class is eight exceptionally our proposed methods and GPNMF. Besides, there is the same trend between the number of training images and accu-

(Fr)

 90.85 90.30 86.5 84.5 82.4 83.7 85.0 80.0 79.0 43.0 91.75 92.25 87.5 84.4 85.81 85 84.4 83.0 82.0 39.3 91.17 94.75 87.5 83.3 87.33 85.6 85.9 84.4 80.0 36.8 93.75 93.88 88.75 88.75 88.5 88.8 88.0 84.3 83.0 40.8 97.50 95.50 92.5 85 90.75 87.25 87.5 84.0 83.0 42.3 Avg. 93.00 93.34 88.55 85.19 86.96 86.07 86.16 83.14 81.4 40.44

P-NMF (KL)

OPNMF (Fr)

OPNMF (KL)

NNDSVD-NMF

racy rate; that is, the lower training numbers lead to a decreasing rate of

StCMF CoCMF GPNMF NMF PNMF P-NMF

Face recognition accuracy on the ORL dataset with different train numbers.

RAM running Windows. Code was written in MATLAB.

Matrix Factorization on Complex Domain for Face Recognition

DOI: http://dx.doi.org/10.5772/intechopen.85182

4.2 Performance and comparison

Sample facial images from GT dataset [56].

4.2.1 Face recognition on ORL dataset

described in Table 1.

No. Trains

Table 1.

167

Figure 3.

$$\text{where } \overline{\mathbf{V}} = \left[ \frac{\mathbf{V}\_1}{||\mathbf{V}\_1||\_1}, \frac{\mathbf{V}\_2}{||\mathbf{V}\_2||\_1}, \dots, \frac{\mathbf{V}\_N}{||\mathbf{V}\_N||\_1} \right]; \overline{\mathbf{V}} \ge \mathbf{0} \tag{23}$$

We summarize the projected gradient method for optimizing (21) and (22) in Algorithm 1.

## 4. Experiments

To investigate the recognition performance of the proposed StCMF and CoCMF methods, we have conducted extensive experiments on the ORL dataset [55] and the Georgia Tech face dataset [56] in two scenarios for face recognitions including holistic face and key point occluded face.

First, we give brief description about the data collections and experiment setting. Second, the performance comparisons and corresponding results are shown.

#### 4.1 Datasets and experiment setting

The ORL dataset contains 400 grayscale images corresponding to 40 people's face. The images were captured at different times, under different lighting conditions, with different facial expression (open or close eyes, smiling or non-smiling) and facial details (glasses or no glasses). All the face images are manually aligned and cropped. For the computational efficiency, each cropped image is resized to 28 � 23 for face recognition without occlusion and 32 � 32 pixels for face recognition with occlusion. Figure 2 shows some instances of such random face on ORL dataset.

The Georgia Tech face dataset (GT) contains images of 50 people taken during 1999 and stored in JPEG format. For each individual, there are 15 color images captured at resolution of 640 � 480 pixels. Most of the images were taken in two different sessions to take into account the variations in illumination conditions, facial expression, and appearance. In our experiments, original images are normalized, cropped and scaled into 31 � 23 pixels, and finally converted into gray level images. Examples of GT dataset are shown in Figure 3.

Figure 2. Sample facial images from ORL dataset [55].

Matrix Factorization on Complex Domain for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85182

Figure 3. Sample facial images from GT dataset [56].

<sup>∇</sup><sup>V</sup> <sup>f</sup> StCMF\_Vð Þ¼� <sup>V</sup> <sup>W</sup><sup>H</sup>Z<sup>H</sup><sup>Z</sup> <sup>þ</sup> <sup>W</sup><sup>H</sup>Z<sup>H</sup>ZWV (21)

; …; <sup>V</sup><sup>N</sup> k k V<sup>N</sup> <sup>1</sup>

<sup>¼</sup> <sup>W</sup><sup>H</sup><sup>W</sup> <sup>V</sup> � <sup>W</sup><sup>H</sup><sup>Z</sup> (22)

; V ≥0 (23)

∇<sup>V</sup> f CoCMF\_<sup>V</sup> V

k k V<sup>1</sup> <sup>1</sup>

; <sup>V</sup><sup>2</sup> k k V<sup>2</sup> <sup>1</sup>

We summarize the projected gradient method for optimizing (21) and (22) in

To investigate the recognition performance of the proposed StCMF and CoCMF methods, we have conducted extensive experiments on the ORL dataset [55] and the Georgia Tech face dataset [56] in two scenarios for face recognitions including

First, we give brief description about the data collections and experiment setting. Second, the performance comparisons and corresponding results are shown.

The ORL dataset contains 400 grayscale images corresponding to 40 people's face. The images were captured at different times, under different lighting conditions, with different facial expression (open or close eyes, smiling or non-smiling) and facial details (glasses or no glasses). All the face images are manually aligned and cropped. For the computational efficiency, each cropped image is resized to 28 � 23 for face recognition without occlusion and 32 � 32 pixels for face recognition with occlusion. Figure 2 shows some instances of such random face on ORL

The Georgia Tech face dataset (GT) contains images of 50 people taken during 1999 and stored in JPEG format. For each individual, there are 15 color images captured at resolution of 640 � 480 pixels. Most of the images were taken in two different sessions to take into account the variations in illumination conditions, facial expression, and appearance. In our experiments, original images are normalized, cropped and scaled into 31 � 23 pixels, and finally converted into gray level

where <sup>V</sup> <sup>¼</sup> <sup>V</sup><sup>1</sup>

Visual Object Tracking with Deep Neural Networks

holistic face and key point occluded face.

4.1 Datasets and experiment setting

images. Examples of GT dataset are shown in Figure 3.

Algorithm 1.

dataset.

Figure 2.

166

Sample facial images from ORL dataset [55].

4. Experiments

We use the nearest neighbor (NN) classifier for all face recognition with/without occlusion experiments. The platform was a 3.0 GHz Pentium V with 1024 MB RAM running Windows. Code was written in MATLAB.

## 4.2 Performance and comparison

## 4.2.1 Face recognition on ORL dataset

For this case, in order to evaluate the performance of the proposed StCMF and CoCMF, we make the comparisons with seven representative algorithms, namely, NMF [29], P-NMF [57], P-NMF (Fr) [58], P-NMF (KL) [58], OPNMF (Fr) [59], OPNMF (KL) [59], NNDSVD-NMF [60], and GPNMF [60]. Different training numbers ranging from five to nine images were randomly chosen from each individual to construct the training set, and the rest images constitute the test set which was used to estimate the accuracy of face recognition [61]. The learning basic images in all selected algorithms are K = 40, and the mean recognition rate are described in Table 1.

Table 1 shows the detailed recognition accuracies of compared algorithms. As can be seen, our algorithms significantly outperform the other algorithms in all the cases. Almost algorithms achieve the best accuracy when the number of training face images per class is eight exceptionally our proposed methods and GPNMF. Besides, there is the same trend between the number of training images and accuracy rate; that is, the lower training numbers lead to a decreasing rate of


Table 1.

Face recognition accuracy on the ORL dataset with different train numbers.

recognition. StCMF achieves the best performance (97.50%) when the number of training samples is chosen largest. However, CoCMF achieves higher improvement in general.

It is observed that the above-selected algorithms employ a different kind of measurements such as Frobenius (Fr) and Kullback–Leibler (KL) and add more graph to regularize as well as adjust basic NMF to projective NMF. In a reprocessing image, centered aligning image technique is applied for other methods to enhance effective recognition rate that cannot be focused on our StCMF and CoCMF models. However, the best recognition rate of all obtained by our proposed CoCMF method which has extra regularizes term.

One of the difficulties in NMF is the estimation of the number of components or K. The choice of K results in a compromise between data fitting and model complexity; that is, a greater K leads to a better data approximation, but a smaller K makes a model being easier to estimate and fewer parameters to transmit. In almost NMFs, K is typically chosen such that it is larger than the estimated number of sources and follows the constraint ð Þ N þ M K ≪ NM. This limit of NMFs illustrated by the observation that among all results, the lowest rate belongs to NNDSVD-NMF, one NMF method utilizes SVD to get initialization which results from significant independency of NNDSVD-NMF on the number of bases K.

#### 4.2.2 Face recognition on GT dataset

Table 2 shows the recognition rates versus feature dimension by the competing methods on GT dataset. GT dataset exists with many challenging samples that are harder to recognize. Thus, the performance of all methods is lower than those of ORL dataset. In this dataset, the implement was done similarly as those in the previous section in choosing algorithms to compare as well as dividing randomly into two different sets, each containing a different number of testing and training images. In our experiments, we set K = 50 and range the number training being five odd numbers as {5, 7, 9, 11, 13}. The experimental results show that as the number of training images increases, the efficiency of the recognition system also increases. We can see that CoCMF method achieves the best performance and StCMF holds the second place in overall. All the methods obtain their best results when 13 training samples are used (the largest number of training sample in our experiment). In this case, the highest recognition rate belongs to the StCMF method again.

ORL database. In cropped 112 92 dimension test image gallery, occlusion was simulated by using a sheltering patch with different size ranges in set {10 10, 15 15, 20 20, 25 25, 30 30} and placed at random locations before resized in

Occluded face samples from ORL dataset with patch sizes of 15 15, 20 20, 25 25, 30 30, and

15 79.58 80.21 75.16 74.32 72.55 69.16 71.25 74.18 45.16 54.46 20 72.08 73.79 64.52 65.45 62.15 67.52 71.23 65.00 41.52 25.62 25 70.00 71.17 65.54 55.18 52.38 65.54 62.19 55.00 35.54 19.83 30 52.08 61.54 54.53 45.62 43.87 48.53 55.21 45.89 28.53 13.22 35 39.17 41.00 43.25 33.63 31.06 43.25 38.79 33.39 23.25 16.13 Avg. 62.58 65.54 60.60 54.84 52.40 58.80 59.73 54.69 34.80 25.85

(Fr)

P-NMF (KL)

OPNMF (Fr)

OPNMF (KL)

NNDSVD-NMF

In this experiment, we take randomly the training images with the ratio 4:6 for training/testing and test several times on each sort of percent of randomly occluded test image. Table 3 shows the detailed recognition accuracy on all selected algorithms and our proposed methods. It can be seen that the recognition rate of all methods is increased when the size of occlusion batch is decreased. Obviously, StCMF and CoCMF outperform other tested approaches even if occlusion. This

In this paper, we have proposed a new approach to complex matrix factorization to face recognition. Preliminary experimental results show that StCMF and CoCMF achieve promising results for face recognition by utilizing the robustness of cosinebased dissimilarity and extend the main spirits of NMF from real number field to complex field which adds flexible constraints for the real-valued function of complex variables. We have also noted how strong is the proficiency of StCMF as well as CoCMF on face recognition task. Our proposed methods are simple frameworks which do not need more complicated regularizes like NMFs in the real domain. We believe that this capability of proposed methods will be stable in other application tasks. In future work, three aspects of the proposed system will be centered on. First, we add more regularized rules into objective function to a range of further application such as speech and sound processing. Second, we employ other classifiers such as complex neural network or complex SVM to treat well the complexvalued feature. Last, kernel methods will be exploited in both feature extraction and classification of StCMF and CoCMF constructed paradigm to develop the perfor-

28 21. Figure 4 shows examples of occluded ORL images.

Face recognition accuracy on the occluded ORL image with different occlusion sizes.

StCMF CoCMF GPNMF NMF PNMF P-NMF

Matrix Factorization on Complex Domain for Face Recognition

DOI: http://dx.doi.org/10.5772/intechopen.85182

5. Summary and discussion

Figure 4.

35 35.

Table 3.

Occluded Size

mance of nonlinear contexts.

169

reveals that StCMF and CoCMF are more robust outlier than the other.

#### 4.2.3 Face recognition on occluded ORL images

For a more convincing experimental assessment of the power of our proposed models in occlusion processing, we test the performance on occluded images of


Table 2.

Face recognition accuracy on the GT dataset with different train numbers.

Matrix Factorization on Complex Domain for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85182

Figure 4.

recognition. StCMF achieves the best performance (97.50%) when the number of training samples is chosen largest. However, CoCMF achieves higher improvement

It is observed that the above-selected algorithms employ a different kind of measurements such as Frobenius (Fr) and Kullback–Leibler (KL) and add more graph to regularize as well as adjust basic NMF to projective NMF. In a reprocessing image, centered aligning image technique is applied for other methods to enhance effective recognition rate that cannot be focused on our StCMF and CoCMF models. However, the best recognition rate of all obtained by our proposed CoCMF method

One of the difficulties in NMF is the estimation of the number of components or K. The choice of K results in a compromise between data fitting and model complexity; that is, a greater K leads to a better data approximation, but a smaller K makes a model being easier to estimate and fewer parameters to transmit. In almost NMFs, K is typically chosen such that it is larger than the estimated number of sources and follows the constraint ð Þ N þ M K ≪ NM. This limit of NMFs illustrated by the observation that among all results, the lowest rate belongs to NNDSVD-NMF, one NMF method utilizes SVD to get initialization which results from signif-

Table 2 shows the recognition rates versus feature dimension by the competing methods on GT dataset. GT dataset exists with many challenging samples that are harder to recognize. Thus, the performance of all methods is lower than those of ORL dataset. In this dataset, the implement was done similarly as those in the previous section in choosing algorithms to compare as well as dividing randomly into two different sets, each containing a different number of testing and training images. In our experiments, we set K = 50 and range the number training being five odd numbers as {5, 7, 9, 11, 13}. The experimental results show that as the number of training images increases, the efficiency of the recognition system also increases. We can see that CoCMF method achieves the best performance and StCMF holds the second place in overall. All the methods obtain their best results when 13 training samples are used (the largest number of training sample in our experiment). In this case, the highest recognition rate belongs to the StCMF method again.

For a more convincing experimental assessment of the power of our proposed models in occlusion processing, we test the performance on occluded images of

 39.64 59.40 59.14 54.70 46.84 58.90 57.97 57.89 48.08 23.80 54.80 62.25 60.96 59.38 52.50 60.20 60.88 60.44 48.68 23.83 75.20 69.67 62.5 62.40 54.93 64.03 63.35 62.48 48.84 24.30 69.50 70.50 65.37 65.20 57.25 63.75 63.38 63.17 49.36 27.35 77.60 73.00 69.00 67.40 61.60 65.60 64.05 63.50 49.50 30.20 Avg. 63.35 66.96 63.39 61.82 54.63 62.50 61.93 61.50 48.90 25.90

(Fr)

P-NMF (KL)

OPNMF (Fr)

OPNMF (KL)

NNDSVD-NMF

icant independency of NNDSVD-NMF on the number of bases K.

in general.

which has extra regularizes term.

Visual Object Tracking with Deep Neural Networks

4.2.2 Face recognition on GT dataset

4.2.3 Face recognition on occluded ORL images

StCMF CoCMF GPNMF NMF PNMF P-NMF

Face recognition accuracy on the GT dataset with different train numbers.

No. Trains

Table 2.

168

Occluded face samples from ORL dataset with patch sizes of 15 15, 20 20, 25 25, 30 30, and 35 35.


Table 3.

Face recognition accuracy on the occluded ORL image with different occlusion sizes.

ORL database. In cropped 112 92 dimension test image gallery, occlusion was simulated by using a sheltering patch with different size ranges in set {10 10, 15 15, 20 20, 25 25, 30 30} and placed at random locations before resized in 28 21. Figure 4 shows examples of occluded ORL images.

In this experiment, we take randomly the training images with the ratio 4:6 for training/testing and test several times on each sort of percent of randomly occluded test image. Table 3 shows the detailed recognition accuracy on all selected algorithms and our proposed methods. It can be seen that the recognition rate of all methods is increased when the size of occlusion batch is decreased. Obviously, StCMF and CoCMF outperform other tested approaches even if occlusion. This reveals that StCMF and CoCMF are more robust outlier than the other.

## 5. Summary and discussion

In this paper, we have proposed a new approach to complex matrix factorization to face recognition. Preliminary experimental results show that StCMF and CoCMF achieve promising results for face recognition by utilizing the robustness of cosinebased dissimilarity and extend the main spirits of NMF from real number field to complex field which adds flexible constraints for the real-valued function of complex variables. We have also noted how strong is the proficiency of StCMF as well as CoCMF on face recognition task. Our proposed methods are simple frameworks which do not need more complicated regularizes like NMFs in the real domain. We believe that this capability of proposed methods will be stable in other application tasks. In future work, three aspects of the proposed system will be centered on. First, we add more regularized rules into objective function to a range of further application such as speech and sound processing. Second, we employ other classifiers such as complex neural network or complex SVM to treat well the complexvalued feature. Last, kernel methods will be exploited in both feature extraction and classification of StCMF and CoCMF constructed paradigm to develop the performance of nonlinear contexts.

## Acknowledgements

This research is partially supported by the Ministry of Science and Technology under Grant Number 108-2634-F-008-004 through Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan.

References

1519-1524

764-779

499-507

721-732

684-698

1691-1699

171

[1] Batur AU, Hayes MH. Segmented linear subspaces for illumination robust face recognition. International Journal of Computer Vision. 2004;57(1):49-66

DOI: http://dx.doi.org/10.5772/intechopen.85182

Matrix Factorization on Complex Domain for Face Recognition

[9] Shashua A, Tammy RR. The quotient image: Class-based rerendering and recognition with varying illuminations. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2001;23(2):

[10] Georghiades AS, Belhumeur PN, Kriegman DJ. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Ishiyama R, Sakamoto S. Geodesic illumination basis: compensating for illumination variations in any pose for face recognition. In: Proc. the 16th International Conference on Pattern Recognition. Vol. 4. 2002. pp. 297-301

[12] Gao W, Shan SG, Chai XJ, Fu XW. Virtual face generation for illumination and pose intensitive face recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 4. pp. 776-779

[13] Ho H, Chellappa R. Pose-invariant face recognition using Markov random fields. IEEE Transactions on Image Processing. 2013;22(4):1573-1584

[14] Blanz V, Grother P, Phillips PJ, Vetter T. Face recognition based on frontal views generated from nonfrontal images. In: IEEE Conf. Computer Vision and Pattern Recognition. 2005.

[15] Gross R, Matthews I, Baker S. Appearance-based face recognition and light-fields. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2004;26(4):449-465

[16] Wu G, Masia B, Jarabo A, Zhang Y, Wang L, Dai Q, et al. Light field image processing: An overview. IEEE Journal

pp. 454-461

129-139

2001;23(2):643-660

[2] Chen T, Yin W, Zhou X, Comaniciu D, Huang T. Total variation models for variable lighting face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006;28(9):

[3] Gao XY, Maylor KHL. Face

recognition using line edge map. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002;24(6):

[4] Guo B, Lam KM, Lin KH, Siu WC. Human face recognition based on spatially weighted Hausdorff distance. Pattern Recognition Letters. 2003;24:

[5] Adini Y, Moses Y, Ullman S. Face

Transactions on Pattern Analysis and Machine Intelligence. 2001;19(7):

[6] Lee KC, Ho J, Kriegman D. Acquiring linear subspaces for face recognition under variable lighting. IEEE

Transactions on Pattern Analysis and Machine Intelligence. 2005;27(5):

[7] Han H, Shan S, Chen X, Gao W. A comparative study on illumination preprocessing in face recognition. Pattern Recognition. 2013;46(6):

[8] Zhao W, Chellappa R. SFS based view synthesis for robust face recognition. In: Proc. the 4th Conference on Automatic Face and

Gesture Recognition. 2000

recognition: The problem of compensating for changes in illumination direction. IEEE

## Author details

Viet-Hang Duong<sup>1</sup> , Manh-Quan Bui2 and Jia-Ching Wang2,3\*

1 Faculty of Information Technology, BacLieu University, VietNam

2 Department of Computer Science Information Engineering, National Central University, Taiwan

3 Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan

\*Address all correspondence to: jcw@csie.ncu.edu.tw

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Matrix Factorization on Complex Domain for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85182

## References

Acknowledgements

Author details

Viet-Hang Duong<sup>1</sup>

University, Taiwan

170

Research (PAIR) Labs, Taiwan.

Visual Object Tracking with Deep Neural Networks

This research is partially supported by the Ministry of Science and Technology under Grant Number 108-2634-F-008-004 through Pervasive Artificial Intelligence

, Manh-Quan Bui2 and Jia-Ching Wang2,3\*

2 Department of Computer Science Information Engineering, National Central

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

1 Faculty of Information Technology, BacLieu University, VietNam

3 Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan

\*Address all correspondence to: jcw@csie.ncu.edu.tw

provided the original work is properly cited.

[1] Batur AU, Hayes MH. Segmented linear subspaces for illumination robust face recognition. International Journal of Computer Vision. 2004;57(1):49-66

[2] Chen T, Yin W, Zhou X, Comaniciu D, Huang T. Total variation models for variable lighting face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006;28(9): 1519-1524

[3] Gao XY, Maylor KHL. Face recognition using line edge map. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002;24(6): 764-779

[4] Guo B, Lam KM, Lin KH, Siu WC. Human face recognition based on spatially weighted Hausdorff distance. Pattern Recognition Letters. 2003;24: 499-507

[5] Adini Y, Moses Y, Ullman S. Face recognition: The problem of compensating for changes in illumination direction. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2001;19(7): 721-732

[6] Lee KC, Ho J, Kriegman D. Acquiring linear subspaces for face recognition under variable lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005;27(5): 684-698

[7] Han H, Shan S, Chen X, Gao W. A comparative study on illumination preprocessing in face recognition. Pattern Recognition. 2013;46(6): 1691-1699

[8] Zhao W, Chellappa R. SFS based view synthesis for robust face recognition. In: Proc. the 4th Conference on Automatic Face and Gesture Recognition. 2000

[9] Shashua A, Tammy RR. The quotient image: Class-based rerendering and recognition with varying illuminations. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2001;23(2): 129-139

[10] Georghiades AS, Belhumeur PN, Kriegman DJ. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2001;23(2):643-660

[11] Ishiyama R, Sakamoto S. Geodesic illumination basis: compensating for illumination variations in any pose for face recognition. In: Proc. the 16th International Conference on Pattern Recognition. Vol. 4. 2002. pp. 297-301

[12] Gao W, Shan SG, Chai XJ, Fu XW. Virtual face generation for illumination and pose intensitive face recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 4. pp. 776-779

[13] Ho H, Chellappa R. Pose-invariant face recognition using Markov random fields. IEEE Transactions on Image Processing. 2013;22(4):1573-1584

[14] Blanz V, Grother P, Phillips PJ, Vetter T. Face recognition based on frontal views generated from nonfrontal images. In: IEEE Conf. Computer Vision and Pattern Recognition. 2005. pp. 454-461

[15] Gross R, Matthews I, Baker S. Appearance-based face recognition and light-fields. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2004;26(4):449-465

[16] Wu G, Masia B, Jarabo A, Zhang Y, Wang L, Dai Q, et al. Light field image processing: An overview. IEEE Journal

of Selected Topics in Signal Processing. Special Issue on Light Field Image Processing. 2017

[17] Malassiotis S, Strintzis M. Robust face recognition using 2D and 3D data: Pose and illumination compensation. Pattern Recognition. 2005;38(12): 2537-2548

[18] Asthana A, Marks T, Jones M, Tieu K, Rohith M. Fully automatic poseinvariant face recognition via 3D pose normalization. IEEE International Conference on Computer Vision (ICCV 2011). 2011:937-944

[19] Shan C, Gong S, McOwan PW. Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing. 2009;27(6):803-816

[20] Liu P, Han S, Meng Z, Tong Y. Facial expression recognition via a boosted deep belief network. In: Proc. the IEEE Conference on Computer Vision and Pattern Recognition. 2014. pp. 1805-1812

[21] Mollahosseini A, Chan D, Mahoor MH. Going deeper in facial expression recognition using deep neural networks. In: Proc. IEEE Winter Conference on Applications of Computer Vision. 2016. pp. 1-10

[22] Zhao G, Pietikainen M. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007;29(6):915-928

[23] Jung H, Lee S, Yim J, Park S, Kim J. Joint fine-tuning in deep neural networks for facial expression recognition. In: Proc. IEEE International Conference on Computer Vision (ICCV). 2015. pp. 2983-2991

[24] Zhao X, Liang X, Liu L, Li T, Han Y, Vasconcelos N, et al. Peak-piloted deep network for facial expression

recognition. In: European Conference on Computer Vision. Springer; 2016. pp. 425-442

[35] Cai D, He XF, Wu X, Han JW. Non-

DOI: http://dx.doi.org/10.5772/intechopen.85182

Matrix Factorization on Complex Domain for Face Recognition

Complex matrix factorization for face recognition [Online]. 2016. Available from: https://arxiv.org/ftp/arxiv/pape

[44] Duong VH, Lee YS, Pham Ding JJ, Pham BT, Bui MQ, Bao PT, et al. Exemplar-embed complex matrix factorization for facial expression recognition. In: Proc the 42nd

International Conference on Acoustics, Speech and Signal Processing (ICASSP

[45] Duong VH, Bui MQ, Ding JJ, Lee YS, Pham BT, Bao PT, et al. A new approach of matrix factorization on

representation. IEICE Transactions on Information and Systems. 2017;E100-D

[47] Duong VH, Lee YS, Ding JJ, Pham BT, Bui MQ, Bao PT, et al. Projective complex matrix factorization for facial expression recognition. EURASIP Journal on Advances in Signal

[48] Palka BP. An Introduction to complex function theory. Springer; 1991

[49] Wirtinger. Wirtinger Zur formalin Theorie de Funktionen von mehr komplexen Ver anderlichen. Mathematische

[50] Strang G. Linear Algebra and Its Applications. 4th ed. Belmont, Ca: Thomson, Brooks/Cole; 2006

[51] Zhou G, Xie S, Yang Z, Yang JM, He Z. Minimum volume constrained nonnegative matrix factorization: enhanced ability of learning parts. IEEE Transactions on Neural Networks. 2011;

[46] Liwicki S, Tzimiropoulos G, Zafeiriou S, Pantic M. Euler principal component analysis. International Journal of Computer Vision. 2013;1:

complex domain for data

rs/1612/1612.02513.pdf

2017). 2017

(12):3059-3063

Processing. 2018;10

Annalen. 1927;97:357-375

22(10):1626-1637

498-518

negative matrix factorization on manifold. In: Proc. IEEE Int'l Data Mining (ICDM '08). 2008. pp. 63-72

[36] Cai D, He XF, Wu X, Han JW, Huang TS. Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011;33(8):1548-1560

[37] Duong VH, Lee YS, Pham BT, Mathulaprangsan S, Bao PT, Wang JC. Spatial dispersion constrained nmf for monaural source separation. In: Proc. the 10th International Symposium on Chinese Spoken Language Processing

[38] Cichocki A, Zdunek R, Amari S. Csisz'ar's divergences for non-negative matrix factorization: Family of new algorithms. In: Proc. Int. Conf.

Independent Component Analysis and Signal Separation. 2006. pp. 32-39

[39] Kong D, Ding C, Huang H. Robust nonnegative matrix factorization using L2,1 norm. In: Proc. ACM Int. Conf. Information and Knowledge Management. 2011. pp. 673-682

[40] Sandler R, Lindenbaum M. Nonnegative matrix factorization with earth mover's distance metric for image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41] Guan N, Tao D, Luo Z, Shawe-Taylor J. MahNMF: Manhattan nonnegative matrix factorization [Online]. 2012. Available from: http://arxiv.org/ab

[42] Cichocki A, Cruces S, Amari S. Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy. 2011;

[43] Duong VH, Lee YS, Pham BT, Mathulaprangsan S, Bao PT, Wang JC.

2011;33(8):1590-1602

s/1207.3438

13(1):134-170

173

(ICSLP). 2016

[25] Aleix MM. Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002;24(6):748-763

[26] Fukunaga K. Statistical Pattern Recognition. Acadamic; 1990

[27] Hyvarinen A, Karhunen J, Oja E. Independent Component Analysis. Wiley Interscience; 2001

[28] Belhumeur PN, Hespanha JP, Kriegman DJ. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1997;19(7):711-720

[29] Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature. 1999;401(6755): 755-791

[30] Lee DD, Seung HS. Algorithms for non-negative matrix factorization. In: Proc. NIPS. 2000. pp. 556-562

[31] Hoyer P. Non-negative sparse coding. In: Proc. IEEE Neural Networks for Signal Processing. 2002. pp. 557-565

[32] Hoyer P. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research. 2004;5:1457-1469

[33] Li H, Adal T, Wang W, Emge D, Cichocki A. NMF with orthogonality constraints and its application to Raman spectroscopy. VLSI. 2007;48:83-97

[34] Guan N, Tao D, Luo Z, Yuan B. Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Transactions on Image Processing. 2011;20(7): 2030-2048

Matrix Factorization on Complex Domain for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85182

[35] Cai D, He XF, Wu X, Han JW. Nonnegative matrix factorization on manifold. In: Proc. IEEE Int'l Data Mining (ICDM '08). 2008. pp. 63-72

of Selected Topics in Signal Processing. Special Issue on Light Field Image

Visual Object Tracking with Deep Neural Networks

recognition. In: European Conference on Computer Vision. Springer; 2016.

[25] Aleix MM. Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Transactions on

Pattern Analysis and Machine Intelligence. 2002;24(6):748-763

Wiley Interscience; 2001

755-791

[26] Fukunaga K. Statistical Pattern Recognition. Acadamic; 1990

[27] Hyvarinen A, Karhunen J, Oja E. Independent Component Analysis.

[28] Belhumeur PN, Hespanha JP, Kriegman DJ. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1997;19(7):711-720

[29] Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature. 1999;401(6755):

[30] Lee DD, Seung HS. Algorithms for non-negative matrix factorization. In:

Proc. NIPS. 2000. pp. 556-562

[31] Hoyer P. Non-negative sparse coding. In: Proc. IEEE Neural Networks for Signal Processing. 2002. pp. 557-565

[32] Hoyer P. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research. 2004;5:1457-1469

[33] Li H, Adal T, Wang W, Emge D, Cichocki A. NMF with orthogonality constraints and its application to Raman spectroscopy. VLSI. 2007;48:83-97

[34] Guan N, Tao D, Luo Z, Yuan B. Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Transactions on Image Processing. 2011;20(7):

2030-2048

pp. 425-442

[17] Malassiotis S, Strintzis M. Robust face recognition using 2D and 3D data: Pose and illumination compensation. Pattern Recognition. 2005;38(12):

[18] Asthana A, Marks T, Jones M, Tieu K, Rohith M. Fully automatic poseinvariant face recognition via 3D pose normalization. IEEE International Conference on Computer Vision (ICCV

[19] Shan C, Gong S, McOwan PW. Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing.

[20] Liu P, Han S, Meng Z, Tong Y. Facial expression recognition via a boosted deep belief network. In: Proc. the IEEE Conference on Computer Vision and Pattern Recognition. 2014.

[21] Mollahosseini A, Chan D, Mahoor MH. Going deeper in facial expression recognition using deep neural networks. In: Proc. IEEE Winter Conference on Applications of Computer Vision. 2016.

[22] Zhao G, Pietikainen M. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007;29(6):915-928

[23] Jung H, Lee S, Yim J, Park S, Kim J.

recognition. In: Proc. IEEE International

[24] Zhao X, Liang X, Liu L, Li T, Han Y, Vasconcelos N, et al. Peak-piloted deep

Joint fine-tuning in deep neural networks for facial expression

Conference on Computer Vision (ICCV). 2015. pp. 2983-2991

network for facial expression

Processing. 2017

2537-2548

2011). 2011:937-944

2009;27(6):803-816

pp. 1805-1812

pp. 1-10

172

[36] Cai D, He XF, Wu X, Han JW, Huang TS. Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011;33(8):1548-1560

[37] Duong VH, Lee YS, Pham BT, Mathulaprangsan S, Bao PT, Wang JC. Spatial dispersion constrained nmf for monaural source separation. In: Proc. the 10th International Symposium on Chinese Spoken Language Processing (ICSLP). 2016

[38] Cichocki A, Zdunek R, Amari S. Csisz'ar's divergences for non-negative matrix factorization: Family of new algorithms. In: Proc. Int. Conf. Independent Component Analysis and Signal Separation. 2006. pp. 32-39

[39] Kong D, Ding C, Huang H. Robust nonnegative matrix factorization using L2,1 norm. In: Proc. ACM Int. Conf. Information and Knowledge Management. 2011. pp. 673-682

[40] Sandler R, Lindenbaum M. Nonnegative matrix factorization with earth mover's distance metric for image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011;33(8):1590-1602

[41] Guan N, Tao D, Luo Z, Shawe-Taylor J. MahNMF: Manhattan nonnegative matrix factorization [Online]. 2012. Available from: http://arxiv.org/ab s/1207.3438

[42] Cichocki A, Cruces S, Amari S. Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy. 2011; 13(1):134-170

[43] Duong VH, Lee YS, Pham BT, Mathulaprangsan S, Bao PT, Wang JC. Complex matrix factorization for face recognition [Online]. 2016. Available from: https://arxiv.org/ftp/arxiv/pape rs/1612/1612.02513.pdf

[44] Duong VH, Lee YS, Pham Ding JJ, Pham BT, Bui MQ, Bao PT, et al. Exemplar-embed complex matrix factorization for facial expression recognition. In: Proc the 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017). 2017

[45] Duong VH, Bui MQ, Ding JJ, Lee YS, Pham BT, Bao PT, et al. A new approach of matrix factorization on complex domain for data representation. IEICE Transactions on Information and Systems. 2017;E100-D (12):3059-3063

[46] Liwicki S, Tzimiropoulos G, Zafeiriou S, Pantic M. Euler principal component analysis. International Journal of Computer Vision. 2013;1: 498-518

[47] Duong VH, Lee YS, Ding JJ, Pham BT, Bui MQ, Bao PT, et al. Projective complex matrix factorization for facial expression recognition. EURASIP Journal on Advances in Signal Processing. 2018;10

[48] Palka BP. An Introduction to complex function theory. Springer; 1991

[49] Wirtinger. Wirtinger Zur formalin Theorie de Funktionen von mehr komplexen Ver anderlichen. Mathematische Annalen. 1927;97:357-375

[50] Strang G. Linear Algebra and Its Applications. 4th ed. Belmont, Ca: Thomson, Brooks/Cole; 2006

[51] Zhou G, Xie S, Yang Z, Yang JM, He Z. Minimum volume constrained nonnegative matrix factorization: enhanced ability of learning parts. IEEE Transactions on Neural Networks. 2011; 22(10):1626-1637

[52] Liu T, Gong M, Tao D. Large-cone nonnegative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems. 2016. DOI: 10.1109/ TNNLS.2016.2574748

[53] Kim J, He Y, Park H. Algorithms for nonnegative matrix and tensor factorizations: A unified view based on block coordinate descent framework. Global Optimization. 2013;58:285-319

[54] Barata JCA, Hussein MS. The Moore-Penrose pseudoinverse: A tutorial review of the theory. Brazilian Journal of Physics. 2012;42:146-165

[55] The ORL Dataset of Face. Website: http://www.cl.cam.ac.uk/research/dtg/ attarchive/facedataset.html

[56] Dataset by Georgia Institute of Technology. Website: http://www.anef ian.com/research/facereco.html

[57] Lin CJ. Projected gradient methods for non-negative matrix factorization. Neural Computation. 2007;19:2756-2779

[58] Yang Z, Yuan Z, Laaksonen J. Projective non-negative matrix factorization with applications to facial image processing. International Journal of Pattern Recognition and Artificial Intelligence. 2007;21(8):1353-1362

[59] Yang Z, Oja E. Linear and nonlinear projective non-negative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems. 2010;21(5):734-749

[60] Boutsidis C, Gallopoulos E. SVD based initialization: A head start for nonnegative matrix factorization. Pattern Recognition. 2008;41(4): 1350-1362

[61] Liu Y, Jia C, Li B, Pang S, Yu Z. Graph regularized projective nonnegative matrix factorization for face recognition. Journal of Computer Information Systems. 2013;9(5): 2047-2055

[62] Sharif M, Sajjad M, Jawad JM, Younas JM, Mudassar R. Face recognition for disguised variations using gabor feature extraction. Australian Journal of Basic and Applied Sciences. 2011;5(6):1648-1656

Chapter 9

Network

better classification results.

classifier (LM-NN)

1. Introduction

175

Abstract

Granular Approach for

Recognizing Surgically Altered

Descriptors and Artificial Neural

This chapter presents a new technique called entropy volume-based scaleinvariant feature transform for correct face recognition post cosmetic surgery. The comparable features taken are the key points and volume of the Difference of Gaussian (DOG) structure for those points the information rate is confirmed. The information extracted has a minimum effect on uncertain changes in the face since the entropy is the higher-order statistical feature. Then the extracted corresponding entropy volume-based scale-invariant feature transform features are applied and provided to the support vector machine for classification. The normal scaleinvariant feature transform feature extracts the key points based on dissimilarity which is also known as the contrast of the image, and the volume-based scaleinvariant feature transform (V-SIFT) feature extracts the key points based on the volume of the structure. However, the EV-SIFT method provides both the contrast and volume information. Thus, EV-SIFT provides better performance when compared with principal component analysis (PCA), normal scale-invariant feature transform (SIFT), and V-SIFT-based feature extraction. Since it is well known that the artificial neural network (ANN) with Levenberg-Marquardt (LM) is a powerful computation tool for accurate classification, it is further used in this technique for

Keywords: face recognition, plastic surgery, scale-invariant feature transform, (SIFT) feature, EV-SIFT feature, Levenberg-Marquardt-based neural network

Human faces are multidimensional and complex visual stimuli, which contain useful information about the uniqueness of a person. Recognizing their faces used for security and authentication purposes has taken a new turn in the current era of computer image and vision analysis, for example, in monitoring applications, image recovery, man-machine interaction, and biometric authentication. Normally, the facial recognition system does not have the sense of touch or human interaction to

Face Images Using Keypoint

Archana Harsing Sable and Haricharan A. Dhirbasi

## Chapter 9

[52] Liu T, Gong M, Tao D. Large-cone nonnegative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems. 2016. DOI: 10.1109/

Visual Object Tracking with Deep Neural Networks

[62] Sharif M, Sajjad M, Jawad JM, Younas JM, Mudassar R. Face recognition for disguised variations using gabor feature extraction.

Sciences. 2011;5(6):1648-1656

Australian Journal of Basic and Applied

[53] Kim J, He Y, Park H. Algorithms for

factorizations: A unified view based on block coordinate descent framework. Global Optimization. 2013;58:285-319

[55] The ORL Dataset of Face. Website: http://www.cl.cam.ac.uk/research/dtg/

[56] Dataset by Georgia Institute of Technology. Website: http://www.anef ian.com/research/facereco.html

[57] Lin CJ. Projected gradient methods for non-negative matrix factorization. Neural Computation. 2007;19:2756-2779

factorization with applications to facial image processing. International Journal of Pattern Recognition and Artificial Intelligence. 2007;21(8):1353-1362

[59] Yang Z, Oja E. Linear and nonlinear

[60] Boutsidis C, Gallopoulos E. SVD based initialization: A head start for nonnegative matrix factorization. Pattern Recognition. 2008;41(4):

[61] Liu Y, Jia C, Li B, Pang S, Yu Z. Graph regularized projective nonnegative matrix factorization for face recognition. Journal of Computer Information Systems. 2013;9(5):

projective non-negative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems.

2010;21(5):734-749

1350-1362

2047-2055

174

[58] Yang Z, Yuan Z, Laaksonen J. Projective non-negative matrix

nonnegative matrix and tensor

[54] Barata JCA, Hussein MS. The Moore-Penrose pseudoinverse: A tutorial review of the theory. Brazilian Journal of Physics. 2012;42:146-165

attarchive/facedataset.html

TNNLS.2016.2574748
