Spatial Domain Representation for Face Recognition

Toshanlal Meenpal, Aarti Goyal and Moumita Mukherjee

## Abstract

Spatial domain representation for face recognition characterizes extracted spatial facial features for face recognition. This chapter provides a complete understanding of well-known and some recently explored spatial domain representations for face recognition. Over last two decades, scale-invariant feature transform (SIFT), histogram of oriented gradients (HOG) and local binary patterns (LBP) have emerged as promising spatial feature extraction techniques for face recognition. SIFT and HOG are effective techniques for face recognition dealing with different scales, rotation, and illumination. LBP is texture based analysis effective for extracting texture information of face. Other relevant spatial domain representations are spatial pyramid learning (SPLE), linear phase quantization (LPQ), variants of LBP such as improved local binary pattern (ILBP), compound local binary pattern (CLBP), local ternary pattern (LTP), three-patch local binary patterns (TPLBP), four-patch local binary patterns (FPLBP). These representations are improved versions of SIFT and LBP and have improved results for face recognition. A detailed analysis of these methods, basic results for face recognition and possible applications are presented in this chapter.

Keywords: spatial domain representation, face recognition, scale-invariant feature transform, histogram of oriented gradients, local binary patterns

### 1. Introduction

Face recognition is a powerful biometric system in today's highly technological world. It is widely accepted over other biometric systems like, finger print, iris or speech recognition for security, surveillance, and commercial applications. Face recognition system is generally a procedure of multiple major stages: face detection, preprocessing, feature extraction and verification. A complete structure of face recognition system is shown in Figure 1. Face detection detects a single face or number of faces present in a given image. Viola-Jones face detection algorithms using Haar features [1], faster R-CNN face detector [2], and face detection based on Histograms of Oriented Gradient [3] are popular methods for detecting faces in an image. Generally, images are captured under unconstrained environment and hence needed to be preprocessed before feeding to feature extraction stage. Preprocessing mainly aims to reduce noise effect, difference of illumination, color intensity, background, and orientation. The correct recognition of image depends upon quality of captured image, lighting condition etc. [4]. Recognition rate can be improved by performing pre-processing on the captured image. Various pre-processing

number of individuals with varying conditions and resolutions. A summary of

A detailed structure of some of these face databases are provided below.

A&T Database originally known as ORL database has face images captured in the interval April 1992 to April 1994. This database is collected by researchers of Cambridge University Engineering department for face recognition project. There are total 400 images in A&T database captured by taking 10 different images of 40 individuals. All images are captured in a dark homogeneous background with resolution 92 112 pixels. Different varying conditions under which images captured are- times, lighting, open eyes, closed eyes, smiling, not smiling, glasses, no glasses, some images also have rotation variation. This database has 40 different directories, each with 10 images of an individual stored as .pgm format. Samples of images of

CAS-PEAL-R1 Database is collected under sponsors of National Hi-Tech Program and ISVISION by the Face Recognition Group of JDL, ICT, CAS. This database contains 30,900 images of 1040 individuals captured under different conditions as such, variation in pose, facial expression, accessory, illumination, background, distance, and time. For pose variation, each of 1040 individuals has approximately 21 different poses. Facial expression is captured for 377 individuals with 6 different

Conditions Image

Lighting, Open eye, closed eyes, smiling, not smiling, glasses, no glasses

> Pose Facial expressions Accessory Illumination Background Distance Time

> Illumination Facial expressions

> Illumination Facial expressions Time

> Illumination Facial expressions

> > Illumination

68 Pose

1000 Pose

10 Pose

Resolution

92 112 400

360 480 30,900

640 486 41,368

256 384 14,051

640 480 52,000

640 480 5850

Images

benchmark face databases is tabulated in Table 1.

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

A&T database is shown in Figure 2.

Database No. of

A&T Database [7] 40

CAS-PEAL-R1 [8] 1040

CMU Multi-PIE Database [9]

Korean Face Database (KFDB) [11]

Yale Face Database B

Summary of benchmark face recognition databases.

[12]

Table 1.

115

individual

FERET [10] 1199 Pose

2.1 A&T Database

2.2 CAS-PEAL-R1

Figure 1.

A complete structure of face recognition system.

techniques are used in image processing to improve the recognition rate such as cropping, image resizing, histogram equalization and de-nosing filtering as described below.


Next is feature extraction which is considered as the most prominent stage in face recognition system to extract discriminative facial features. Extracted features are then represented as feature vector and are fed to verification stage. Feature selection is an optional stage before verification which reduces feature vector dimensions using dimensional reduction techniques [6]. Final stage is verification to identify an unknown by finding closest matching in gallery.

## 2. Existing face databases

There are a number of benchmark face databases for fair face recognition evaluation by researchers. These databases are designed with images or videos of a

number of individuals with varying conditions and resolutions. A summary of benchmark face databases is tabulated in Table 1.

A detailed structure of some of these face databases are provided below.

## 2.1 A&T Database

A&T Database originally known as ORL database has face images captured in the interval April 1992 to April 1994. This database is collected by researchers of Cambridge University Engineering department for face recognition project. There are total 400 images in A&T database captured by taking 10 different images of 40 individuals. All images are captured in a dark homogeneous background with resolution 92 112 pixels. Different varying conditions under which images captured are- times, lighting, open eyes, closed eyes, smiling, not smiling, glasses, no glasses, some images also have rotation variation. This database has 40 different directories, each with 10 images of an individual stored as .pgm format. Samples of images of A&T database is shown in Figure 2.

## 2.2 CAS-PEAL-R1

techniques are used in image processing to improve the recognition rate such as cropping, image resizing, histogram equalization and de-nosing filtering as

1. Face Detection and Cropping: - Face detection involves detecting face image from whole image. Cropping can be done based on one or more features of the

2. Image Resizing: - Variation in face image size, shape, pose etc. raises difficulty for designing face recognition algorithms. So it is very important to resize image before feature extraction. For this, face images are cropped again into a standard size. Affine transformation can be applied on face with Bilinear

3. Image Equalization: - Illumination variation problem in the original resized

4.Image De-noising and Filtering: - Raw images are captured with many noise during the time of capturing the image and later also. Wiener filter and median

Next is feature extraction which is considered as the most prominent stage in face recognition system to extract discriminative facial features. Extracted features are then represented as feature vector and are fed to verification stage. Feature selection is an optional stage before verification which reduces feature vector dimensions using dimensional reduction techniques [6]. Final stage is verification

There are a number of benchmark face databases for fair face recognition evaluation by researchers. These databases are designed with images or videos of a

image is overcome by using histogram equalization.

to identify an unknown by finding closest matching in gallery.

described below.

Figure 1.

image such as eyes, lips, nose etc.

A complete structure of face recognition system.

Visual Object Tracking with Deep Neural Networks

filter are used to remove noises [5].

Interpolation algorithm.

2. Existing face databases

114

CAS-PEAL-R1 Database is collected under sponsors of National Hi-Tech Program and ISVISION by the Face Recognition Group of JDL, ICT, CAS. This database contains 30,900 images of 1040 individuals captured under different conditions as such, variation in pose, facial expression, accessory, illumination, background, distance, and time. For pose variation, each of 1040 individuals has approximately 21 different poses. Facial expression is captured for 377 individuals with 6 different


#### Table 1.

Summary of benchmark face recognition databases.

Figure 2. Samples of images of A&T database with 10 varying conditions [7].

face recognition. Scale-invariant feature transform (SIFT), histogram of oriented gradients (HOG), and local binary patterns (LBP) are most commonly used spatial feature representations over past decade. Recently, other relevant feature representations, such as, spatial pyramid learning (SPLE), linear phase quantization (LPQ), variants of LBP such as improved local binary pattern (ILBP), compound local binary pattern (CLBP), local ternary pattern (LTP), three-patch local binary patterns (TPLBP), four-patch local binary patterns (FPLBP) are effectively used for

Histogram of oriented gradients (HOG) is introduced by Dalal et al. [13] in 2005 for human detection. HOG is an effective descriptor for face recognition by computing normalized histograms of face gradient orientations in dense grid [14]. Basically, HOG generates local appearance and shape of face rather than local intensity gradients. HOG is based on computation, fine orientation binning,

A detailed implementation for extracting HOG features for face recognition is

horizontal and vertical directions with the following masks:

and Dy ¼

Results for a sample facial image using horizontal (DxÞ and vertical Dy

1. Facial image is first divided into small regions called cells. For an image of size 64 � 64, overlapping cells of 8 � 8 pixels are obtained. Gradient directions over pixels are computed for each cell. Simple 1-D derivatives are used in

> �1 0 1

3 7

2 6 4

Dx ¼ �½ � 101 (1)

<sup>5</sup> (2)

� � deriva-

face recognition.

Figure 4.

given as:

117

3. Histogram of oriented gradients (HOG)

Samples of images of CMU Multi-PIE Database [9].

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

normalization and descriptor blocks.

tive masks are shown in Figure 5.

Figure 3. Samples of images of CAS-PEAL-R1 database [8].

expressions, similarly for accessory, 6 different images of 438 individuals with different accessory are used. Illumination variation has images of 233 individuals captured for minimum 10 and maximum 31 lighting variations. Background variation has images of 297 individuals for 2 to 4 different backgrounds. Further distance and time parameters have 296 and 66 individuals at an interval of 6-month. Samples of images of CAS-PEAL-R1 database are shown in Figure 3.

## 2.3 CMU Multi-PIE Database

CMU Multi-PIE Database is collected from October 2000 to December 2000 by taking 41,368 images of 68 individuals designed for 14 different poses, 43 illumination variation, and 4 different expressions. This database is known as CMU Multi-PIE by its varying conditions- pose, illumination, and expression. Image resolution is set to resolution 640 486 pixels. Samples of images of CMU Multi-PIE database is shown in Figure 4.

This chapter mainly focuses on feature extraction stage in face recognition. It presents some well-known and recently explored spatial domain representations for

Figure 4. Samples of images of CMU Multi-PIE Database [9].

face recognition. Scale-invariant feature transform (SIFT), histogram of oriented gradients (HOG), and local binary patterns (LBP) are most commonly used spatial feature representations over past decade. Recently, other relevant feature representations, such as, spatial pyramid learning (SPLE), linear phase quantization (LPQ), variants of LBP such as improved local binary pattern (ILBP), compound local binary pattern (CLBP), local ternary pattern (LTP), three-patch local binary patterns (TPLBP), four-patch local binary patterns (FPLBP) are effectively used for face recognition.

## 3. Histogram of oriented gradients (HOG)

Histogram of oriented gradients (HOG) is introduced by Dalal et al. [13] in 2005 for human detection. HOG is an effective descriptor for face recognition by computing normalized histograms of face gradient orientations in dense grid [14]. Basically, HOG generates local appearance and shape of face rather than local intensity gradients. HOG is based on computation, fine orientation binning, normalization and descriptor blocks.

A detailed implementation for extracting HOG features for face recognition is given as:

1. Facial image is first divided into small regions called cells. For an image of size 64 � 64, overlapping cells of 8 � 8 pixels are obtained. Gradient directions over pixels are computed for each cell. Simple 1-D derivatives are used in horizontal and vertical directions with the following masks:

$$D\_{\mathfrak{x}} = [-\mathbf{1} \,\mathbf{0} \,\mathbf{1}] \tag{1}$$

$$\mathbf{B} \text{ and } D\_{\mathcal{V}} = \begin{bmatrix} -\mathbf{1} \\ \mathbf{0} \\ \mathbf{1} \end{bmatrix} \tag{2}$$

Results for a sample facial image using horizontal (DxÞ and vertical Dy � � derivative masks are shown in Figure 5.

expressions, similarly for accessory, 6 different images of 438 individuals with different accessory are used. Illumination variation has images of 233 individuals captured for minimum 10 and maximum 31 lighting variations. Background variation has images of 297 individuals for 2 to 4 different backgrounds. Further distance and time parameters have 296 and 66 individuals at an interval of 6-month. Samples

CMU Multi-PIE Database is collected from October 2000 to December 2000 by taking 41,368 images of 68 individuals designed for 14 different poses, 43 illumination variation, and 4 different expressions. This database is known as CMU Multi-PIE by its varying conditions- pose, illumination, and expression. Image resolution is set to resolution 640 486 pixels. Samples of images of CMU Multi-PIE database

This chapter mainly focuses on feature extraction stage in face recognition. It presents some well-known and recently explored spatial domain representations for

of images of CAS-PEAL-R1 database are shown in Figure 3.

Samples of images of A&T database with 10 varying conditions [7].

Visual Object Tracking with Deep Neural Networks

2.3 CMU Multi-PIE Database

Samples of images of CAS-PEAL-R1 database [8].

is shown in Figure 4.

116

Figure 2.

Figure 3.

ϱ a small constant. Different normalization schemes used are L1-norm, L1-sqrt, L2-norm and L2-hys. Generally, L2-hys is used for block normalization. L2-hys is obtained by first computing L2-norm and then clipping such that maximum

Sample input facial image and resultant HOG features are shown in Figure 6.

Scale invariant feature transform (SIFT) is introduced by Lowe et al. [16] for extracting discriminative invariant features in an image. SIFT descriptor is widely used for facial feature representation by extracting blob-like local features [17]. These features are invariant to scale, translation and rotation resulting reliable matching. SIFT is described in four sections as: (1) Detection of scale-space extrema, (2) Detection of local extrema, (3) Orientation assignment, and

First step is to identify keypoints in scale-space of grayscale input image

such that,G að Þ¼ ; <sup>b</sup>; <sup>σ</sup> <sup>1</sup>

where σ is standard deviation of Gaussian G að Þ ; b; σ .

L að Þ¼ ; b; σ G að Þ ; b; σ ∗ f að Þ ; b (5)

� <sup>a</sup>2þb<sup>2</sup> ð Þ<sup>=</sup>2σ<sup>2</sup>

<sup>¼</sup> L að Þ� ; <sup>b</sup>; <sup>k</sup><sup>σ</sup> L að Þ ; <sup>b</sup>; <sup>σ</sup> (7)

(6)

<sup>2</sup>Пσ<sup>2</sup> <sup>e</sup>

Two closest scales of image with difference of multiplication factor k are used to effectively detect extrema in scale-space. Difference of Gaussian (DOG) is computed by taking difference of these two scaled versions of image convolved with

D að Þ¼ ; b; σ ð Þ G að Þ� ; b; kσ G að Þ ; b; σ ∗ f að Þ ; b

Local extrema (maxima/minima) of D að Þ ; b; σ is calculated by comparing sample pixel with eight neighbors in 3 � 3 patch as well as nine neighbors above and below scaled images. To select sample point as local minima, it should be smaller than all 26 neighbors whereas for local maxima, selected point should be larger than all neighbors. After keypoint localization, low contrast and poorly localized points are removed by computing |D að Þ ; b; σ ∣ and discarding points with lower value to

Orientation assignment to each keypoint results in rotation invariance. For each Gaussian smoothened image L að Þ ; b , orientation is assigned by computing gradient magnitude m að Þ ; b , and gradient direction θð Þ a; b by its neighbor using Eqs. (8) and

value of ν is limited to 0.2 and then renormalizing.

4. Scale invariant feature transform (SIFT)

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

(4) Keypoint descriptor representation.

4.1 Detection of scale-space extrema

f að Þ ; b which is defined as:

original image given as:

defined threshold.

(9) respectively.

119

4.3 Orientation assignment

4.2 Detection of local extrema

## Figure 5.

Sample facial image and resultant derivatives. (a) Horizontal derivative. (b) Vertical derivative.


$$\frac{64 \times 64}{8 \times 8} \times 50\% \text{ overlapping} = 196 \text{ blocks} \tag{3}$$

$$196 \text{ blocks} \times 9 \text{ bin} = 1764 \text{ dimensional HOG vector} \tag{4}$$

4.Different normalization schemes are presented in [15] for block normalization. Let <sup>ν</sup> represents un-normalized block with k k<sup>ν</sup> <sup>k</sup> as <sup>k</sup>th norm for k = 1, 2 and

Figure 6. Sample example of (a) Input facial image of size 64 � 64. (b) Resultant HOG features (1764 dimensions).

ϱ a small constant. Different normalization schemes used are L1-norm, L1-sqrt, L2-norm and L2-hys. Generally, L2-hys is used for block normalization. L2-hys is obtained by first computing L2-norm and then clipping such that maximum value of ν is limited to 0.2 and then renormalizing.

Sample input facial image and resultant HOG features are shown in Figure 6.

## 4. Scale invariant feature transform (SIFT)

Scale invariant feature transform (SIFT) is introduced by Lowe et al. [16] for extracting discriminative invariant features in an image. SIFT descriptor is widely used for facial feature representation by extracting blob-like local features [17]. These features are invariant to scale, translation and rotation resulting reliable matching. SIFT is described in four sections as: (1) Detection of scale-space extrema, (2) Detection of local extrema, (3) Orientation assignment, and (4) Keypoint descriptor representation.

### 4.1 Detection of scale-space extrema

2. Next step is fine orientation binning for extracting HOG features. Histogram channels are evenly selected in the range 0–180° for unsigned and 0–360° for signed gradient. Each cell can contribute in the form of pixel magnitude, gradient magnitude, square root or square of magnitude. In general, gradient magnitude

Sample facial image and resultant derivatives. (a) Horizontal derivative. (b) Vertical derivative.

yields the best results while square root reduces the performance [13].

explained as:

Figure 5.

Figure 6.

118

64 � 64 8 � 8

Visual Object Tracking with Deep Neural Networks

3. Gradients in each cell are normalized for local contrast normalization. Cell gradients are normalized from all blocks and are concatenated to form HOG feature vector. Dalal et al. [13] proposed 9 histogram channels (bins) to be computed for unsigned gradient. Hence, for 64 � 64 image, 1764 dimensional HOG feature vector is obtained representing full facial appearance. It can be

� 50% overlapping ¼ 196 blocks (3)

196 blocks � 9 bin ¼ 1764 dimensional HOG vector (4)

4.Different normalization schemes are presented in [15] for block normalization. Let <sup>ν</sup> represents un-normalized block with k k<sup>ν</sup> <sup>k</sup> as <sup>k</sup>th norm for k = 1, 2 and

Sample example of (a) Input facial image of size 64 � 64. (b) Resultant HOG features (1764 dimensions).

First step is to identify keypoints in scale-space of grayscale input image f að Þ ; b which is defined as:

$$L(a, b, \sigma) = G(a, b, \sigma) \* f(a, b) \tag{5}$$

$$\text{Is such that, } G(a, b, \sigma) = \frac{1}{2\Pi\sigma^2} e^{-\left(a^2 + b^2\right)/2\sigma^2} \tag{6}$$

where σ is standard deviation of Gaussian G að Þ ; b; σ .

Two closest scales of image with difference of multiplication factor k are used to effectively detect extrema in scale-space. Difference of Gaussian (DOG) is computed by taking difference of these two scaled versions of image convolved with original image given as:

$$\begin{split} D(a,b,\sigma) &= \left( G(a,b,k\sigma) - G(a,b,\sigma) \right) \* f(a,b) \\ &= L(a,b,k\sigma) - L(a,b,\sigma) \end{split} \tag{7}$$

#### 4.2 Detection of local extrema

Local extrema (maxima/minima) of D að Þ ; b; σ is calculated by comparing sample pixel with eight neighbors in 3 � 3 patch as well as nine neighbors above and below scaled images. To select sample point as local minima, it should be smaller than all 26 neighbors whereas for local maxima, selected point should be larger than all neighbors. After keypoint localization, low contrast and poorly localized points are removed by computing |D að Þ ; b; σ ∣ and discarding points with lower value to defined threshold.

#### 4.3 Orientation assignment

Orientation assignment to each keypoint results in rotation invariance. For each Gaussian smoothened image L að Þ ; b , orientation is assigned by computing gradient magnitude m að Þ ; b , and gradient direction θð Þ a; b by its neighbor using Eqs. (8) and (9) respectively.

$$m(a,b) = \sqrt{\left(L(a+1,b) - L(a-1,b)\right)^2 + \left(L(a,b+1) - L(a,b-1)\right)^2} \tag{8}$$

pyramid is generated as in Figure 8(c). Final resultant SIFT keypoints are then

Local phase quantization (LPQ) introduced by Ojansivu et al. [18, 19] is blur tolerant texture based descriptor. LPQ is based on blur invariance property of frequency domain phase spectrum of an image. LPQ for face recognition is investigated by Ahonen et al. [20] and reported improved results for blurred facial images. LPQ on an image pixel is applied by using short-term Fourier transform (STFT) over M � M patch with image as center and four scalar frequencies. Imaginary and real components are then whitened and binary quantized to generate LPQ code for respective pixel. Complete process is detailed in Figure 9 where LPQ code is obtained for an image pixel [21]. Similarly, final LPQ feature vector can be obtained

Spatial blurring is performed by convolving grayscale input image f að Þ ; b to point spread function (PSF). Frequency domain analysis can be represented as:

here, F uð Þ ; v and P uð Þ ; v are DFT of original image and PSF respectively. H uð Þ ; v

Now, if PSF is positive and even, then⎳P uð Þ ; v must be either 0 or П, such that

Since, shape of P uð Þ ; v generally selected is similar to Gaussian function, low frequency value of P uð Þ ; v is positive. This results ⎳P uð Þ¼ ; v 0 and Eq. (11)

LPQ encoding scheme. (a) Input 5 � 5 patch. (b) Frequency domain representation. (c) LPQ code.

⎳P uð Þ¼ ; v 0 for P uð Þ ; v ≥ 0 while, ⎳P uð Þ¼ ; v П for P uð Þ ; v , 0.

H uð Þ¼ ; v F uð Þ ; v : P uð Þ ; v (10)

⎳H uð Þ¼ ; v ⎳F uð Þþ ; v ⎳P uð Þ ; v (11)

represented as feature vector to be fed to classifier for face recognition.

5. Linear phase quantization (LPQ)

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

by shifting M � M patch over the entire image.

is DFT of resultant blurred image. Phase spectrum is obtained as:

Figure 9.

121

$$\theta(a,b) = \tanh(L(a,b+\mathbf{1}) - L(a,b-\mathbf{1}))/L(a+\mathbf{1},b) - L(a-\mathbf{1},b))\tag{9}$$

#### 4.4 Keypoint descriptor representation

Finally, each detected keypoint is represented as 128 dimensional feature vector. This is obtained by computing magnitude and orientation of gradient at each point in 16 � 16 sized patch of an image. Each 16 � 16 patch is subdivided into 4 � 4 nonoverlapping regions such that each 4 � 4 region is represented by 8 bins. Hence, each keypoint descriptor is represented by 4 � 4 � 8 = 128 length vector.

Figure 7 shows an example of assignment of SIFT descriptor for 8 � 8 neighborhood. Length of each arrow corresponds sum of gradient magnitude in a specific direction for 4 � 4 region.

Processing flow to generate SIFT features for face recognition is shown in Figure 8. Input original image is first preprocessed and difference of Gaussian

Figure 7.

Example of (a) Image gradients of 2 � 2 patch computed from 8 � 8 neighborhood. (b) Resultant SIFT keypoint descriptor.

#### Figure 8.

Processing flow of SIFT for face recognition. (a) Original image. (b) Processed image. (c) Difference of Gaussian Pyramid. (d) SIFT keypoints.

pyramid is generated as in Figure 8(c). Final resultant SIFT keypoints are then represented as feature vector to be fed to classifier for face recognition.

## 5. Linear phase quantization (LPQ)

m að Þ¼ ; b

q

direction for 4 � 4 region.

Figure 7.

Figure 8.

120

Gaussian Pyramid. (d) SIFT keypoints.

keypoint descriptor.

4.4 Keypoint descriptor representation

Visual Object Tracking with Deep Neural Networks

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð Þ L að Þ� <sup>þ</sup> <sup>1</sup>; <sup>b</sup> L að Þ � <sup>1</sup>; <sup>b</sup> <sup>2</sup> <sup>þ</sup> ð Þ L að Þ� ; <sup>b</sup> <sup>þ</sup> <sup>1</sup> L að Þ ; <sup>b</sup> � <sup>1</sup> <sup>2</sup>

θð Þ¼ a; b tanhð Þ L að Þ� ; b þ 1 L að Þ ; b � 1 =L að Þ� þ 1; b L að ÞÞ � 1; b (9)

Finally, each detected keypoint is represented as 128 dimensional feature vector. This is obtained by computing magnitude and orientation of gradient at each point in 16 � 16 sized patch of an image. Each 16 � 16 patch is subdivided into 4 � 4 nonoverlapping regions such that each 4 � 4 region is represented by 8 bins. Hence, each keypoint descriptor is represented by 4 � 4 � 8 = 128 length vector.

Figure 7 shows an example of assignment of SIFT descriptor for 8 � 8 neighborhood. Length of each arrow corresponds sum of gradient magnitude in a specific

Processing flow to generate SIFT features for face recognition is shown in Figure 8. Input original image is first preprocessed and difference of Gaussian

Example of (a) Image gradients of 2 � 2 patch computed from 8 � 8 neighborhood. (b) Resultant SIFT

Processing flow of SIFT for face recognition. (a) Original image. (b) Processed image. (c) Difference of

(8)

Local phase quantization (LPQ) introduced by Ojansivu et al. [18, 19] is blur tolerant texture based descriptor. LPQ is based on blur invariance property of frequency domain phase spectrum of an image. LPQ for face recognition is investigated by Ahonen et al. [20] and reported improved results for blurred facial images.

LPQ on an image pixel is applied by using short-term Fourier transform (STFT) over M � M patch with image as center and four scalar frequencies. Imaginary and real components are then whitened and binary quantized to generate LPQ code for respective pixel. Complete process is detailed in Figure 9 where LPQ code is obtained for an image pixel [21]. Similarly, final LPQ feature vector can be obtained by shifting M � M patch over the entire image.

Spatial blurring is performed by convolving grayscale input image f að Þ ; b to point spread function (PSF). Frequency domain analysis can be represented as:

$$H(\mathfrak{u}, \boldsymbol{\upsilon}) = F(\mathfrak{u}, \boldsymbol{\upsilon}) \cdot P(\mathfrak{u}, \boldsymbol{\upsilon}) \tag{10}$$

here, F uð Þ ; v and P uð Þ ; v are DFT of original image and PSF respectively. H uð Þ ; v is DFT of resultant blurred image.

Phase spectrum is obtained as:

$$
\sum H(u,v) = \sum F(u,v) + \sum P(u,v) \tag{11}
$$

Now, if PSF is positive and even, then⎳P uð Þ ; v must be either 0 or П, such that ⎳P uð Þ¼ ; v 0 for P uð Þ ; v ≥ 0 while, ⎳P uð Þ¼ ; v П for P uð Þ ; v , 0.

Since, shape of P uð Þ ; v generally selected is similar to Gaussian function, low frequency value of P uð Þ ; v is positive. This results ⎳P uð Þ¼ ; v 0 and Eq. (11)

Figure 9. LPQ encoding scheme. (a) Input 5 � 5 patch. (b) Frequency domain representation. (c) LPQ code.

becomes ⎳H uð Þ¼ ; v ⎳F uð Þ ; v : Hence, it can be stated that LPQ possesses blur invariant property. Detailed mathematical analysis of LPQ can be obtained from [21].

## 6. Local binary patterns (LBP)

Local Binary Patterns (LBP) is introduced by Ojala et al. [22] as rotation invariant texture based feature descriptor. LBP as feature representation for face recognition is proposed by Ahohen et al. [23]. It stated that texture analysis of a local facial region represents its local appearance and fusion of all regions can generate an encoded global geometry of face.

Consider an input image and let f að Þ ; b be its preprocessed version. Basic LBP operator on 3 � 3 neighborhood of f að Þ ; b and generated decimal code for center pixel is shown in Figure 10. LBP operator replaces each pixel of f að Þ ; b with a calculated decimal code resulting in LBP encoded image f LBPð Þ a; b . It is done by thresholding each pixel of 3 � 3 neighborhood with its center pixel. Resultant is a binary code which is then converted into corresponding decimal code. Center pixel is then replaced by decimal code of generated binary stream. LBP code assigned to center pixel is given by Eq. (12). Here, ic represents center pixel, cn is gray level of neighbor pixels, and cp is gray level of center pixel.

$$\begin{aligned} \textit{LBP}\_{P,\ \ d}(i\_{\ell}) &= \sum\_{m=0}^{P-1} \varsigma(c\_{n-c\_p}) \mathfrak{Z}^m \\ \textit{s} &= \begin{cases} \mathbf{1} & \textit{if } c\_n - c\_p > \mathbf{0} \\ \mathbf{0} & \textit{otherwise} \end{cases} \end{aligned} \tag{12}$$

Major advantages of LBP over other spatial feature representations are simple calculations, comparatively smaller feature vector size, more powerful towards noises and illumination balance. In recent years, various variants of LBP are widely implemented in texture analysis. Local ternary patterns (LTP) proposed by Tan et al. [24] is based on a ternary threshold operator. LTP is an improved LBP variant by using two LBP vectors for building one LTP representation. Other variants of LBP are compound local binary pattern (CLBP) [25], three-patch LBP (TPLBP) [26], four-patch LBP (FPLBP) [26] and improved local binary pattern (ILBP) [27]. These representations are verified to be

Processing flow of LBP for face recognition. (a) Original input image. (b) Preprocessed image. (c) LBP Encoded image. (d) Divided non-overlapping patches for encoded image. (e) Histogram of selected non-overlapping

Local ternary patterns (LTP) [24] is a generalization of LBP with reduced sensi-

tivity to noise and illumination variations. LTP generates a 3-valued code by including a threshold around zero and improves resistance to noise. LTP works well

more efficient than LBP against illumination and noise conditions.

patch. (f) Final LBP feature vector by concatenating histograms of all patches in image.

7. Local ternary patterns (LTP)

Figure 11.

Figure 12.

123

Different P and R combinations for LBP operator.

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

for noisy images and different lighting conditions.

Ahohen et al. [23] proposed that LBP operator can be used with varying neighborhood size M � M and radius R to deal with different image scales. Notation ð Þ P; R is used to represent P sampling points or neighbor pixels around center pixel for radius R. Thresholding is then performed by comparing center pixel with P neighbor pixels. Example of some selected values of ð Þ P; R is shown in Figure 11.

LBP for face recognition processes by building local LBP descriptor to represent local region and then combined to obtain global representation for entire face. Encoded image f LBPð Þ a; b is evenly divided into non-overlapping blocks. Histogram for each block are calculated and final LBP feature vector is built by concatenating all regional histograms. LBP operator provides essential spatial information that plays a key role for face recognition. Complete processing flow to generate LBP feature vector is shown in Figure 12.

#### Figure 10.

Basic LBP operator on 3 � 3 neighborhood forf að Þ ; b . (a) Preprocessed image. (b) 3 � 3 Neighborhood. (c) Corresponding gray levels of each pixel. (d) Result after thresholding. Finally, center pixel is replaced by code 42.

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

Figure 11. Different P and R combinations for LBP operator.

Figure 12.

becomes ⎳H uð Þ¼ ; v ⎳F uð Þ ; v : Hence, it can be stated that LPQ possesses blur invariant property. Detailed mathematical analysis of LPQ can be obtained from [21].

Local Binary Patterns (LBP) is introduced by Ojala et al. [22] as rotation invariant texture based feature descriptor. LBP as feature representation for face recognition is proposed by Ahohen et al. [23]. It stated that texture analysis of a local facial region represents its local appearance and fusion of all regions can generate an

Consider an input image and let f að Þ ; b be its preprocessed version. Basic LBP operator on 3 � 3 neighborhood of f að Þ ; b and generated decimal code for center pixel is shown in Figure 10. LBP operator replaces each pixel of f að Þ ; b with a calculated decimal code resulting in LBP encoded image f LBPð Þ a; b . It is done by thresholding each pixel of 3 � 3 neighborhood with its center pixel. Resultant is a binary code which is then converted into corresponding decimal code. Center pixel is then replaced by decimal code of generated binary stream. LBP code assigned to center pixel is given by Eq. (12). Here, ic represents center pixel, cn is gray level of

> P�1 m¼0

<sup>s</sup> <sup>¼</sup> <sup>1</sup> if cn � cp . <sup>0</sup> 0 otherwise

Ahohen et al. [23] proposed that LBP operator can be used with varying neigh-

borhood size M � M and radius R to deal with different image scales. Notation ð Þ P; R is used to represent P sampling points or neighbor pixels around center pixel for radius R. Thresholding is then performed by comparing center pixel with P neighbor pixels. Example of some selected values of ð Þ P; R is shown in Figure 11. LBP for face recognition processes by building local LBP descriptor to represent

local region and then combined to obtain global representation for entire face. Encoded image f LBPð Þ a; b is evenly divided into non-overlapping blocks. Histogram for each block are calculated and final LBP feature vector is built by concatenating all regional histograms. LBP operator provides essential spatial information that plays a key role for face recognition. Complete processing flow to generate LBP

Basic LBP operator on 3 � 3 neighborhood forf að Þ ; b . (a) Preprocessed image. (b) 3 � 3 Neighborhood. (c) Corresponding gray levels of each pixel. (d) Result after thresholding. Finally, center pixel is replaced by

s cn�cp � �2<sup>m</sup>

( (12)

LBPP, <sup>R</sup> ið Þ¼ <sup>c</sup> ∑

6. Local binary patterns (LBP)

Visual Object Tracking with Deep Neural Networks

encoded global geometry of face.

feature vector is shown in Figure 12.

Figure 10.

code 42.

122

neighbor pixels, and cp is gray level of center pixel.

Processing flow of LBP for face recognition. (a) Original input image. (b) Preprocessed image. (c) LBP Encoded image. (d) Divided non-overlapping patches for encoded image. (e) Histogram of selected non-overlapping patch. (f) Final LBP feature vector by concatenating histograms of all patches in image.

Major advantages of LBP over other spatial feature representations are simple calculations, comparatively smaller feature vector size, more powerful towards noises and illumination balance. In recent years, various variants of LBP are widely implemented in texture analysis. Local ternary patterns (LTP) proposed by Tan et al. [24] is based on a ternary threshold operator. LTP is an improved LBP variant by using two LBP vectors for building one LTP representation. Other variants of LBP are compound local binary pattern (CLBP) [25], three-patch LBP (TPLBP) [26], four-patch LBP (FPLBP) [26] and improved local binary pattern (ILBP) [27]. These representations are verified to be more efficient than LBP against illumination and noise conditions.

## 7. Local ternary patterns (LTP)

Local ternary patterns (LTP) [24] is a generalization of LBP with reduced sensitivity to noise and illumination variations. LTP generates a 3-valued code by including a threshold around zero and improves resistance to noise. LTP works well for noisy images and different lighting conditions.

In LBP, neighbor pixels are compared with center pixel directly. Hence, a small variation in pixel values due to noise can drastically change LBP code. To overcome this limitation, LTP introduces a threshold �t around center pixel ic and neighbor pixels are compared to generate 3-valued ternary code as:

$$\text{LTP}\_{P,R}(i\_c) = \sum\_{m=0}^{P-1} s(c\_n - c\_p) \mathbf{2}^m \tag{13}$$

$$\mathbf{s} = \begin{cases} \quad \mathbf{1} \ c\_p \ge c\_n + t \\ \quad \mathbf{0} \ |c\_p - t| < t \\\quad -\mathbf{1} \ c\_p \le c\_p - t \end{cases} \tag{14}$$

generated by replacing '-1' in original ternary code to '0' and '1' respectively. Hence,

Compound local binary pattern (CLBP) proposed by Ahmed et al. [25] is an improved variant of LBP using 2P bits code. CLBP overcomes limitation of LBP by improving performance in case of flat image. LBP results poor for images with bright spots or dark patches i.e. in case of flat image LBP fails as shown in Figure 15. Original LBP generates P bits code by taking gray level difference between center pixel and P neighbor pixels (sampling points). CLPB is an extension to LBP by generating 2P bits code for P neighbor pixels. Here, extra P bits encode magnitude information of difference between center pixel and P pixels. This way, CLBP increases robustness of texture representation mainly in case of flat images.

To generate 2P bits code, CLBP represents each neighbor pixel with two bits for sign and magnitude information. The first bit is same as LBP bit and represents sign of difference between center pixel and respective neighbor pixel. Second bit encodes magnitude of difference with respect to a calculated threshold Mab. This threshold is obtained by taking mean of magnitudes of difference between center

First bit is set to '1' if gray level of neighbor pixel is greater than or equals to center pixel and '0' otherwise. Second bit is '1' if absolute magnitude of difference between neighbor pixel and center pixel is greater than threshold and '0' otherwise. CLBP

> P�1 m¼0

00 cn � cp , 0, cn � cp

01 cn � cp , 0 cn � cp

10 cn � cp ≥0 cn � cp

CLBP encoding scheme to generate 2P bits code for 3 � 3 neighborhood of an image is shown in Figure 16. A 16-bits CLBP code is generated after thresholding using Eq. (16). Resultant CLBP code is then split into two 8 bits sub-CLBP codes to

11 otherwise

reduce possible binary patterns from 216 to (2 � 28Þ: First 8-bits code is

LBP code for flat image. (a) 3 � 3 Neighbouhood of an image. (b) LBP encoded image.

s cn�cp

� � �

� � �

� � �

� �2<sup>m</sup> (15)

(16)

�≤ Mab

� . Mab

�≤ Mab

CLBPP,R ið Þ¼ <sup>c</sup> ∑

s ¼

8 >>>>>><

>>>>>>:

LTP represents each original image by two encoded images.

8. Compound local binary pattern (CLBP)

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

pixel and all P pixels.

Figure 15.

125

Here, cp and cn represent gray levels of center pixel and neighbor pixels respectively. Understanding of LTP encoding scheme to generate ternary LTP code is shown in Figure 13. Here, threshold t is set to 5, hence with center pixel value 40, the tolerance range is [35, 45]. Neighbor pixels with gray level values in this range is replaced by zero, those above are replaced by 1 and below are replaced by �1 as described in Eq. (14).

Resultant ternary LTP code is split into two sub-LTP codes which are treated as two separate channels as shown in Figure 14. Lower and upper sub-LTP codes are

#### Figure 13.

LTP encoding scheme to generate ternary LTP code. (a) Preprocessed image. (b) 3 � 3 Neighborhood. (c) Corresponding gray levels of each pixel. (d) Ternary LTP code after thresholding.

#### Figure 14.

Splitting of ternary LTP code to generate lower and upper sub-LTP codes. (a) 3 � 3 neighborhood of an image. (b) Ternary LTP code. (c) Lower sub-LTP code. (d) Upper sub-LTP code. Finally, lower and upper sub-LTP codes obtained are 7 and 168 respectively.

In LBP, neighbor pixels are compared with center pixel directly. Hence, a small variation in pixel values due to noise can drastically change LBP code. To overcome this limitation, LTP introduces a threshold �t around center pixel ic and neighbor

> P�1 m¼0

1 cp ≥cn þ t 0 ∣cp � t∣ , t

�1 cp ≤ cp � t

Here, cp and cn represent gray levels of center pixel and neighbor pixels respectively. Understanding of LTP encoding scheme to generate ternary LTP code is shown in Figure 13. Here, threshold t is set to 5, hence with center pixel value 40, the tolerance range is [35, 45]. Neighbor pixels with gray level values in this range is replaced by zero, those above are replaced by 1 and below are replaced by �1 as

Resultant ternary LTP code is split into two sub-LTP codes which are treated as two separate channels as shown in Figure 14. Lower and upper sub-LTP codes are

LTP encoding scheme to generate ternary LTP code. (a) Preprocessed image. (b) 3 � 3 Neighborhood.

Splitting of ternary LTP code to generate lower and upper sub-LTP codes. (a) 3 � 3 neighborhood of an image. (b) Ternary LTP code. (c) Lower sub-LTP code. (d) Upper sub-LTP code. Finally, lower and upper sub-LTP

(c) Corresponding gray levels of each pixel. (d) Ternary LTP code after thresholding.

s cn�cp

� �2<sup>m</sup> (13)

(14)

pixels are compared to generate 3-valued ternary code as:

Visual Object Tracking with Deep Neural Networks

described in Eq. (14).

Figure 13.

Figure 14.

124

codes obtained are 7 and 168 respectively.

LTPP, <sup>R</sup> ið Þ¼ <sup>c</sup> ∑

8 >><

>>:

s ¼

generated by replacing '-1' in original ternary code to '0' and '1' respectively. Hence, LTP represents each original image by two encoded images.

## 8. Compound local binary pattern (CLBP)

Compound local binary pattern (CLBP) proposed by Ahmed et al. [25] is an improved variant of LBP using 2P bits code. CLBP overcomes limitation of LBP by improving performance in case of flat image. LBP results poor for images with bright spots or dark patches i.e. in case of flat image LBP fails as shown in Figure 15.

Original LBP generates P bits code by taking gray level difference between center pixel and P neighbor pixels (sampling points). CLPB is an extension to LBP by generating 2P bits code for P neighbor pixels. Here, extra P bits encode magnitude information of difference between center pixel and P pixels. This way, CLBP increases robustness of texture representation mainly in case of flat images.

To generate 2P bits code, CLBP represents each neighbor pixel with two bits for sign and magnitude information. The first bit is same as LBP bit and represents sign of difference between center pixel and respective neighbor pixel. Second bit encodes magnitude of difference with respect to a calculated threshold Mab. This threshold is obtained by taking mean of magnitudes of difference between center pixel and all P pixels.

First bit is set to '1' if gray level of neighbor pixel is greater than or equals to center pixel and '0' otherwise. Second bit is '1' if absolute magnitude of difference between neighbor pixel and center pixel is greater than threshold and '0' otherwise. CLBP

$$\text{CLBP}\_{P,R}(i\_c) = \sum\_{m=0}^{P-1} s(c\_{n-c\_p}) \mathbf{2}^m \tag{15}$$

$$\mathbf{s} = \begin{cases} \mathbf{0}\mathbf{0}\ c\_n - c\_p < \mathbf{0}, \quad \left| c\_n - c\_p \right| \le M\_{ab} \\\\ \mathbf{0}\mathbf{1}\ c\_n - c\_p < \mathbf{0} \quad \left| c\_n - c\_p \right| > M\_{ab} \\\\ \mathbf{1}\mathbf{0}\ c\_n - c\_p \ge \mathbf{0} \quad \left| c\_n - c\_p \right| \le M\_{ab} \\\\ \mathbf{1}\mathbf{1} & otherwise \end{cases} \tag{16}$$

CLBP encoding scheme to generate 2P bits code for 3 � 3 neighborhood of an image is shown in Figure 16. A 16-bits CLBP code is generated after thresholding using Eq. (16). Resultant CLBP code is then split into two 8 bits sub-CLBP codes to reduce possible binary patterns from 216 to (2 � 28Þ: First 8-bits code is

Figure 15. LBP code for flat image. (a) 3 � 3 Neighbouhood of an image. (b) LBP encoded image.

TPLBP assigns each neighbor pixel in encoded image with 1-bit value by comparing gray level of three patches. For each center pixel ic, M � M patch is considered and P additional same sized patches with center at distance of radius R is selected. Center pixel ic is compared with center pixels of two patches at δ distance apart along the ring of radius R. This way, TPLBP generates P bits code for ic as:

here, cp, cm and cmþ<sup>δ</sup> mod M are gray level of ic, gray levels of center pixel of <sup>m</sup>th

8 ><

>:

τ is a user-specific threshold selected slightly greater than zero (say τ=.01) to obtain stability in flat regions. Figure 18 shows a sample example to generate TPBLP code for selected P ¼ 8, δ ¼ 2, M ¼ 3: TPLBP code generation for given

Processing flow to obtain TPLBP feature vector for face recognition is shown in Figure 19. Input facial image of size 64 � 64 is first represented as TPLBP encoded

� � � d cmþ<sup>δ</sup> mod M; cp

1, a≥ τ

0, a , τ

� � � d c3;cp � � � � 21

� � � d c5;cp � � � � 23

� � � d c7;cp � � � � 2<sup>5</sup>

� � � d c1;cp � � � � 27

þ

þ

þ

� � � � 2<sup>m</sup> (17)

(18)

(19)

TPLBPP,R,M, <sup>δ</sup> ið Þ¼ <sup>c</sup> ∑

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

sample using Eq. (17) is as:

Figure 18.

127

f dc0;cp

f dc2;cp

f dc4;cp

f dc6;cp

TPLBP code generation for selected P ¼ 8, δ ¼ 2, M ¼ 3:

� � � d c2;cp

� � � d c4;cp

� � � d c6;cp

� � � d c0;cp

P�1 m¼0

and ð Þ <sup>m</sup> <sup>þ</sup> <sup>δ</sup> th patches respectively. <sup>d</sup>ð Þ: is <sup>L</sup><sup>2</sup> norm and <sup>f</sup> is given as:

f að Þ¼

� � � � 20 <sup>þ</sup> f dc1;cp

� � � � 22 <sup>þ</sup> f dc3;cp

� � � � 24 <sup>þ</sup> f dc5;cp

� � � � <sup>2</sup><sup>6</sup> <sup>þ</sup> f dc7;cp

image as in Figure 19(c). TPLBP encoded image is then divided into nonoverlapping patches of same size and histogram for each patch is obtained. These

f dcm;cp

Figure 16.

CLBP encoding scheme to generate 2P bits code. (a) 3 3 neighborhood of an image. (b) 2P bits CLBP code after thresholding. (c) Separated sub-CLBP codes. (d) Resultant two 8-bit sub-CLBP codes.

Figure 17.

Processing flow of CLBP for face recognition. (a) Original image. (b) Preprocessed image. (c) Separated sub-CLBP encoded images. (d) Respective histograms of each encoded image. (e) Concatenated histogram.

concatenation of bits from pixels marked red in Figure 16(c). Again, second 8-bits code is obtained by concatenating bit values from left over pixels. Finally, these sub-CLBP codes are treated as channels for final feature vector representation.

Processing flow to generate histograms of CLBP encoded image for face recognition is shown in Figure 17. It explains how each pixel of original image is converted into CLBP encoded image. Figure 17(c) shows two sub-CLBP encoded images. Histogram of each encoded image are obtained as in Figure 17(d). These histograms can be individually used as separate feature vectors for face recognition or can be concatenated as a single final vector.

#### 9. Three-patch LBP (TPLBP)

Original LBP and different variants of LBP generate 1-bit value or 2-bit value (for CLBP) by comparing two pixels, one as center pixel and other as one of the P neighbor pixels. Wolf et al. [26] proposed two different variants of LBP, namely, Three-patch LBP (TPLBP and Four-patch LBP (FPLBP) by comparing center pixel with more than one neighbor pixels.

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

TPLBP assigns each neighbor pixel in encoded image with 1-bit value by comparing gray level of three patches. For each center pixel ic, M � M patch is considered and P additional same sized patches with center at distance of radius R is selected. Center pixel ic is compared with center pixels of two patches at δ distance apart along the ring of radius R. This way, TPLBP generates P bits code for ic as:

$$\text{TPLBP}\_{P,R,M,\mathcal{S}}(\mathbf{i}\_c) = \sum\_{m=0}^{P-1} f\left(d\left(c\_m, c\_p\right) - d\left(c\_{m+8\mod{\mathcal{M}}}, c\_p\right)\right) 2^m \tag{17}$$

here, cp, cm and cmþ<sup>δ</sup> mod M are gray level of ic, gray levels of center pixel of <sup>m</sup>th and ð Þ <sup>m</sup> <sup>þ</sup> <sup>δ</sup> th patches respectively. <sup>d</sup>ð Þ: is <sup>L</sup><sup>2</sup> norm and <sup>f</sup> is given as:

$$f(a) = \begin{cases} \quad \text{1,} \ a \ge \pi \\\\ 0, \ a \le \pi \end{cases} \tag{18}$$

τ is a user-specific threshold selected slightly greater than zero (say τ=.01) to obtain stability in flat regions. Figure 18 shows a sample example to generate TPBLP code for selected P ¼ 8, δ ¼ 2, M ¼ 3: TPLBP code generation for given sample using Eq. (17) is as:

$$\begin{aligned} &f\left(d(c\_0, c\_p) - d(c\_2, c\_p)\right)2^0 + f\left(d(c\_1, c\_p) - d(c\_3, c\_p)\right)2^1 + \\ &f\left(d(c\_2, c\_p) - d(c\_4, c\_p)\right)2^2 + f\left(d(c\_3, c\_p) - d(c\_5, c\_p)\right)2^3 + \\ &f\left(d(c\_4, c\_p) - d(c\_6, c\_p)\right)2^4 + f\left(d(c\_5, c\_p) - d(c\_7, c\_p)\right)2^5 + \\ &f\left(d(c\_6, c\_p) - d\left(c\_0, c\_p\right)\right)2^6 + f\left(d(c\_7, c\_p) - d\left(c\_1, c\_p\right)\right)2^7 \end{aligned} \tag{19}$$

Processing flow to obtain TPLBP feature vector for face recognition is shown in Figure 19. Input facial image of size 64 � 64 is first represented as TPLBP encoded image as in Figure 19(c). TPLBP encoded image is then divided into nonoverlapping patches of same size and histogram for each patch is obtained. These

Figure 18. TPLBP code generation for selected P ¼ 8, δ ¼ 2, M ¼ 3:

concatenation of bits from pixels marked red in Figure 16(c). Again, second 8-bits code is obtained by concatenating bit values from left over pixels. Finally, these sub-CLBP codes are treated as channels for final feature vector representation. Processing flow to generate histograms of CLBP encoded image for face recog-

Processing flow of CLBP for face recognition. (a) Original image. (b) Preprocessed image. (c) Separated sub-CLBP encoded images. (d) Respective histograms of each encoded image. (e) Concatenated histogram.

CLBP encoding scheme to generate 2P bits code. (a) 3 3 neighborhood of an image. (b) 2P bits CLBP code

after thresholding. (c) Separated sub-CLBP codes. (d) Resultant two 8-bit sub-CLBP codes.

Visual Object Tracking with Deep Neural Networks

Original LBP and different variants of LBP generate 1-bit value or 2-bit value (for CLBP) by comparing two pixels, one as center pixel and other as one of the P neighbor pixels. Wolf et al. [26] proposed two different variants of LBP, namely, Three-patch LBP (TPLBP and Four-patch LBP (FPLBP) by comparing center pixel

nition is shown in Figure 17. It explains how each pixel of original image is converted into CLBP encoded image. Figure 17(c) shows two sub-CLBP encoded images. Histogram of each encoded image are obtained as in Figure 17(d). These histograms can be individually used as separate feature vectors for face recognition

or can be concatenated as a single final vector.

9. Three-patch LBP (TPLBP)

Figure 16.

Figure 17.

126

with more than one neighbor pixels.

FPLBPP,R1,R2,M,<sup>δ</sup> ið Þ¼ <sup>c</sup> ∑

to TPLBP is shown in Figure 21.

11. Improved LBP (ILBP)

Complete processing flow to generate ILBP code.

Figure 21.

Figure 22.

129

P=2�1 m¼0

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

f dc<sup>ð</sup> i,m;co,mþ<sup>δ</sup> mod MÞ � d ci,mþP=2;co,mþP=2þ<sup>δ</sup> mod M <sup>2</sup><sup>m</sup>

here, ci,m and co,mþ<sup>δ</sup> mod M are gray levels of center pixel of <sup>m</sup>th patch in inner

f dc ð Þ ð Þ� <sup>i</sup>0;co<sup>1</sup> d cð Þ <sup>i</sup>4;co<sup>5</sup> 20 <sup>þ</sup> f dc ð Þ ð Þ� <sup>i</sup>1;co<sup>2</sup> d cð Þ <sup>i</sup>5;co<sup>6</sup> <sup>2</sup><sup>1</sup>

Processing flow to obtain FPLBP feature vector for a sample facial image similar

Improved LBP (ILBP) originally named as CLBP (complete LBP) is proposed by Guo et al. [27]. It is termed as ILBP to distinguish its abbreviation from compound LBP (CLBP). In ILBP, neighbor pixels are represented by its center pixel and a local difference sign-magnitude transform (LDSMT). A complete processing flow to

Processing flow of FPLBP for face recognition. (a) Original input image. (b) Preprocessed image. (c) FPLBP Encoded image. (d) Divided non-overlapping patches for encoded image. (e) Histogram of selected nonoverlapping patch. (f) Final FPLBP feature vector by concatenating histograms of all patches in image.

f dc ð Þ ð Þ� <sup>i</sup>2;co<sup>3</sup> d cð Þ <sup>i</sup>6;co<sup>7</sup> <sup>2</sup><sup>2</sup> <sup>þ</sup> f dc ð Þ ð Þ� <sup>i</sup>3;co<sup>4</sup> d cð Þ <sup>i</sup>7;co<sup>8</sup> <sup>23</sup> (21)

co,mþP=2þ<sup>δ</sup> mod M are gray levels of center pixel of center symmetric ð Þ <sup>m</sup> <sup>þ</sup> <sup>P</sup>=<sup>2</sup> th patch in inner ring and ð Þ <sup>m</sup> <sup>þ</sup> <sup>P</sup>=<sup>2</sup> <sup>þ</sup> <sup>δ</sup> th patch in outer ring respectively. Figure 20 shows a sample example to generate FPBLP code for selected P ¼ 8, δ ¼ 2, M ¼ 3:

ring and ð Þ <sup>m</sup> <sup>þ</sup> <sup>δ</sup> th patch in outer ring respectively. Again, ci,mþP=<sup>2</sup> and

Also FPLBP code generation for given sample using Eq. (20) is as:

(20)

Figure 19.

Processing flow of TPLBP for face recognition. (a) Original input image. (b) Preprocessed image. (c) TPLBP Encoded image. (d) Divided non-overlapping patches for encoded image. (e) Histogram of selected nonoverlapping patch. (f) Final TPLBP feature vector by concatenating histograms of all patches in image.

histograms are then normalized and truncated to value 0:2. Finally, TPLBP feature vector is obtained by concatenating all histograms.

## 10. Four-patch LBP (FPLBP)

Four-patch LBP (FPLBP) [26] is an extension to TPLBP by comparing center pixels of four patches to generate 1-bit value. Two different rings with radius R1 and R2 (R1 , R2) and P patches of size M � M for each ring are selected around center pixel ic. Two patches with center symmetric are selected in inner ring and compared with corresponding patches in outer ring at distance δ along a circle. This way, FPLBP generates P=2 bit code for ic by obtaining P=2 pairs as:

Figure 20. FPLBP code generation for selected P ¼ 8, δ ¼ 2, M ¼ 3:

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

$$FPLBP\_{P,R1,R2,M,\delta}(i\_c) = \sum\_{m=0}^{P/2-1} f\left(d(c\_{i,m}, c\_{o,m+8\text{ mod }M}) - d\left(c\_{i,m+P/2}, c\_{o,m+P/2+8\mod M}\right)\right) 2^m \tag{20}$$

here, ci,m and co,mþ<sup>δ</sup> mod M are gray levels of center pixel of <sup>m</sup>th patch in inner ring and ð Þ <sup>m</sup> <sup>þ</sup> <sup>δ</sup> th patch in outer ring respectively. Again, ci,mþP=<sup>2</sup> and

co,mþP=2þ<sup>δ</sup> mod M are gray levels of center pixel of center symmetric ð Þ <sup>m</sup> <sup>þ</sup> <sup>P</sup>=<sup>2</sup> th patch in inner ring and ð Þ <sup>m</sup> <sup>þ</sup> <sup>P</sup>=<sup>2</sup> <sup>þ</sup> <sup>δ</sup> th patch in outer ring respectively. Figure 20 shows a sample example to generate FPBLP code for selected P ¼ 8, δ ¼ 2, M ¼ 3: Also FPLBP code generation for given sample using Eq. (20) is as:

$$\begin{aligned} f(d(c\_{i0}, c\_{o1}) - d(c\_{i4}, c\_{o5}))2^0 + f(d(c\_{i1}, c\_{o2}) - d(c\_{i5}, c\_{o6}))2^1 \\ f(d(c\_{i2}, c\_{o3}) - d(c\_{i6}, c\_{o7}))2^2 + f(d(c\_{i3}, c\_{o4}) - d(c\_{i7}, c\_{o8}))2^3 \end{aligned} \tag{21}$$

Processing flow to obtain FPLBP feature vector for a sample facial image similar to TPLBP is shown in Figure 21.

#### Figure 21.

histograms are then normalized and truncated to value 0:2. Finally, TPLBP feature

Processing flow of TPLBP for face recognition. (a) Original input image. (b) Preprocessed image. (c) TPLBP Encoded image. (d) Divided non-overlapping patches for encoded image. (e) Histogram of selected nonoverlapping patch. (f) Final TPLBP feature vector by concatenating histograms of all patches in image.

Four-patch LBP (FPLBP) [26] is an extension to TPLBP by comparing center pixels of four patches to generate 1-bit value. Two different rings with radius R1 and R2 (R1 , R2) and P patches of size M � M for each ring are selected around center pixel ic. Two patches with center symmetric are selected in inner ring and compared with corresponding patches in outer ring at distance δ along a circle. This way,

vector is obtained by concatenating all histograms.

Visual Object Tracking with Deep Neural Networks

FPLBP generates P=2 bit code for ic by obtaining P=2 pairs as:

10. Four-patch LBP (FPLBP)

Figure 19.

Figure 20.

128

FPLBP code generation for selected P ¼ 8, δ ¼ 2, M ¼ 3:

Processing flow of FPLBP for face recognition. (a) Original input image. (b) Preprocessed image. (c) FPLBP Encoded image. (d) Divided non-overlapping patches for encoded image. (e) Histogram of selected nonoverlapping patch. (f) Final FPLBP feature vector by concatenating histograms of all patches in image.

## 11. Improved LBP (ILBP)

Improved LBP (ILBP) originally named as CLBP (complete LBP) is proposed by Guo et al. [27]. It is termed as ILBP to distinguish its abbreviation from compound LBP (CLBP). In ILBP, neighbor pixels are represented by its center pixel and a local difference sign-magnitude transform (LDSMT). A complete processing flow to

Figure 22. Complete processing flow to generate ILBP code.

generate ILBP code is shown in Figure 22. ILPB generates 3P bits code for P neighbor pixels. An original image is first represented in terms of local threshold and global threshold. Local threshold is then further decomposed into sign and magnitude components. Consequently, three representations of P bits are obtained namely, ILBP\_Sign (ILBP\_S), ILBP\_Magnitude (ILBP\_M) and ILBP\_Gobal (ILBP\_G) and combined to form 3P bits ILBP code.

Let cp and cn represent gray levels of center pixel ic and P neighbor pixels respectively. Local threshold is generated by taking difference sp ¼ cn � cp. Subtracted vector sp is further divided into components, namely, magnitude of subtraction (mp) and sign of subtraction (qp) as:

$$s\_p = q\_p \* m\_p, where \begin{cases} q\_p = \text{sign}(\mathfrak{s}\_p) \\ m\_p = |sp| \end{cases} \tag{22}$$

$$q\_p = \begin{cases} \mathbf{1}, & s\_p \ge 0 \\ -\mathbf{1}, & s\_p < \mathbf{0} \end{cases} \tag{23}$$

12. Result analysis for face recognition

Comparative analysis of spatial domain feature representations.

Feature Advantages Disadvantages

LBP • High discriminative power.

• Computational simplicity.

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

LPQ • Performance is better as compare to

images.

CLBP • It gives better performance as

TPLBP • Rotation invariant for texture descriptor.

FPLBP • Rotation invariant for texture descriptor.

Table 2.

131

LBP in case of blurred illumination and facial expression variations

compared to LBP as it uses both difference sign and magnitude.

• Capture information for not only microstructure but also macrostructure.

• Capture information for not only microstructure but also macrostructure.

HOG • Rotation and scale invariant. • Very sensitive to image rotation. Not

SIFT • Rotation and scale invariant. • Mathematically complicated and

LTP • Resistant to noise. • Not invariant under gray-scale

good choice for classification of

textures or objects.

devices.

computationally heavy. • It is not effective for low powered

• Not invariant to rotations. • Size of feature vector increases exponentially with number of neighbors leading to an increase of computational complexity in terms of

• The structural information captured by it is limited. Only pixel difference is used, magnitude information ignored. • Performance decreases for flat images.

• LPQ vector is about four times longer than an LBP vector with 8 neighbor

transform of intensity values as its encoding is based on a fixed predefined

• Feature vector is too long so it increases computational time.

time and space.

pixels.

thresholding.

• More complex.

• Complexity increases.

Face recognition has been explored over last many years, hence there exists a large number of researches in this domain. In this section, we present existing face recognition results and analysis based on different spatial domain representations. Deniz et al. [28] proposed face recognition using HOG features by extracting features from varying image patches which resulted in an improved accuracy. Recognition accuracy is evaluated on FERET database with best result of 95.4%. Other related researches are [29] which used EBGM-HOG and showed robustness to change in illumination, rotation and small displacements. Some existing works on face recognition using SIFT features are [30, 31]. These works have also used

Understanding of ILPB encoding scheme to generate 3P bits ILBP code is shown in Figure 23. Figure 23(a) shows 3 � 3 neighborhood with center pixel value 50. ILBP encoded image after local thresholding is shown in Figure 23(b) as [�38, �15, 20, 15, 22, �6, �41, 35]. After LDSMT, sign and magnitude vectors are obtained. It is clearly seen that original LBP uses only sign as LBP encodes �1 as 0 in sign vector representation. LBP code for above sample block is [0, 0, 1, 1, 1, 0, 0, 1]. Hence, LBP considers only sign components of subtraction while ILBP combines three representations, ILBP\_S, ILBP\_M and ILBP\_G. Local region around center pixel is represented by LDSMT, assigning threshold value w.r.t sign leads ILBP\_S and assigning threshold value w.r.t. magnitude leads ILBP\_M. Similarly, image is also encoded using global threshold is termed as ILBP\_G.

A comparative analysis of various spatial domain feature representations is given in Table 2.

#### Figure 23.

ILBP encoding scheme. (a) 3 � 3 neighborhood of an image. (b) ILBP encoded image after thresholding. (c) Sign component. (d) Magnitude component.


#### Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

generate ILBP code is shown in Figure 22. ILPB generates 3P bits code for P neighbor pixels. An original image is first represented in terms of local threshold and global threshold. Local threshold is then further decomposed into sign and magnitude components. Consequently, three representations of P bits are obtained

namely, ILBP\_Sign (ILBP\_S), ILBP\_Magnitude (ILBP\_M) and ILBP\_Gobal

Let cp and cn represent gray levels of center pixel ic and P neighbor pixels respectively. Local threshold is generated by taking difference sp ¼ cn � cp. Subtracted vector sp is further divided into components, namely, magnitude of

sp <sup>¼</sup> qp <sup>∗</sup> mp, where qp <sup>¼</sup> sign sp

qp <sup>¼</sup> <sup>1</sup>, sp <sup>≥</sup><sup>0</sup>

�

(

�1, sp , 0

Understanding of ILPB encoding scheme to generate 3P bits ILBP code is shown in Figure 23. Figure 23(a) shows 3 � 3 neighborhood with center pixel value 50. ILBP encoded image after local thresholding is shown in Figure 23(b) as [�38, �15, 20, 15, 22, �6, �41, 35]. After LDSMT, sign and magnitude vectors are obtained. It is clearly seen that original LBP uses only sign as LBP encodes �1 as 0 in sign vector representation. LBP code for above sample block is [0, 0, 1, 1, 1, 0, 0, 1]. Hence, LBP considers only sign components of subtraction while ILBP combines three representations, ILBP\_S, ILBP\_M and ILBP\_G. Local region around center pixel is represented by LDSMT, assigning threshold value w.r.t sign leads ILBP\_S and assigning threshold value w.r.t. magnitude leads ILBP\_M. Similarly, image is also

A comparative analysis of various spatial domain feature representations is given

ILBP encoding scheme. (a) 3 � 3 neighborhood of an image. (b) ILBP encoded image after thresholding. (c)

� �

(22)

(23)

mp ¼ j j sp

(ILBP\_G) and combined to form 3P bits ILBP code.

Visual Object Tracking with Deep Neural Networks

subtraction (mp) and sign of subtraction (qp) as:

encoded using global threshold is termed as ILBP\_G.

in Table 2.

Figure 23.

130

Sign component. (d) Magnitude component.

Table 2.

Comparative analysis of spatial domain feature representations.

## 12. Result analysis for face recognition

Face recognition has been explored over last many years, hence there exists a large number of researches in this domain. In this section, we present existing face recognition results and analysis based on different spatial domain representations. Deniz et al. [28] proposed face recognition using HOG features by extracting features from varying image patches which resulted in an improved accuracy. Recognition accuracy is evaluated on FERET database with best result of 95.4%. Other related researches are [29] which used EBGM-HOG and showed robustness to change in illumination, rotation and small displacements. Some existing works on face recognition using SIFT features are [30, 31]. These works have also used

variants of SIFT such as volume-SIFT (VSIFT), partial-descriptor-SIFT (PDSIFT), learning SIFT at specific locations to improve verification accuracy.

the research grant. The sanctioned project title is "Design and development of an Automatic Kinship Verification system for Indian faces with possible integration of

AADHAR Database." with reference no. ECR/2016/001659.

The authors have no conflict of interest.

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

Conflict of interest

Author details

133

Raipur, Chhattisgarh, India

provided the original work is properly cited.

Toshanlal Meenpal\*, Aarti Goyal and Moumita Mukherjee

\*Address all correspondence to: tmeenpal.etc@nitrr.ac.in

Electronics and Telecommunication Department, National Institute of Technology,

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

Face recognition using LPQ feature representation is inspired by [18, 19] which used LPQ as blur invariant descriptor. Damane et al. [32] presented face recognition using LPQ under varying conditions of light, blur, and illumination. Experiments are performed on extended YALE-B, CMU-PIE, and CAS-PEAL-R1 face databases and results showed that LPQ has more robustness to light and illumination variation. Chan et al. [33] presented multiscale LPQ for face recognition and evaluated results on FERET and BANCA face databases. Multiscale LPQ is obtained by applying varying filter size and combining LPQ images, which are then projected into LDA space. Best results of 99.2% for FB, 92% for DP1 and 88% for DP2 are achieved on FERET probe sets.

Face recognition using LBP feature representation is one of the most researched area [34–38]. Again, Tan et al. [24] evaluated face recognition under varying lighting condition using LTP feature representation on Extended Yale-B, and CMU PIE face databases. They showed that LTP is more discriminant and less sensitive to noise in uniform regions and improved results in case of flat images. Wolf et al. [26] proposed TPLBP and FPLBP features for face recognition. Accuracy results are validated on two well-known databases, labeled faces in the wild (LFW) and multi PIE. They showed that combining several descriptors from the same LBP boosts family recognition rate. This paper claimed that best accuracy of 80.75% for TPLBP and 75.57% for FPLBP are obtained with the combination of ITML with MultiOSS ID and pose variation. Ahmed et al. [25] proposed CLBP features for facial expression recognition. It is an extension of LBP features. Results are verified in Cohn-Kanade (CK) facial expression database. CLBP features are classified with the help of SVM classifier. They showed that classification rate can be effected by adjusting the number of regions into which expression images are partitioned. For this, they considered three cases by dividing images into 3 3, 5 5, and 7 6 patches. Best accuracy result for CLBP is 94.4% in case of image with 5 5 patch size.

### 13. Conclusion

This chapter presents well-known and some recently explored spatial feature representations for face recognition. These feature representations are scale, translation and rotation invariants for 2-D face images. This chapter covers HOG, SIFT and LBP feature representations and complete processing flow to generate feature vectors using these representations for face recognition. SIFT and HOG based on computing image gradients and local extrema are commonly used feature representations for face recognition. LBP performs texture based analysis to represent local facial appearance and an encoded facial image. Other relevant spatial domain representations, such as, LPQ and variants of LBP are explained and analyzed for face recognition. LPQ possesses blur invariant property and provides improved results for blurred facial image. Different variants of LBP, such as, LTP, CLBP, TPLBP and FPLBP are more robust to noise and lighting conditions. These representations characterize facial features more effectively and obtain discriminative feature vectors for face recognition.

#### Acknowledgements

The research work is supported by Science and Engineering Research Board (SERB), Department of Science and Technology (DST), Government of India for the research grant. The sanctioned project title is "Design and development of an Automatic Kinship Verification system for Indian faces with possible integration of AADHAR Database." with reference no. ECR/2016/001659.

## Conflict of interest

variants of SIFT such as volume-SIFT (VSIFT), partial-descriptor-SIFT (PDSIFT),

Face recognition using LPQ feature representation is inspired by [18, 19] which used LPQ as blur invariant descriptor. Damane et al. [32] presented face recognition using LPQ under varying conditions of light, blur, and illumination. Experiments are performed on extended YALE-B, CMU-PIE, and CAS-PEAL-R1 face databases and results showed that LPQ has more robustness to light and illumination variation. Chan et al. [33] presented multiscale LPQ for face recognition and evaluated results on FERET and BANCA face databases. Multiscale LPQ is obtained by applying varying filter size and combining LPQ images, which are then projected into LDA space. Best results of 99.2% for FB, 92% for DP1 and 88% for DP2 are achieved

Face recognition using LBP feature representation is one of the most researched area [34–38]. Again, Tan et al. [24] evaluated face recognition under varying lighting condition using LTP feature representation on Extended Yale-B, and CMU PIE face databases. They showed that LTP is more discriminant and less sensitive to noise in uniform regions and improved results in case of flat images. Wolf et al. [26] proposed TPLBP and FPLBP features for face recognition. Accuracy results are validated on two well-known databases, labeled faces in the wild (LFW) and multi PIE. They showed that combining several descriptors from the same LBP boosts family recognition rate. This paper claimed that best accuracy of 80.75% for TPLBP and 75.57% for FPLBP are obtained with the combination of ITML with MultiOSS ID and pose variation. Ahmed et al. [25] proposed CLBP features for facial expression recognition. It is an extension of LBP features. Results are verified in Cohn-Kanade (CK) facial expression database. CLBP features are classified with the help of SVM classifier. They showed that classification rate can be effected by adjusting the number of regions into which expression images are partitioned. For this, they considered three cases by dividing images into 3 3, 5 5, and 7 6 patches. Best

accuracy result for CLBP is 94.4% in case of image with 5 5 patch size.

This chapter presents well-known and some recently explored spatial feature representations for face recognition. These feature representations are scale, translation and rotation invariants for 2-D face images. This chapter covers HOG, SIFT and LBP feature representations and complete processing flow to generate feature vectors using these representations for face recognition. SIFT and HOG based on computing image gradients and local extrema are commonly used feature representations for face recognition. LBP performs texture based analysis to represent local facial appearance and an encoded facial image. Other relevant spatial domain representations, such as, LPQ and variants of LBP are explained and analyzed for face recognition. LPQ possesses blur invariant property and provides improved results for blurred facial image. Different variants of LBP, such as, LTP, CLBP, TPLBP and FPLBP are more robust to noise and lighting conditions. These representations characterize facial features more effectively and obtain discriminative

The research work is supported by Science and Engineering Research Board (SERB), Department of Science and Technology (DST), Government of India for

learning SIFT at specific locations to improve verification accuracy.

Visual Object Tracking with Deep Neural Networks

on FERET probe sets.

13. Conclusion

feature vectors for face recognition.

Acknowledgements

132

The authors have no conflict of interest.

## Author details

Toshanlal Meenpal\*, Aarti Goyal and Moumita Mukherjee Electronics and Telecommunication Department, National Institute of Technology, Raipur, Chhattisgarh, India

\*Address all correspondence to: tmeenpal.etc@nitrr.ac.in

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## References

[1] Viola P, Jones MJ. Robust real-time face detection. International Journal of Computer Vision. 2004;57(2):137-154

[2] Ren S, He K, Girshick R, Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems. 2015. pp. 91-99

[3] King DE. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research. 2009;10(Jul): 1755-1758

[4] Dharavath K, Talukdar FA, Laskar RH. Improving face recognition rate with image preprocessing. Indian Journal of Science and Technology. 2014;7(8):1170-1175

[5] Gross R, Brajovic V. An image preprocessing algorithm for illumination invariant face recognition. In: International Conference on Audioand Video-Based Biometric Person Authentication. Berlin, Heidelberg: Springer; 2003. pp. 10-18

[6] Chandrashekar G, Sahin F. A survey on feature selection methods. Computers and Electrical Engineering. 2014;40(1, 1):16-28

[7] Available from: http://www.cl. cam.ac.uk/research/dtg/attarchive/ facedatabase.html

[8] Gao W, Cao B, Shan S, Chen X, Zhou D, Zhang X, et al. The CAS-PEAL largescale Chinese face database and baseline evaluations. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans. 2008;38(1): 149-161

[9] Sim T, Baker S, Bsat M. The CMU pose, illumination, and expression (PIE) database. In: Proceedings of Fifth IEEE International Conference on Automatic

Face Gesture Recognition. IEEE; 2002. pp. 53-58

Pattern Recognition Workshop (CVPRW'06). IEEE; 2006. pp. 35-35

[18] Ojansivu V, Heikkilä J. Blur insensitive texture classification using

International Conference on Image and Signal Processing. Berlin, Heidelberg:

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

> [26] Wolf L, Hassner T, Taigman Y. Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Guo Z, Zhang L, Zhang D. A completed modeling of local binary

[28] Déniz O, Bueno G, Salido J, De la Torre F. Face recognition using histograms of oriented gradients. Pattern Recognition Letters. 2011;

[29] Albiol A, Monzo D, Martin A, Sastre J, Albiol A. Face recognition using HOG–EBGM. Pattern Recognition Letters. 2008;29(10):1537-1543

[30] Križaj J, Štruc V, Pavešić N. Adaptation of SIFT features for robust face recognition. In: International Conference Image Analysis and Recognition. Berlin, Heidelberg: Springer; 2010. pp. 394-404

[31] Sadeghipour E, Sahragard N. Face recognition based on improved SIFT algorithm. International Journal of Advanced Computer Science and Applications. 2016;7(1):547-551

[32] Damane Local Phase-Context for Face Recognition under Varying

2014;39:12-19

Conditions. Procedia Computer Science.

[33] Chan CH, Kittler J, Poh N, Ahonen T, Pietikäinen M. (Multiscale) local phase quantisation histogram discriminant analysis with score normalisation for robust face recognition. In: 2009 IEEE 12th

International Conference on Computer Vision Workshops, ICCV Workshops.

IEEE; 2009. pp. 633-640

pattern operator for texture classification. IEEE Transactions on Image Processing. 2010;19(6):

2011;33(10):1978-1990

1657-1663

32(12):1598-1603

[19] Rahtu E, Heikkilä J, Ojansivu V, Ahonen T. Local phase quantization for blur-insensitive image analysis. Image and Vision Computing. 2012;30(8):

[20] Ahonen T, Rahtu E, Ojansivu V, Heikkila J. Recognition of blurred faces using local phase quantization. In: 2008 19th International Conference on Pattern Recognition. IEEE; 2008.

[21] Nguyen HT. Contributions to facial feature extraction for face recognition [Doctoral dissertation]. Université de

[22] Ojala T, Pietikäinen M, Harwood D.

[23] Ahonen T, Hadid A, Pietikainen M. Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006;1(12):2037-2041

[24] Tan X, Triggs W. Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Transactions on Image Processing.

[25] Ahmed F, Hossain E, Bari AS, Hossen MS. Compound local binary pattern (clbp) for rotation invariant texture classification. International Journal of Computers and Applications.

2010;19(6):1635-1650

2011;33(6):5-10

135

A comparative study of texture measures with classification based on featured distributions. Pattern Recognition. 1996;29(1, 1):51-59

local phase quantization. In:

Springer; 2008. pp. 236-243

501-512

pp. 1-4

Grenoble, Sep 2014

[10] Phillips PJ, Moon H, Rauss P, Rizvi SA. The FERET evaluation methodology for face-recognition algorithms. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE; 1997. pp. 137-143

[11] Hwang BW, Roh MC, Lee SW. Performance evaluation of face recognition algorithms on Asian face database. In: Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings. IEEE; 2004. pp. 278-283

[12] Available from: http://vision.ucsd. edu/datasets/yale\_face\_dataset\_original/ yalefaces.zip

[13] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: International Conference on computer vision & Pattern Recognition (CVPR'05). Vol. 1. IEEE Computer Society; 2005. pp. 886-893

[14] Shu C, Ding X, Fang C. Histogram of the oriented gradient for face recognition. Tsinghua Science and Technology. 2011;16(2):216-224

[15] Dadi HS, Pillutla GK. Improved face recognition rate using HOG features and SVM classifier. IOSR Journal of Electronics and Communication Engineering. 2016;11(04):34-44

[16] Lowe DG. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision. 2004;60(2):91-110

[17] Bicego M, Lagorio A, Grosso E, Tistarelli M. On the use of SIFT features for face authentication. In: 2006 Conference on Computer Vision and

Spatial Domain Representation for Face Recognition DOI: http://dx.doi.org/10.5772/intechopen.85382

Pattern Recognition Workshop (CVPRW'06). IEEE; 2006. pp. 35-35

References

pp. 91-99

1755-1758

2014;7(8):1170-1175

Springer; 2003. pp. 10-18

2014;40(1, 1):16-28

facedatabase.html

149-161

134

[6] Chandrashekar G, Sahin F. A survey on feature selection methods. Computers and Electrical Engineering.

[7] Available from: http://www.cl. cam.ac.uk/research/dtg/attarchive/

[8] Gao W, Cao B, Shan S, Chen X, Zhou D, Zhang X, et al. The CAS-PEAL largescale Chinese face database and baseline evaluations. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans. 2008;38(1):

[9] Sim T, Baker S, Bsat M. The CMU pose, illumination, and expression (PIE) database. In: Proceedings of Fifth IEEE International Conference on Automatic

[1] Viola P, Jones MJ. Robust real-time face detection. International Journal of Computer Vision. 2004;57(2):137-154

Visual Object Tracking with Deep Neural Networks

Face Gesture Recognition. IEEE; 2002.

[10] Phillips PJ, Moon H, Rauss P, Rizvi SA. The FERET evaluation methodology for face-recognition algorithms. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE; 1997.

[11] Hwang BW, Roh MC, Lee SW. Performance evaluation of face recognition algorithms on Asian face database. In: Sixth IEEE International Conference on Automatic Face and

Proceedings. IEEE; 2004. pp. 278-283

[12] Available from: http://vision.ucsd. edu/datasets/yale\_face\_dataset\_original/

[13] Dalal N, Triggs B. Histograms of oriented gradients for human detection.

computer vision & Pattern Recognition (CVPR'05). Vol. 1. IEEE Computer

[14] Shu C, Ding X, Fang C. Histogram of the oriented gradient for face recognition. Tsinghua Science and Technology. 2011;16(2):216-224

[15] Dadi HS, Pillutla GK. Improved face recognition rate using HOG features and

features from scale-invariant keypoints. International Journal of Computer

SVM classifier. IOSR Journal of Electronics and Communication Engineering. 2016;11(04):34-44

[16] Lowe DG. Distinctive image

[17] Bicego M, Lagorio A, Grosso E, Tistarelli M. On the use of SIFT features

for face authentication. In: 2006 Conference on Computer Vision and

Vision. 2004;60(2):91-110

In: International Conference on

Society; 2005. pp. 886-893

Gesture Recognition, 2004.

pp. 53-58

pp. 137-143

yalefaces.zip

[2] Ren S, He K, Girshick R, Sun J. Faster

r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems. 2015.

[3] King DE. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research. 2009;10(Jul):

[4] Dharavath K, Talukdar FA, Laskar RH. Improving face recognition rate with image preprocessing. Indian Journal of Science and Technology.

[5] Gross R, Brajovic V. An image preprocessing algorithm for

illumination invariant face recognition. In: International Conference on Audioand Video-Based Biometric Person Authentication. Berlin, Heidelberg:

[18] Ojansivu V, Heikkilä J. Blur insensitive texture classification using local phase quantization. In: International Conference on Image and Signal Processing. Berlin, Heidelberg: Springer; 2008. pp. 236-243

[19] Rahtu E, Heikkilä J, Ojansivu V, Ahonen T. Local phase quantization for blur-insensitive image analysis. Image and Vision Computing. 2012;30(8): 501-512

[20] Ahonen T, Rahtu E, Ojansivu V, Heikkila J. Recognition of blurred faces using local phase quantization. In: 2008 19th International Conference on Pattern Recognition. IEEE; 2008. pp. 1-4

[21] Nguyen HT. Contributions to facial feature extraction for face recognition [Doctoral dissertation]. Université de Grenoble, Sep 2014

[22] Ojala T, Pietikäinen M, Harwood D. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition. 1996;29(1, 1):51-59

[23] Ahonen T, Hadid A, Pietikainen M. Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006;1(12):2037-2041

[24] Tan X, Triggs W. Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Transactions on Image Processing. 2010;19(6):1635-1650

[25] Ahmed F, Hossain E, Bari AS, Hossen MS. Compound local binary pattern (clbp) for rotation invariant texture classification. International Journal of Computers and Applications. 2011;33(6):5-10

[26] Wolf L, Hassner T, Taigman Y. Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011;33(10):1978-1990

[27] Guo Z, Zhang L, Zhang D. A completed modeling of local binary pattern operator for texture classification. IEEE Transactions on Image Processing. 2010;19(6): 1657-1663

[28] Déniz O, Bueno G, Salido J, De la Torre F. Face recognition using histograms of oriented gradients. Pattern Recognition Letters. 2011; 32(12):1598-1603

[29] Albiol A, Monzo D, Martin A, Sastre J, Albiol A. Face recognition using HOG–EBGM. Pattern Recognition Letters. 2008;29(10):1537-1543

[30] Križaj J, Štruc V, Pavešić N. Adaptation of SIFT features for robust face recognition. In: International Conference Image Analysis and Recognition. Berlin, Heidelberg: Springer; 2010. pp. 394-404

[31] Sadeghipour E, Sahragard N. Face recognition based on improved SIFT algorithm. International Journal of Advanced Computer Science and Applications. 2016;7(1):547-551

[32] Damane Local Phase-Context for Face Recognition under Varying Conditions. Procedia Computer Science. 2014;39:12-19

[33] Chan CH, Kittler J, Poh N, Ahonen T, Pietikäinen M. (Multiscale) local phase quantisation histogram discriminant analysis with score normalisation for robust face recognition. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops. IEEE; 2009. pp. 633-640

[34] Huang D, Shan C, Ardebilian M, Chen L. Facial image analysis based on local binary patterns: A survey. IEEE Transactions on Image Processing. 2011; 41:1-14

[35] Shan C, Gong S, McOwan PW. Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing. 2009;27(6):803-816

[36] Zhang G, Huang X, Li SZ, Wang Y, Wu X. Boosting local binary pattern (LBP)-based face recognition. In: Advances in Biometric Person Authentication. Berlin, Heidelberg: Springer; 2004. pp. 179-186

[37] Dadiz BG, Ruiz CR. Detecting depression in videos using uniformed local binary pattern on facial features. In: Computational Science and Technology. Singapore: Springer; 2019. pp. 413-422

[38] Liu L, Fieguth P, Zhao G, Pietikäinen M, Hu D. Extended local binary patterns for face recognition. Information Sciences. 2016;358:56-72

**137**

**Chapter 7**

Area

**Abstract**

*and Shahrel Azmin Suandi*

to the baseline BGP descriptor.

**1. Introduction**

Extended Binary Gradient

Gradient Pattern for Face

Pattern (eBGP): A Micro- and

Macrostructure-Based Binary

Recognition in Video Surveillance

*Nuzrul Fahmi Nordin, Samsul Setumin, Abduljalil Radman* 

An excellent face recognition for a surveillance camera system requires remarkable and robust face descriptor. Binary gradient pattern (BGP) descriptor is one of the ideal descriptors for facial feature extraction. However, exploiting local features merely from smaller region or microstructure does not capture a complete facial feature. In this paper, an extended binary gradient pattern (eBGP) is proposed to capture both micro- and macrostructure information of a local region to boost up the descriptor performance and discriminative power. Two topologies, the patchbased and circular-based topologies, are incorporated with the eBGP to test its robustness against illumination, image quality, and uncontrolled capture conditions using the SCface database. Experimental results show that the fusion between micro- and macrostructure information significantly boosts up the descriptor performance. It also illustrates that the proposed eBGP descriptor outperforms the conventional BGP on both the patch-based topology and the circular-based topology. Furthermore, a fusion of information from two different image types, orientational image gradient magnitude (OIGM) and grayscale image, attained better performance than using OIGM image only. The overall results indicate that the proposed eBGP descriptor improves the recognition performance with respect

**Keywords:** surveillance system, face recognition, binary gradient pattern (BGP),

Face recognition is one of the biometric verification methods that offers a wide range of applications such as law enforcement, forensics, biometric authentication, surveillance, and health monitoring [1]. Face recognition has also been used

facial feature extraction, patch-based topology, circular-based topology

## **Chapter 7**

[34] Huang D, Shan C, Ardebilian M, Chen L. Facial image analysis based on local binary patterns: A survey. IEEE Transactions on Image Processing. 2011;

Visual Object Tracking with Deep Neural Networks

[35] Shan C, Gong S, McOwan PW. Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing.

[36] Zhang G, Huang X, Li SZ, Wang Y, Wu X. Boosting local binary pattern (LBP)-based face recognition. In: Advances in Biometric Person Authentication. Berlin, Heidelberg: Springer; 2004. pp. 179-186

[37] Dadiz BG, Ruiz CR. Detecting depression in videos using uniformed local binary pattern on facial features. In: Computational Science and

[38] Liu L, Fieguth P, Zhao G, Pietikäinen M, Hu D. Extended local binary patterns for face recognition. Information Sciences. 2016;358:56-72

Technology. Singapore: Springer; 2019.

41:1-14

2009;27(6):803-816

pp. 413-422

136
