**3. Object recognition based on histogram of oriented gradients and multi-layer perceptron**

The Object recognition system based on the histogram of oriented gradients and multi-layer perceptron (ORS HOG-MLP) proposed in this paper presents the following contributions: (1) offers good performance in multiclass applications. (2) Determine the performance of an object recognition system that uses HOG and neural network-based classifier, particularly an MLP. (3) In order to improve the characterization process of the image, a modification that improves the properties representation of the gradient calculation algorithm is proposed.

The ORS HOG-MLP is an algorithm that is geared to automatic object recognition. **Figure 1** shows the elements that are part of this system and the relationship that exists between them.

Initially, a window detector process takes samples over the entire image, each sample has a fixed size and it is called a detection window. Better performance of the algorithm occurs when there is an overlap between detection windows. The image will be sampled several times and on each occasion, the detection window size will be different. This feature is intended to avoid image segmentation and get an algorithm robust to the object's size.

Then the ORS HOG-MLP algorithm is applied to the extracted detection window. First, the gradient of an image at each pixel is computed. Let the discrete version of a detection window be represented as the matrix **<sup>A</sup>** <sup>¼</sup> *aij* � � *wi*�*hi*; *wi* and *hi* are the width and height of the detection window, respectively, and *a* represents the *ij*-th pixel value, 0 <sup>≤</sup>*a*≤2*<sup>n</sup>* � 1, and *<sup>n</sup>* is the number of bits necessary to represent the value of a pixel. Now, the aim of this process is to compute the magnitude and direction of the gradients for each pixel. For this purpose, we use a 1-*D*½ � �1, 0, 1 mask at none scale smoothing ð Þ *σ* ¼ 0 which is applied over all image pixels. The magnitudes and directions obtained by this process are grouped into two matrices (**M** and **Th**); these matrices are defined as

*Multi-Object Recognition Using a Feature Descriptor and Neural Classifier DOI: http://dx.doi.org/10.5772/intechopen.106754*

**Figure 1.** *Block diagram of ORS HOG-MLP.*

$$\begin{aligned} \mathbf{M} &= \begin{bmatrix} m\_{\vec{\imath}\vec{\jmath}} \end{bmatrix}\_{wi \times hi}; m\_{\vec{\imath}\vec{\jmath}} = \sqrt{\left[a\_{i+1,j} - a\_{i-1,j}\right]^2 + \left[a\_{i,j+1} - a\_{i,j-1}\right]^2} \\ \mathbf{Th} &= \begin{bmatrix} th\_{\vec{\imath}\vec{\jmath}} \end{bmatrix}\_{wi \times hi}; th\_{\vec{\imath}\vec{\jmath}} = \tan^{-1} \frac{a\_{i,j+1} - a\_{i,j-1}}{a\_{i+1,j} - a\_{i-1,j}} \end{aligned} \tag{6}$$

**Figure 2** shows the results of gradient computation process. A detection window includes information that does not belong to the interest object (information about other objects and background information), during the gradient computation process, this information corrupts the interest object.

In order to improve the representation of the interest object properties by reducing this information, we propose to apply a threshold operation on the matrices **M** and **Th**,

$$\begin{aligned} m\_{\vec{\imath}} &= \begin{cases} 0, & \text{if } m\_{\vec{\imath}\jmath} < \text{unbrad\\_grad} \\ n\_{\vec{\imath}\jmath}, & \text{otherwise} \end{cases} \\ th\_{\vec{\imath}} &= \begin{cases} 0, & \text{if } m\_{\vec{\imath}\jmath} < \text{unbrad\\_grad} \\ th\_{\vec{\imath}\jmath}, & \text{otherwise} \end{cases} \end{aligned} \tag{7}$$

**Figure 2** also shows the result obtained by applying this process on the matrices. Now, both matrices **M** and **Th** are divided into *cx* � *cy* small-connected regions of

*px* � *py* pixels, called magnitude cells and direction cells, and defined as **cm***lk* ¼

$$\left[c\_{-}m\_{\vec{\eta}}^{lk}\right]\_{px\times p\eta}^{cc\times\eta} \text{ and } \mathbf{cth}\_{lk} = \left[c\_{-}h\_{\vec{\eta}}^{lk}\right]\_{px\times p\eta}^{cc\times\eta}; \mathbf{c}\mathbf{c} = mi/p\mathbf{x} \text{ and } c\mathbf{y} = hi/p\mathbf{y} \text{ (see Figure 3)}.$$
 Then, each pixel calculates a weighted function of its gradient magnitude based on its gradient orientation to contribute to the building of the histogram of the cell to which it belongs. For this purpose, a vector of  $p = 9$  bins uniformly spaced over  $0$ -180° is

defined for the quantification process of gradient orientations: f g *bin*0, *bin*1, … , *bin*<sup>8</sup> ¼ 10° , 30° , … , 170° � �. The distance between bins is denoted by *dbins* <sup>¼</sup> 20° .

Let **MC** ¼ ½ � **Mc***lk* be a matrix of *cx* � *cy* elements, called bins matrix, where the **Mc***lk* <sup>¼</sup> *mclk o* � � element is a *p*-dimensional vector containing the corresponding bins

#### **Figure 2.**

*Results of gradient computation process. (a) Original image. (b) Gradient magnitude. (c) Pixel intensity proportional to the gradient direction (gradient direction). (d) Gradient magnitude improved. (e) Pixel intensity proportional to the gradient direction improved.*

**Figure 3.** *Details of orientation binning and descriptor block phases.*

*mclk o* � � of the histogram of the *lk*-th cell, *<sup>o</sup>* <sup>¼</sup> 0,1, … ,*<sup>p</sup>* � 1 (see **Figure 3**). Furthermore, considering two adjacent bins, *bino* and *bino*þ1, where *bino* <sup>≤</sup>*c*\_*thlk ij* ≤*bino*þ1, the distances between *c*\_*thlk ij* and each of the bins are defined as *do* <sup>¼</sup> *<sup>c</sup>*\_*thlk ij* � *bino* � � and *do*þ<sup>1</sup> <sup>¼</sup> *bino*þ<sup>1</sup> � *<sup>c</sup>*\_*thlk ij* � �.

Thus, the result of this process is defined as

$$\begin{aligned} m c\_o^{lk} &= \left\{ m c\_o^{lk} + \left( 1 - \frac{d\_o}{d\_{bins}} \right) \bullet c\_- m\_{\ddot{\imath}\dot{}}^{lk}, & \text{if } bin\_o &< c\_- t h\_{\ddot{\imath}}^{lk} \\\\ m c\_o^{lk} + c\_- m\_{\ddot{\imath}\dot{}}^{lk}, & \text{if } bin\_o &= c\_- t h\_{\ddot{\imath}}^{lk} \end{aligned} \tag{8}$$
 
$$\begin{aligned} m c\_{o+1}^{lk} &= \begin{cases} m c\_{o+1}^{lk} + \left( 1 - \frac{d\_{o+1}}{d\_{bins}} \right) \bullet c\_- m\_{\ddot{\imath}\dot{}}^{lk}, & \text{if } c\_- t h\_{\ddot{\imath}\dot{}}^{lk} < bin\_{o+1} \\\\ m c\_{o+1}^{lk} + c\_- m\_{\ddot{\imath}\dot{}}^{lk}, & \text{if } bin\_{o+1} &= c\_- t h\_{\ddot{\imath}}^{lk} \end{cases} \end{aligned} \tag{8}$$

**Figure 4** shows the result of applying orientation binning process.

In the next process, sets of 2 � 2 adjacent cells are grouped into spatial regions called descriptor blocks (n\_**Mc** - number of **Mc** per block). In order to get better

*Multi-Object Recognition Using a Feature Descriptor and Neural Classifier DOI: http://dx.doi.org/10.5772/intechopen.106754*

performance, the blocks are formed using an overlap of one cell on both *x*-axis and *y*-axis (see **Figures 3** and **4e**). The result of this process is the matrix **BL** ¼ ½ � **bl** , composed of *bx* � *by* descriptor blocks; where *bx* ¼ 1,2, … ,*cx* � 1, *by* ¼ 1,2, … ,*cy* � 1 and a **bl** block is defined as

$$\mathbf{b}\mathbf{l}\_{bx,by} = \left\{ \mathbf{Mc}\_{bx,by}, \mathbf{Mc}\_{bx,by+1}, \mathbf{Mc}\_{bx+1,by}, \mathbf{Mc}\_{bx+1,by+1} \right\} \tag{9}$$

Then, each descriptor block must be normalized. For this purpose, we decided to use the *L*2 � *Hys* block normalization scheme, which first applied the *L*2 � *norm* (scheme)

$$\begin{split} \mathbf{b} \mathbf{b}\_{\mathbf{k}\mathbf{x},\mathbf{b}\mathbf{y}} &= \frac{\mathbf{b} \mathbf{l}\_{\mathbf{k}\mathbf{x},\mathbf{b}\mathbf{y}}}{\sqrt{\left| \left| \mathbf{b} \mathbf{l}\_{\mathbf{k}\mathbf{x},\mathbf{b}\mathbf{y}} \right| \right|^{2} + \epsilon^{2}}} \\ &= \frac{\mathbf{b} \mathbf{l}\_{\mathbf{k}\mathbf{x},\mathbf{b}\mathbf{y}}}{\sqrt{\sum\_{o} \left| m c\_{o}^{\mathbf{k}\mathbf{x},\mathbf{b}\mathbf{y}} \right|^{2} + \sum\_{o} \left| m c\_{o}^{\mathbf{k}\mathbf{x},\mathbf{b}\mathbf{y} + 1} \right|^{2} + \sum\_{o} \left| m c\_{o}^{\mathbf{k}\mathbf{x} + 1,\mathbf{b}\mathbf{y}} \right|^{2} + \sum\_{o} \left| m c\_{o}^{\mathbf{k}\mathbf{x} + 1,\mathbf{b}\mathbf{y} + 1} \right|^{2} + \epsilon^{2}} \end{split} \tag{10}$$

Thus, each *mc:* (bin) of the *blbx*,*by* block is limited to a maximum value of 0.2: *mc* ¼ 0*:*2, if j *mc*>0*:*2, and then it is renormalized again with *L*2 � *norm:*

Eventually, the final object descriptor, **Od**, is a *r*-dimensional vector of all components (bins) of the normalized cell responses from all of the blocks in the detection window

$$\begin{split} \mathbf{OA} &= [ad]\_r = \{\mathbf{b}\mathbf{l}\_1, \mathbf{b}\mathbf{l}\_2, \dots, \mathbf{b}\mathbf{l}\_{kx,\mathfrak{p}}\} \\ &= \{\{\mathbf{M}c\_{1,1}, \mathbf{M}c\_{2,2}, \mathbf{M}c\_{2,1}, \mathbf{M}c\_{2,2}\}, \{\mathbf{M}c\_{1,2}, \mathbf{M}c\_{1,3}, \mathbf{M}c\_{2,2}, \mathbf{M}c\_{2,3}\}, \dots \\ &= \{\{\mathbf{M}c\_{k,\mathfrak{p}}, \mathbf{M}c\_{k,\mathfrak{p}+1}, \mathbf{M}c\_{kx+1,\mathfrak{p}}, \mathbf{M}c\_{kx+1,\mathfrak{p}+1}\}\} \\ &= \{\{\{mc\_0^{11}, \dots, \dots, mc\_8^{11}\}, \{mc\_0^{12}, \dots, mc\_8^{12}\}, \{mc\_0^{21}, \dots, mc\_8^{21}\}, \{mc\_0^{21}, \dots, mc\_8^{21}\}, \{mc\_0^{22}, \dots, mc\_8^{22}\}\}, \\ &\{\{mc\_0^{12}, \dots, mc\_8^{12}\}, \{mc\_0^{13}, \dots, mc\_8^{13}\}, \{mc\_0^{m2}, \dots, mc\_8^{m2}\}, \{mc\_0^{m3}, \dots, mc\_8^{m3}\}\}, \dots, \\ &\{\{mc\_0^{kx,1}, \dots, mc\_8^{kx+1}\}, \{\{mc\_0^{kx,1}, \dots, mc\_8^{kx+1}\}, \{mc\_0^{kx+1,1}, \dots, mc\_8^{kx+1}\}\}, \end{split} \tag{11}$$

where *<sup>r</sup>* <sup>¼</sup> *bx* � *by* � n\_**Mc** � *<sup>p</sup>* and *od*<sup>0</sup> <sup>¼</sup> *mc*<sup>11</sup> <sup>0</sup> ,*od*<sup>1</sup> <sup>¼</sup> *mc*<sup>11</sup> <sup>1</sup> , … ,*od*<sup>8</sup> <sup>¼</sup> *mc*<sup>11</sup> <sup>8</sup> ,*od*<sup>9</sup> ¼ *mc*<sup>12</sup> <sup>0</sup> , … ,*od*<sup>17</sup> <sup>¼</sup> *mc*<sup>12</sup> <sup>8</sup> , … ,*odr*�<sup>9</sup> ¼ *mc bx*þ1,*by*þ1 <sup>0</sup> ,*odr*�<sup>1</sup> ¼ *mc bx*þ1,*by*þ1 8

**Figure 4.**

*Result of orientation binning process. (a) Original image. (b) Gradient magnitude. (c) Gradient direction. (d) Histograms of the cells. (e) Descriptor blocks phase.*

**Figure 5** shows the sequence of processes that generate the HOG descriptor of the object.

Now, the MLP belonging to ORS HOG-MLP must be adapted to be able to recognize the interest objects. For this purpose, the processes described above are applied to *q* interest objects and each descriptor obtained is associated with its corresponding class, *c* ¼ ½ �*c <sup>k</sup>*; thus, the training set of MLP is defined as

$$\left\{ \left( \mathbf{Od}^1, \mathbf{c}^1 \right), \left( \mathbf{Od}^2, \mathbf{c}^2 \right), \dots, \left( \mathbf{Od}^q, \mathbf{c}^q \right) \right\} = \left\{ \left( \mathbf{Od}^\mu, \mathbf{c}^\mu \right) \middle| \mu = 1, 2, \dots, q \right\} \tag{12}$$

The MPL structure is established as follows: It has one hidden layer of *h* units, the input layer has *r* units and the output layer has *v* units. Considering the training set, the backpropagation learning algorithm is used to generate the bank of models, which includes the vital information of the objects that integrate the training set. Essentially, the bank of models is present in the synaptic weights, **<sup>W</sup>**<sup>1</sup> <sup>¼</sup> *<sup>w</sup>*<sup>1</sup> *jo* h i *h*�*r* and **<sup>W</sup>**<sup>2</sup> <sup>¼</sup> *<sup>w</sup>*<sup>2</sup> *kj* h i *v*�*h* , that define the connections between neurons that integrate the MLP.

Finally, the operation process of ORS HOG-MLP, when the **Od***<sup>μ</sup>* object is presented, is defined as

$$\boldsymbol{c}\_{k}^{\mu} = \mathbf{g} \left( \sum\_{j=0}^{h-1} \boldsymbol{w}\_{kj}^{2} \bullet \boldsymbol{f} \left( \sum\_{o=0}^{r-1} \boldsymbol{w}\_{jo}^{1} \bullet \boldsymbol{o} \boldsymbol{d}\_{o}^{\mu} - \boldsymbol{\theta}\_{j}^{1} \right) - \boldsymbol{\theta}\_{k}^{2} \right) \tag{13}$$

where *k* ¼ 0,2, … ,*v*.

**Figure 5.** *Sequence of processes for generating the HOG descriptor of the object.*
