**Vectorial Signatures for Pattern Recognition**

Jesús Ramón Lerma-Aragón1 and Josué Álvarez-Borrego2

*1Facultad de Ciencias, Universidad Autónoma de Baja California, 2CICESE, División de Física Aplicada, Departamento de Óptica México* 

### **1. Introduction**

18 Will-be-set-by-IN-TECH

316 Fourier Transform – Signal Processing

Lerma-Aragón, J. R. & Álvarez-Borrego, J. (2009a). Vectorial signatures for invariant

Lerma-Aragón, J.R. & Álvarez-Borrego, J. (2009b). Character recognition basen on vectorial

Moses, K.R., Higgings, P., McCabe, M., Probhakar, S. & Swann, S. (2009). Automatic

Ponce, J., Berg, T.L., Everingham, M., Forsyth, D.A., Hebert, M., Lazebnik, S., Marszalek, M.,

Solorza, S. & Álvarez-Borrego, J. (2010). Digital system of invariant correlation to position and rotation. *Optics Communications*, Vol. 283, No. 19, 3613-3630, ISSN 0030-4018. Zavala-Hamz, V. & Álvarez-Borrego, J. (1997). Circular harmonic filters for the recognition of marine microorganisms. *Applied Optics*, Vol. 36, No. 2, 484-489, ISSN 1559-128X.

*Optics*, Vol. 56, No. 14, 1598 -1606, ISSN 0950-0340.

ISSN 0025-3162.

ISBN 3-540-68794-7, Berlin.

signatures. *e-Gnosis*, Vol. Concibe, No. 9, 1-7, ISSN 1665-5745.

recognition of position, rotation and scale pattern recognition. *Journal of Modern*

fingerprint identification systems (AFIS), In: *The fingerprint sourcebook*, International Association for Identification. National Institute of Justice, Washington, DC, USA. Pech-Pacheco, J.L. & Álvarez-Borrego, J. (1998). Optical-digital system applied to the

identification of five phytoplankton species. *Marine Biology*, Vol. 132, No. 3, 357-366,

Schmid, C., Russell, B.C., Torralba, A., Williams, C.K.I., Zhang, J. & Zisserman, A. (2006). Dataset issues in object recognition, In: *Toward category-level object recognition (lectures notes in computer science/image processin, computer vision, pattern recognition, and graphics)*, Ponce, J., Hebert, M., Schmid, C. & Zisserman, A., 29-48, Springer,

Due to the variety of shapes and sizes that present both living organisms and static objects, the necessity to look for automated systems of identification, both in industry and scientific research has arisen. During the 1960s, the scientific community in the field of optics has used the Fourier transform and other types of mathematical transformations for pattern recognition, taking advantage of their different properties; for example, invariance to position, rotation and scale.

Many invariant descriptors use the Fourier transform to extract invariant features. The Fourier transform has been a powerful tool for pattern recognition. One important property of the Fourier transform is that a shift in the time domain causes no change in the magnitude spectrum. This can be used to extract invariant features in pattern recognition.

In the last few years, they have been used as a tool in the digital processing of pattern recognition. Recently several books have been published (Pratt, 2007, Gonzalez et al., 2008 and 2009, Cheriet et al*.,* 2007, and Obinata, 2007), that show us a general view, the progress and development of pattern recognition, as well as different tools for image pre-processing, extraction, selection and creation of features, classification methods, using different types of transforms, etc. However, some studies have focused on solving the problem of invariance to rotation and scale (Cohen, 1993, Casasent, 1976a, Pech et al, 2001 and Pech et al, 2003, Solorza and Álvarez-Borrego, 2010, Solorza and Álvarez-Borrego, 2011).

For practical implementation of the theory it should be considered, according to Casasent and Psaltis (1976a, 1976b, 1976c), a system that is invariant to scale through the manipulation of the Fourier transform directly changing the input function. We must also consider the practical realization of the Mellin transformation given by a logarithmic mapping of the input stage followed by a Fourier transform.

Schwartz (1994) found that there is strong evidence in many physiological and psychophysical visual systems (including human) use of such logarithmic mapping between the retina and visual cortex. The motion of changing a scaling to a shift by a logarithmic transformation of the coordinates occurs in many areas. In fact, the log-polar coordinate system described above seems to have a biological analogue. Since biological systems can shift objects to the centre of the field of view by movement of the eyes, it seems logical that a mapping that facilitates scale and rotation invariant recognition would be most useful. This is not to say that biological systems also compute Fourier transforms of these representations.

Vectorial Signatures for Pattern Recognition 319

Cohen (1993) introduced the "scale transform." This transform is said to be scale invariant, thus meaning that the signals differing just by a scale transformation (compression or expansion with energy preservation) have the same transform magnitude distribution. Cohen showed that the scale transform is a restriction of the Mellin transform on the vertical

We choose the scale transform, which is a restriction of the Fourier-Mellin transform (Casasent, 1976c). The main property of the scale transform is the scale invariant property. If we call c the scale variable, then in two dimensions the scale transform and its inverse is

<sup>1</sup> , ( , ) , <sup>2</sup>

<sup>1</sup> ( ) ,, . <sup>2</sup>

*xy*

*x y*

In this section we will use two-dimensional scale transformations now written in polar coordinates *D*(cr,c*θ*), because this transform is invariant to changes in size and rotation

> <sup>1</sup> , ( , ) , <sup>2</sup> *jc <sup>r</sup> jc D(c c ) f r r e drd <sup>r</sup>*

θ

<sup>1</sup> , ( , ) . <sup>2</sup> *c c <sup>r</sup> jc D(c c ) f r e e d d*

λ θ

Applying the transformation described above, we built a new algorithm. Fig. 1 shows the

function *f(x,y)* (step 1) and the modulus of its Fourier transform *(F)* is obtained (step 2). A parabolic filter to the modulus of *F* is applied (step 3). In this way, low frequencies are attenuated and high frequencies are enhanced in proportion to (*wx*)2, (*wy*)2 (Pech-Pacheco et al., 2003), where *wx* and *wy* are the angular frequencies. The scale factor *r*, (step 4) is applied to the results of step 3 (*r* is the radial spatial frequency). This process is what differentiates the scale transform from the Mellin transform. After these steps, we mapped the Cartesian coordinates to polar-logarithmic coordinates (step 5), to obtain invariance to rotation. In this step we introduced a bilinear interpolation of the first data of coordinate conversion. This is done to avoid the aliasing due to the log-polar sampling. A subimage denoted by M(λ,θ) (Lerma-Aragón & Álvarez-Borrego 2008) is chosen from step 5, which contains more information of the target and a bilinear interpolation is done again in order to resize the subimage data to its original size (step 6). From the subimage, we obtain two 1-D

2

π

0 0

2

π

0 0

π

π

and taking the logarithm of the radial coordinate, that is *λ= Inr*, we get

∞ ∞ <sup>+</sup>

*xy*

*dxdy D(c c ) f x y e*

0 0

π

*f(x y) D(c c )e* π

−∞ −∞

*x y*

θ

λ θ

block diagram of the methodology used. The target (*Ii*

( )

*x y jc Inx jc Iny*

( ) <sup>1</sup> ( ) <sup>2</sup>

( ) <sup>1</sup> ( ) <sup>2</sup> <sup>2</sup>

λ λ θ θ θ

θθ

<sup>−</sup> <sup>∞</sup> − − <sup>=</sup> ∫ ∫ (6)

θ

 λ θ

) to be recognized is denoted by the

<sup>+</sup> <sup>∞</sup> − − <sup>=</sup> ∫ ∫ (7)

*x y jc Inx jc Iny x y*

<sup>=</sup> ∫ ∫ (5)

∞ ∞ − − <sup>=</sup> ∫ ∫ (4)

*dc dc*

**2.1.2 The scale transform** 

line *p = −jc + 1/2*, with *c* Є R.

**3. Implementation** 

given by Cristobal & Cohen (1996).

Other methods can take advantage of representation in which transformations are reduced to shifts.

The different techniques used today have difficulties, the main one being that the calculations involve a high computational complexity. Moreover, in this case, the complexity of the operation performed is directly proportional to the resolution of the images used. The computational cost associated inevitably increase the time required for comparison. Emerging applications of computer vision and pattern recognition require the development of new algorithms. The vectorial signatures techniques have an important role to play in this context, given their simplicity and low computational requirements.

In this chapter we will focus on examining the capabilities of vectorial signatures, as a method of recognizing objects, and we will consider inside this algorithm invariance to position, rotation and scale. It will be applied here to the recognition of alphabetical letters in Arial typeface, copepod species and some butterflies species (color images). However, this algorithm can be applied to any kind of object. The system was developed by taking advantage of the properties of the Fourier, Mellin and Scale transforms.

### **2. Mathematical description**

In this section we present some basic concepts, such as the definition of Fourier, Mellin and Scale transforms, tools that are of great usefulness in audio and image processing, noise reduction signals (such as white noise) frequency analysis of any discrete signal, materials analysis and statistical synthesis by inverse Fourier transform, and so on.

#### **2.1 Fourier transform**

The Fourier transform *F(u*) of a continuous function *f(x)* of a single variable is defined

$$F(\mu) = \int\_{-\infty}^{\infty} f(\mathbf{x}) e^{-j2\pi\mu\mathbf{x}} d\mathbf{x} \tag{1}$$

Where *j* = −1. Consequently, the inverse Fourier transform (IFT) of *F*(*u*) results in the original function, described by

$$f(\mathbf{x}) = \bigcap\_{-\alpha}^{\infty} F(\mu)e^{j2\pi\mu x}d\mu\tag{2}$$

#### **2.1.1 The Fourier-Mellin transform**

Mellin transform is especially useful for the scale invariant (Bracewell, 1978). This has been applied to the spatially varying image restoration and in the analysis of networks that vary with time, among others.

The two dimensional Mellin transform *M(ju,jv)* of a function *f(x,y)* along the imaginary axis is defined by Casasent and Psaltis (1976a)

$$M(j\mu, j\upsilon) = \bigcap\_{0 \le 0}^{\upsilon \wedge \upsilon} f(\mathbf{x}, y) \mathbf{x}^{-j\mu - 1} y^{-j\upsilon - 1} d\mathbf{x} dy \tag{3}$$

#### **2.1.2 The scale transform**

318 Fourier Transform – Signal Processing

Other methods can take advantage of representation in which transformations are reduced

The different techniques used today have difficulties, the main one being that the calculations involve a high computational complexity. Moreover, in this case, the complexity of the operation performed is directly proportional to the resolution of the images used. The computational cost associated inevitably increase the time required for comparison. Emerging applications of computer vision and pattern recognition require the development of new algorithms. The vectorial signatures techniques have an important role

In this chapter we will focus on examining the capabilities of vectorial signatures, as a method of recognizing objects, and we will consider inside this algorithm invariance to position, rotation and scale. It will be applied here to the recognition of alphabetical letters in Arial typeface, copepod species and some butterflies species (color images). However, this algorithm can be applied to any kind of object. The system was developed by taking

In this section we present some basic concepts, such as the definition of Fourier, Mellin and Scale transforms, tools that are of great usefulness in audio and image processing, noise reduction signals (such as white noise) frequency analysis of any discrete signal, materials

The Fourier transform *F(u*) of a continuous function *f(x)* of a single variable is defined

∞

−∞

∞

−∞

<sup>2</sup> ( ) *j ux F(u) f x e dx*

Where *j* = −1. Consequently, the inverse Fourier transform (IFT) of *F*(*u*) results in the

<sup>2</sup> ( ) *j ux f(x) F u e du*

Mellin transform is especially useful for the scale invariant (Bracewell, 1978). This has been applied to the spatially varying image restoration and in the analysis of networks that vary

The two dimensional Mellin transform *M(ju,jv)* of a function *f(x,y)* along the imaginary axis

, (,) *ju jv M(ju jv) f x y x y dxdy*

0 0

∞ ∞

π

π

1 1

<sup>=</sup> ∫ (1)

<sup>=</sup> ∫ (2)

−− −− <sup>=</sup> ∫ ∫ (3)

−

to play in this context, given their simplicity and low computational requirements.

advantage of the properties of the Fourier, Mellin and Scale transforms.

analysis and statistical synthesis by inverse Fourier transform, and so on.

**2. Mathematical description** 

**2.1 Fourier transform** 

original function, described by

**2.1.1 The Fourier-Mellin transform** 

is defined by Casasent and Psaltis (1976a)

with time, among others.

to shifts.

Cohen (1993) introduced the "scale transform." This transform is said to be scale invariant, thus meaning that the signals differing just by a scale transformation (compression or expansion with energy preservation) have the same transform magnitude distribution. Cohen showed that the scale transform is a restriction of the Mellin transform on the vertical line *p = −jc + 1/2*, with *c* Є R.

We choose the scale transform, which is a restriction of the Fourier-Mellin transform (Casasent, 1976c). The main property of the scale transform is the scale invariant property.

If we call c the scale variable, then in two dimensions the scale transform and its inverse is given by Cristobal & Cohen (1996).

$$D(c\_x, c\_y) = \frac{1}{\sqrt{2\pi}} \prod\_{0 \neq 0}^{\phi \neq 0} f(\mathbf{x}, y) e^{\left(-jc\_x \ln \mathbf{x} - jc\_y \ln y\right)} \frac{d\mathbf{x} dy}{\sqrt{\mathbf{x} y}} \tag{4}$$

$$f(\mathbf{x}, y) = \frac{1}{\sqrt{2\pi}} \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} D(\mathbf{c}\_x, \mathbf{c}\_y) e^{\left(j c\_x \ln x + j c\_y \ln y\right)} \frac{d \mathbf{c}\_x d \mathbf{c}\_y}{\sqrt{\mathbf{x} y}}.\tag{5}$$

### **3. Implementation**

In this section we will use two-dimensional scale transformations now written in polar coordinates *D*(cr,c*θ*), because this transform is invariant to changes in size and rotation

$$D(\mathbf{c}\_{r'}, \mathbf{c}\_{\theta}) = \frac{1}{\sqrt{2\pi}} \prod\_{0=0}^{\approx 2\pi} f(r, \theta) r^{\left(-jc\_r - \frac{1}{2}\right)} e^{\left(-js\theta\right)} dr d\theta,\tag{6}$$

and taking the logarithm of the radial coordinate, that is *λ= Inr*, we get

$$D(\mathbf{c}\_{\lambda}, \mathbf{c}\_{\theta}) = \frac{1}{\sqrt{2\pi}} \prod\_{0}^{\approx 2\pi} f(\lambda, \theta) r^{\left(-j\mathbf{c}\_{r} - \frac{1}{2}\right)} e^{\frac{\lambda}{2}} e^{\frac{(\lambda \cdot \rho + \theta \cdot \theta)}{\theta}} d\lambda d\theta. \tag{7}$$

Applying the transformation described above, we built a new algorithm. Fig. 1 shows the block diagram of the methodology used. The target (*Ii* ) to be recognized is denoted by the function *f(x,y)* (step 1) and the modulus of its Fourier transform *(F)* is obtained (step 2). A parabolic filter to the modulus of *F* is applied (step 3). In this way, low frequencies are attenuated and high frequencies are enhanced in proportion to (*wx*)2, (*wy*)2 (Pech-Pacheco et al., 2003), where *wx* and *wy* are the angular frequencies. The scale factor *r*, (step 4) is applied to the results of step 3 (*r* is the radial spatial frequency). This process is what differentiates the scale transform from the Mellin transform. After these steps, we mapped the Cartesian coordinates to polar-logarithmic coordinates (step 5), to obtain invariance to rotation. In this step we introduced a bilinear interpolation of the first data of coordinate conversion. This is done to avoid the aliasing due to the log-polar sampling. A subimage denoted by M(λ,θ) (Lerma-Aragón & Álvarez-Borrego 2008) is chosen from step 5, which contains more information of the target and a bilinear interpolation is done again in order to resize the subimage data to its original size (step 6). From the subimage, we obtain two 1-D

Vectorial Signatures for Pattern Recognition 321

Thus, the signatures of all the *j* images of the database can be compared with any image *i* to

To apply the algorithm based on the theory described above, is considered primarily a target image, in this particular case, we consider a simple image (rectangle) of size 256 X 256 pixels and black background, as shown in Fig. 2(a), which is calculated the magnitude of the Fourier transform, which can be seen in Fig. 2(b). We used Matlab colormap command to change to a colour figure and of this way clearly visualize each step of the procedures. When these values are scaled linearly for display, the brightest pixels will dominate the display, at the expense of lower (and just as important) values of the spectrum. If, instead of displaying the values in this manner, we apply logarithmic transformations (Gonzalez, 2009) to the spectrum values, then the range of values of the result is a more manageable number. Fig. 3 shows the result of scaling this new range linearly and displaying the spectrum. The wealth of detail visible in this image compared to a straight display of the

Fig. 2. (a) Simple image of a rectangle. (b) The corresponding centered spectrum

(a) (b)

Fig. 3. Result showing increased detail after a log transformation

λλ

be recognized.

**3.1 Application of the algorithm** 

spectrum is evident from these pictures.

{ () () () () } 2 2 11 22 . *E Vw Vw Vw Vw d fg fg*

= − +− <sup>⎡</sup> ⎤ ⎡ <sup>⎤</sup> ∑ <sup>⎣</sup> ⎦ ⎣ <sup>⎦</sup> (12)

θθ

vectors by projecting M(λ,θ) onto the x and y axes (steps 7 and 8). In other words, we compute the marginals, according to the equations

$$M\left(\mathcal{A}\_m\right) = \sum\_{k=1}^n M\left(\mathcal{A}\_m, \mathcal{O}\_k\right),\tag{8}$$

$$M\left(\theta\_n\right) = \sum\_{k=1}^{m} M\left(\mathcal{A}\_{k'}\theta\_n\right). \tag{9}$$

Fig. 1. Block diagram of used procedure

A modulus of the Fourier transform is calculated in order to obtain the rotation and scale signatures (equations 10 and 11), which will be the two unidimensional vectors for the target (steps 9 and 10).

$$V\_1(w\_{\mathcal{A}}) = \left| F \left[ M(\mathcal{A}\_m) \right] \right| \,\tag{10}$$

$$V\_2\left(w\_\theta\right) = \left| F\left[\left.M(\theta\_n)\right\|\right] \right| \tag{11}$$

To evaluate this algorithm, the target image *(Ii )* and an image problem *(Ij )* (*Ij* is denoted by the function *g(x,y)* either rotated or scaled) are selected, to which we apply the procedure described above. To determine the similarity between the images *Ii* and *Ij* in the database, using the Euclidean distance (*Ed*) is calculated among their average firms using the following equation

320 Fourier Transform – Signal Processing

vectors by projecting M(λ,θ) onto the x and y axes (steps 7 and 8). In other words, we

() ( ) 1

() ( ) 1

*n m m k k M M* λ

=

*m n k n k M M* θ

=

A modulus of the Fourier transform is calculated in order to obtain the rotation and scale signatures (equations 10 and 11), which will be the two unidimensional vectors for the target

*V w FM* <sup>1</sup> ( ) () *<sup>m</sup>* ,

<sup>2</sup> ( ) () , *V w FM*

the function *g(x,y)* either rotated or scaled) are selected, to which we apply the procedure

using the Euclidean distance (*Ed*) is calculated among their average firms using the

λ

θ*<sup>n</sup>* = ⎡ ⎤

*)* and an image problem *(Ij*

⎤ ⎣ ⎦ (10)

⎣ ⎦ (11)

*)* (*Ij*

λ= ⎡

θ

described above. To determine the similarity between the images *Ii*

, ,

, .

<sup>=</sup> ∑ (8)

<sup>=</sup> ∑ (9)

is denoted by

and *Ij* in the database,

λ θ

λ θ

compute the marginals, according to the equations

Fig. 1. Block diagram of used procedure

To evaluate this algorithm, the target image *(Ii*

(steps 9 and 10).

following equation

$$E\_d = \sqrt{\sum \left[ V\_{1f} \left( w\_{\lambda} \right) - V\_{1g} \left( w\_{\lambda} \right) \right]^2 + \left[ V\_{2f} \left( w\_{\theta} \right) - V\_{2g} \left( w\_{\theta} \right) \right]^2} \,\tag{12}$$

Thus, the signatures of all the *j* images of the database can be compared with any image *i* to be recognized.

#### **3.1 Application of the algorithm**

To apply the algorithm based on the theory described above, is considered primarily a target image, in this particular case, we consider a simple image (rectangle) of size 256 X 256 pixels and black background, as shown in Fig. 2(a), which is calculated the magnitude of the Fourier transform, which can be seen in Fig. 2(b). We used Matlab colormap command to change to a colour figure and of this way clearly visualize each step of the procedures. When these values are scaled linearly for display, the brightest pixels will dominate the display, at the expense of lower (and just as important) values of the spectrum. If, instead of displaying the values in this manner, we apply logarithmic transformations (Gonzalez, 2009) to the spectrum values, then the range of values of the result is a more manageable number. Fig. 3 shows the result of scaling this new range linearly and displaying the spectrum. The wealth of detail visible in this image compared to a straight display of the spectrum is evident from these pictures.

(a) (b)

Fig. 3. Result showing increased detail after a log transformation

Vectorial Signatures for Pattern Recognition 323

Finally, Fig. 8 shows the modulus of the Fourier transform of the vectors showed in the Fig. 7(b) and 7(c) respectively, which defines the vectorial signatures of the image, and they are

used for comparison with respect to another image.

(a) (b)

Fig. 6. (a) Image 5(b) in polar-logarithmic coordinates. (b) Selected subimage

Fig. 7. (a) Tridimensional image of Fig. 6(b). (b) Vector of the summation on the rotation

To evaluate the performance of the vectorial signatures, images of different types were used

axis. (c) Vector of the summation on the scale axis

as input images.

**4. Computer simulations in grayscale images** 

Fig. 4. Parabolic filter

Fig. 5. (a) Modulus with parabolic-effect. (b) Image with scale factor

A parabolic filter (Fig. 4) to the modulus of F is applied, in this way, low frequencies are attenuated and high frequencies are enhanced as shown in Fig. 5(a). The scale factor *r*, is applied to the previous results (Fig. 5b). A subimage is chosen from Fig. 6(a) corresponding to the zone that contains the major quantity of information. In the Fig. 6(b) results are observed.

From the subimage *M*(*λ*,*θ*) seen in Fig. 6(b), we compute the marginal functions, using the following equations

$$\log\left(\mathbf{x}\right) = \sum\_{\theta} M(\mathcal{A}, \theta)\_{\prime} \tag{13}$$

$$h(\mathbf{x}) = \sum\_{\lambda} M(\lambda, \theta) . \tag{14}$$

In the Fig. 7(a) the subimage (Fig. 6b) appears, in three dimensions, and in 7(b) and 7(c) the marginal functions can be observed, with which scale and rotation vectors are obtained. 322 Fourier Transform – Signal Processing

Fig. 4. Parabolic filter

observed.

following equations

 (a) (b) Fig. 5. (a) Modulus with parabolic-effect. (b) Image with scale factor

A parabolic filter (Fig. 4) to the modulus of F is applied, in this way, low frequencies are attenuated and high frequencies are enhanced as shown in Fig. 5(a). The scale factor *r*, is applied to the previous results (Fig. 5b). A subimage is chosen from Fig. 6(a) corresponding to the zone that contains the major quantity of information. In the Fig. 6(b) results are

From the subimage *M*(*λ*,*θ*) seen in Fig. 6(b), we compute the marginal functions, using the

*gx M* ( ) ( , ,) θ<sup>=</sup> ∑

*hx M* ( ) ( , ., ) λ<sup>=</sup> ∑

In the Fig. 7(a) the subimage (Fig. 6b) appears, in three dimensions, and in 7(b) and 7(c) the marginal functions can be observed, with which scale and rotation vectors are obtained.

λ θ

λ θ (13)

(14)

Finally, Fig. 8 shows the modulus of the Fourier transform of the vectors showed in the Fig. 7(b) and 7(c) respectively, which defines the vectorial signatures of the image, and they are used for comparison with respect to another image.

Fig. 6. (a) Image 5(b) in polar-logarithmic coordinates. (b) Selected subimage

Fig. 7. (a) Tridimensional image of Fig. 6(b). (b) Vector of the summation on the rotation axis. (c) Vector of the summation on the scale axis

### **4. Computer simulations in grayscale images**

To evaluate the performance of the vectorial signatures, images of different types were used as input images.

Vectorial Signatures for Pattern Recognition 325

 E F H L T


Angle of rotation (degrees)

Fig. 10. Euclidean distance of the letter E compared with other letters

0.0 2.0x10<sup>6</sup> 4.0x10<sup>6</sup> 6.0x10<sup>6</sup> 8.0x10<sup>6</sup> 1.0x10<sup>7</sup> 1.2x10<sup>7</sup> 1.4x10<sup>7</sup> 1.6x10<sup>7</sup> 1.8x10<sup>7</sup> 2.0x10<sup>7</sup> 2.2x10<sup>7</sup> 2.4x10<sup>7</sup> 2.6x10<sup>7</sup>

Fig. 11. Statistical distance behavior

Euclidean distance

Fig. 8. (a) Scale signature vector. (b) Rotation signature vector

### **4.1 Simulation using letters**

The first case analyzed corresponds to images of the different letters of the alphabet. Each letter is an image of 256 X 256 pixels of black background with a centered white Arial letter size 72. The methodology described above was applied to these letters. The behavior of the Euclidean distance (*Ed*) for the target E versus itself with respect to rotation from 0 to 359º with variations of one degree was studied. Because of the letter distortions, when is rotated, due to the square pixels, the curve representing the *Ed* is cyclic and with symmetry of 180º (Fig. 9).

Fig. 9. Behavior of the Euclidean distance where the target and the input scene is the letter E

324 Fourier Transform – Signal Processing

The first case analyzed corresponds to images of the different letters of the alphabet. Each letter is an image of 256 X 256 pixels of black background with a centered white Arial letter size 72. The methodology described above was applied to these letters. The behavior of the Euclidean distance (*Ed*) for the target E versus itself with respect to rotation from 0 to 359º with variations of one degree was studied. Because of the letter distortions, when is rotated, due to the square pixels, the curve representing the *Ed* is cyclic and with symmetry of 180º

0 40 80 120 160 200 240 280 320 360

E

Angle of rotation (degrees)

Fig. 9. Behavior of the Euclidean distance where the target and the input scene is the letter E

(a) (b)

Fig. 8. (a) Scale signature vector. (b) Rotation signature vector

**4.1 Simulation using letters** 

0.0

2.0x10<sup>6</sup>

4.0x10<sup>6</sup>

6.0x10<sup>6</sup>

Euclidean distance

8.0x10<sup>6</sup>

1.0x10<sup>7</sup>

(Fig. 9).

Fig. 10. Euclidean distance of the letter E compared with other letters

Fig. 11. Statistical distance behavior

Vectorial Signatures for Pattern Recognition 327

important fish, such as sardine, herring, and pilchard. Copepods are also the main food source for a great variety of invertebrates. Studies on copepod abundance and species composition are particularly relevant, because most larvae of commercial fish feed on copepods. Hence, changes in the abundance of these plankters from year to year, may determine interannual population fluctuations of the commercially exploited fish stocks in a

For this study, adult stages of different copepod species were separated from several plankton samples. The specimens were observed with an optical microscope and their images were digitally captured using a charge coupled device camera (CCD). We used 14 images from seven different species of copepods each with male and female samples (Fig. 13). Since the background noise in all the images is mostly repetitive, the images where cleaned using some methods described en Guerrero & Álvarez-Borrego (2009). Fig. 14 shows the original and its pre-processed image for a female specimen *Rhincalanus nasutus.* 

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

Fig. 13. Species of copepod used in the study: (a) *Calanus pacificus* female; (b) *C. pacificus* male; (c) *Rhincalanus nasutus* female; (d ) *R. nasutus* male; (e) *Centropages furcatus* female; ( f ) *C. furcatus* male; (g) *Pleuromamma gracilis* female; (h) *P. gracilis* male; (i) *Temora discaudata* female; (j) *T. discaudata* male; (k) *Acartia tonsa* female; (l) *A. tonsa* male; (m) *Centropages* 

The image of one of the calanoides *Calanus pacificus* female Fig. 13(a) was used as a target to discriminate between copepod species and sex from the rest of the organisms, the image size

(m) (n)

*hamatus* female; (n) *C. hamatus* male

particular region.

To evaluate the distinguish ability between the 26 letters of the alphabet; each one was rotated 360 degrees, with a variation of one degree. Simulation was performed to determine the difference between them, using the 9360 images (26x360). In Figure 10 shows the behavior of the letters that have greater similarity with the reference (E); the result shows a very clear separation between the values of the Euclidean distance of the letter E compared with the rest of the letters, which allows being able to identify it. When an image is similar to another one, the *Ed* has a minimum value. Statistic was performed and the mean value ± 2SE (two standard error) was calculated. This algorithm has at least a 95.4% level of confidence for this case (Fig. 11 shows the comparison with all the letters).

To analyze the behavior to changes in scale, changes were tested by 90% to 110% with respect to the target image, with increments of 0.5%. Fig. 12. shows the results obtained by reference to the letter E and compared again with all the letters of the alphabet.

Fig. 12. Changes of distance versus scale

#### **4.2 Simulation using copepod species**

Copepods are the dominant group in marine zooplankton; they constitute at least 70% of the planktonic fauna. This group of organisms has a great diversity. There are about 11,500 species of copepods described by Humes (1994) and the number is steadily increasing with the description of new species, such as those from anchialine caves (Fosshagen, 1991), hydrothermal vents (Humes, 1991), and those previously reported under the name of another species (Bradford, 1976; Soh & Suh, 2000).

Copepods are of prime importance in marine ecosystems. The majority of copepods feed on phytoplankton, forming a direct link between primary production and commercially 326 Fourier Transform – Signal Processing

To evaluate the distinguish ability between the 26 letters of the alphabet; each one was rotated 360 degrees, with a variation of one degree. Simulation was performed to determine the difference between them, using the 9360 images (26x360). In Figure 10 shows the behavior of the letters that have greater similarity with the reference (E); the result shows a very clear separation between the values of the Euclidean distance of the letter E compared with the rest of the letters, which allows being able to identify it. When an image is similar to another one, the *Ed* has a minimum value. Statistic was performed and the mean value ± 2SE (two standard error) was calculated. This algorithm has at least a 95.4% level of

To analyze the behavior to changes in scale, changes were tested by 90% to 110% with respect to the target image, with increments of 0.5%. Fig. 12. shows the results obtained by

88 90 92 94 96 98 100 102 104 106 108 110 112

Scale (%)

Copepods are the dominant group in marine zooplankton; they constitute at least 70% of the planktonic fauna. This group of organisms has a great diversity. There are about 11,500 species of copepods described by Humes (1994) and the number is steadily increasing with the description of new species, such as those from anchialine caves (Fosshagen, 1991), hydrothermal vents (Humes, 1991), and those previously reported under the name of

Copepods are of prime importance in marine ecosystems. The majority of copepods feed on phytoplankton, forming a direct link between primary production and commercially

<sup>21</sup> <sup>22</sup> <sup>23</sup> <sup>24</sup> <sup>25</sup> <sup>26</sup> <sup>27</sup> <sup>28</sup>

<sup>U</sup> <sup>V</sup> <sup>W</sup> <sup>X</sup> <sup>Y</sup> <sup>Z</sup> AA AB

<sup>29</sup> <sup>30</sup>

<sup>z</sup> aa ab ac ad ae af

AC

<sup>31</sup> <sup>32</sup> <sup>33</sup> <sup>34</sup> 35 36 <sup>37</sup> <sup>38</sup> <sup>39</sup> <sup>40</sup> <sup>41</sup>

ag ah ai aj ak al am an ao

AD AE AF AG AH AI AJ AK AL AM AN AO

 A B C D E F G H I J K L M N O 1 P A Q a R S T U V W X Y Z

0

Fig. 12. Changes of distance versus scale

**4.2 Simulation using copepod species** 

another species (Bradford, 1976; Soh & Suh, 2000).

1x107

2x107

Euclidean distance

<sup>1</sup> <sup>2</sup> <sup>3</sup> <sup>4</sup> <sup>5</sup> <sup>6</sup> <sup>7</sup> <sup>8</sup> <sup>9</sup>

a b c d <sup>e</sup> <sup>f</sup> <sup>g</sup> <sup>h</sup> <sup>i</sup> j k

<sup>A</sup> <sup>B</sup> <sup>C</sup> <sup>D</sup> <sup>E</sup> <sup>F</sup> <sup>G</sup> <sup>H</sup>

10 11 12

3x107

4x107

5x107

confidence for this case (Fig. 11 shows the comparison with all the letters).

<sup>I</sup> <sup>J</sup> <sup>K</sup> <sup>L</sup> <sup>M</sup> <sup>N</sup> <sup>O</sup> <sup>P</sup> <sup>Q</sup>

reference to the letter E and compared again with all the letters of the alphabet.

R S T

Reference E

<sup>13</sup> <sup>14</sup> <sup>15</sup> 16 17 18 <sup>19</sup> <sup>20</sup>

<sup>l</sup> <sup>m</sup> <sup>n</sup> <sup>o</sup> <sup>p</sup> <sup>q</sup> <sup>r</sup> <sup>s</sup> <sup>t</sup> <sup>u</sup> <sup>v</sup> <sup>w</sup> x y

important fish, such as sardine, herring, and pilchard. Copepods are also the main food source for a great variety of invertebrates. Studies on copepod abundance and species composition are particularly relevant, because most larvae of commercial fish feed on copepods. Hence, changes in the abundance of these plankters from year to year, may determine interannual population fluctuations of the commercially exploited fish stocks in a particular region.

For this study, adult stages of different copepod species were separated from several plankton samples. The specimens were observed with an optical microscope and their images were digitally captured using a charge coupled device camera (CCD). We used 14 images from seven different species of copepods each with male and female samples (Fig. 13). Since the background noise in all the images is mostly repetitive, the images where cleaned using some methods described en Guerrero & Álvarez-Borrego (2009). Fig. 14 shows the original and its pre-processed image for a female specimen *Rhincalanus nasutus.* 

Fig. 13. Species of copepod used in the study: (a) *Calanus pacificus* female; (b) *C. pacificus* male; (c) *Rhincalanus nasutus* female; (d ) *R. nasutus* male; (e) *Centropages furcatus* female; ( f ) *C. furcatus* male; (g) *Pleuromamma gracilis* female; (h) *P. gracilis* male; (i) *Temora discaudata* female; (j) *T. discaudata* male; (k) *Acartia tonsa* female; (l) *A. tonsa* male; (m) *Centropages hamatus* female; (n) *C. hamatus* male

The image of one of the calanoides *Calanus pacificus* female Fig. 13(a) was used as a target to discriminate between copepod species and sex from the rest of the organisms, the image size

Vectorial Signatures for Pattern Recognition 329

Fig. 16. Statistical behavior of the *Ed*, for a *Calanus pacificus* female specimen

its average, which is now the reference vectorial signature (Fig. 17).

of the target with the confidence level was about 95.4%.

the second method is pattern matching using the correlation.

**4.4 Comparison with other algorithms** 

procedure is approximately 20% lower.

Like a second set of test images 30 females and 30 males from each of the 7 different species of copepods were used, thus forming a database of 420 images. To have an identification system that contains information from several copepods, vector composed signatures were designed, which is derived from adding the signatures of 10 different copepods and to get

In Fig. 18 is shown the application of this method, we can see an excellent separation of the Euclidean distance mean, which corresponds to Centropages furcatus female (Group e), used like target, with respect to the calculation of the *Ed* from the rest of the groups. Statistic was realized and the mean value ± 2SE was calculated. We can identify the species and sex

To evaluate the performance of the algorithm, we compare it with respect to that published by Álvarez-Borrego and Castro-Longoria, considering that both are similar in application to pattern recognition and methodology based on the properties of the Fourier transform, in

Comparing the results of the two algorithms applied to the same sets of images, we observed that in the second is not possible to distinguish the selected image as reference, with respect to the others. Regarding the computational cost, the processing time of our

**4.3 Average signature** 

is 256 X 256 pixels of black background with a centered copepod. The image was rotated 360º in increments of 1º to recognize a target in an input scene; signatures were compared using the Euclidean distance (*Ed*) between the target and the input image and, because there is no variation in the size of copepod adults, we did not analyze the changes of scale.

In order to see if this methodology has a good performance for recognizing the target image when compared with other images that don't correspond to the copepod chosen as reference, numerical calculations were performed for the entire set of 14 images showed in Fig. 13.

Fig. 14. (a) Original image. (b) Cleaned images

Fig. 15. shows the performance of *Ed* for the fourteen pictures rotated 360 degrees. The Fig. 13(a) was used as a target, corresponding to a female *Calanus pacificus*. We can see a very good separation of the *Ed* for the target with respect to the calculation of the *Ed* for the nontarget images. As can be seen in Fig. 15, the curve of the copepod (Fig. 13a) represents an *Ed* with minimum values, which distinguish it from the rest of the copepods. Statistic was realized and the mean value ±2 SE was calculated. We can see this algorithm has at least a 95.4% level of confidence for this case (Fig. 16).

Fig. 15. Change the angle of rotation against *Ed*

328 Fourier Transform – Signal Processing

is 256 X 256 pixels of black background with a centered copepod. The image was rotated 360º in increments of 1º to recognize a target in an input scene; signatures were compared using the Euclidean distance (*Ed*) between the target and the input image and, because there

In order to see if this methodology has a good performance for recognizing the target image when compared with other images that don't correspond to the copepod chosen as reference, numerical calculations were performed for the entire set of 14 images showed in Fig. 13.

is no variation in the size of copepod adults, we did not analyze the changes of scale.

Fig. 15. shows the performance of *Ed* for the fourteen pictures rotated 360 degrees. The Fig. 13(a) was used as a target, corresponding to a female *Calanus pacificus*. We can see a very good separation of the *Ed* for the target with respect to the calculation of the *Ed* for the nontarget images. As can be seen in Fig. 15, the curve of the copepod (Fig. 13a) represents an *Ed* with minimum values, which distinguish it from the rest of the copepods. Statistic was realized and the mean value ±2 SE was calculated. We can see this algorithm has at least a

> Cop.a Cop.b Cop.c Cop.d Cop.e Cop.f Cop.g Cop.h Cop.i Cop.j Cop.k Cop.l Cop.m Cop.n


Angle of rotation (degrees)

(a) (b)

Fig. 14. (a) Original image. (b) Cleaned images

95.4% level of confidence for this case (Fig. 16).

0.0

Fig. 15. Change the angle of rotation against *Ed*

2.0x106

4.0x106

6.0x106

Euclidean distance

8.0x106

1.0x107

1.2x107

1.4x107

Fig. 16. Statistical behavior of the *Ed*, for a *Calanus pacificus* female specimen

### **4.3 Average signature**

Like a second set of test images 30 females and 30 males from each of the 7 different species of copepods were used, thus forming a database of 420 images. To have an identification system that contains information from several copepods, vector composed signatures were designed, which is derived from adding the signatures of 10 different copepods and to get its average, which is now the reference vectorial signature (Fig. 17).

In Fig. 18 is shown the application of this method, we can see an excellent separation of the Euclidean distance mean, which corresponds to Centropages furcatus female (Group e), used like target, with respect to the calculation of the *Ed* from the rest of the groups. Statistic was realized and the mean value ± 2SE was calculated. We can identify the species and sex of the target with the confidence level was about 95.4%.

### **4.4 Comparison with other algorithms**

To evaluate the performance of the algorithm, we compare it with respect to that published by Álvarez-Borrego and Castro-Longoria, considering that both are similar in application to pattern recognition and methodology based on the properties of the Fourier transform, in the second method is pattern matching using the correlation.

Comparing the results of the two algorithms applied to the same sets of images, we observed that in the second is not possible to distinguish the selected image as reference, with respect to the others. Regarding the computational cost, the processing time of our procedure is approximately 20% lower.

Vectorial Signatures for Pattern Recognition 331

One of the most important characteristics, which account for visual pattern recognition is the color, and has been used in a wide range of applications in different areas of knowledge, such as evaluation and identification of textiles, flowers, microscopic images, surface corrosion, species recognition, and so on. In our daily life, our vision and actions are influenced by a variety of shapes and colors. The introduction of color increases the amount of information in pattern recognition, so that discrimination can be improved considerably. Color is a perceived phenomenon and not a physical dimension like length or temperature. A suitable form of representation must be found for storing, displaying, and processing color images. This representation must be well suited to the mathematical demands of a color image processing algorithm, to the technical conditions of a camera, printer, or monitor, and to human color perception as well. These various demands cannot be met equally well simultaneously. For this reason, different representations are used in color image processing according to the goal. For these use different models that are known as color space (Gonzalez & Woods, 2008, Kosch & Abidi, 2008, Westland & Ripamonti, 2004) were used. One of the most used color space is RGB, this model is based on the additive mixture of the three primary colors: red (R), green (G) and blue (B). When dealing with color images, there are several concepts that are inherently quite different. For example, if we treat the RGB signals at a pixel as a three-dimensional vector, a color image becomes a vector field, while a

Fig. 19. shows the block diagram of the methodology applied to RGB color spaces: in the

where *i* = R,G,B, then each image is decomposed into its respective RGB channels [ *fR(x, y)*, *fG(x, y)*, *fB(x, y)*]. To have an identification system that contains information from the components *fi(x,y)*, a vector-composed signatures were obtained, which is now the average

To test this algorithm a set of images shown in Fig. 20 was used. With different types of butterflies, each image has a size of 256 x 256 pixels. In addition it is estimated that the number of species of butterflies in the world varies between 15.000 and 20.000, which according to

In Fig. 21 we can see the statistical behavior of the Euclidean distance for the case of rotation of the image 360 degrees in one degree increments, using as reference the butterfly (10), which corresponds to a specimen *Archaeoprepona amphimachus amphiktion*, which at first glance is very similar to others butterflies, for example with butterfly (11) belongs to same genus but different species. As expected, the minimum value of the distance corresponds to the target. We can see that this algorithm has a very good performance in to recognize the target when is compared with the others butterflies. Numerical simulations were performed

the North American Butterfly Association (NABA), about 2000 are located in México.

*)*, for the test image (*Ij*

, the value of the *Ed* is calculated.

θ

and *Ij*

performed the same process where were obtained *V1g(wλ)* and *V2g(w*

) which is denoted by *fi(x, y),*

) represented by *gj(x, y)* is

*)*. To determine the

θ

**5. Computer simulations in color images** 

monochromatic image is a scalar field.

vectorial signature *V1f(wλ)* and *V2f(w*

similarity between the two images *Ii*

**5.2 Simulation for butterflies** 

first step, is select the color image to be used like a target (*Ii*

**5.1 Methodology** 

Fig. 17. Average vectorial signature

Fig. 18. Statistical behavior of the Euclidean distance, in the case of vectorial signatures composite of 10 images of Centropages furcatus female (Group e)

330 Fourier Transform – Signal Processing

Fig. 18. Statistical behavior of the Euclidean distance, in the case of vectorial signatures

composite of 10 images of Centropages furcatus female (Group e)

Fig. 17. Average vectorial signature

### **5. Computer simulations in color images**

One of the most important characteristics, which account for visual pattern recognition is the color, and has been used in a wide range of applications in different areas of knowledge, such as evaluation and identification of textiles, flowers, microscopic images, surface corrosion, species recognition, and so on. In our daily life, our vision and actions are influenced by a variety of shapes and colors. The introduction of color increases the amount of information in pattern recognition, so that discrimination can be improved considerably.

Color is a perceived phenomenon and not a physical dimension like length or temperature. A suitable form of representation must be found for storing, displaying, and processing color images. This representation must be well suited to the mathematical demands of a color image processing algorithm, to the technical conditions of a camera, printer, or monitor, and to human color perception as well. These various demands cannot be met equally well simultaneously. For this reason, different representations are used in color image processing according to the goal. For these use different models that are known as color space (Gonzalez & Woods, 2008, Kosch & Abidi, 2008, Westland & Ripamonti, 2004) were used.

One of the most used color space is RGB, this model is based on the additive mixture of the three primary colors: red (R), green (G) and blue (B). When dealing with color images, there are several concepts that are inherently quite different. For example, if we treat the RGB signals at a pixel as a three-dimensional vector, a color image becomes a vector field, while a monochromatic image is a scalar field.

### **5.1 Methodology**

Fig. 19. shows the block diagram of the methodology applied to RGB color spaces: in the first step, is select the color image to be used like a target (*Ii* ) which is denoted by *fi(x, y),* where *i* = R,G,B, then each image is decomposed into its respective RGB channels [ *fR(x, y)*, *fG(x, y)*, *fB(x, y)*]. To have an identification system that contains information from the components *fi(x,y)*, a vector-composed signatures were obtained, which is now the average vectorial signature *V1f(wλ)* and *V2f(w*θ*)*, for the test image (*Ij* ) represented by *gj(x, y)* is performed the same process where were obtained *V1g(wλ)* and *V2g(w*θ*)*. To determine the similarity between the two images *Ii* and *Ij* , the value of the *Ed* is calculated.

### **5.2 Simulation for butterflies**

To test this algorithm a set of images shown in Fig. 20 was used. With different types of butterflies, each image has a size of 256 x 256 pixels. In addition it is estimated that the number of species of butterflies in the world varies between 15.000 and 20.000, which according to the North American Butterfly Association (NABA), about 2000 are located in México.

In Fig. 21 we can see the statistical behavior of the Euclidean distance for the case of rotation of the image 360 degrees in one degree increments, using as reference the butterfly (10), which corresponds to a specimen *Archaeoprepona amphimachus amphiktion*, which at first glance is very similar to others butterflies, for example with butterfly (11) belongs to same genus but different species. As expected, the minimum value of the distance corresponds to the target. We can see that this algorithm has a very good performance in to recognize the target when is compared with the others butterflies. Numerical simulations were performed

Vectorial Signatures for Pattern Recognition 333

To analyze the invariance to scale changes were made from 70% to 130%, with variations of one percent each of the test images. The analysis of the statistical behavior of the distance is shown in Fig. 22. for the case where the butterfly (3) is used like target. The results indicate that there is separation of the reference image with the rest of the butterflies, which can be

Fig. 21. Statistical behavior of the *Ed*, for an *Archaeoprepona amphimachus amphiktion* specimen

Fig. 22. Statistical behavior of the *Ed*, for a *Agraulis vanillae incarnata* specimen

identified without difficulty, regardless of changes of scale.

considering as a reference each of the 18 butterflies and in all cases could be distinguished from the rest.

Fig. 19. Block diagram of used procedure


Fig. 20. Butterflies used. (1) Actinote guatemalena guerrerensis. (2) Actinote stratonice oaxaca. (3) Agraulis vanillae incarnata. (4) Agrias amydon oaxacata. (5) Anae aidea. (6) Ancyluris inca mora. (7) Anetia thirza. (8) Ansyluris jurgennseni. (9) Arawacus sito. (10) Archaeoprepona amphimachus amphiktion. (11) Archaeoprepona demophon centralis. (12) Baeotus baeotus. (13) Basilarchia archippus. (14) Biblis hyperia aganissa. (15) Bolboneura Sylphis sylphis. (16) Caligo eurylochus sulanos. (17) Caligo memnon. (18) Caligo oileus scamander

332 Fourier Transform – Signal Processing

considering as a reference each of the 18 butterflies and in all cases could be distinguished

Fig. 20. Butterflies used. (1) Actinote guatemalena guerrerensis. (2) Actinote stratonice oaxaca. (3) Agraulis vanillae incarnata. (4) Agrias amydon oaxacata. (5) Anae aidea. (6) Ancyluris inca mora. (7) Anetia thirza. (8) Ansyluris jurgennseni. (9) Arawacus sito. (10) Archaeoprepona amphimachus amphiktion. (11) Archaeoprepona demophon centralis. (12) Baeotus baeotus. (13) Basilarchia archippus. (14) Biblis hyperia aganissa. (15) Bolboneura Sylphis sylphis. (16) Caligo eurylochus sulanos. (17) Caligo memnon. (18) Caligo oileus

from the rest.

scamander

Fig. 19. Block diagram of used procedure

To analyze the invariance to scale changes were made from 70% to 130%, with variations of one percent each of the test images. The analysis of the statistical behavior of the distance is shown in Fig. 22. for the case where the butterfly (3) is used like target. The results indicate that there is separation of the reference image with the rest of the butterflies, which can be identified without difficulty, regardless of changes of scale.

Fig. 21. Statistical behavior of the *Ed*, for an *Archaeoprepona amphimachus amphiktion* specimen

Fig. 22. Statistical behavior of the *Ed*, for a *Agraulis vanillae incarnata* specimen

Vectorial Signatures for Pattern Recognition 335

Cheriet, M., Kharma N., Liu C., Suen, C. (2007). *Character recognition systems. A guide for* 

Cohen, L. The Scale Representation*, IEEE Transactions on signal processing,* Vol. 41, No. 12,

Cohen, L.(1993). The scale representation, *IEEE Transactions on Signal Processing,* Vol. 41, No.

Cristobal, G. & Cohen, L. (1996). Scale in image, *SPIE Proceedings*. Vol. 2846, pp. 251-261,

Fosshagen, A. & Iliffe T, (1991), A new genus of calanoid copepod from an anchialine caves in Belize, *Bull. Plankton Soc. Japan*, Spec.Vol, pp. 339-346, ISSN: 0387-8961. Gonzalez, R. & Woods, R. (2008). *Digital Image Processing*, (3rd Edition), Pearson Prentice

Gonzalez, R. C. & Woods, R.E. Eddins, S. L. (2009). *Digital Image Processing Using MATLAB®,* 

Guerrero, R.E. & Álvarez-Borrego, J. (2009). Nonlinear composite filter performance, *Optical* 

Humes, A. G. (1991). Zoogeography of copepods at hydrothermal vents in the eastern Pacific Ocean, *Bull. Plankton Soc. Japan*, pp. 383-389, ISSN: 0387-8961. Humes, A. G. (1994). How many copepods?, *Hydrobiologia,* Vol. 292-293, No. 1, pp. 1-7, ISSN:

Koschan, A. & Abidi, M. (2008). *Digital Color Image Processing* (First Edition), John Wiley &

Lerma-Aragón, J. & Álvarez-Borrego, J. (2009). Vectorial signatures for invariant recognition

Obinata, G. & Morris, G. (2007). *Vision systems, segmentation and pattern recognition*, I-Tech Education and Publishing, ISBN 978-3-902613-01-1, Vienna, Austria. Pech, J., Cristobal, G., Álvarez-Borrrego, J., and Cohen, L., (2001), Automatic system for

Pratt, W. K. (2007). *Digital image processing*, (4th ed), John Wiley & Sons, Inc., ISBN: 978-0-

Schwartz, E. (1994). Topographic mapping in primate visual cortex: anatomical and

Soh, H. Y. & Suh, H.L. (2000). A new species of Acartia (Copepoda, Calanoida) from the

Solorza, S. & Álvarez-Borrego, J., (2010). Digital system of invariant correlation to position

Solorza, S. & Álvarez-Borrego, J., (2011). Digital system of invariant correlation to position

of position, rotation and scale pattern recognition, *Journal of Modern Optics*, Vol. 56,

phytoplanktonic algae identification, *Limnetica*. Vol. 20, No. 1, pp. 143-158, ISSN

computation approaches, *Visual Science and Engineering. Models and Applications,* Ed.

Yellow Sea, *Journal of Plankton Research,* Vol. 22, (February 2000), pp. 321-337, ISSN:

and rotation, *Optics Communications*, Vol*.* 283, No. 19 (October 2010), pp. 3613–3630,

and scale using adaptive ring masks and unidimensional signatures, *Proceedings of SPIE*, 22nd Congress of the International Commission for Optics: Light for the

Hall, ISBN: 978-0-13-168728-8, Upper Saddle River. New Jersey.

(2th ed), Gatesmark Publishing, ISBN: 978-0-9820854-0-0, USA.

*Engineering.* Vol. 48, No. 6, pp. 067201 1-11, ISSN: 0091-3286.

Sons, Inc., ISBN 9780470147085, Hoboken, New Jersey.

No 14, pp. 1598-1606, ISSN 0950-0340.

471-76777-0, Hoboken, New Jersey.

Marcel Dekker, New York. ISBN: 0824791851.

(December 1993), pp. 3275-3292. ISSN: 1053-587X

Denver, CO, USA, August 1996.

12, (December 1993), pp.3275-3291, ISSN: 1053-587X.

New Jersey.

0018-8158.

0213-8409

0142-7873.

ISSN: 0030-4018

*students and practioners*, John Wiley & Sons, Inc., ISBN: 978-0-471-41570-1, Hoboken,

## **6. Conclusion**

In this chapter it was developed a system for pattern recognition with the utilization of vectorial signatures, based on the well-known relation between scale and Fourier transform, and has been performed to be practical and accurate. To test the built model, simulations were made and the results were analyzed. Conducting various numerical tests with letters of the alphabet, with variations in the scale and rotation, the results obtained from a comparison of the Euclidean distances show that it is possible for identification and discrimination with a high confidence level (at least 95.4%). The system was applied to the recognition of copepod species and sex. With vectorial signatures it is possible to identify, with a confidence level of 95.4%, the species and sex of various species of copepods. A modification of the algorithm in color image recognition, using average vector signature from the information contained in the RGB channels was done. For this case images of real butterflies were used, finding that the system can identify the species you choose as a reference, when compared with images of different butterflies, for any angle of rotation that contains, as well as changes in the scale of 70% to 130%.

This work contributes to increase the potential recognition systems for use in applications requiring assessment and interpretation of data, applications that were traditionally performed by human vision or specialists trained technicians and are still far from automation.

### **7. Acknowledgment**

This document is based on work partially supported by UABC and CONACYT under Project grants 102007 and 169174.

### **8. References**


334 Fourier Transform – Signal Processing

In this chapter it was developed a system for pattern recognition with the utilization of vectorial signatures, based on the well-known relation between scale and Fourier transform, and has been performed to be practical and accurate. To test the built model, simulations were made and the results were analyzed. Conducting various numerical tests with letters of the alphabet, with variations in the scale and rotation, the results obtained from a comparison of the Euclidean distances show that it is possible for identification and discrimination with a high confidence level (at least 95.4%). The system was applied to the recognition of copepod species and sex. With vectorial signatures it is possible to identify, with a confidence level of 95.4%, the species and sex of various species of copepods. A modification of the algorithm in color image recognition, using average vector signature from the information contained in the RGB channels was done. For this case images of real butterflies were used, finding that the system can identify the species you choose as a reference, when compared with images of different butterflies, for any angle of rotation that

This work contributes to increase the potential recognition systems for use in applications requiring assessment and interpretation of data, applications that were traditionally performed by human vision or specialists trained technicians and are still far from

This document is based on work partially supported by UABC and CONACYT under

Álvarez-Borrego J. & Castro-Longoria E., (2003). Discrimination between Acartia

Bracewell, R. N. (1999). *The Fourier transform and its applications*, (3th ed), McGraw-Hill,

Bradford, J. M. (1976). Partial revision of the Acartia subgenus Acartiura (Copepoda:

Casasent, D. & Psaltis, D. (1976a). Scale invariant optical correlation using Mellin

Casasent, D. &Psaltis, D. (1976b). Scale invariant optical transforms. *Optical Engineering,* Vol.

Casasent, D. & Psaltis, D. (1976c). Position, rotation, and scale invariant optical correlation. *Applied Optics* Vol. 15, No. 7, (July, 1976), pp. 1795-1799, ISSN: 0003-6935.

10, No. 1, (March, 1976), pp. 159-202, ISSN: 0028-8330.

15, (May-June, 1976), pp. 258-261, ISSN: 0091-3286.

(Copepoda: Calanoida) species using their diffraction pattern in a position, rotation invariant digital correlation, *Journal of Plankton Research*, Vol. 25, No. 2, (February,

Calanoida: Acartidae), *New Zealand Journal of Marine and Freshwater Research*., Vol.

transforms. *Optics Communication,* Vol. 17, No. 1, (April, 1976), pp. 59-63, ISSN:

contains, as well as changes in the scale of 70% to 130%.

2003), pp 229-233, ISSN: 0142-7873.

ISBN-10: 0073039381, New York.

**6. Conclusion** 

automation.

**8. References** 

**7. Acknowledgment** 

Project grants 102007 and 169174.

0030-4018.


**0**

**15**

*Mexico*

**Multidimensional Features Extraction**

Jesus Olivares-Mercado, Gualberto Aguilar-Torres, Karina Toscano-Medina, Gabriel Sanchez-Perez, Mariko Nakano-Miyatake and Hector Perez-Meana

Pattern recognition have been a topic of active research during the 30 years, due to the high performance that these schemes presents, when they have been used in the solution of many practical problems in several fields of science, medicine and engineering. The efficiency of pattern recognition algorithms strongly depends in an accurate features extraction scheme that be able to represent the pattern under analysis using a number of parameters as small as possible, while keeping a large intra-pattern and very low inter-pattern similarities. These requirements have led to the development of several feature extraction methods, which can be divided in three groups. Feature extraction methods in time domain, spatial domain and frequency domain. In all cases the proposed feature extraction methods strongly depend of the specify applications. Thus the features extraction methods performing well in some applications, may do not perform well in others, for example the features extraction methods used for speech or speaker recognition are quite different to those used for fingerprints or face recognition. This chapter presents an analysis of some successful frequency domain feature extraction methods that have been proposed for applications involving audio, speech and images pattern recognition. Evaluation results are also provided to show the effectiveness of

Speech is a widely used biometric feature for person recognition, where a person is recognized through his voice. To develop this kind of systems, several frequency domain methods have been proposed such as LPCepstral, described below, the LPCESPTRAL combined with dynamic features like the Cepstral Mean Normalization (CMN). In this section an analysis of this feature extraction methods is provided together with some evaluation results to show

A general speaker recognition system, shown in Fig 1, consists mainly, of three stages: the feature extraction stage, where appropriate information is estimated in a suitable form and size, from the speech signal to obtain a good representation of the speaker features, the classifier stage, where the speaker models are adapted using the feature vectors, and the

**1. Introduction**

such feature extraction methods.

recognition performance.

**2.1 Speaker recognition system**

**2. Feature extraction in speaker recognition**

**Methods in Frequency Domain**

*National Polytechnic Institute*

Development of the World, ISBN: 9780819485854, Vol. 8011, Puebla, México, August 2011.

Westland, S. & Ripamonti, C. (2004). *Computational Colour Science using MATLAB*, John Wiley & Sons Ltd, ISBN 0-470-84562-7, West Su.

## **Multidimensional Features Extraction Methods in Frequency Domain**

Jesus Olivares-Mercado, Gualberto Aguilar-Torres, Karina Toscano-Medina, Gabriel Sanchez-Perez, Mariko Nakano-Miyatake and Hector Perez-Meana *National Polytechnic Institute Mexico*

### **1. Introduction**

336 Fourier Transform – Signal Processing

Westland, S. & Ripamonti, C. (2004). *Computational Colour Science using MATLAB*, John

Wiley & Sons Ltd, ISBN 0-470-84562-7, West Su.

August 2011.

Development of the World, ISBN: 9780819485854, Vol. 8011, Puebla, México,

Pattern recognition have been a topic of active research during the 30 years, due to the high performance that these schemes presents, when they have been used in the solution of many practical problems in several fields of science, medicine and engineering. The efficiency of pattern recognition algorithms strongly depends in an accurate features extraction scheme that be able to represent the pattern under analysis using a number of parameters as small as possible, while keeping a large intra-pattern and very low inter-pattern similarities. These requirements have led to the development of several feature extraction methods, which can be divided in three groups. Feature extraction methods in time domain, spatial domain and frequency domain. In all cases the proposed feature extraction methods strongly depend of the specify applications. Thus the features extraction methods performing well in some applications, may do not perform well in others, for example the features extraction methods used for speech or speaker recognition are quite different to those used for fingerprints or face recognition. This chapter presents an analysis of some successful frequency domain feature extraction methods that have been proposed for applications involving audio, speech and images pattern recognition. Evaluation results are also provided to show the effectiveness of such feature extraction methods.

### **2. Feature extraction in speaker recognition**

Speech is a widely used biometric feature for person recognition, where a person is recognized through his voice. To develop this kind of systems, several frequency domain methods have been proposed such as LPCepstral, described below, the LPCESPTRAL combined with dynamic features like the Cepstral Mean Normalization (CMN). In this section an analysis of this feature extraction methods is provided together with some evaluation results to show recognition performance.

### **2.1 Speaker recognition system**

A general speaker recognition system, shown in Fig 1, consists mainly, of three stages: the feature extraction stage, where appropriate information is estimated in a suitable form and size, from the speech signal to obtain a good representation of the speaker features, the classifier stage, where the speaker models are adapted using the feature vectors, and the

reducing the problem of low speech quality. In most SRS even if they have a good performance using the same data training (closed test), their performance considerably degrades when the systems are used with different data set (open test), because the data for closed and open test for each speaker may have different acoustic conditions. Thus, channel normalization techniques may have to be used to reduce the speaker features distortion, keeping in such way a good recognition performance. Among the channel normalization technique we have the Cepstral Mean Normalization (CMN), which can provide a considerable environmental robustness at a negligible computational cost (Liu et al., 1993). On the other hand, the use of more than one speaker feature is proposed as well as combination of them to get a more robust feature vector. To improve the SRS performance the LPCepstral coefficients obtained using the CMN can be combined with the pitch information because the pitch is a very important

Multidimensional Features Extraction Methods in Frequency Domain 339

To estimate the LPCepstral coefficients, firstly the speech signal is divided in segments of 20 ms length with 50% overlap using a Hamming window. Next, the LPC coefficients are estimated using the Levinson algorithm such that the mean square value of prediction error

*S*(*n*) −

becomes a minimum, where *E*[.] is the expectation operator, *P* is the predictor order and *ai* is the *i* − *th* linear prediction coefficient (LPC). Next, once the LPC vector has been estimated, the LPCepstral coefficients can be obtained in a recursive way as follows (Simancas-Acevedo

*c*1,*t*, *c*2,*t*, *c*3,*t*, ··· , *cd*,*<sup>t</sup>*

The pitch and voiced part detection plays a very important roll in the speaker recognition systems because the pitch value and the voiced segments of speech signals contain the most important information about the speaker identity. Then the feature vector could be extracted only from the voiced segments of speech signal (Rabiner & Gold, 1975). To this end, firstly the pitch period is detected using the autocorrelation method (Campbell, 1997) as follows: Initially, the speech signal is segmented in frames of 20 ms with 10 ms overlap using a Hamming window; next the center clipper method (Ganchev et al., 2002) is applied to the windowed frame to reduce the effect of the additive noise intrinsic within of the speech signal. Subsequently, the autocorrelation of the center clipped segment is obtained. Finally the pitch value is estimate as the distance between two consecutive positions in which the normalized autocorrelation sequence is larger than a given threshold, as proposed in

*P* ∑ *i*=1

*aiS*(*n* − *i*)

(*<sup>n</sup>* − *<sup>i</sup>*)*aicn*−*i*, *<sup>n</sup> >* <sup>0</sup> (2)

(3)

(1)

1 *n*

*Xt* =

**2.2.2 Features vector derived from LPCepstral of voiced segments**

*n* ∑ *i*=1

where *cn* is the *n* − *th* LPCepstral coefficient. Thus the SRS feature vector becomes

*E*[*e*(*n*)] = *E*

*cn* = −*an* +

speaker feature.

given by.

et al., 2001).

where *t* denotes the frame number.

**2.2.1 Features vector derived form LPCepstral**

decision stage, where the recognition decision is taken. The SRS system under analysis firstly extracts the features vector from the speaker voice. To this end, firstly estimates the LPC-Cepstral (Ganchev et al., 2002) coefficients using only the voiced parts of speech signals. Then using the estimated LPC-Cepstral coefficients, the dynamic features are estimated to enhance the speaker features vector. Next the estimated dynamical features vector is feed to a Gaussian Mixture Model, GMM, which is used to obtain a representative model for each speaker. Taking into account that a SRS can be improved taking the voiced part of the speech signal because this contains the main information relative to the speaker identity (El-Solh, 2007; Markov & Nakagawa, 1999; Plumper et al., 1999). For this reason in this book the features vector will be derived from the LPC-cepstral extracted only from the voiced part speech signal.

Fig. 1. General Speaker Recognition System

#### **2.2 Feature vector extraction**

A good performance of any pattern recognition system strongly depends on the extraction of a suitable feature vector that allow unambiguous representation of the pattern under analysis, with a number of parameters as small as possible. A simple way to estimate the speaker characteristics is the use of the linear prediction coefficients (LPC) of speech signal. The main reason about it is the fact that using these parameters can satisfactorily represent the structure of the vocal tract. However, it has been reported that better performance can be obtained if the LPC are combined with some frequency domain representation. One of these representations are the LPC features combined with the cepstral analysis, which allows to get a robust speaker characterization with low sensitivity to the distortion introduced in the signal transmitted through conventional communication channels (**?**).

The features vectors extracted from the whole speech signal provide a fairly good performance. However, when the LPCepstral coefficients are obtained from the LPC analysis, useful information of the speaker is still ignored or not taken in account, such as the pitch that is a specific feature of the individual speaker identity widely used to represent the glottal flow information.The performance of SRS can be seriously degraded when the SRS uses speech signal transmitted though some communication channel, such as a telephone one, due to the frequency response of the communication channel as well as the environment or the microphone characteristics. The LPCepstral coefficients have shown to be robust for 2 Will-be-set-by-IN-TECH

decision stage, where the recognition decision is taken. The SRS system under analysis firstly extracts the features vector from the speaker voice. To this end, firstly estimates the LPC-Cepstral (Ganchev et al., 2002) coefficients using only the voiced parts of speech signals. Then using the estimated LPC-Cepstral coefficients, the dynamic features are estimated to enhance the speaker features vector. Next the estimated dynamical features vector is feed to a Gaussian Mixture Model, GMM, which is used to obtain a representative model for each speaker. Taking into account that a SRS can be improved taking the voiced part of the speech signal because this contains the main information relative to the speaker identity (El-Solh, 2007; Markov & Nakagawa, 1999; Plumper et al., 1999). For this reason in this book the features vector will be derived from the LPC-cepstral extracted only from the voiced part

A good performance of any pattern recognition system strongly depends on the extraction of a suitable feature vector that allow unambiguous representation of the pattern under analysis, with a number of parameters as small as possible. A simple way to estimate the speaker characteristics is the use of the linear prediction coefficients (LPC) of speech signal. The main reason about it is the fact that using these parameters can satisfactorily represent the structure of the vocal tract. However, it has been reported that better performance can be obtained if the LPC are combined with some frequency domain representation. One of these representations are the LPC features combined with the cepstral analysis, which allows to get a robust speaker characterization with low sensitivity to the distortion introduced in the signal transmitted

The features vectors extracted from the whole speech signal provide a fairly good performance. However, when the LPCepstral coefficients are obtained from the LPC analysis, useful information of the speaker is still ignored or not taken in account, such as the pitch that is a specific feature of the individual speaker identity widely used to represent the glottal flow information.The performance of SRS can be seriously degraded when the SRS uses speech signal transmitted though some communication channel, such as a telephone one, due to the frequency response of the communication channel as well as the environment or the microphone characteristics. The LPCepstral coefficients have shown to be robust for

speech signal.

Fig. 1. General Speaker Recognition System

through conventional communication channels (**?**).

**2.2 Feature vector extraction**

reducing the problem of low speech quality. In most SRS even if they have a good performance using the same data training (closed test), their performance considerably degrades when the systems are used with different data set (open test), because the data for closed and open test for each speaker may have different acoustic conditions. Thus, channel normalization techniques may have to be used to reduce the speaker features distortion, keeping in such way a good recognition performance. Among the channel normalization technique we have the Cepstral Mean Normalization (CMN), which can provide a considerable environmental robustness at a negligible computational cost (Liu et al., 1993). On the other hand, the use of more than one speaker feature is proposed as well as combination of them to get a more robust feature vector. To improve the SRS performance the LPCepstral coefficients obtained using the CMN can be combined with the pitch information because the pitch is a very important speaker feature.

#### **2.2.1 Features vector derived form LPCepstral**

To estimate the LPCepstral coefficients, firstly the speech signal is divided in segments of 20 ms length with 50% overlap using a Hamming window. Next, the LPC coefficients are estimated using the Levinson algorithm such that the mean square value of prediction error given by.

$$E[e(n)] = E\left[\mathcal{S}(n) - \sum\_{i=1}^{P} a\_i \mathcal{S}(n-i)\right] \tag{1}$$

becomes a minimum, where *E*[.] is the expectation operator, *P* is the predictor order and *ai* is the *i* − *th* linear prediction coefficient (LPC). Next, once the LPC vector has been estimated, the LPCepstral coefficients can be obtained in a recursive way as follows (Simancas-Acevedo et al., 2001).

$$\mathcal{L}\_{\mathcal{U}} = -a\_{\mathcal{U}} + \frac{1}{n} \sum\_{i=1}^{n} (n-i) a\_i c\_{n-i\prime} \qquad n > 0 \tag{2}$$

where *cn* is the *n* − *th* LPCepstral coefficient. Thus the SRS feature vector becomes

$$X\_t = \begin{bmatrix} \mathcal{c}\_{1,t}, \mathcal{c}\_{2,t}, \mathcal{c}\_{3,t}, \cdots, \mathcal{c}\_{d,t} \end{bmatrix} \tag{3}$$

where *t* denotes the frame number.

#### **2.2.2 Features vector derived from LPCepstral of voiced segments**

The pitch and voiced part detection plays a very important roll in the speaker recognition systems because the pitch value and the voiced segments of speech signals contain the most important information about the speaker identity. Then the feature vector could be extracted only from the voiced segments of speech signal (Rabiner & Gold, 1975). To this end, firstly the pitch period is detected using the autocorrelation method (Campbell, 1997) as follows: Initially, the speech signal is segmented in frames of 20 ms with 10 ms overlap using a Hamming window; next the center clipper method (Ganchev et al., 2002) is applied to the windowed frame to reduce the effect of the additive noise intrinsic within of the speech signal. Subsequently, the autocorrelation of the center clipped segment is obtained. Finally the pitch value is estimate as the distance between two consecutive positions in which the normalized autocorrelation sequence is larger than a given threshold, as proposed in

Fig. 3. Speech signal reduction taking only voiced part

of the *log*10*F*0,*<sup>t</sup>* is close to the normal distribution.

**2.2.3 Reinforcing and enhancing feature vectors**

where *c<sup>v</sup>*

*Xp <sup>t</sup>* = *cv* 1,*t*, *<sup>c</sup><sup>v</sup>* 2,*t*, *<sup>c</sup><sup>v</sup>*

and the second one consists of LPCepstral coefficients and pitch information

3,*t*, ··· , *<sup>c</sup><sup>v</sup>*

Multidimensional Features Extraction Methods in Frequency Domain 341

In long distance speaker recognition, the speech signal is transmitted through a channel communication and then is processed by the SRS. However, the speech signal suffers some distortion or variation due to the communication channel effects, noise environment, etc. Because these distortions are added to the principal components of the speech signal, it is necessary to remove the undesirable information before to proceed with the recognition process. To this end it would be convenient to enhance the estimated feature vector. Thus we can subtract the global average vector from all feature vector components. In this process, known as Cepstral Mean Normalization (CMN) (Murthy et al., 1999; Reynolds, 1995; Hardt & Fellbaum, 1997), it is assumed that the mean values of LPCepstral coefficients of clean speech is zero, so that if the mean value is subtracted from the feature vector components its mean value becomes zero. This avoids the distortion introduced by the additive noise when the signal passes through the communication channel. This technique is equivalent to a high-pass filtering of LPCepstral coefficients, because the CMN estimates the mean value of the LPCepstral vector coefficients and subtracts it from each component, as shown in eq. (6):

*<sup>n</sup>*,*<sup>t</sup>* is the *n* − *th* LPCepstral coefficient and *F*0.*<sup>t</sup>* is the inverse of the pitch period at the block *t*. Here *log*10*F*0,*<sup>t</sup>* is used instead of the pitch period, because the probability distribution

*<sup>d</sup>*,*t*, *log*10*F*0,*<sup>t</sup>*

(5)

(Seung-Jin et al., 2007). Using the pitch information, the speech segment is then classified as voiced or unvoiced, because the pitch only appears in the voiced segments. Thus, if the pitch does not exist the speech segment is considered as a unvoiced segment (Campbell, 1997; **?**); and the pitch exists then speech segment is classified as a voiced speech segment. The Fig. 2 shows clearly the result of this procedure that proves that the detection of the voiced part is correctly done.

If only the voiced segments of speech signal are taken in account for features extraction, the original speech signal is transformed into a new speech signal containing only voiced parts, neglecting in such way the unvoiced and noisy silence parts, as shown in Fig. 3. This may improve the feature extraction because the unvoiced and silence parts provide non-useful information (Pool & Preez, 1999).

Fig. 2. Pitch period detection in two speech signals

The new signal, with only the voiced parts, has less samples number than the original one, but contains the essential parts required to estimate the principal features that identify the speaker and, as shown in the Fig. 3, the number of data is reduced, in many cases, as far as 50%. Once the new signal is constructed only with the voiced parts, it is divided in segments of 20 ms length with 50% overlap using a Hamming window. Next, the LPC coefficients are estimated using the Levinson algorithm as mentioned above, and then the LPCepstral coefficients are estimated using the eq. (2). Here two different features vectors can be estimated. The first one consists only of the LPCepstral of voiced segment.

$$X\_t^v = \begin{bmatrix} c\_{1,t'}^v c\_{2,t'}^v c\_{3,t'}^v \cdots \; \; \mathbf{c}\_{d,t}^v \end{bmatrix} \tag{4}$$

4 Will-be-set-by-IN-TECH

(Seung-Jin et al., 2007). Using the pitch information, the speech segment is then classified as voiced or unvoiced, because the pitch only appears in the voiced segments. Thus, if the pitch does not exist the speech segment is considered as a unvoiced segment (Campbell, 1997; **?**); and the pitch exists then speech segment is classified as a voiced speech segment. The Fig. 2 shows clearly the result of this procedure that proves that the detection of the voiced part is

If only the voiced segments of speech signal are taken in account for features extraction, the original speech signal is transformed into a new speech signal containing only voiced parts, neglecting in such way the unvoiced and noisy silence parts, as shown in Fig. 3. This may improve the feature extraction because the unvoiced and silence parts provide non-useful

The new signal, with only the voiced parts, has less samples number than the original one, but contains the essential parts required to estimate the principal features that identify the speaker and, as shown in the Fig. 3, the number of data is reduced, in many cases, as far as 50%. Once the new signal is constructed only with the voiced parts, it is divided in segments of 20 ms length with 50% overlap using a Hamming window. Next, the LPC coefficients are estimated using the Levinson algorithm as mentioned above, and then the LPCepstral coefficients are estimated using the eq. (2). Here two different features vectors can be estimated. The first one

3,*t*, ··· , *<sup>c</sup><sup>v</sup>*

*d*,*t* 

(4)

correctly done.

information (Pool & Preez, 1999).

Fig. 2. Pitch period detection in two speech signals

consists only of the LPCepstral of voiced segment.

*Xv <sup>t</sup>* = *cv* 1,*t*, *<sup>c</sup><sup>v</sup>* 2,*t*, *<sup>c</sup><sup>v</sup>*

Fig. 3. Speech signal reduction taking only voiced part

and the second one consists of LPCepstral coefficients and pitch information

$$\mathbf{X}\_t^p = \begin{bmatrix} \mathbf{c}\_{1,t}^v, \mathbf{c}\_{2,t}^v, \mathbf{c}\_{3,t}^v, \cdots, \mathbf{c}\_{d,t}^v, \log\_{10} \mathbf{F}\_{0,t} \end{bmatrix} \tag{5}$$

where *c<sup>v</sup> <sup>n</sup>*,*<sup>t</sup>* is the *n* − *th* LPCepstral coefficient and *F*0.*<sup>t</sup>* is the inverse of the pitch period at the block *t*. Here *log*10*F*0,*<sup>t</sup>* is used instead of the pitch period, because the probability distribution of the *log*10*F*0,*<sup>t</sup>* is close to the normal distribution.

#### **2.2.3 Reinforcing and enhancing feature vectors**

In long distance speaker recognition, the speech signal is transmitted through a channel communication and then is processed by the SRS. However, the speech signal suffers some distortion or variation due to the communication channel effects, noise environment, etc. Because these distortions are added to the principal components of the speech signal, it is necessary to remove the undesirable information before to proceed with the recognition process. To this end it would be convenient to enhance the estimated feature vector. Thus we can subtract the global average vector from all feature vector components. In this process, known as Cepstral Mean Normalization (CMN) (Murthy et al., 1999; Reynolds, 1995; Hardt & Fellbaum, 1997), it is assumed that the mean values of LPCepstral coefficients of clean speech is zero, so that if the mean value is subtracted from the feature vector components its mean value becomes zero. This avoids the distortion introduced by the additive noise when the signal passes through the communication channel. This technique is equivalent to a high-pass filtering of LPCepstral coefficients, because the CMN estimates the mean value of the LPCepstral vector coefficients and subtracts it from each component, as shown in eq. (6):

Feature Vector LPCepstral LPCepstral

Multidimensional Features Extraction Methods in Frequency Domain 343

7147 phrases 96.61% 97.13%

3658 phrases 82.34% 83.57%

Feature Vector LPCepstral LPCepstral LPCepstral LPCepstral

6581 phrases 93.31% 80.72% 99.18% 97.29%

3282 phrases 76.97% 70.88% 80.29% 77.57%

variations make the face recognition a very difficult task. Most approaches to face recognition are in the image domain whereas we believe that there are more advantages to work directly in the spatial frequency domain. By going to the spatial frequency domain, image information gets distributed across frequencies providing tolerance to reasonable deviations and also providing graceful degradation against distortions to images (e.g., occlusions) in the spatial

This section provides a detailed description of a general face recognition system which consists in three stages each one. Figure 5 shows the block diagram of this system. Firstly in the pre-processing the input image is normalized, equalized or some other method for enhance a image. In the feature extraction the phase spectrum is extracted, after that, the Principal Components Analysis (PCA) (Kriegman et al., 1997; Hager et al., 1999) is applied to the phase spectrum to obtain a dominant feature of the faces. Next, the features in principal components space are fed into classifier and the image will be classified to the class given

In the method proposed by Savvides (Savvides et al., 2004) were considering the combination of principal component analysis and phase spectrum of an image. Oppenheim (Oppenheim et al., 1980; Lim & Oppenheim, 1981) show that the image phase spectrum contains the most important information required for face image recognition, doing less relevant the use of the magnitude spectrum. His research also shows that getting only the phase spectrum of an image, can reconstruct the original image to a scale factor, therefore, information phase is the most important in the representation of a 2D signal in the Fourier domain. This is also demonstrated by a simple experiment, this is shown in Figure 6, in which the face of A (a) is reconstructed using the phase spectrum of A (c) and the magnitude spectrum of face B (f);

From voiced From voiced From voiced From voiced part parts using and pitch parts using

CMN information pitch and CMN

Evaluation(Close test)

Evaluation(Open test)

Closed test

Open test

**3.1 Face recognition system**

maximum likelihood.

**3.2 Feature vector extraction**

domain.

Table 2. Results with different features vectors

Table 1. Results using whole and voiced part speech signal

From whole speech signal From voiced part

$$\text{CMN}\_{n,t} = c\_{n,t} - \frac{1}{T} \sum\_{t=1}^{T} c\_{n,t}, \qquad 1 \le n \le d \tag{6}$$

where *cn*,*<sup>t</sup>* is *n* − *th* LPCepstral coefficient at block *t* and *T* is the total number of frames in which the speech signal was divided to extract the feature vectors. Fig. 4a and Fig. 4b show the effect produced in the feature vector when the Cepstral Mean Normalization (CMM) technique is applied to one of the LPCepstral coefficients extracted from the only the voiced part of the speech signal. In this situation the features vector becomes

$$\text{CMN}\_{t} = \begin{bmatrix} \text{CMN}\_{1,t} \text{'CMN}\_{2,t} \text{'CMN}\_{2,t} \dots \text{'CMN}\_{d,t} \end{bmatrix} \tag{7}$$

Fig. 4. The second LPC-Cepstral coefficient extracted from the speech signal

#### **2.3 Results**

Here present some results of a SRS that were evaluated using a feature vector with 16 LPCepstral coefficients extracted from only voiced part of the speech signals. The first evaluation of the baseline system is using these 16 LPCepstral coefficients extracted from only voiced part. The second evaluation is using the CMN technique to enhance the feature vector quality which has been affected by the channel communication and the environment noise. The third evaluation is using a combination of LPCepstral and pitch information and the forth evaluation is using the combination of LPCepstral coefficients and pitch information applying the CMN technique. All system evaluations, which are discussed in each respective section, are presented in the Table 1 and Table 2 where for system evaluation in close test, the same data for training was used and for system evaluation in open test, 2 different repetitions were used which were stored in different times, giving a total of 3658 phrases..

#### **3. Feature extraction in face recognition**

Face recognition has a large amount of biometric applications. The intra-person variations of the face image derive mainly from changes in facial expressions, illumination conditions as well as because the use of some accessories such as eyeglasses and muffler, etc. These


Table 1. Results using whole and voiced part speech signal


Table 2. Results with different features vectors

variations make the face recognition a very difficult task. Most approaches to face recognition are in the image domain whereas we believe that there are more advantages to work directly in the spatial frequency domain. By going to the spatial frequency domain, image information gets distributed across frequencies providing tolerance to reasonable deviations and also providing graceful degradation against distortions to images (e.g., occlusions) in the spatial domain.

### **3.1 Face recognition system**

6 Will-be-set-by-IN-TECH

*T* ∑ *t*=1

where *cn*,*<sup>t</sup>* is *n* − *th* LPCepstral coefficient at block *t* and *T* is the total number of frames in which the speech signal was divided to extract the feature vectors. Fig. 4a and Fig. 4b show the effect produced in the feature vector when the Cepstral Mean Normalization (CMM) technique is applied to one of the LPCepstral coefficients extracted from the only the voiced

*CMN*1,*t*, *CMN*2,*t*, *CMN*2,*t*, ··· , *CMNd*,*<sup>t</sup>*

channel

*cn*,*t*, 1 ≤ *n* ≤ *d* (6)

2ª LPCepstral coefficients of speech signal Transmitted through a communications

(7)

*T*

*CMNn*,*<sup>t</sup>* <sup>=</sup> *cn*,*<sup>t</sup>* <sup>−</sup> <sup>1</sup>

part of the speech signal. In this situation the features vector becomes

Fig. 4. The second LPC-Cepstral coefficient extracted from the speech signal

used which were stored in different times, giving a total of 3658 phrases..

**3. Feature extraction in face recognition**

Here present some results of a SRS that were evaluated using a feature vector with 16 LPCepstral coefficients extracted from only voiced part of the speech signals. The first evaluation of the baseline system is using these 16 LPCepstral coefficients extracted from only voiced part. The second evaluation is using the CMN technique to enhance the feature vector quality which has been affected by the channel communication and the environment noise. The third evaluation is using a combination of LPCepstral and pitch information and the forth evaluation is using the combination of LPCepstral coefficients and pitch information applying the CMN technique. All system evaluations, which are discussed in each respective section, are presented in the Table 1 and Table 2 where for system evaluation in close test, the same data for training was used and for system evaluation in open test, 2 different repetitions were

Face recognition has a large amount of biometric applications. The intra-person variations of the face image derive mainly from changes in facial expressions, illumination conditions as well as because the use of some accessories such as eyeglasses and muffler, etc. These

*CMNt* =

2ª LPCepstral coefficients of clean speech

**2.3 Results**

This section provides a detailed description of a general face recognition system which consists in three stages each one. Figure 5 shows the block diagram of this system. Firstly in the pre-processing the input image is normalized, equalized or some other method for enhance a image. In the feature extraction the phase spectrum is extracted, after that, the Principal Components Analysis (PCA) (Kriegman et al., 1997; Hager et al., 1999) is applied to the phase spectrum to obtain a dominant feature of the faces. Next, the features in principal components space are fed into classifier and the image will be classified to the class given maximum likelihood.

### **3.2 Feature vector extraction**

In the method proposed by Savvides (Savvides et al., 2004) were considering the combination of principal component analysis and phase spectrum of an image. Oppenheim (Oppenheim et al., 1980; Lim & Oppenheim, 1981) show that the image phase spectrum contains the most important information required for face image recognition, doing less relevant the use of the magnitude spectrum. His research also shows that getting only the phase spectrum of an image, can reconstruct the original image to a scale factor, therefore, information phase is the most important in the representation of a 2D signal in the Fourier domain. This is also demonstrated by a simple experiment, this is shown in Figure 6, in which the face of A (a) is reconstructed using the phase spectrum of A (c) and the magnitude spectrum of face B (f);

denote its Fourier transform as *X*[*k*, *l*] whose Fourier transform pair is defined as follows:

Multidimensional Features Extraction Methods in Frequency Domain 345

*<sup>x</sup>*[*m*, *<sup>n</sup>*]*exp*−*i*2*πkm*

If we have N training images then the covariance matrix of the Fourier transform is obtained

now formulate the space domain PCA and it shows that the space in the domain of the

*Cs* = Σ

Comparing equation 13 with equation 16 we can see that is a relationship between space and frequency domain of the eigenvectors (*vsandvf*) related by an inverse Fourier transform as

Here present some results, where the Face Recognition System was evaluated using the "AR Face Database", which has a total of 9,360 face images of 120 people (65 men and 55 women). This database includes 78 face images of each people with different illuminations, facial expression and partial occluded face images with sunglasses and scarf. To evaluate the performance of the proposed methods under several illumination and occlusion conditions, two different training sets are used. The first one consists of 1200 images, 10 images of each person, with different illumination and expression conditions; while the second set consists of

Σ

{*F*(*xi* <sup>−</sup> *<sup>μ</sup>*}{*F*(*xi* <sup>−</sup> *<sup>μ</sup>*}*<sup>T</sup>* <sup>=</sup> *<sup>F</sup>*<sup>Σ</sup>

*<sup>X</sup>*[*k*, *<sup>l</sup>*]*exp*−*i*2*πkm*

*X*[*k*, *l*] =

*<sup>X</sup>*[*m*, *<sup>n</sup>*] = <sup>1</sup>

Σ *<sup>f</sup>* <sup>=</sup> <sup>1</sup> *N*

multiplying both sides by *F*−<sup>1</sup> have:

covariance matrix is:

follows:

**3.3 Results**

once the PCA was normalized, the eigenvectors *wf* de Σ

where *vs* is the space in the domain of the eigenvectors.

where *<sup>i</sup>* <sup>=</sup> √−<sup>1</sup>

by:

*M*−1 ∑ *m*=0

*MN*

*N* ∑ *i*=1

*F*Σ

Σ

*N*−1 ∑ *n*=0

*M*−1 ∑ *m*=0

*N*−1 ∑ *n*=0

*x*[*m*, *n*] *X*[*k*, *l*] (8)

*<sup>M</sup> exp*−*i*2*πln*

*<sup>f</sup>* are obtained by:

*sF*−<sup>1</sup>*wf* <sup>=</sup> *<sup>λ</sup>wf* (12)

*sF*−<sup>1</sup>*wf* <sup>=</sup> *<sup>λ</sup>F*−<sup>1</sup>*wf* (13)

*<sup>s</sup>* (14)

*Csvs* = *λvs* (15)

*svs* = *λvs* (16)

*vs* = *<sup>F</sup>*−<sup>1</sup>*vf* (17)

*<sup>N</sup>* (9)

*<sup>N</sup>* (10)

*sF*−<sup>1</sup> (11)

*<sup>M</sup> exp*−*i*2*πln*

Fig. 5. General Face Recognition System

and the face of B (e) is reconstructed using the phase spectrum of B (g) and the magnitude spectrum of A (b). The reconstructed images, figures 2(d) and 2(h), show that the synthesized face image clearly resemble A (a) and B (e), respectively.

However, the performance of PCA in the frequency domain alone does not constitute any progress, this is because the eigenvectors obtained in the frequency domain are simply the Fourier transform of spatial domain. We begin this derivation by defining the standard 2-D discrete Fourier transform (DFT). Given an input 2-D discrete signal *x*[*m*, *n*] of size *M* × *N*

denote its Fourier transform as *X*[*k*, *l*] whose Fourier transform pair is defined as follows:

$$\propto [m, n] \iff X[k, l] \tag{8}$$

$$X[k,l] = \sum\_{m=0}^{M-1} \sum\_{n=0}^{N-1} x[m,n] \exp\frac{-i2\pi km}{M} \exp\frac{-i2\pi ln}{N} \tag{9}$$

$$X[m,n] = \frac{1}{MN} \sum\_{m=0}^{M-1} \sum\_{n=0}^{N-1} X[k,l] \exp\frac{-i2\pi km}{M} \exp\frac{-i2\pi ln}{N} \tag{10}$$

where *<sup>i</sup>* <sup>=</sup> √−<sup>1</sup>

8 Will-be-set-by-IN-TECH

and the face of B (e) is reconstructed using the phase spectrum of B (g) and the magnitude spectrum of A (b). The reconstructed images, figures 2(d) and 2(h), show that the synthesized

However, the performance of PCA in the frequency domain alone does not constitute any progress, this is because the eigenvectors obtained in the frequency domain are simply the Fourier transform of spatial domain. We begin this derivation by defining the standard 2-D discrete Fourier transform (DFT). Given an input 2-D discrete signal *x*[*m*, *n*] of size *M* × *N*

Fig. 5. General Face Recognition System

Fig. 6. Oppenheim Experiment

face image clearly resemble A (a) and B (e), respectively.

If we have N training images then the covariance matrix of the Fourier transform is obtained by:

$$\widehat{\Sigma\_f} = \frac{1}{N} \sum\_{i=1}^{N} \{ F(\mathbf{x}\_i - \widehat{\boldsymbol{\mu}}) \} \{ F(\mathbf{x}\_i - \widehat{\boldsymbol{\mu}}) \}^T = F \widehat{\Sigma\_s} \mathbf{F}^{-1} \tag{11}$$

once the PCA was normalized, the eigenvectors *wf* de Σ *<sup>f</sup>* are obtained by:

$$F\widehat{\Sigma\_{\sf s}}F^{-1}w\_f = \lambda w\_f\tag{12}$$

multiplying both sides by *F*−<sup>1</sup> have:

$$
\widehat{\Sigma\_{\sf s}} F^{-1} w\_f = \lambda F\_{-1} w\_f \tag{13}
$$

now formulate the space domain PCA and it shows that the space in the domain of the covariance matrix is:

$$\mathcal{C}\_{\rm s} = \widehat{\Sigma\_{\rm s}} \tag{14}$$

$$
\mathbb{C}\_s v\_s = \lambda v\_s \tag{15}
$$

$$
\hat{\Sigma}\_{\text{s}}^{\top} v\_{\text{s}} = \lambda v\_{\text{s}} \tag{16}
$$

where *vs* is the space in the domain of the eigenvectors.

Comparing equation 13 with equation 16 we can see that is a relationship between space and frequency domain of the eigenvectors (*vsandvf*) related by an inverse Fourier transform as follows:

$$v\_s = F^{-1} v\_f \tag{17}$$

#### **3.3 Results**

Here present some results, where the Face Recognition System was evaluated using the "AR Face Database", which has a total of 9,360 face images of 120 people (65 men and 55 women). This database includes 78 face images of each people with different illuminations, facial expression and partial occluded face images with sunglasses and scarf. To evaluate the performance of the proposed methods under several illumination and occlusion conditions, two different training sets are used. The first one consists of 1200 images, 10 images of each person, with different illumination and expression conditions; while the second set consists of

**4.2 Feature vector extraction**

**4.2.1 End-pointing algorithm**

gamma constant:

signal is beginning.

**4.2.2 Frame blocking and windowing**

frame increment and frame overlapping:

Fig. 8. Methodology used in the signal processing

Figure 8 shows the applied processes in the signal analysis. With this signal analysis a high efficiency of feature extraction is obtained, this facilitates to the neural network the recognition process, this means that higher percentages of verification and identification can be obtained.

Multidimensional Features Extraction Methods in Frequency Domain 347

In time domain, magnitude, energy, power, maximums and minimums can be computed from which, the energy is used. Once the energy was calculated, a reference is obtained, with this

> ∞ ∑ *n*=−∞ *s*

Now, a gamma constant is defined, this constant indicates the number of samples taken from the signal. The following step is to make a relationship between the sound signal and the

*<sup>E</sup>*[*n*] = [(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*) <sup>∗</sup> *En*−1] <sup>∗</sup> [*<sup>γ</sup>* <sup>∗</sup> *<sup>y</sup>*<sup>2</sup>

To each file stored in the database an end-pointing algorithm was applied. In order to limit the signal two thresholds of 20 and 10 the maximum energy must be defined, this corresponds to the percentage taken form the signal. This algorithm compares the thresholds with each sample of the energy until a sample is greater or equal to these thresholds, indicating the

The sound signal, *S*[*n*], is blocked into frames of 240 samples that corresponds to 30 msec, in which voice is considered stationary (Kitamura & Hayahara, 1988), with adjacent frames being separated by 120 samples. The use of frames implies three parameters: frame size,

where *If* represents the frame beginning and *Of* the frame overlapping.

<sup>2</sup>[*n*] (18)

*Sf* = *If* + *Of* (20)

*<sup>n</sup>*] (19)

reference the signal can be limited. In the discrete case the energy is defined as:

*E*[*n*] =

1200 face images, 10 per each person, with different illumination conditions, expressions and occlusions, which are the result of using sunglasses and scarf. The remaining images of the AR face database are used for testing. The table 3 show some results of face identification and table 4 some results of face verification.


Table 3. Results using Eigenphases for identification.


Table 4. Results using Eigenphases for verification

### **4. Feature extraction in sound recognition**

Sound recognition (Goldhor, 1993) has a large amount of application civil as well as military applications such as engine diagnostic, airplane or ship recognition, etc. This applications also depends on the above mentioned feature extraction methods. This section describes a sound recognition system using a frequency domain feature extraction method like LPCepstral.

### **4.1 Sound recognition system**

Figure 7 shows the proposed system. This system consist of four sequential processes: first a common database of environmental sounds is obtained, after this a segmentation algorithm is applied to each token (file) of this database; third LPC-Cepstral features are extracted from each segmented file and the DFT is computed from these coefficients, finally the DFT magnitude is computed and a training strategy is adopted. The decision is taken at the final process and a recognition percentage is computed.

Fig. 7. General Sound Recognition System

#### **4.2 Feature vector extraction**

10 Will-be-set-by-IN-TECH

1200 face images, 10 per each person, with different illumination conditions, expressions and occlusions, which are the result of using sunglasses and scarf. The remaining images of the AR face database are used for testing. The table 3 show some results of face identification and

Eigenphases 80.63 96.28

Training set 1 0.5 19.07 Training set 2 0.5 3.80

Sound recognition (Goldhor, 1993) has a large amount of application civil as well as military applications such as engine diagnostic, airplane or ship recognition, etc. This applications also depends on the above mentioned feature extraction methods. This section describes a sound recognition system using a frequency domain feature extraction method like LPCepstral.

Figure 7 shows the proposed system. This system consist of four sequential processes: first a common database of environmental sounds is obtained, after this a segmentation algorithm is applied to each token (file) of this database; third LPC-Cepstral features are extracted from each segmented file and the DFT is computed from these coefficients, finally the DFT magnitude is computed and a training strategy is adopted. The decision is taken at the final

Training set 1 Training set 2

False acceptance False reject

table 4 some results of face verification.

Table 3. Results using Eigenphases for identification.

Table 4. Results using Eigenphases for verification

**4. Feature extraction in sound recognition**

process and a recognition percentage is computed.

Fig. 7. General Sound Recognition System

**4.1 Sound recognition system**

Figure 8 shows the applied processes in the signal analysis. With this signal analysis a high efficiency of feature extraction is obtained, this facilitates to the neural network the recognition process, this means that higher percentages of verification and identification can be obtained.

Fig. 8. Methodology used in the signal processing

#### **4.2.1 End-pointing algorithm**

In time domain, magnitude, energy, power, maximums and minimums can be computed from which, the energy is used. Once the energy was calculated, a reference is obtained, with this reference the signal can be limited. In the discrete case the energy is defined as:

$$E[n] = \sum\_{n = -\infty}^{\infty} s^2[n] \tag{18}$$

Now, a gamma constant is defined, this constant indicates the number of samples taken from the signal. The following step is to make a relationship between the sound signal and the gamma constant:

$$E[n] = [(1 - \gamma) \* E\_{n-1}] \* [\gamma \* y\_n^2] \tag{19}$$

To each file stored in the database an end-pointing algorithm was applied. In order to limit the signal two thresholds of 20 and 10 the maximum energy must be defined, this corresponds to the percentage taken form the signal. This algorithm compares the thresholds with each sample of the energy until a sample is greater or equal to these thresholds, indicating the signal is beginning.

#### **4.2.2 Frame blocking and windowing**

The sound signal, *S*[*n*], is blocked into frames of 240 samples that corresponds to 30 msec, in which voice is considered stationary (Kitamura & Hayahara, 1988), with adjacent frames being separated by 120 samples. The use of frames implies three parameters: frame size, frame increment and frame overlapping:

$$S\_f = I\_f + O\_f \tag{20}$$

where *If* represents the frame beginning and *Of* the frame overlapping.

where *S*(*n*) is the estimated signal in time *n*, *P* is the filter order and a is the filter coefficients

Multidimensional Features Extraction Methods in Frequency Domain 349

A 64-point DFT was then calculated for each column in the matrix and the first 32 points of this symmetrical transform retained. The resulting square matrix is a two dimensional Cepstral representation of the input signal. Each column corresponds to a particular spectral frequency, and ach row corresponds to a temporal frequency. The first column contains the DFT of the power envelope of the signal. The first row contains the DFT of the average signal spectrum. The first element of the first column contains the average signal power level. It Šs typical of a two-dimensional Cepstral representations of acoustic signals, and certainly for our signals, that this corner element is the largest component and the size of the components in the first row and first column are larger than the size of interior matrix components. After this the DFT magnitude for each column in the matrix is computed. The LPC-Cepstral coefficients, which are the Fourier transform representation of the spectrum, have been shown to be more robust for speech recognition than the LPC coefficients, in this case we applied this method. The Cepstral transforms were calculated for two variants of the spectrum within each frame: a linear variant and one in which the frequency scale was

The used model is an artificial neural network backpropagation. The traditional error backpropagation algorithm is used. For each sound pattern, 50 sound files were used in the network training process. The sound samples are first normalized so that the average magnitude becomes zero and the standard deviation is one. Clusters, or classes, were formed by grouping the feature vectors for each type of sound. For the network training, the ideal number of hidden-layer neurons was chosen from the experimental work. The hope, of course, is that the samples of each sound will cluster together in that space and that cluster

Two neural networks, per sound-source, were trained. Four stages (one per neural network) were necessary for the network training and each stage corresponds to each file stored in the database. 32 input-layer neurons were necessary for the neural network training, 10, 15 and 20 hidden-layer neurons were used in this neural network and the best results were obtained with 20 neurons; 1 output-layer neuron, per network, was necessary for verify the

Fingerprint Identification (Aguilar et al., 1994) is one of the most reliable and popular personal identification methods. The performance of minutiae extraction algorithms and other fingerprint recognition techniques relies heavily on the input fingerprint images quality. In an ideal fingerprint image, ridges and valleys alternate and flow in a locally constant direction. In such situations, the ridges can be easily detected and the minutiae can be accurately located in the image. However, in practice, because of skin conditions (e.g., wet or dry, cuts, and bruises), sensor noise, incorrect finger pressure; and inherently low-quality fingers (e.g., elderly people, manual workers), etc. a significant amount of fingerprints are

source-sounds and 4 output-layer neurons in the identification process.

**5. Feature extraction in fingerprint recognition**

vector.

**4.3 Results**

warped using mel frequency transformation.

for different sounds will be rejected.

poor quality images.

To the sequence of analysis frames generated from each end-pointed file was applied a windowing algorithm, Hamming window was used:

$$
\widehat{S}\_w[n] = \widehat{S}[n]W[n] \tag{21}
$$

where 0 *< n < N* − 1 , *N* is the number of samples in the analysis frame and *W*[*n*] is a Hamming window.

#### **4.2.3 LPC and LPC-Cepstral parameters**

In each window 17 LPC coefficients were calculated with Levinson-Durbin recursion. LPC-Cepstral coefficients can be derived directly from the set of LPC coefficients using the algorithm:

$$\mathbb{C}[n] = -a[n] - \frac{1}{n} \sum\_{k=1} n = 1k\mathbb{C}[k]a[n-k] \tag{22}$$

where *n >* 0, *C*<sup>0</sup> = *a*<sup>0</sup> = 1, *k > p* and *a*[*n*] represents the linear prediction coefficients, estimated, using a linear filter, as it is shown in Figure 9 , where a given sound sample cam be approximated or predicted as a linear combination of its past *p* samples, as shown in eq. 22. The number of frames generated for each signal was of 64. The result in effect was that each signal was represented by a 17 by 64 array of the Cepstral coefficients, with the 64 rows representing time and the 17 columns representing frequency.

Fig. 9. Linear Prediction Filter used for the LPC-Cepstral Estimation

$$\widehat{S}(n) = \sum\_{i=1}^{P} a(i)S(n-i) \tag{23}$$

where *S*(*n*) is the estimated signal in time *n*, *P* is the filter order and a is the filter coefficients vector.

A 64-point DFT was then calculated for each column in the matrix and the first 32 points of this symmetrical transform retained. The resulting square matrix is a two dimensional Cepstral representation of the input signal. Each column corresponds to a particular spectral frequency, and ach row corresponds to a temporal frequency. The first column contains the DFT of the power envelope of the signal. The first row contains the DFT of the average signal spectrum. The first element of the first column contains the average signal power level. It Šs typical of a two-dimensional Cepstral representations of acoustic signals, and certainly for our signals, that this corner element is the largest component and the size of the components in the first row and first column are larger than the size of interior matrix components. After this the DFT magnitude for each column in the matrix is computed. The LPC-Cepstral coefficients, which are the Fourier transform representation of the spectrum, have been shown to be more robust for speech recognition than the LPC coefficients, in this case we applied this method. The Cepstral transforms were calculated for two variants of the spectrum within each frame: a linear variant and one in which the frequency scale was warped using mel frequency transformation.

### **4.3 Results**

12 Will-be-set-by-IN-TECH

To the sequence of analysis frames generated from each end-pointed file was applied a

where 0 *< n < N* − 1 , *N* is the number of samples in the analysis frame and *W*[*n*] is a

In each window 17 LPC coefficients were calculated with Levinson-Durbin recursion. LPC-Cepstral coefficients can be derived directly from the set of LPC coefficients using the

> *<sup>n</sup>* ∑ *k*=1

where *n >* 0, *C*<sup>0</sup> = *a*<sup>0</sup> = 1, *k > p* and *a*[*n*] represents the linear prediction coefficients, estimated, using a linear filter, as it is shown in Figure 9 , where a given sound sample cam be approximated or predicted as a linear combination of its past *p* samples, as shown in eq. 22. The number of frames generated for each signal was of 64. The result in effect was that each signal was represented by a 17 by 64 array of the Cepstral coefficients, with the 64 rows

*<sup>C</sup>*[*n*] = <sup>−</sup>*a*[*n*] <sup>−</sup> <sup>1</sup>

representing time and the 17 columns representing frequency.

Fig. 9. Linear Prediction Filter used for the LPC-Cepstral Estimation

*S*(*n*) =

*P* ∑ *i*=1

*S<sup>w</sup>*[*n*] = *S*[*n*]*W*[*n*] (21)

*n* = 1*kC*[*k*]*a*[*n* − *k*] (22)

*a*(*i*)*S*(*n* − *i*) (23)

windowing algorithm, Hamming window was used:

**4.2.3 LPC and LPC-Cepstral parameters**

Hamming window.

algorithm:

The used model is an artificial neural network backpropagation. The traditional error backpropagation algorithm is used. For each sound pattern, 50 sound files were used in the network training process. The sound samples are first normalized so that the average magnitude becomes zero and the standard deviation is one. Clusters, or classes, were formed by grouping the feature vectors for each type of sound. For the network training, the ideal number of hidden-layer neurons was chosen from the experimental work. The hope, of course, is that the samples of each sound will cluster together in that space and that cluster for different sounds will be rejected.

Two neural networks, per sound-source, were trained. Four stages (one per neural network) were necessary for the network training and each stage corresponds to each file stored in the database. 32 input-layer neurons were necessary for the neural network training, 10, 15 and 20 hidden-layer neurons were used in this neural network and the best results were obtained with 20 neurons; 1 output-layer neuron, per network, was necessary for verify the source-sounds and 4 output-layer neurons in the identification process.

### **5. Feature extraction in fingerprint recognition**

Fingerprint Identification (Aguilar et al., 1994) is one of the most reliable and popular personal identification methods. The performance of minutiae extraction algorithms and other fingerprint recognition techniques relies heavily on the input fingerprint images quality. In an ideal fingerprint image, ridges and valleys alternate and flow in a locally constant direction. In such situations, the ridges can be easily detected and the minutiae can be accurately located in the image. However, in practice, because of skin conditions (e.g., wet or dry, cuts, and bruises), sensor noise, incorrect finger pressure; and inherently low-quality fingers (e.g., elderly people, manual workers), etc. a significant amount of fingerprints are poor quality images.

equal to the image size. Therefore the algorithm does not use the full contextual information

Multidimensional Features Extraction Methods in Frequency Domain 351

An advantage of this approach is that, for its operation, it does not require the computation of intrinsic images; which has the effect of increasing the dominant spectral components while attenuating the weak ones. However, in order to preserve the phase, the enhancement also retains the original spectrum F(u,v). In this algorithm the fingerprint enhancement are

Sherlock (Sherlock et al., 1994) perform contextual filtering completely in the frequency domain, where each image is convolved with a precomputed filter whose size is equal to the image size. However, the algorithm assumes that the ridge frequency is constant through out the image in order to prevent having a large number of precomputed filters. Therefore the algorithm does not use the full contextual information provided by the fingerprint image. Watson (Watson, Candela & Grother) proposed another approach for performing image enhancement completely in the frequency domain, which is based on the root filtering technique. In this approach the image is divided into overlapping blocks, where in each block

where the *k* in formula (24) is an experimentally determined constant, which we choose equal to 0.45. Here, while a higher "k" improves the appearance of the ridges, filling up small holes in ridges, a too large value of "*k*" may result in false joining of ridges, such that a termination might become a bifurcation. Another advantage of this approach is that, for its operation, it does not require the computation of intrinsic images; which has the effect of increasing the dominant spectral components while attenuating the weak ones. However, in order to preserve the phase, the enhancement also retains the original spectrum *F*(*u*, *v*). Figure 12

In this section a fingerprint recognition algorithm using FFT was evaluated. The tests consisted of the recognition of 125 people. Table 5 shows the recognition results obtained using FFT. We show that it results in 11.3% improvement in recognition rate over a set of 1000

*F*(*u*, *v*)|*F*(*u*, *v*)|

*k* 

*F*(*u*, *v*) = *FFT*(*I*(*x*, *y*)) (25)

(24)

transformed from spatial domain to frequency domain by Fourier transforming.

*Ienh*(*x*, *<sup>y</sup>*) = *FFT*−<sup>1</sup>

provided by the fingerprint image.

Fig. 12. Set of Fingerprints

**5.2.1 Fourier domain filtering**

the enhanced image is obtained as:

shows fingerprint images after of the FFT.

**5.3 Results**

Fig. 10. Percent of correct classification of test.

### **5.1 Fingerprint recognition system**

Figure 10 shows the main stages of a biometric system. As mentioned above, a good quality of the input image ensures that the recognition will be higher. Therefore, we can use Fourier to enhance the quality of input images of a biometric system.

Fig. 11. General Fingerprint Recognition System

### **5.2 Feature vector extraction**

A good fingerprint image will have high contrast and well-defined ridges and valleys, and a poor quality fingerprint will have low contrast and ridges and valleys poorly defined. In a biometric system we can have images with different qualities, as shown in Figure 11. Thus the goal of an enhancement algorithm is to improve the clarity of the ridge structures in the recoverable regions and classify the unrecoverable regions as too hazy for further processing. Because of the regularity and continuity properties of the fingerprint images, the occluded and corrupted regions can be recovered using the contextual information from the surrounding neighborhood. This section describes a contextual filtering completely in the frequency domain, where each image is convolved with a precomputed filter whose size is equal to the image size. Therefore the algorithm does not use the full contextual information provided by the fingerprint image.

Fig. 12. Set of Fingerprints

14 Will-be-set-by-IN-TECH

Figure 10 shows the main stages of a biometric system. As mentioned above, a good quality of the input image ensures that the recognition will be higher. Therefore, we can use Fourier

A good fingerprint image will have high contrast and well-defined ridges and valleys, and a poor quality fingerprint will have low contrast and ridges and valleys poorly defined. In a biometric system we can have images with different qualities, as shown in Figure 11. Thus the goal of an enhancement algorithm is to improve the clarity of the ridge structures in the recoverable regions and classify the unrecoverable regions as too hazy for further processing. Because of the regularity and continuity properties of the fingerprint images, the occluded and corrupted regions can be recovered using the contextual information from the surrounding neighborhood. This section describes a contextual filtering completely in the frequency domain, where each image is convolved with a precomputed filter whose size is

Fig. 10. Percent of correct classification of test.

Fig. 11. General Fingerprint Recognition System

to enhance the quality of input images of a biometric system.

**5.1 Fingerprint recognition system**

**5.2 Feature vector extraction**

An advantage of this approach is that, for its operation, it does not require the computation of intrinsic images; which has the effect of increasing the dominant spectral components while attenuating the weak ones. However, in order to preserve the phase, the enhancement also retains the original spectrum F(u,v). In this algorithm the fingerprint enhancement are transformed from spatial domain to frequency domain by Fourier transforming.

### **5.2.1 Fourier domain filtering**

Sherlock (Sherlock et al., 1994) perform contextual filtering completely in the frequency domain, where each image is convolved with a precomputed filter whose size is equal to the image size. However, the algorithm assumes that the ridge frequency is constant through out the image in order to prevent having a large number of precomputed filters. Therefore the algorithm does not use the full contextual information provided by the fingerprint image. Watson (Watson, Candela & Grother) proposed another approach for performing image enhancement completely in the frequency domain, which is based on the root filtering technique. In this approach the image is divided into overlapping blocks, where in each block the enhanced image is obtained as:

$$I\_{enh}(\mathbf{x}, \mathbf{y}) = FFT^{-1}\left[F(\mathbf{u}, \mathbf{v})|F(\mathbf{u}, \mathbf{v})|^{k}\right] \tag{24}$$

$$F(\mathfrak{u}, \mathfrak{v}) = FFT(I(\mathfrak{x}, \mathfrak{y})) \tag{25}$$

where the *k* in formula (24) is an experimentally determined constant, which we choose equal to 0.45. Here, while a higher "k" improves the appearance of the ridges, filling up small holes in ridges, a too large value of "*k*" may result in false joining of ridges, such that a termination might become a bifurcation. Another advantage of this approach is that, for its operation, it does not require the computation of intrinsic images; which has the effect of increasing the dominant spectral components while attenuating the weak ones. However, in order to preserve the phase, the enhancement also retains the original spectrum *F*(*u*, *v*). Figure 12 shows fingerprint images after of the FFT.

### **5.3 Results**

In this section a fingerprint recognition algorithm using FFT was evaluated. The tests consisted of the recognition of 125 people. Table 5 shows the recognition results obtained using FFT. We show that it results in 11.3% improvement in recognition rate over a set of 1000

El-Solh A., Cuhadar A., Goubran R. A., (2007). Evaluation of Speech Enhancement Techniques

Multidimensional Features Extraction Methods in Frequency Domain 353

Markov K. P., Nakagawa S. (1999). Integrating Pitch and LPC-Residual Information with

Plumper M. D., Quatieri T. F., Reynolds D. A. (1999). Modeling of the Glottal Flow Derivative

Liu F., Stern R. M., Huang X., Acero A. (1993). Efficient Cepstral Normalization For Robust

Simancas-Acevedo E., Kurematsu A., Nakano-Miyatake M., Perez-Meana H. (2001). Speaker

Campbell J. P. (1997). Speaker Recognition: A Tutorial. Proceedings of the IEEE, vol. 85, Nˇr9,

Seung-Jin Jang, Seong-Hee Choi, Hyo-Min Kim, Hong-Shik Choi, Young-Ro Yoon, (2007).

Pool J., du Preez J. A. (1999). HF Speaker Recognition. Thesis notes, Digital Signal

Murthy H. A., Beaufays F., Heck L. P., Weintraub M. (1999). Robust Text-Independent Speaker

Reynolds D. A. (1995). Robust Text-Independent Speaker Identification Using Gaussian

Hardt D., Fellbaum K. (1997). Spectral Subtraction and Rasta Filtering in Text Dependent

Kriegman D., Belhumeur P., Hespanha J. (1997). Eigenfaces vs fisherfaces: recognition using

Hager J., Ekman P., Sejnowski T., Donato G., Bartlett M. (1999). Classifying facial actions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, 974-989. Savvides M., Kumar B.V.K.V., Khosla P.K. (2004). Eigenphases vs. eigen-faces. Pro- ceedings of the 17th International Conference on Pattern Recognition, Vol. 3, 810-813. Oppenheim A. V., Hayes M. H., Lim J. S. (1980). Signal reconstruction from phase or magni-

HMM-based Speaker Verification. Proc. of ICASSP, vol. 2, 867-870.

tude. IEEE Trans. Acoust., Signal Processing, Vol. 28, pp. 672-680.

Language Technology. ISBN:1-55860-324-7, Stroudsburg, PA, USA.

9780-7695-3084-0.

(E), 20, 4, 281-291.

Cliffd, NJ.

1437U1462. ˝

and Audio Processing, vol. 7, Nˇr5.

1557-170X, Print ISBN: 978-1-4244-0787-3 .

of Stellenbosch, March 1999.

Processing, vol. 7, Nˇr5.

Intelligence, 19(7), 711-720.

3, Nˇr1, 72U83. ˝

for Speaker Identification in Noisy Environments. Multimedia Workshops, 2007. ISMW '07. Ninth IEEE International Symposium on, page(s): 235 - 239 Print ISBN:

LPC-Cepstral for Text-independent Speaker Recognition. J. Acoustic Society of Japan

Waveform with Application to Speaker Identification. IEEE Transactions on Speech

Speech recognition. Proceeding HLT '93 Proceedings of the workshop on Human

Recognition Using Gaussian Mixtures Model. Lecture Notes in Computer Science, Bio- Inspired Applications of Connectionism, Springer Verlag, Berlin, 287U294. ˝ Rabiner B., Gold B. (1975). Digital Processing of Speech Signals. Prentice Hall, Englewood

Evaluation of Performance of Several Established Pitch Detection Algorithms in Pathological Voices. Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE , page(s): 620 - 623, ISSN :

Processing Group, Department of Electrical and Electronic Engineering, University

Identification over Telephone Channels. IEEE Transactions on Speech and Audio

Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing, vol.

class specific linear projection. IEEE Transactions on Pattern Analysis and Machine

Fig. 13. Set of Fingerprints

images. True recognition means that the person was effectively recognized, the rest is divided into two: false recognition and without recognition. False recognition occurs when a person is confused and without recognition when the system does not deliver any possible identified person.


Table 5. Test results made to 1000 images

The ROC curves before and after enhancement are as shown in the Figure 13.

Fig. 14. Set of Fingerprints

#### **6. References**

Ganchev T., Tsopanoglou A., Fakotakis N., Kokkinakis G. (2002). Probabilistic Neural Networks Combined with GMMs For Speaker Recognition over Telephone Channels. 14-th International Conference On Digital Signal Processing (DSP 2002), Santorini, Greece, Volume II, 1081U1084. ˝

16 Will-be-set-by-IN-TECH

images. True recognition means that the person was effectively recognized, the rest is divided into two: false recognition and without recognition. False recognition occurs when a person is confused and without recognition when the system does not deliver any possible identified

> Total True False without percentage recognition recognition recognition Without FFT 81.8% 9.4% 8.8% Using FFT 93.1% 3.8% 3.1%

Ganchev T., Tsopanoglou A., Fakotakis N., Kokkinakis G. (2002). Probabilistic Neural

Networks Combined with GMMs For Speaker Recognition over Telephone Channels. 14-th International Conference On Digital Signal Processing (DSP 2002), Santorini,

The ROC curves before and after enhancement are as shown in the Figure 13.

Fig. 13. Set of Fingerprints

Fig. 14. Set of Fingerprints

Greece, Volume II, 1081U1084.

˝

**6. References**

Table 5. Test results made to 1000 images

person.


18 Will-be-set-by-IN-TECH

354 Fourier Transform – Signal Processing

Lim J. S., Oppenheim A. V. (1981). The importance of phase in signals. Proc. IEEE, 69(5),

Goldhor, R.S., (1993), Recognition of Environmental Sounds, Proceedings of ICASSP, Vol. 1,

Kitamura T., Hayahara, E. (1988), Word Recognition Using a Two-Dimensional Mel-Cepstrum

Aguilar, G., Sanchez, G., Toscano, K., Salinas, M., Nakano, M., Perez, H., (2007),

Second International Conference on, page(s): 32, Print ISBN: 0-7695-2911-9 . Sherlock B. G., Monro D. M., Millard K., (1994), Fingerprint enhancement by directional Fourier filtering, in: Visual Image Signal Processing, Vol. 141, pp. 87Ð94, 1994. Watson C.I., Candela G.I., and Grother P.J., (1994), ÒComparison of FFT Fingerprint Filtering

in Noisy Enviroments, paper PPP6 presented at the 2nd Joint Meeting of the ASA

Fingerprint Recognition, Internet Monitoring and Protection, 2007. ICIMP 2007.

Methods for Neural Network Classification,Ó Tech. Report: NIST TR 5493, Sept.

529-541.

149-152.

1994.

and ASJ, Hawwaii.

## *Edited by Salih Mohammed Salih*

The field of signal processing has seen explosive growth during the past decades; almost all textbooks on signal processing have a section devoted to the Fourier transform theory. For this reason, this book focuses on the Fourier transform applications in signal processing techniques. The book chapters are related to DFT, FFT, OFDM, estimation techniques and the image processing techqniques. It is hoped that this book will provide the background, references and the incentive to encourage further research and results in this area as well as provide tools for practical applications. It provides an applications-oriented to signal processing written primarily for electrical engineers, communication engineers, signal processing engineers, mathematicians and graduate students will also find it useful as a reference for their research activities.

Fourier Transform - Signal Processing

Fourier Transform

Signal Processing

*Edited by Salih Mohammed Salih*

Photo by agsandrew / iStock