**7. Results**

256 Bio-Inspired Computational Algorithms and Their Applications

The chromosome is comprised of the parameters of the filter bank described in equation 43. The number of sub-bands is fixed, and each one of the centers and bandwidths are subject to

The cost function is given by equation (32), criterion *J2*. The goal is to maximize J2 as a

*R2*. Bandwidth is limited to range from 40 to 400 Hz, varying according to the

The main parameters of the genetic algorithm are summarized in the Table 1. The values of the parameters are given according to the best results obtained by experimentation. The genetic algorithm was ran in Matlab®, using the genetic algorithms toolbox and the *gatool* 

1. *Vocabulary definition*. 2 to K words. In many real life applications, K in 8 to 15 words do

2. *Database acquisition*. 15 to 20 utterances of each word from the vocabulary, for learning purposes. The sampling frequency can be set from 6000 to 8000 Hz. Human voice

3. *s(t)* to *S(*w*)* transformation. Apply Fourier transform to the data, normalize to unitary

4. *Data preparation for the GA.* Set-up the size (eq. 43) and restrictions (subsection 6.2.2.3) of

5. *Running the GA.* Run the GA to find the sub-bands whose *J2* (eq. (32)) is the maximum. 6. *Filters realization.* For each sub-band, compute the coefficients of the respective bandpass filters. For real-time implementation, order from 4 to 8 is recommended, type IIR, elliptic. Elliptic filters achieve great discrimination and selectivity. Implementation

7. *Modeling the commands.* To make comparisons and therefore classification, a Gaussian statistical model of each word is to be constructed for each command in the vocabulary.

of the command, one column per sub-band selected by the AG),

band, this is the feature vector of the command (μi)

a. Construct a matrix *C* of 15-20 rows of *Srn(*w*)* and *M* columns (one row per sample

b. Compute the mean value overall the samples, to find the average energy per sub-

guide. The nomenclature of the parameters in Table 1 is the one used by Matlab®.

To make operational de methodology described so far, the following steps apply:

**6.2.2 Genetic algorithm set up 6.2.2.1 Coding the chromosome** 

**6.2.2.2 The cost function** 

**6.2.2.3 Restrictions** 

the genetic algorithm. Real numbers are used.

*R1*. Sub-bands overlapping < = 50Hz,

performance of the genetic algorithm.

**6.2.2.4 Operating parameters of the genetic algorithm** 

function of the centers and bandwidths.

The following restrictions apply:

**6.2.2.5 Application's algorithm** 

accommodates easily here.

the filter bank (chromosome).

details can be found in [2].

amplitude and to a fixed length of 2000.

Proceed as follows for each command:

the job.

To test the system, the following lexicons were used: *L0*={faster, slower, left, right, stop, forward, reverse, brake}, *L1*={zero, one, two, three, four, five, six, seven, eigth, nine}, L2={rápido, lento, izquierda, derecha, alto, adelante, reversa, freno}, L3 = {uno, dos, tres, cuatro, cinco}. In all the lexicons, 3 male and 3 female volunteers were enrolled. They donated 116 samples of each word, 16 for training and 100 for testing. To demostrate the power of our approach we used the minimum distance classifier with the Mahalanobis distance. In all the cases, the genetic algorithm was ran 30 times to find the best response in the training set. During the training phase, the leave-one-out method was used to exploit the limited size of the training set [1]. Table 2 summarizes the results. In columns 5 and 7 are shown the comparison against a backpropagation neural network using as features Cepstral coefficients. The experiments were done using Matlab(R) and its associated toolboxes of genetic algorithms, neural networks and digital signal processing. The real-time implementation was done with a TMS320LF2407 Texas Instruments(R) Digital Signal Processor mounted on an experimentation card.


 1Gender, 2 Lexicon, 3Minimum distance classifier, 4Backpropagation neural network [1] 8-32-K neurons per layer, K according to the experiment, one neuron for each word in L.

Table 2. Percentage (%) of correct classification with 4 lexicons, 2 languages, 6 persons, male and female voices, 2 classifiers. Simulations and real time implementation.

Optimal Feature Generation with

Genetic Algorithms and FLDR in a Restricted-Vocabulary Speech Recognition System 259

A minimum distance classifier was implemented in a digital signal processor TMS320LF2407 for each of the four lexicons **L**={L0, L1, L2, L3} from Texas Instruments, in order to verify the performance using in a nearly real-life application. The voiced/no-voiced segmentation was performed using a push-button to start and finish capturing the voice. The DSP has a built-in 10-bit analog to digital converter facilitating the interfacing task. The digital filters used were IIR topology, elliptic type, 8th order. The filter coefficients (*A, B*) were calculate using the Matlab® Software. The analog-to-digital conversion was set-up to acquire one sample every T seconds, (T=1/6000); each time a sample came into de device, the filters actuated and the respective output was squared and accumulated to calculate the energy of the signal. Scaling issues had to be solved since the model was created in a real valued [-1 , 1] scale, while the DSP just "see" integer values. Once a whole command was processed, it was just a matter of a few miliseconds to apply the minimum distance classifier and provide the classification. The correct classification rate was in this case of the order of

In this chapter was presented a method to implement a high performance, real-time, restricted-vocabulary speech recognition system, combining a genetic algorithm and the Fisher's Linear Discriminant Ratio (FLDR) in its matrix formulation. A review of the concepts of variable and feature selection as well as feature generation was made; also were presented some concepts related with speech processing, like the LPC formulation and the

One of the conceptual tools used here was the energy of the signal in certain sub-bands in the frequency domain; thanks to the Parseval's theorem, the same amounts of energy can be calculated in the time domain via a bank of digital filters, enabling thus a very fast way to apply the recognizer, since the process goes on at the same time as the occurrence of the word is exerted. Mainly, two experiments were shown, in Spanish and English, with male and female participants; in both cases high performance was attained, beyond 94% at the

Fig. 10. Normalized spectra of the words "faster" and "slower".

**7.1.4 Results of the real-time implementation** 

94.5%, in a total of 1200 repetitions of the words in **L**.

**8. Conclusions and future work** 

DTW method for template matching.

### **7.1 Results and discussion**

## **7.1.1 Results in the L3 Spanish vocabulary**

The genetic algorithm was executed 30 times, and the maximum Fisher's ratio obtained was 62. The resulting best chromosome was:

BF = [254 180 526 132 744 118 1196 141 1483 115 2082 86 2295 171 2828 46]

From which the corresponding center and bandwidth were:

[ ] [ ] C 254 526 744 1196 1483 2082 2295 2828 BW 180 132 118 141 115 86 171 46 = =

The recognition rates in the training and testing sets were 100% and 99%, respectively. In real conditions the correct classification rate was 93.5% in 40 repetitions of each word to the microphone.
