**Optimal Feature Generation with Genetic Algorithms and FLDR in a Restricted-Vocabulary Speech Recognition System**

Julio César Martínez-Romo1, Francisco Javier Luna-Rosas2, Miguel Mora-González3, Carlos Alejandro de Luna-Ortega4 and Valentín López-Rivas5 *1,2,5Instituto Tecnológico de Aguascalientes 3Universidad de Guadalajara, Centro Universitario de los Lagos 4Universidad Politécnica de Aguascalientes Mexico* 

### **1. Introduction**

22 Will-be-set-by-IN-TECH

234 Bio-Inspired Computational Algorithms and Their Applications

Medeiros C. & Veiga Á. (2000). A Hybrid Linear-Model for Time Series Forecasting. *IEEE*

Medeiros C. & Veiga Á. (2005). A Flexible Coefficient Smooth Transition Time Series Model.

Miller B. & Goldberg D. (1995). Genetic Algorithms, Tournament Selection, and the Effects of Noise. *Complex Systems*, Vol 9., page numbers (193-212), ISSN 0891-2513. Journal of Time Series Analysis. *available in http;//www.blackwellpublishing.com/journal.asp?ref=*

Palit A. & Popovic D. (2005). *Computational Intelligence in Time Series Forecasting, theory and*

Sánchez V. & Flores P. & Valera J. & Pérez M. (2004). Mass Balance Calculations in Copper

Szpiro G. (1997). Forecasting chaotic time series with genetic algorithms. *Physical Review E*, Vol

Weigend A. & Gershenfeld N. (1994). *Time Series Prediction, Forecasting the future and*

Yadavalli V. & Dahule R. & Tambee S. & Kulkarni B. (1999). Obtaining functional form for

Flash Smelting by Means of Genetic Algorithms. *Journal of Metals*, Vol 56., No 12.,

*Undestanding the Past*. Addison-Wesley Publishing Company. ISBN 9780201626025,

chaotic time series using genetic algorithm. *CHAOS*, Vol 9., No 3., page numbers

*engineering applications*. Springe-Verlag. ISBN 1852339489, London.

page numbers ( 29-32), ISSN 1047-4838.

(789-794), ISSN 1054-1500.

5., No 3., page numbers (2557-2568), ISSN 1539-3755.

1045-9227.

1045-9227.

*0143-9782*

USA.

*Transactions on Neural Networks*, Vol 11., No 6., page numbers (1402-1412), ISSN

*IEEE Transactions on Neural Networks*, Vol 16., No 1., page numbers (97-113), ISSN

In every pattern recognition problem there exist the need for variable and feature selection and, in many cases, feature generation. In pattern recognition, the term variable is usually understood as the raw measurements or raw values taken from the subjects to be classified, while the term feature is used to refer to the result of the transformations applied to the variables in order to transform them into another domain or space, in which a bigger discriminant capability of the new calculated features is expected; a very popular cases of feature generation are the use of principal component analysis (PCA), in which the variables are projected into a lower dimensional space in which the new features can be used to visualize the underlying class distributions in the original data [1], or the Fourier Transform, in which a few of its coefficients can represent new features [2], [3]. Sometimes, the literature does not make any distinction between variables and features, using them indistinctly [4], [5].

Although many variables and features can be obtained for classification, not all of them posse discriminant capabilities; moreover, some of them could cause confusion to a classifier. That is the reason why the designer of the classification system will require to refine his choice of variables and features. Several specific techniques for such a purpose are available [1], and some of them will be reviewed later on in this chapter.

Optimal feature generation is the generation of the features under some optimality criterion, usually embodied by a cost function to search the solutions' space of the problem at hand and providing the best option to the classification problem. Examples of techniques like these are the genetic algorithms [6] and the simulated annealing [1]. In particular, genetic algorithms are used in this work.

Speech recognition has been a topic of high interest in the research arena of the pattern recognition community since the beginnings of the current computation age [7], [8]; it is due,

Optimal Feature Generation with

selection methods [1].

Genetic Algorithms and FLDR in a Restricted-Vocabulary Speech Recognition System 237

The recent apparition of new and robust classifiers such as support vector machines (SVM), optimum margin classifiers and relevance vector machines [4], and other robust kernel classifiers seems to demonstrate that the new developments are directed towards classifiers which, although powerful, must be preceded by reliable feature generation techniques. In some cases, the classifiers use a filter that consists of a stage of feature selection, like in the Recursive Feature Elimination Support Vector Machine [15], which eliminates features in a recursive manner, similar to the backward/forward variable

The methods for variable and feature selection are based on two approaches: the first is to consider the features as scalars -*scalar feature selection*-, and the other is to consider the features as vectors –*feature vector selection*-. In both approaches a class separability measurement criteria must be adopted; some criteria include the receiver operating curve (ROC), the Fisher Discriminant Ratio (FDR) or the one-dimensional divergence [1]. The goal is to select a subset of *k* from a total of *K* variables or features. In the sequel, the term

The first step is to choose a class separability measuring criterion, C(K). The value of the criterion C(K) is computed for each of the available features, then the features are ranked in descending order of the values of C(K). The *k* features corresponding to the *k* best C(K) values are selected to form the feature vector. This approach is simple but it does not take

The scalar feature selection may not be effective with features with high mutual correlation; another disadvantage is that if one wishes to verify all possible combinations of the features –in the spirit of optimality- then it is evident that the computational burden is a major limitating factor. In order to reduce the complexity some suboptimal procedures have been

a. Select a class separability criterion, and compute its value for the feature vector of all the

b. Eliminate one feature and for each possible combination of the remaining features recalculate the corresponding criterion value. Select the combination with the best

c. From the selected *K-1* feature vector eliminate one feature and for each of the resulting combinations compute the criterion value and select the one with the best value. d. Continue until the feature vector consists of only the *k* features, where *k* is the

**2.1 Methods for variable and feature selection and generation** 

features is used to represent variables and features.

into consideration existing correlations between features.

*Sequential Backward Selection.* The following steps comprise this method:

The number of computations can be calculated from: 1+1/2 ((K+1)K – *k*(*k+1)).*

**2.1.1 Scalar feature selection** 

**2.1.2 Vector feature selection** 

suggested [1]:

features.

value.

predefined size.

partly, to the fact that it is capable of enabling many practical applications in artificial intelligence, such as natural language understanding [9], man-machine interfaces, help for the impaired, and others; on the other hand, it is an intriguing intellectual challenge in which new mathematical methods for feature generation and new and more sophisticated classifiers appear nearly every year [10], [11]. Practical problems that arise in the implementation of speech recognition algorithms include real-time requirements, to lower the computational complexity of the algorithms, and noise cancelation in general or specific environments [12]. Speech recognition can be user or not-user dependant.

A specific case of speech recognition is word recognition, aimed at recognizing isolated words from a continuous speech signal; it find applications in system commanding as in wheelchairs, TV sets, industrial machinery, computers, cell phones, toys, and many others. A particularity of this specific speech processing niche is that usually the vocabulary is comprised of a relatively low amount of words; for instance, see [13] and [14].

In this chapter we present an innovative method for the restricted-vocabulary speech recognition problem in which a genetic algorithm is used to optimally generate the design parameters of a set of bank filters by searching in the frequency domain for a specific set of sub-bands and using the Fisher's linear discriminant ratio as the class separability criterion in the features space. In this way we use genetic algorithms to create optimum feature spaces in which the patterns from **N** classes will be distributed in distant and compact clusters. In our context, each class {ω0, ω1, ω2,…, ωN-1} represents one word of the lexicon. Another important part of this work is that the algorithm is required to run in real time on dedicated hardware, not necessarily a personal computer or similar platform, so the algorithm developed should has low computational requirements.

This chapter is organized as follows: the section 2 will present the main ideas behind the concepts of variable and feature selection; section 3 presents an overview of the most representative speech recognition methods. The section 4 is devoted to explain some of the mathematical foundations of our method, including the Fourier Transform, the Fisher's linear discriminant ratio and the Parseval's theorem. Section 5 shows our algorithmic foundations, namely the genetic algorithms and the backpropagation neural networks, a powerful classifier used here for performance comparison purposes. The implementation of our speech recognition approach is depicted in section 6 and, finally, the conclusions and the future work are drawn in section 7.

### **2. Optimal variable and feature selection**

Feature selection refers to the problem of selecting features that are most predictive of a given outcome. Optimal feature generation, however, refers to the derivation of features from input variables that are optimal in terms of class separability in the feature space. Optimal feature generation is of particular relevance to pattern recognition problems because it is the basis for achieving high correct classification rates: the better the discriminant features are represented, the better the classifier will categorize new incoming patterns. Feature generation is responsible for the way the patterns lay in the features space, therefore, shaping the decision boundary of every pattern recognition problem; linear as well as non-linear classifiers can be beneficiaries of well-shaped feature spaces.

The recent apparition of new and robust classifiers such as support vector machines (SVM), optimum margin classifiers and relevance vector machines [4], and other robust kernel classifiers seems to demonstrate that the new developments are directed towards classifiers which, although powerful, must be preceded by reliable feature generation techniques. In some cases, the classifiers use a filter that consists of a stage of feature selection, like in the Recursive Feature Elimination Support Vector Machine [15], which eliminates features in a recursive manner, similar to the backward/forward variable selection methods [1].
