**Meet the editor**

Dr. S. Ramakrishnan is a Professor and the Head of the Department of Information Technology, Dr. Mahalingam College of Engineering and Technology, Pollachi, India. Dr. Ramakrishnan is a reviewer of 25 international journals such as *IEEE Transactions on Image Processing*, *IET Journals* (Formally IEE), *ACM Computing Reviews*, Elsevier Science Journals, Springer Journals, and Wiley Journals. He is a

guest editor of special issues in three international journals, including *Telecommunication Systems Journal of Springer*. He has published 116 papers in international, national journals and conference proceedings. Dr. S. Ramakrishnan has published a book on wireless sensor networks for CRC Press, USA, three books on speech processing for InTech Publisher, Croatia, and a book on computational techniques for Lambert Academic Publishing, Germany.

## Contents

#### **Preface XI**


## Preface

Pattern recognition continued to be one of the important research fields in computer science and electrical engineering. Lots of new applications are emerging and hence pattern analysis and synthesis become significant subfields in pattern recognition. This book is an edited vol‐ ume and has six chapters arranged into two sections, namely, pattern recognition analysis and pattern recognition applications. These two sections have three chapters each.

Chapter 1 is on motif discovery in protein sequences. This chapter covers basics of motif representation nicely and provides methods for motif discovery using probabilistic models. This chapter also provides the details of the tools available for motif discovery.

Chapter 2 is on metaheuristics for classification problems. Authors of this chapter give the need for metaheuristics and how that can be used for classification problems. This chapter particularly focuses on hybridizing metaheuristics and suggests various methods for hy‐ bridization.

Chapter 3 is on synthesized phase objects in the optical pattern recognition. This chapter is highly comprehensive and has the required basics such as definitions and properties. Au‐ thors of this chapter provide a clear note on the required experimental setups and results with discussions. Methodologies used for the optical pattern recognition are also well ex‐ plained.

Chapter 4 is on face recognition. Authors provide a nice overview for the face recognition systems. Authors critically study and address several challenges in the face recognition sys‐ tems, namely, pose variations, the presence/absence of structuring elements/occlusions, fa‐ cial expression changes, aging of the face, varying illumination conditions, image resolution, and modality.

Chapter 5 is on the classification of brain tissues using textures. Authors of this chapter criti‐ cally study texture analysis and statistical methods and apply those for the classification of CT images using histogram-based features and neural networks. Experimental results and discussions are also represented neatly.

The last chapter is on structural damage detection using machine learning techniques. Au‐ thors of this chapter examine the role of pattern recognition techniques in structural health monitoring. They successfully apply PCA and k-NN for the purpose of damage detection and present the experimental results.

Overall this book is brief and comprehensive and will be a useful resource for the graduate students, researchers, and practicing engineers in the field of machine vision and computer science and engineering.

I would like to express my sincere thanks to all authors for their contribution and effort to compile this wonderful book and my earnest gratitude and appreciation to the InTech pub‐ lisher, in particular Ms. Maja Bozicevic, Publishing Process Manager, who has drawn to‐ gether the authors to publish this book. I would also like to express my heartfelt thanks to the management, secretary, director, and principal of my institute. Finally, I extend my dear‐ est thanks to my family members and in particular to my sweet daughter Abirami.

> **Dr. S. Ramakrishnan** Professor and Head, Department of Information Technology Dr. Mahalingam College of Engineering and Technology Pollachi, India

**Pattern Recognition: Analysis**

I would like to express my sincere thanks to all authors for their contribution and effort to compile this wonderful book and my earnest gratitude and appreciation to the InTech pub‐ lisher, in particular Ms. Maja Bozicevic, Publishing Process Manager, who has drawn to‐ gether the authors to publish this book. I would also like to express my heartfelt thanks to the management, secretary, director, and principal of my institute. Finally, I extend my dear‐

**Dr. S. Ramakrishnan**

Pollachi, India

Professor and Head, Department of Information Technology Dr. Mahalingam College of Engineering and Technology

est thanks to my family members and in particular to my sweet daughter Abirami.

VIII Preface

#### **Chapter 1 Provisional chapter**

#### **Motif Discovery in Protein Sequences Motif Discovery in Protein Sequences**

Salma Aouled El Haj Mohamed , Mourad Elloumi and Julie D. Thompson Salma Aouled El Haj Mohamed, Mourad Elloumi and Julie D. Thompson

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/65441

#### **Abstract**

Biology has become a data‐intensive research field. Coping with the flood of data from the new genome sequencing technologies is a major area of research. The exponential increase in the size of the datasets produced by "next‐generation sequencing" (NGS) poses unique computational challenges. In this context, motif discovery tools are widely used to identify important patterns in the sequences produced. Biological sequence motifs are defined as short, usually fixed length, sequence patterns that may represent important structural or functional features in nucleic acid and protein sequences such as transcription binding sites, splice junctions, active sites, or interaction interfaces. They can occur in an exact or approximate form within a family or a subfamily of sequences. Motif discovery is therefore an important field in bioinformatics, and numerous methods have been developed for the identification of motifs shared by a set of functionally related sequences. This chapter will review the existing motif discovery methods for protein sequences and their ability to discover biologically important features as well as their limitations for the discovery of new motifs. Finally, we will propose new horizons for motif discovery in order to address the short comings of the existent methods.

**Keywords:** motif discovery, bioinformatics, biological sequences, protein sequences, bioinspired algorithms

#### **1. Introduction**

Biology has been transformed by the availability of numerous complete genome sequences for a wide variety of organisms, ranging from bacteria and viruses to model plants and animals to humans. Genome sequencing and analysis is constantly evolving and plays an increasingly

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

important part of biological and biomedical research. This has led to new challenges related to the development of the most efficient and effective ways to analyze data and to use them to generate new insights into the function of biological systems. The completion of the genome sequences is just a first step toward the beginning of efforts to decipher the meaning of the genetic "instruction book." Whole‐genome sequencing is commonly associated with sequenc‐ ing human genomes, where the genetic data represent a treasure trove for discovering how genes contribute to our health and well‐being. However, the scalable, flexible nature of next‐ generation sequencing (NGS) technology makes it equally useful for sequencing any species, such as agriculturally important livestock, plants, or disease‐related microbes.

The exponential increase in the size of the datasets produced by this new generation of instruments clearly poses unique computational challenges. A single run of a NGS machine can produce terabytes of data, and even after image processing, base calling, and assembly, there will be hundreds of gigabytes of uncompressed primary data that must be stored either in flat files or in a database. Efficient treatment of all this data will require new computational approaches in terms of data storage and management, but also new effective algorithms to analyze the data and extract useful knowledge.

The major challenge today is to understand how the genetic information encoded in the genome sequence is translated into the complex processes involved in the organism and the effects of environmental factors on these processes. Bioinformatics plays a crucial role in the systematic interpretation of genome information, associated with data from other high‐ throughput experimental techniques, such as structural genomics, proteomics, or transcrip‐ tomics.

A widely used tool in all these stages is the comparison (or alignment) of the new genetic sequences with existing sequences. During genome assembly, short read sequences are often aligned to a reference genome to form longer contigs. Identification of coding regions then involves alignment of known genes to the new genomic sequence. Finally, functional signifi‐ cance is most often assigned to the protein coding regions by searching public databases for similar sequences and by transferring the pertinent information from the known to the unknown protein. A wide variety of computational algorithms have been applied to the sequence comparison problem in diverse domains, notably in natural language processing. Nevertheless, the analysis of biological sequences involves more than abstract string parsing, for behind the string of bases or amino acids is the whole complexity of molecular and evolutionary biology.

One major problem is the identification of important features, such as regulatory sites in the genomes, or functional domains or active sites in proteins, that are conserved within a family of sequences, without prior alignment of the sequences. In this context, motif recognition and discovery tools are widely used. The retrieved motifs are often compiled in databases including DNA regulatory motifs in TRANSFAC [1], JASPAR [2], or RegulonDB [3], and protein motifs in PRINTS [4], PROSITE [5], or ELM [6]. These well‐characterized motifs can be used as a starting point for the identification of known motifs in other sequences. This is otherwise known as the pattern recognition problem. The challenges associated with *de novo* pattern discovery, or the detection of previously unknown motifs [7], is far more difficult due to the nature of the motifs.

important part of biological and biomedical research. This has led to new challenges related to the development of the most efficient and effective ways to analyze data and to use them to generate new insights into the function of biological systems. The completion of the genome sequences is just a first step toward the beginning of efforts to decipher the meaning of the genetic "instruction book." Whole‐genome sequencing is commonly associated with sequenc‐ ing human genomes, where the genetic data represent a treasure trove for discovering how genes contribute to our health and well‐being. However, the scalable, flexible nature of next‐ generation sequencing (NGS) technology makes it equally useful for sequencing any species,

The exponential increase in the size of the datasets produced by this new generation of instruments clearly poses unique computational challenges. A single run of a NGS machine can produce terabytes of data, and even after image processing, base calling, and assembly, there will be hundreds of gigabytes of uncompressed primary data that must be stored either in flat files or in a database. Efficient treatment of all this data will require new computational approaches in terms of data storage and management, but also new effective algorithms to

The major challenge today is to understand how the genetic information encoded in the genome sequence is translated into the complex processes involved in the organism and the effects of environmental factors on these processes. Bioinformatics plays a crucial role in the systematic interpretation of genome information, associated with data from other high‐ throughput experimental techniques, such as structural genomics, proteomics, or transcrip‐

A widely used tool in all these stages is the comparison (or alignment) of the new genetic sequences with existing sequences. During genome assembly, short read sequences are often aligned to a reference genome to form longer contigs. Identification of coding regions then involves alignment of known genes to the new genomic sequence. Finally, functional signifi‐ cance is most often assigned to the protein coding regions by searching public databases for similar sequences and by transferring the pertinent information from the known to the unknown protein. A wide variety of computational algorithms have been applied to the sequence comparison problem in diverse domains, notably in natural language processing. Nevertheless, the analysis of biological sequences involves more than abstract string parsing, for behind the string of bases or amino acids is the whole complexity of molecular and

One major problem is the identification of important features, such as regulatory sites in the genomes, or functional domains or active sites in proteins, that are conserved within a family of sequences, without prior alignment of the sequences. In this context, motif recognition and discovery tools are widely used. The retrieved motifs are often compiled in databases including DNA regulatory motifs in TRANSFAC [1], JASPAR [2], or RegulonDB [3], and protein motifs in PRINTS [4], PROSITE [5], or ELM [6]. These well‐characterized motifs can be used as a starting point for the identification of known motifs in other sequences. This is otherwise known as the pattern recognition problem. The challenges associated with *de novo* pattern

such as agriculturally important livestock, plants, or disease‐related microbes.

analyze the data and extract useful knowledge.

4 Pattern Recognition - Analysis and Applications

tomics.

evolutionary biology.

Biological sequence motifs are defined as short, usually fixed length, sequence patterns that may represent important structural or functional features in nucleic acid and protein sequences such as transcription binding sites, splice junctions, active sites, or interaction interfaces. They occur in an exact or approximate form within a family or a subfamily of sequences. Motif discovery is therefore an important challenge in bioinformatics and numerous methods have been developed for the identification of motifs shared by a set of functionally related sequences.

Consequently, much effort has been applied to *de novo* motif discovery, for example, in DNA sequences, with a large number of specialized methods that were reviewed recently in [8]. One interesting aspect is the development of nature‐inspired algorithms, for example, particle swarm optimization has been used to find gapped motifs in DNA sequences [9], while DNA motifs have been discovered using an artificial immune system (AIS) [10]. Unfortunately, far fewer tools have been dedicated to the *de novo* search for protein motifs. This is due to the combinatorial explosion created by the large alphabet size of protein sequences, as well as the degeneracy of the motifs, i.e., the large number of wildcard symbols within the motifs. Some tools, such as Teiresias [11], or the MEME suite [12], can discover motifs in both DNA and protein sequences. Other work has been dedicated to the discovery of specific types of protein motifs, such as patterns containing large irregular gaps with "eukaryotic linear motifs" with SLiMFinder [13] or phosphorylation sites [14]. Many studies have been conducted to compare these specific motif discovery tools, such as [15].

In most cases, *de novo* motif discovery algorithms take as input a set of related sequences and search for patterns that are unlikely to occur by chance and that might represent a biologically important sequence pattern. Since protein motifs are usually short and can be highly variable, a challenging problem for motif discovery algorithms is to distinguish functional motifs from random patterns that are overrepresented. One solution to this challenge is to first construct a global multiple alignment of the sequences and then search for motifs in the aligned sequences. This reduces the search space to the aligned regions of the sequences, but also severely limits the possibilities of finding new motifs.

Furthermore, existing motif discovery methods are able to find motifs that are conserved within a complete family, but most of them are still unable to find motifs that are conserved only within a subfamily of the sequences. These subfamily‐specific motifs, which we will call "rare" motifs, are often conserved within groups of proteins that perform the same function (specificity groups) and vary between groups with different functions/specificities. These sites generally determine protein specificity either by binding specific substrates/inhibitors or through interaction with other protein.

In Section 2, we will provide a brief description of protein sequences and the motifs that characterize them. Then, in Section 3, the main approaches used for discovery of motifs in protein sequences will be presented. Section 3 also deals with motif recognition in protein sequences. In Section 4, the main approaches used for the more difficult problem of *de novo* motif discovery will be presented. Finally, in Section 5, we will propose new horizons for motif discovery in order to address the short comings of the existent methods.

#### **2. Protein sequences, active sites, and motifs**

Some basic concepts in protein biology are necessary for understanding the rest of this chapter. For many readers, this will be a familiar territory and in this case, they may want to skip this section and go directly to Section 3.

The genetic information encoded in the genome sequence of any organism contains the blueprint for its potential development and activity. However, the translation of this informa‐ tion into cellular or organism‐level behavior depends on the gene products, especially proteins. Proteins perform a wide variety of cellular functions, ranging from catalysis of reactions, nutrient transport, and signal transmission to structural and mechanical roles. A protein is composed of a single chain of amino acids (of which there are 20 different kinds), represented by their single letter codes. This "primary structure" or sequence is none other than a string of characters that we can read from left to right, i.e., from NH2 part (*N*‐terminal) to the COOH part (*C*‐terminal).

Every protein molecule has a characteristic three‐dimensional (3D) shape or conformation, known as its native state. The process by which a protein sequence assumes its 3D structure is known as folding. Protein folding can be considered as a hierarchical process, in which the primary sequence defines secondary structure, which in turn defines the tertiary structure. Individual protein molecules can then interact with other proteins to form complex quaternary structures. The precise 3D structure of a protein molecule is generally required for proper biological function since a specific conformation is needed that the cell factors can recognize and interact with.

During evolution, random mutagenesis events take place, which change the genomic sequen‐ ces that encode proteins. There are several different types of mutation that can occur. A single amino acid can be substituted for another one. Insertions and deletions also occur, involving a single amino acid up to several hundred amino acids. Some of these evolutionary changes will adversely affect the function of a protein, e.g., mutations of active sites in an enzyme, or mutations that disrupt the 3D structure of the protein. If this happens to a protein that takes part in a crucial process for the cell, it will result in cell death. As a result, amino acids that are essential for a protein's function, or that are needed for the protein to fold correctly, are conserved over time. Occasionally, mutations occur that give rise to new functions. This is one of the ways that new traits and eventually species may come about during evolution.

By comparing related sequences and looking for those amino acids that remain the same in all of the members in the family, we can predict the sites that might be essential for function. Some examples of important functional sites include the following:

**•** Enzyme active sites: to catalyze a reaction, an enzyme will bind to one or more reactant molecules, known as its substrates. The active site consists of the enzyme's amino acids that form temporary bonds with the substrate, known as the binding site, and the amino acids that catalyze the reaction of that substrate.

motif discovery will be presented. Finally, in Section 5, we will propose new horizons for motif

Some basic concepts in protein biology are necessary for understanding the rest of this chapter. For many readers, this will be a familiar territory and in this case, they may want to skip this

The genetic information encoded in the genome sequence of any organism contains the blueprint for its potential development and activity. However, the translation of this informa‐ tion into cellular or organism‐level behavior depends on the gene products, especially proteins. Proteins perform a wide variety of cellular functions, ranging from catalysis of reactions, nutrient transport, and signal transmission to structural and mechanical roles. A protein is composed of a single chain of amino acids (of which there are 20 different kinds), represented by their single letter codes. This "primary structure" or sequence is none other than a string of characters that we can read from left to right, i.e., from NH2 part (*N*‐terminal) to the COOH

Every protein molecule has a characteristic three‐dimensional (3D) shape or conformation, known as its native state. The process by which a protein sequence assumes its 3D structure is known as folding. Protein folding can be considered as a hierarchical process, in which the primary sequence defines secondary structure, which in turn defines the tertiary structure. Individual protein molecules can then interact with other proteins to form complex quaternary structures. The precise 3D structure of a protein molecule is generally required for proper biological function since a specific conformation is needed that the cell factors can recognize

During evolution, random mutagenesis events take place, which change the genomic sequen‐ ces that encode proteins. There are several different types of mutation that can occur. A single amino acid can be substituted for another one. Insertions and deletions also occur, involving a single amino acid up to several hundred amino acids. Some of these evolutionary changes will adversely affect the function of a protein, e.g., mutations of active sites in an enzyme, or mutations that disrupt the 3D structure of the protein. If this happens to a protein that takes part in a crucial process for the cell, it will result in cell death. As a result, amino acids that are essential for a protein's function, or that are needed for the protein to fold correctly, are conserved over time. Occasionally, mutations occur that give rise to new functions. This is one

of the ways that new traits and eventually species may come about during evolution.

examples of important functional sites include the following:

By comparing related sequences and looking for those amino acids that remain the same in all of the members in the family, we can predict the sites that might be essential for function. Some

**•** Enzyme active sites: to catalyze a reaction, an enzyme will bind to one or more reactant molecules, known as its substrates. The active site consists of the enzyme's amino acids that

discovery in order to address the short comings of the existent methods.

**2. Protein sequences, active sites, and motifs**

section and go directly to Section 3.

6 Pattern Recognition - Analysis and Applications

part (*C*‐terminal).

and interact with.


An example of a simple functional site is the N‐glycosylation site, which is a posttransla‐ tional modification where a carbohydrate is attached to a hydroxyl or other functional group of a protein molecule. The sequence motif representing this site can be indicated by N‐X‐S/T. The first amino acid is asparagine (N), the second amino acid can be any of the 20 amino acids (X), and the third amino acid is either serine (S) or threonine (T). This example introduces the first complication in protein motif discovery: the motifs can con‐ tain both exact and ambiguous elements. Asparagine is a necessary amino acid, since this is the site that will be glycosylated, and is represented by an exact element. The third position should be a hydroxyl‐containing amino acid (serine or threonine), while the sec‐ ond position is a "wild card." Nevertheless, the N‐glycosylation motif shown here is un‐ interrupted, and so it is relatively easy to recognize. The spacing between the elements in many other sequence motifs can vary considerably, but the presence of such motifs is gen‐ erally detected from the structure rather than sequence and this kind of motif will not be discussed in detail here. Finally, it should be pointed out that, just because this motif ap‐ pears in a protein sequence, it does not mean that the site is glycosylated. The functional implications of a motif will depend on the neighboring amino acids and the surrounding 3D context. Therefore, in practice, identifying functional motifs from a protein sequence is far from straightforward.

## **3. Motif recognition in protein sequences**

The motif recognition problem takes as input a set of known patterns or features that in some way define a class of proteins. The goal is then to search in an unsupervised or supervised way for other instances of the same patterns. As mentioned in the Introduction, the known motifs in biological sequences are generally compiled databases that are publically available over the Internet. For example, the PRINTS database (www.bioinf.manchester.ac.uk/dbbrowser/ PRINTS) contains "protein fingerprints," where a fingerprint is composed of a group of motifs that characterize a given set of protein sequences with the same molecular function. In contrast, the PROSITE (prosite.expasy.org) and ELM (elm.eu.org) databases contain single motifs that correspond to known functionally or structurally important amino acids, such as those involved in an active site or a ligand binding site. The motifs contained in these resources are generally manually curated and the entries in the databases include extensive documentation of the specific biological function associated with the sites.

#### **3.1. Motif representation**

Over the years, a variety of motif representation models have been developed to take into account the complexity of protein motifs. The models are attempts to construct generalizations based on known functional motifs, and are used to help characterize the functional sites and to facilitate their identification in unknown protein sequences. They can be divided into two main categories.

#### *3.1.1. Deterministic models*

Consensus sequences are the simplest model for representing protein motifs. They can be constructed easily by selecting the amino acid found most frequently at each position in the signal. The number of matches between a consensus and an unknown candidate sequence can be used to evaluate the significance of a potential functional site. However, consensus sequen‐ ces are limited models, since they do not capture the variability of each position. To support some degree of ambiguity, regular expressions can be used. Regular expressions are typically composed of exact symbols, ambiguous symbols, fixed gaps, and/or flexible gaps [16]. For example, the IQ motif is an extremely basic unit of about 23 amino acids, whose conserved core can be represented by the regular expression:

#### [FILV]Qxxx[RK]Gxxx[RK]xx[FILVWY]

where x signifies any amino acid, and the square brackets indicate an alternative.

#### *3.1.2. Probabilistic models*

Although deterministic models provide useful ways to construct human‐readable represen‐ tations of motifs, their main drawback is that they lose some information. For instance, in the IQ motif discussed above, the first position is usually I and both [RK] are most often R. Probabilistic models can be used to overcome such loss of information. The position‐specific scoring matrix (PSSM) [17], also known as the probability weight matrix (PWM), is undoubt‐ edly one of the most widely used probabilistic models. This model is represented by a matrix where each entry (*i*,*a*) is the probability of finding an amino acid *a* at the *i*th position in the sequence motif. For example, for a set of motifs:

**•** WSEW

**3. Motif recognition in protein sequences**

8 Pattern Recognition - Analysis and Applications

of the specific biological function associated with the sites.

core can be represented by the regular expression:

**3.1. Motif representation**

*3.1.1. Deterministic models*

*3.1.2. Probabilistic models*

main categories.

The motif recognition problem takes as input a set of known patterns or features that in some way define a class of proteins. The goal is then to search in an unsupervised or supervised way for other instances of the same patterns. As mentioned in the Introduction, the known motifs in biological sequences are generally compiled databases that are publically available over the Internet. For example, the PRINTS database (www.bioinf.manchester.ac.uk/dbbrowser/ PRINTS) contains "protein fingerprints," where a fingerprint is composed of a group of motifs that characterize a given set of protein sequences with the same molecular function. In contrast, the PROSITE (prosite.expasy.org) and ELM (elm.eu.org) databases contain single motifs that correspond to known functionally or structurally important amino acids, such as those involved in an active site or a ligand binding site. The motifs contained in these resources are generally manually curated and the entries in the databases include extensive documentation

Over the years, a variety of motif representation models have been developed to take into account the complexity of protein motifs. The models are attempts to construct generalizations based on known functional motifs, and are used to help characterize the functional sites and to facilitate their identification in unknown protein sequences. They can be divided into two

Consensus sequences are the simplest model for representing protein motifs. They can be constructed easily by selecting the amino acid found most frequently at each position in the signal. The number of matches between a consensus and an unknown candidate sequence can be used to evaluate the significance of a potential functional site. However, consensus sequen‐ ces are limited models, since they do not capture the variability of each position. To support some degree of ambiguity, regular expressions can be used. Regular expressions are typically composed of exact symbols, ambiguous symbols, fixed gaps, and/or flexible gaps [16]. For example, the IQ motif is an extremely basic unit of about 23 amino acids, whose conserved

[FILV]Qxxx[RK]Gxxx[RK]xx[FILVWY]

Although deterministic models provide useful ways to construct human‐readable represen‐ tations of motifs, their main drawback is that they lose some information. For instance, in the IQ motif discussed above, the first position is usually I and both [RK] are most often R.

where x signifies any amino acid, and the square brackets indicate an alternative.



The corresponding PSSM is shown in **Table 1**.

**Table 1.** Example of a position specific scoring matrix (PSSM).

**Figure 1.** An example of a sequence logo for representing patterns in biological sequences. The logo represents the Pribnow box, a conserved region found upstream of the some genes in prokaryotic genomes.

Although in this example, PSSM containing entries having a value of 0, in general, pseudo‐ counts are applied, especially when using a small dataset, in order to allow the calculation of probabilities for new motifs.

The information summarized in the PSSM can also be represented by a sequence logo [18], which is a graphical representation of the motif conservation as shown in **Figure 1**. A logo consists of a stack of letters at each position in the motif, where the relative sizes of the letters indicate their frequency in the sequences. The total height of the letters corresponds to the information content of the position, in bits.

Another widely used probabilistic model is the hidden Markov model (HMM), a statistical model that is generally applicable to time series or linear sequences. They were first introduced in bioinformatics for DNA sequences [19]. A HMM can be visualized as a finite state machine that moves through a series of states and produces some kind of output. The HMM generates a protein sequence by emitting amino acids as it progresses through a series of states. Each state has a table of amino acid emission probabilities, and transition probabilities for moving from state to state.

All of the representations mentioned so far inherently assume that positions within the motif are independent of each other. However, in some cases, this strong independence assumption may not be reasonable. Markov models of higher order, permuted Markov models, or Bayesian networks can be used to capture local dependencies by considering how each position depends on the other.

#### **3.2. Motif detection**

The models described in the previous section can be applied to the task of scanning a user‐ submitted sequence for matches to known motifs, thus providing evidence for the function of the protein and contributing to its classification in a given protein family. Ideally, a motif model would recognize all and only the members of the family. Unfortunately, this is seldom the case in practice.

In the case of deterministic models including consensus sequences and regular expressions, the models are often either too specific leading to a large number of false negative predictions, or too degenerate resulting in many false positives. The statistical power of such models can be estimated using standard measures, such as the positive and negative predictive values (PPV and NPV, respectively).

In the case of probability matrices or HMM‐based methods, a log‐odds score can be calculated that is a measure of how probable it is that a sequence is generated by a model rather than by a random null model, representing the universe of all sequences (also known as the "back‐ ground"). The log‐odds score of a motif is defined as:

$$\text{score}(s) = \log\_2 \frac{P\_m(s)}{P\_{\mathcal{D}}(s)} \tag{1}$$

where *Pm* is the probability that the sequence was generated by the motif model *m* and *Pφ* is the probability that the sequence was generated by the null model. The logarithm is usually base 2, and the score is given in bits. A log‐odds score greater than zero indicates that the sequence fits the motif model better.

#### **4. Motif discovery in protein sequences**

#### **4.1. Methods for motif discovery**

The information summarized in the PSSM can also be represented by a sequence logo [18], which is a graphical representation of the motif conservation as shown in **Figure 1**. A logo consists of a stack of letters at each position in the motif, where the relative sizes of the letters indicate their frequency in the sequences. The total height of the letters corresponds to the

Another widely used probabilistic model is the hidden Markov model (HMM), a statistical model that is generally applicable to time series or linear sequences. They were first introduced in bioinformatics for DNA sequences [19]. A HMM can be visualized as a finite state machine that moves through a series of states and produces some kind of output. The HMM generates a protein sequence by emitting amino acids as it progresses through a series of states. Each state has a table of amino acid emission probabilities, and transition probabilities for moving

All of the representations mentioned so far inherently assume that positions within the motif are independent of each other. However, in some cases, this strong independence assumption may not be reasonable. Markov models of higher order, permuted Markov models, or Bayesian networks can be used to capture local dependencies by considering how each position depends

The models described in the previous section can be applied to the task of scanning a user‐ submitted sequence for matches to known motifs, thus providing evidence for the function of the protein and contributing to its classification in a given protein family. Ideally, a motif model would recognize all and only the members of the family. Unfortunately, this is seldom the case

In the case of deterministic models including consensus sequences and regular expressions, the models are often either too specific leading to a large number of false negative predictions, or too degenerate resulting in many false positives. The statistical power of such models can be estimated using standard measures, such as the positive and negative predictive values

In the case of probability matrices or HMM‐based methods, a log‐odds score can be calculated that is a measure of how probable it is that a sequence is generated by a model rather than by a random null model, representing the universe of all sequences (also known as the "back‐

> ( ) ( ) ( ) log *<sup>m</sup> <sup>z</sup>*

where *Pm* is the probability that the sequence was generated by the motif model *m* and *Pφ* is the probability that the sequence was generated by the null model. The logarithm is usually

<sup>=</sup> (1)

*P s score s P s* <sup>Æ</sup>

information content of the position, in bits.

10 Pattern Recognition - Analysis and Applications

from state to state.

on the other.

in practice.

**3.2. Motif detection**

(PPV and NPV, respectively).

ground"). The log‐odds score of a motif is defined as:

Given a set of functionally related sequences, the main aim of motif discovery algorithms is to find new and *a priori* unknown motifs that are frequent, unexpected, or interesting according to some formal criteria. The methods used to discover such motifs follow the same general schema, as shown in **Figure 2**. They can be grouped into two main categories: alignment‐based methods and methods that search for motifs in unaligned sequences.

**Figure 2.** General motif discovery process.

#### *4.1.1. Alignment‐based methods*

Alignment‐based methods for motif discovery first construct a multiple sequence alignment of the set of sequences, where each sequence of amino acids is typically represented as a row within a matrix. Gaps are inserted between the amino acids so that identical or similar characters are aligned in successive columns. Once the multiple alignments are constructed, the patterns are extracted from the alignment by combining the substrings common to most of the sequences.

One of the first automatic methods for the identification of conserved positions in a multiple alignment was the AMAS program [20], using a set‐based description of amino acid properties. Since then, a large number of different methods have been proposed. For example, Al2Co [21] calculates a conservation index at each position in a multiple sequence alignment using weighted amino acid frequencies at each position. The DIVAA method [22] is based on a statistical measure of the diversity at a given position. The diversity measures the proportion of the 20 possible amino acids that are observed.

The advantage of the alignment‐based approach is that no upper limit has to be imposed on the length of the motifs. Moreover, these algorithms usually do not need as input a maximum threshold value for the motif distance from the sequences. In general, this approach works well if the sequences are sufficiently similar and the patterns occur in the same order in all of the sequences. Unfortunately, this is not usually the case and therefore most methods for motif discovery in protein sequences assume that the input sequences are unaligned.

#### *4.1.2. Alignment‐free methods*

The vast majority of motif discovery methods in bioinformatics are alignment‐free approaches that do not rely on the initial construction of a multiple sequence alignment. Instead, they generally search for patterns that are overrepresented in a given set of sequences. The simplest solution is to generate all possible motifs up to a maximum length *l*, and then to search separately for the approximate occurrences of each motif in the set of sequences. Once a list of candidate patterns is obtained, the ones with the highest significance scores are selected. This approach guarantees to find all motifs that satisfy the input constraints. Moreover, the sequences can be organized in suitable indexing structures, such as suffix trees, etc., so that the time needed by the algorithm to search for a single motif is linear in the overall length of the sequences.

This simplistic approach has an evident disadvantage: the number of candidate motifs, and therefore the time required by the algorithm, grows exponentially with the length of the sequences. Computing a significance score for each motif further increases the time required by the algorithm. A number of more efficient tools have been developed to address these issues and in the next chapter, we will discuss some of the more widely used ones.

#### **4.2. Tools for motif discovery**

In this section, we will present of the programs that are specifically designed to search for motifs in protein sequences that are biologically significant. The search for motifs in a set of unaligned sequences is a complex problem because many factors come into play, such as the precise start and end boundaries of the motif, the size variability (presence of gaps or not), or stronger or weaker motif conservation during evolution.

*De novo* motif discovery programs are generally based on one of the following three algorithms:


**•** Probabilistic optimization is an iterative method in which a random subsequence is extracted from each sequence to build an initial model. In each subsequent iteration, the *i*th sequence is removed and the model is recalculated. Then, a new motif is extracted from the *i*th sequence. This process is repeated until convergence.

The advantage of the alignment‐based approach is that no upper limit has to be imposed on the length of the motifs. Moreover, these algorithms usually do not need as input a maximum threshold value for the motif distance from the sequences. In general, this approach works well if the sequences are sufficiently similar and the patterns occur in the same order in all of the sequences. Unfortunately, this is not usually the case and therefore most methods for motif

The vast majority of motif discovery methods in bioinformatics are alignment‐free approaches that do not rely on the initial construction of a multiple sequence alignment. Instead, they generally search for patterns that are overrepresented in a given set of sequences. The simplest solution is to generate all possible motifs up to a maximum length *l*, and then to search separately for the approximate occurrences of each motif in the set of sequences. Once a list of candidate patterns is obtained, the ones with the highest significance scores are selected. This approach guarantees to find all motifs that satisfy the input constraints. Moreover, the sequences can be organized in suitable indexing structures, such as suffix trees, etc., so that the time needed by the algorithm to search for a single motif is linear in the overall length of

This simplistic approach has an evident disadvantage: the number of candidate motifs, and therefore the time required by the algorithm, grows exponentially with the length of the sequences. Computing a significance score for each motif further increases the time required by the algorithm. A number of more efficient tools have been developed to address these issues

In this section, we will present of the programs that are specifically designed to search for motifs in protein sequences that are biologically significant. The search for motifs in a set of unaligned sequences is a complex problem because many factors come into play, such as the precise start and end boundaries of the motif, the size variability (presence of gaps or not), or

*De novo* motif discovery programs are generally based on one of the following three algorithms: **•** Enumeration is a method that involves counting all substrings of a certain length (known as words or *k*‐mers) and then seeking overrepresentations. Such exhaustive motif finding approaches are guaranteed to report all instances of motifs in a set of sequences. However, the exponential complexity of such searches means that the problem quickly becomes

**•** Deterministic optimization is based on the expectation‐maximization (EM) algorithm that estimates the likelihood of a motif from existing data in two stages repeated iteratively. The first uses a set of parameters to reconstruct the hidden motif structure. The second uses this structure to reestimate the parameters. This method allows finding alternate sequences

and in the next chapter, we will discuss some of the more widely used ones.

stronger or weaker motif conservation during evolution.

representing the motif and updating the motif model.

discovery in protein sequences assume that the input sequences are unaligned.

*4.1.2. Alignment‐free methods*

12 Pattern Recognition - Analysis and Applications

**4.2. Tools for motif discovery**

intractable for large alphabets.

the sequences.

Below, and in **Table 2**, we present the most used motif discovery programs and discuss their advantages and limitations.

Teiresias [11] is based on an enumeration algorithm. It operates in two phases: scanning and convolution. During the scanning phase, elementary motifs with sufficient support are identified. These elementary motifs constitute the building blocks for the convolution phase. They are combined into progressively larger motifs until all the existing maximal motifs are generated.

MEME [12] is an example of a deterministic optimization algorithm. It allows discovery of motifs in DNA or protein sequences based on expectation maximization (EM). MEME discovers at least three motifs, each of which may be present in some or all of the input sequences. MEME chooses the width and number of occurrences of each motif automatically in order to minimize the "E‐value" of the motif, i.e., the probability of finding a similarly well‐ conserved pattern in random sequences. With default parameters, only motif widths between 6 and 50 are considered, but the user have the possibility to change this as well as several other parameters (options) of the motif discovery.

Pratt [23] is based on probabilistic optimization. It first searches the space of motifs, as constrained by the user, and compiles a list of the most significant sequences that matches at least the user‐defined minimum number of sequences. If the user has not switched off the refinement, these motifs will be input to one of the motif refinement algorithms. The most significant motifs resulting from this are then output to a file.

qPMS [24] stands for quorum planted motif search. The program searches for motifs in either DNA or protein sequences. It uses the (*l*, *d*) motif search algorithm known as the planted motif search. qPMS takes as input a set of sequences and two values, *l* and *d*. It returns all sequences *M* of length *l*, which appear in at least q% of the sequences.

SLiMFinder [13] identifies novel short linear motifs (SLiMs) in a set of sequences. SLiMs are microdomains that have important functions in many diverse biological pathways. SLiM‐ mediated functions include posttranslational modification, subcellular localization, and ligand binding. SLiMs are generally less than 10 amino acids long, many of which will be "flexible" in terms of the conserved amino acid. SLiMFinder constructs such motifs by grouping dimers into longer patterns: motifs with fixed amino acid positions are identified and then grouped to include amino acid ambiguity and variable‐length wildcards. Finally, motifs that are overrepresented in a set of unrelated proteins are identified.

Dilimot [25] proceeds as follows: in the first step, a user provided set of protein sequences is filtered to eliminate repetitive sequences as well as the regions least likely to contain linear motifs. In the second step, overrepresented motifs are identified in the nonfiltered sequences and ranked according to scores that take into account the background probability of the motif, the number of sequences containing the motif, the size of the sequence set, and the degree to which the motif is conserved in other orthologous proteins.


**Table 2.** Advantages and limitations of the most used motif discovery programs.

MotifHound [26] is suitable for the discovery of small and degenerate linear motifs. The method needs two input datasets: a background set of protein sequences and a subset of this background set that represents the query sequences. MotifHound first enumerates all possible motifs present in the query sequences, and then calculates the frequency of each motif in both the query and the background sets.

FIRE‐pro [27] stands for finding informative regulatory elements in proteins. Its main goal is to discover protein motifs that correlate with the biological behavior of the corresponding proteins. FIRE‐pro calculates a mutual information measure between frequent *k*‐mer motifs and a "protein behavior profile" containing experimental data about the function of the proteins.

Most of these programs need prior knowledge about either the input sequences or the motif structure. Furthermore, they are generally designed to discover frequent motifs that occur in all or most of the sequences. The subfamily‐specific motifs, which differentiate a specific subset of sequences, pose a greater challenge due to the statistical nature of these programs or the default choice of parameters used. Nevertheless, these "rare" motifs are often characteristic of important biological functions or context‐specific modifications, including substrate binding sites, protein‐protein interactions, or posttranslational modification sites.

In the final section of this chapter, we will discuss the use of "intelligent algorithms" that should be more reliable for the discovery of significant rare motifs in addition to the conserved and known ones.

## **5. Intelligent algorithms for protein motif discovery**

the number of sequences containing the motif, the size of the sequence set, and the degree to

Does not need background sequences; Very fast

Does not need background sequences; Fast, Multi‐thread version available; User friendly output

Does not need background

Well documented; Can use

Exhaustive exploration of motifs; Can use filters Fast; Multi‐thread version available

Integrates several types of sequence information on

sequences

filters

motifs

MotifHound [26] is suitable for the discovery of small and degenerate linear motifs. The method needs two input datasets: a background set of protein sequences and a subset of this background set that represents the query sequences. MotifHound first enumerates all possible motifs present in the query sequences, and then calculates the frequency of each motif in both

FIRE‐pro [27] stands for finding informative regulatory elements in proteins. Its main goal is to discover protein motifs that correlate with the biological behavior of the corresponding proteins. FIRE‐pro calculates a mutual information measure between frequent *k*‐mer motifs and a "protein behavior profile" containing experimental data about the function of the

Fast; Low memory consumption

Too many redundant motifs discovered

motifs to discover

Unable to discover effectively exact motifs

Limited to 20 protein

Needs background sequences

Needs background sequences

Needs background sequences; Source code not

available

sequences

User friendly output Needs background

sequences

User defines the number of

**Program Description Advantages Disadvantages**

which the motif is conserved in other orthologous proteins.

Teiresias Finds motifs that are frequent in a set of related sequences

using Gibbs sampling and expectation

sequences based on Quorum Planted Motif

unrelated sequences relative to background

in a set of unrelated sequences relative to

set of sequences with specific functions or

**Table 2.** Advantages and limitations of the most used motif discovery programs.

MEME Finds motifs in related sequences

Pratt Discovers flexible motifs in related

qPMS Finds overrepresented motifs in a set of

SlimFinder Finds overrepresented motifs in a set of

MotifHound Exhaustively finds motifs overrepresented

background sequences

Dilimot Finds overrepresented motifs in a set of unrelated sequences relative to a

background sequences

FirePro Correlates overrepresented motifs in a

maximization

14 Pattern Recognition - Analysis and Applications

sequences

Search

sequences

behaviors

the query and the background sets.

proteins.

Intelligent algorithms include optimization and nature inspired algorithms. Among these, artificial immune systems are especially adapted to pattern discovery, and have been used recently for motif discovery in DNA sequences. The high complexity and dimensionality of the problems in bioinformatics are an interesting challenge for testing and validating new computational intelligence techniques. Similarly, the application of AIS to bioinformatics may bring important contributions to the biological sciences, providing an alternative form of analyzing and interpreting the huge volume of data from molecular biology and genomics [28].

Artificial immune systems are a class of computationally intelligent systems inspired by the principles and processes of the vertebrate immune system. The algorithms typically apply the structure and function of the immune system to solving hard computational problems. Since their introduction in the 1990s, a number of common techniques have been developed, including:


Although a number of these different AIS can be used for pattern recognition, the clonal selection algorithm seems to be particularly well suited for protein motif discovery in large sets of sequences. In particular, the capabilities for self‐organization of huge numbers of immune cells mean that no prior information is needed. In addition, the system does not require outside intervention and so it can automatically classify pathogens (motifs) and it can react to pathogens that the body has never seen before. Another advantage of AIS is the fact that there are varying types of elements that protect the body from invaders, and there are different lines of defense, such as innate and adaptive immunity. These features can be abstracted to model the diverse types of motifs found in protein molecules (see Section 1). These different mechanisms are organized in multiple layers that act cooperatively to provide high noise tolerance and high overall security.

The use of such intelligent algorithmic approaches should improve the whole motif discovery process: from the selection of suitable sets of sequences, via data cleaning and preprocessing, motif identification and evaluation, to the final presentation and visualization of the results. Nevertheless, a number of issues remain to be addressed before such systems can be applied to the very large datasets produced by NGS technologies. In particular, the substantial time and memory requirements of AIS are a limiting factor, although these can be significantly reduced thanks to the inherently parallel nature of the algorithms.

#### **Acknowledgements**

We would like to thank the members of the BICS and BISTRO Bioinformatics Platforms in Strasbourg for their support. This work was supported by Institute funds from the CNRS, the Université de Strasbourg and the Faculté de Médecine de Strasbourg.

#### **Author details**

Salma Aouled El Haj Mohamed1,2,3, Mourad Elloumi2 and Julie D. Thompson3\*

\*Address all correspondence to: thompson@unistra.fr

1 Faculty of Science, Doctoral School of Mathematics, Computer Science and Material Science and Technology, University of Tunis El Manar, Tunis, Tunisia

2 Laboratory of Technologies of Information and Communication and Electrical Engineering (LaTICE), University of Tunis, Tunis, Tunisia

3 Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Strasbourg Federation of Translational Medicine (FMTS), Strasbourg, France

#### **References**

Although a number of these different AIS can be used for pattern recognition, the clonal selection algorithm seems to be particularly well suited for protein motif discovery in large sets of sequences. In particular, the capabilities for self‐organization of huge numbers of immune cells mean that no prior information is needed. In addition, the system does not require outside intervention and so it can automatically classify pathogens (motifs) and it can react to pathogens that the body has never seen before. Another advantage of AIS is the fact that there are varying types of elements that protect the body from invaders, and there are different lines of defense, such as innate and adaptive immunity. These features can be abstracted to model the diverse types of motifs found in protein molecules (see Section 1). These different mechanisms are organized in multiple layers that act cooperatively to provide

The use of such intelligent algorithmic approaches should improve the whole motif discovery process: from the selection of suitable sets of sequences, via data cleaning and preprocessing, motif identification and evaluation, to the final presentation and visualization of the results. Nevertheless, a number of issues remain to be addressed before such systems can be applied to the very large datasets produced by NGS technologies. In particular, the substantial time and memory requirements of AIS are a limiting factor, although these can be significantly

We would like to thank the members of the BICS and BISTRO Bioinformatics Platforms in Strasbourg for their support. This work was supported by Institute funds from the CNRS, the

1 Faculty of Science, Doctoral School of Mathematics, Computer Science and Material Science

2 Laboratory of Technologies of Information and Communication and Electrical Engineering

3 Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS,

Strasbourg Federation of Translational Medicine (FMTS), Strasbourg, France

and Julie D. Thompson3\*

high noise tolerance and high overall security.

16 Pattern Recognition - Analysis and Applications

**Acknowledgements**

**Author details**

reduced thanks to the inherently parallel nature of the algorithms.

Université de Strasbourg and the Faculté de Médecine de Strasbourg.

Salma Aouled El Haj Mohamed1,2,3, Mourad Elloumi2

\*Address all correspondence to: thompson@unistra.fr

(LaTICE), University of Tunis, Tunis, Tunisia

and Technology, University of Tunis El Manar, Tunis, Tunisia


#### **Hybrid Metaheuristics for Classification Problems Hybrid Metaheuristics for Classification Problems**

Nadia Abd-Alsabour

[12] Bailey TL, Bodén M, Whitington T, Machanick P. The value of position‐specific priors

[13] Edwards RJ, Davey NE, Shields D. SLiMFinder: A probabilistic method for identifying over‐represented, convergently evolved, short linear motifs in proteins. PLoS One. 2007

[14] Frades I, Resjö S, Andreasson E. Comparison of phosphorylation patterns across eukaryotes by discriminative N‐gram analysis. BMC Bioinformat. 2015 16:239.

[15] Bhowmick P, Guharoy M, Tompa P. Bioinformatics approaches for predicting disor‐

[16] Brazma A, Jonassen I, Eidhammer I, Gilbert D. Approaches to the automatic discovery

[17] Henikoff S, Henikoff JG. Position‐based sequence weights. J Mol Biol. 1994 243:574–578.

[18] Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequen‐

[19] Churchill GA. Stochastic models for heterogeneous DNA sequences. Bull Math Biol.

[20] Livingstone CD, Barton GJ. Protein sequence alignments: a strategy for the hierarchical

[21] Pei J, Grishin NV. AL2CO: calculation of positional conservation in a protein sequence

[22] Rodi DJ, Mandava S, Makowski L. DIVAA: analysis of amino acid diversity in multiple

[23] Jonassen I, Collins JF, Higgins DG. Finding flexible patterns in unaligned protein

[24] Dinh H, Rajasekaran S, Davila J. qPMS7: a fast algorithm for finding (l,d) motifs in DNA

[25] Neduva V, Russell RB. DILIMOT: discovery of linear motifs in proteins. Nucleic Acids

[26] Lieber DS, Elemento O, Tavazoie S. Large‐scale discovery and characterization of

[27] Kelil A, Dubreuil B, Levy ED, Michnick SW. Fast and accurate discovery of degenerate

[28] Al‐Enezi A, Abbod MF, Alsharhan Al‐Enezi S. Artificial immune systems‐models,

protein regulatory motifs in eukaryotes. PLoS One. 2010; 5:e14444.

linear motifs in protein sequences. PLoS One. 2014; 9:e106081.

algorithms and applications. IJRRAS. 2010; 3:118–131.

analysis of residue conservation. Comput Appl Biosci. 1993; 9:745–756.

aligned protein sequences. Bioinformatics. 2004; 20:3481–3489.

in motif discovery using MEME. BMC Bioinformat. 2010 11:179.

dered protein motifs. Adv Exp Med Biol. 2015; 870:291–318.

of patterns in biosequences. J Comp Biol. 1998 5:279–305.

ces. Nucleic Acids Res. 1990; 18:6097–6100.

alignment. Bioinformatics. 2001; 17:700–712.

sequences. Protein Sci. 1995; 4:1587–1595.

Res. 2006; 34:W350–W355.

and protein sequences. PLoS One. 2012; 7:e41425.

2:e967.

18 Pattern Recognition - Analysis and Applications

1998; 51:79–94.

#### Nadia Abd-Alsabour

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/65253

#### **Abstract**

High accuracy and short amount of time are required for the solutions of many classification problems such as real-world classification problems. Due to the practical importance of many classification problems (such as crime detection), many algorithms have been developed to tackle them. For years, metaheuristics (MHs) have been successfully used for solving classification problems. Recently, hybrid metaheuristics have been successfully used for many real-world optimization problems such as flight scheduling and load balancing in telecommunication networks. This chapter investigates the use of this new interdisciplinary field for classification problems. Moreover, it demonstrates the forms of metaheuristics hybridization as well as designing a new hybrid metaheuristic.

**Keywords:** metaheuristics, hybrid metaheuristics, classification problems

#### **1. Introduction**

Before starting this chapter, let us know the trip that led to the appearance of hybrid metaheuristics. Traditionally, rigorous approaches (that are based on hypotheses, characterizations, deductions, and experiments) were used for solving many optimization problems.

However, in order to find possible good solutions for new complex optimization problems, researchers went toward the use of heuristics. Heuristics are rules of thumb, trails and error, common sense, etc. Many of these heuristics strategies are often independent of the undertaken optimization problems and share common aspects. This introduced the term metaheuristics which refers to general techniques that are not specific to a particular problem [1]. Metaheuristics are approximate algorithms, and each of them has its own historical background [2–4]. A metaheuristic is a set of algorithmic concepts used for defining heuristic methods that can

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

be applied to a variety of optimization problems with relatively few modifications in order to adapt them to particular optimization problems [5, 6].

Metaheuristics have successfully found high-quality solutions for a wide spectrum of NP hard optimization problems [1], that is, they are hard to be solved. This means the needed time to solve an instance of these optimization problems grows exponentially with the instance size in the worst case. These optimization problems are so complex as there is no known algorithm that can solve them in polynomial time. They still have to be solved in a huge number of practical settings. Therefore, a large number of optimization algorithms were proposed to tackle them [5, 6].

Of great importance for the success of designing a new metaheuristic is considering that this metaheuristic will have to explore the search space effectively and efficiently. The search process should be intelligent in order to intensively explore areas of the search space that have high-quality solutions and to move to unexplored areas. This is called intensification and diversification, respectively. Intensification is the exploitation of the gathered information by the metaheuristic at a given time, while diversification is the exploration of the areas imperfectly taken into account. The use of these two important characteristics of a metaheuristic can lead to getting high-quality solutions. Crucial for the success of a metaheuristic is a welladjusted balance between these two features. This is to on one side identify quickly search areas with high-quality solutions and on the other side to avoid spending too much time in areas consisting of poor-quality solutions or have been well explored [1, 7–11].

#### **There are many classifications for metaheuristics as follows:**


When they first appeared, pure metaheuristics quickly became state-of-the-art algorithms for many optimization problems as they found high-quality solutions for these optimization problems. This was reported in many specific conferences and workshops. This success had motivated researches toward finding answers to questions such as:

**•** Why a given metaheuristic is successful?

be applied to a variety of optimization problems with relatively few modifications in order to

Metaheuristics have successfully found high-quality solutions for a wide spectrum of NP hard optimization problems [1], that is, they are hard to be solved. This means the needed time to solve an instance of these optimization problems grows exponentially with the instance size in the worst case. These optimization problems are so complex as there is no known algorithm that can solve them in polynomial time. They still have to be solved in a huge number of practical settings. Therefore, a large number of optimization algorithms were proposed to

Of great importance for the success of designing a new metaheuristic is considering that this metaheuristic will have to explore the search space effectively and efficiently. The search process should be intelligent in order to intensively explore areas of the search space that have high-quality solutions and to move to unexplored areas. This is called intensification and diversification, respectively. Intensification is the exploitation of the gathered information by the metaheuristic at a given time, while diversification is the exploration of the areas imperfectly taken into account. The use of these two important characteristics of a metaheuristic can lead to getting high-quality solutions. Crucial for the success of a metaheuristic is a welladjusted balance between these two features. This is to on one side identify quickly search areas with high-quality solutions and on the other side to avoid spending too much time in

**•** Nature-inspired metaheuristics [such as ant colony optimization (ACO) algorithms, genetic algorithms (GAs), particle swarm optimization (PSO), and simulated annealing (SA)] vs. nonnature-inspired metaheuristics [such as iterated local search (ILS), and tabu search (TS)].

**•** Memory-based metaheuristics vs. memory-less metaheuristics. This is based on the use of the search history, that is, whether they use memory or not. The use of memory is considered

**•** Population-based metaheuristics vs. single-point metaheuristics. This is based on how many used solutions at any given time by a metaheuristic. Population metaheuristics manipulate a set of solutions (at each iteration) from which the population of the next iteration is produced. Examples are evolutionary algorithms and scatter search, and constructionoriented techniques such as ant colony optimization and the greedy randomized adaptive search procedure. The metaheuristics that deal with only one solution at any given time are called trajectory metaheuristics where the search process describes a trajectory in the search

When they first appeared, pure metaheuristics quickly became state-of-the-art algorithms for many optimization problems as they found high-quality solutions for these optimization problems. This was reported in many specific conferences and workshops. This success had

areas consisting of poor-quality solutions or have been well explored [1, 7–11].

**There are many classifications for metaheuristics as follows:**

one of the crucial elements of a powerful metaheuristic.

motivated researches toward finding answers to questions such as:

This is based on the origins of a metaheuristic.

space [1, 2, 4] as shown in **Figure 1** [12].

adapt them to particular optimization problems [5, 6].

20 Pattern Recognition - Analysis and Applications

tackle them [5, 6].


**Figure 1.** Metaheuristics classification [12].

Despite this success, it became recently evident that the focus on pure metaheuristics is restrictive when tackling particular optimization problems such as real-world and large-scale optimization problems [2]. A skilled combination of a metaheuristic with components from other metaheuristics or with other optimization algorithms such as operations research techniques (mathematical programming), artificial intelligence techniques (constraint programming), or complete algorithms (branch and bound) can lead to getting much better solutions for these optimization problems. This interdisciplinary field is called hybrid metaheuristics which goes beyond the scope of a pure metaheuristic [1]. Over the years, many algorithms that do not purely follow the paradigm of a pure metaheuristic were developed. They combine various algorithmic components originating from different optimization algorithms [2]. This is explained in Section 3.

The rest of this chapter is organized as follows. The following section introduced classification problems. Section 3 explains the main forms of hybridizing metaheuristics. Section 4 demonstrates designing a hybrid metaheuristic. The fifth section demonstrates hybrid metaheuristics for classification problems. The discussion is given in Section 6. The last section concludes this chapter and highlights future work in this area.

#### **2. Classification problems**

Classification involves training and testing data which consist of data instances (objects). Each instance in the training set contains one class label (called target, dependent, response, or features) and other features (called attributes, inputs, predictors, or independent features) [13– 15]. Classification consists of examining the features of a new object and then assigning it to one of the predefined set of classes. The objects to be classified are generally represented by records in a dataset. The classification task is to build a model that will be applied to unclassified data to classify it, that is, predicting the target values of instances (that are given only the input features) in the testing set [15, 16]. The classification task (determining which of the fixed set of classes an example belongs to) is illustrated in **Figure 2**.

**Figure 2.** The classification task.

Examples of classification problems are:


In these examples, the classifier is built to predict categorical labels such as "low risky," "medium risky," or "high risky" for the first example; "yes" or "no" for the second example; "treatment A," "treatment B," or "treatment C" for the third example, etc. [16–18].

The accuracy of a classifier refers to how well it can predict the value of the predictor feature for a previous unseen data and how well it captured the dependencies among the input features. Classifier accuracy is the main measure for classification and is widely used. The classifier accuracy goes up when comparing between different classifiers [18–20].

The classifier is considered the basic component of any classification system, and its task is to partition the feature space into class-labeled decision regions (one for each category). Classifiers' performance is sensitive to the choice of the features that are used for constructing those classifiers. This choice affects the accuracy of these classifiers, the time needed for learning, and the number of examples needed for learning. Feature selection (FS) can be seen as an optimization problem that involves searching the space of possible solutions (feature subsets) to identify the optimal one. Many metaheuristics (such as ant colony optimization algorithms, particle swarm optimization, genetic algorithms, simulated annealing, and tabu search) have been used for solving the feature selection problem [20, 21].

Feature selection (deleting a column from a dataset) is one of the main aspects of dimension reduction besides instance reduction (deleting a row from a dataset). This is illustrated in **Figure 3** [18]. Both of these should keep the characteristics of the original input data after excluding some of it.

**Figure 3.** Data reduction [18].

sified data to classify it, that is, predicting the target values of instances (that are given only the input features) in the testing set [15, 16]. The classification task (determining which of the

fixed set of classes an example belongs to) is illustrated in **Figure 2**.

**•** classifying credit applications such as low, medium, or high risky,

**•** diagnosing whether a particular illness is present or not, **•** choosing particular contents to be displayed on a web page,

been used for solving the feature selection problem [20, 21].

**•** determining which phone numbers correspond to fax machines,

**•** placing a new student into a particular track based on special needs, **•** identifying whether a behavior indicates a possible terrorist threat, and

**•** determining whether a customer with a given profile will buy a new computer,

**•** determining whether a will was written by the real person or somebody else,

**•** predicting which of three specific treatments a breast cancer patient should receive,

In these examples, the classifier is built to predict categorical labels such as "low risky," "medium risky," or "high risky" for the first example; "yes" or "no" for the second example;

The accuracy of a classifier refers to how well it can predict the value of the predictor feature for a previous unseen data and how well it captured the dependencies among the input features. Classifier accuracy is the main measure for classification and is widely used. The

The classifier is considered the basic component of any classification system, and its task is to partition the feature space into class-labeled decision regions (one for each category). Classifiers' performance is sensitive to the choice of the features that are used for constructing those classifiers. This choice affects the accuracy of these classifiers, the time needed for learning, and the number of examples needed for learning. Feature selection (FS) can be seen as an optimization problem that involves searching the space of possible solutions (feature subsets) to identify the optimal one. Many metaheuristics (such as ant colony optimization algorithms, particle swarm optimization, genetic algorithms, simulated annealing, and tabu search) have

"treatment A," "treatment B," or "treatment C" for the third example, etc. [16–18].

classifier accuracy goes up when comparing between different classifiers [18–20].

**Figure 2.** The classification task.

22 Pattern Recognition - Analysis and Applications

Examples of classification problems are:

**•** spotting fraudulent insurance claims.

**Figure 4** [22] illustrates the revised classification with the use of dimension reduction phase as an intermediate step. In **Figure 4**, dimension reduction is performed first to the given data, and then, the prediction methods are applied to the reduced data.

**Figure 4.** The role of dimension reduction [22].

#### **3. Hybridization of metaheuristics**

Although combining different algorithms together dates back to 1980s, in recent years only hybrid metaheuristics have been commonly used. Then, the advantage of combining different algorithms together has been widely recognized [1, 4]. Forms of hybridization can be classified into two categories (as in **Figure 5**): (1) combining components from a metaheuristic into another metaheuristic (examples are: using trajectory methods into population algorithms or using a specific local search method into a more general trajectory algorithm such as iterated local search) and (2) combining metaheuristics with other techniques such as artificial intelligence and operations research (examples are: combining metaheuristics with constraint programming (CP), integer programming (IP), tree-based search methods, data mining techniques, etc.) [1]. The following two subsections explain these types.

**Figure 5.** Forms of hybridization [4].

#### **3.1. Hybridizing metaheuristics with metaheuristics**

This category represents the beginning of hybridizing metaheuristics. Later, it got widely used especially integrating nature-inspired metaheuristics with local search methods. This is well illustrated in the most common type of this category which is in ant colony optimization algorithms and evolutionary algorithms that often use local search methods in order to refine the generated solutions during the search process. The reason for that is these nature-inspired metaheuristics explore well the search space and identify the regions having high-quality solutions (since they first capture a global picture of the search space and then they successively focus the search toward the promising regions). However, these nature-inspired metaheuristics are not effective in exploiting the accumulated search experiences that can be achieved by adding local search methods into them. Therefore, the resulting hybrid metaheuristic will work as follows: the nature-inspired metaheuristic will identify the promising search areas from which the local search method can then determine quickly the best solutions. Based on the above–mentioned fact, the resulting hybrid metaheuristic combining the strengths of both metaheuristics is often very successful. Apart from this hybridization, there are other hybrids. We mentioned it only here as it is considered the standard way of hybridization [1, 2].

#### **3.2. Hybridizing metaheuristics with other algorithms**

There are many possible ways of integration between metaheuristics and other algorithms. For example, metaheuristics and tree search methods can be interleaved or sequentially applied. This can be achieved by using a tree search method for generating a partial solution that a metaheuristic can then complete. Alternatively, a metaheuristic improves a solution generated by a tree search method. Another example is that constraint programming techniques can be used to reduce the search space (or the neighborhoods) that will be explored by a local search method [1, 4].

It should be noted that all of the hybrid metaheuristics mentioned above are integrative combinations in which there is some kind of master algorithm including one or more subordinate components (either embedded or called). Another way of combinations is called either collaborative or cooperative combinations in which the search is performed by different algorithms that exchange information about states, models, entire subproblems, solutions, or search space characteristics. The cooperative search algorithms consist of parallel execution of search algorithms that can be different or instances of the same algorithm working on different models or running with different parameter settings. Therefore, the control strategy in hybrid metaheuristics can be integrative or collaborative, and the order of executing the combined parts can be sequential, parallel, or interleaved [1, 4, 12]. These are shown in **Figures 6** and **7**.

**Figure 6.** The control strategy in hybrid metaheuristics [4].

**Figure 5.** Forms of hybridization [4].

24 Pattern Recognition - Analysis and Applications

method [1, 4].

**3.1. Hybridizing metaheuristics with metaheuristics**

**3.2. Hybridizing metaheuristics with other algorithms**

This category represents the beginning of hybridizing metaheuristics. Later, it got widely used especially integrating nature-inspired metaheuristics with local search methods. This is well illustrated in the most common type of this category which is in ant colony optimization algorithms and evolutionary algorithms that often use local search methods in order to refine the generated solutions during the search process. The reason for that is these nature-inspired metaheuristics explore well the search space and identify the regions having high-quality solutions (since they first capture a global picture of the search space and then they successively focus the search toward the promising regions). However, these nature-inspired metaheuristics are not effective in exploiting the accumulated search experiences that can be achieved by adding local search methods into them. Therefore, the resulting hybrid metaheuristic will work as follows: the nature-inspired metaheuristic will identify the promising search areas from which the local search method can then determine quickly the best solutions. Based on the above–mentioned fact, the resulting hybrid metaheuristic combining the strengths of both metaheuristics is often very successful. Apart from this hybridization, there are other hybrids. We mentioned it only here as it is considered the standard way of hybridization [1, 2].

There are many possible ways of integration between metaheuristics and other algorithms. For example, metaheuristics and tree search methods can be interleaved or sequentially applied. This can be achieved by using a tree search method for generating a partial solution that a metaheuristic can then complete. Alternatively, a metaheuristic improves a solution generated by a tree search method. Another example is that constraint programming techniques can be used to reduce the search space (or the neighborhoods) that will be explored by a local search

It should be noted that all of the hybrid metaheuristics mentioned above are integrative combinations in which there is some kind of master algorithm including one or more subordinate components (either embedded or called). Another way of combinations is called either collaborative or cooperative combinations in which the search is performed by different algorithms that exchange information about states, models, entire subproblems, solutions, or search space characteristics. The cooperative search algorithms consist of parallel execution of search algorithms that can be different or instances of the same algorithm working on different models or running with different parameter settings. Therefore, the control strategy in hybrid

**Figure 7.** The order of executing the combined algorithms in hybrid metaheuristics [4].

#### **4. Designing a hybrid metaheuristic**

The main motivation behind combining various algorithmic ideas from different metaheuristics is to get better performing system that exploits and includes advantages of the combined algorithms [3, 4]. These advantages should be complementary to each other so that the resulting hybrid metaheuristic can benefit from them [2, 3, 23]. The key to achieving high performance in the resulting hybrid metaheuristic (especially when tackling hard optimization problem) is to choose suitable combinations of complementary algorithmic concepts. Therefore, this task of developing a highly effective hybrid metaheuristic is complicated and not easy [3]. The reasons for that are as follows:


According to Blum et al. [2], before starting to develop a hybrid metaheuristic, we should consider whether it is the appropriate choice for the given optimization problem. This can be achieved by answering the following questions:

**•** What is the optimization objective? Do we need a reasonable good solution? And whether this solution is needed very quickly or not? Or we can sacrifice the computation time in order to get a very good solution? (these questions in general guide us toward using metaheuristics or complete methods) In this case, when very good solution is needed and it cannot be obtained by the existing complete algorithms in reasonable time, then we need to know the answer of the next question in order to decide on developing a hybrid metaheuristic.

	- **◦** Searching the literature carefully for the most successful optimization algorithms for the given optimization problem or for similar optimization problems, and
	- **◦** Studying different ways of combining the most promising characteristics of the selected optimization algorithms to be combined [2, 3].

Besides, in order to set guidelines for developing a new hybrid metaheuristic, it is crucial to improve the used research methodology that consists of combining different algorithmic components without identifying the contributions of these components to the performance of the resulting hybrid metaheuristic. The used methodology should consist of theoretical models for the characteristics of the hybrid metaheuristics. It can be experimental such as those used in natural sciences. Moreover, testing and statistical assessment of the obtained results should be included as well [2].

#### **5. Hybrid metaheuristics for classification problems**

The first category of using hybrid metaheuristics for classification problems concerns with using a metaheuristic for the feature selection problem besides the used classifier. This is because selecting the most relevant set of input features to use for building the used classifier plays an important role in classification. The most common metaheuristics for the feature selection problem are genetic algorithms, ant colony optimization algorithms, and particle swarm optimization algorithms [24] which are hybridized with the used classifier in each application. This is explained below.

The feature selection problem is used in many applications from choosing the most important social-economic parameters in order to identify who can return a bank loan to dealing with a chemical process and selecting the best set of ingredients. It is used to simplify the datasets by eliminating irrelevant and redundant features without reducing the classification accuracy. Examples of these applications are: face recognition, speaker recognition, face detection, bioinformatics, web page classification, and text categorization [24, 25].

order to get a very good solution? (these questions in general guide us toward using metaheuristics or complete methods) In this case, when very good solution is needed and it cannot be obtained by the existing complete algorithms in reasonable time, then we need to know the answer of the next question in order to decide on developing a hybrid meta-

**•** Is there any existing metaheuristic that can get the required solution for the given optimization problem? If no, can we enhance any of the existing metaheuristics to better suit this optimization problem? If no, then the decision is to develop a hybrid metaheuristic and we

**•** Which hybrid metaheuristic will work well for this optimization problem? Unfortunately, till now, there is no answer to this question as it is difficult to set guidelines for developing

**◦** Searching the literature carefully for the most successful optimization algorithms for the

**◦** Studying different ways of combining the most promising characteristics of the selected

**•** Identifying special characteristics of the given optimization problem and finding effective

Besides, in order to set guidelines for developing a new hybrid metaheuristic, it is crucial to improve the used research methodology that consists of combining different algorithmic components without identifying the contributions of these components to the performance of the resulting hybrid metaheuristic. The used methodology should consist of theoretical models for the characteristics of the hybrid metaheuristics. It can be experimental such as those used in natural sciences. Moreover, testing and statistical assessment of the obtained results should

The first category of using hybrid metaheuristics for classification problems concerns with using a metaheuristic for the feature selection problem besides the used classifier. This is because selecting the most relevant set of input features to use for building the used classifier plays an important role in classification. The most common metaheuristics for the feature selection problem are genetic algorithms, ant colony optimization algorithms, and particle swarm optimization algorithms [24] which are hybridized with the used classifier in each

The feature selection problem is used in many applications from choosing the most important social-economic parameters in order to identify who can return a bank loan to dealing with a chemical process and selecting the best set of ingredients. It is used to simplify the datasets by eliminating irrelevant and redundant features without reducing the classification accuracy.

will need to know the answer of the following questions.

optimization algorithms to be combined [2, 3].

**5. Hybrid metaheuristics for classification problems**

ways in order to exploit them [4].

application. This is explained below.

be included as well [2].

a well-performing hybrid metaheuristic, but the following can help:

given optimization problem or for similar optimization problems, and

heuristic.

26 Pattern Recognition - Analysis and Applications

The idea of using genetic algorithms for solving optimization problems is that they start with a population of individuals each of which represents a solution to the given optimization problem. Initially, the population includes all randomly generated solutions (the first generation of the population). Then, the various genetic operators are applied over the population to produce a new population. Within a population, the goodness (measured by a fitness function) of a solution varies from individual to individual [26].

Genetic algorithms are one of the most common approaches for the feature selection problem. The usual usage is to use them for first selecting the most relevant features (from the given dataset) that will be used for building the used classifier. Examples are the work of Yang and Honavar [27] and Tan et al. [28].

There are other directions for using genetic algorithms for the feature selection problem, for instance, hybridizing the used genetic algorithm with another metaheuristic in order to select the most appropriate feature subset before building the given predictor (such as Oh et al. [29] who embedded local search into the used genetic algorithm). Another example is the work of Salcedo-Sanz et al. [30] who used extra genetic operator in order to fix (in each iteration) the number of features to be chosen out of the available ones.

Similar to the way of using genetic algorithms for the feature selection problem is the use of ant colony optimization algorithms that have been widely used for this optimization problem. Examples are the work of Yang and Honavar [27] and the work of Abd-Alsabour and Randall [31].

There are other ways for using ant colony optimization algorithms for the feature selection problem. An example is the work of Vieira et al. [32] who used two cooperative artificial ant colonies: one for determining the number of features to be selected and the second one for selecting the features based on the cardinality given by the first colony. Another direction is to use ensemble (more than one classifier is built and then is combined to produce a better classification—this is called ensemble techniques [33]) of classifiers to perform the classification besides the used metaheuristic for the feature selection problem.

Another metaheuristic that has also been used for the feature selection problem is particle swarm optimization. Researchers developed variants of PSO in order to be suitable for the feature selection problem such as the work of Chuang et al. [34] who proposed a variant of PSO called complementary PSO (CPSO) with the use of k-nearest neighbor classifier. Another example is Zahran and Kanaan [35] who implemented a binary PSO for feature selection. Also, Jacob and Vishwanath [36] proposed multi-objective PSO that outperformed a multi-objective GA in the same authors' experiments. Moreover, Yan et al. [37] proposed a new discrete PSO algorithm with a multiplicative likeliness enhancement rule for unordered feature selection. Also, Sivakumar and Chandrasekar [38] developed a modified continuous PSO for the feature selection problem with k-nearest neighbor classifier that served as a fitness function for the PSO.

There are other ways for using particle swarm optimization algorithms for the feature selection problem. An example is the work of Wahono and Suryana [39] who used a combination of PSO and a bagging of classifiers (bagging is an ensemble technique where many classifiers are built and the final classification decision is made based on voting of the committee of the combined classifiers. It is used in order to improve the classification accuracy [33]). Another example is the work of Nazir et al. [40] who combined a PSO and a GA to perform together the feature selection.

The classification task involves other subtasks besides the feature selection problem, and many metaheuristics have been used for solving these subtasks, for example, the use of ant colony optimization for designing a classifier ensemble such as the work of Palanisamy and Kanmani [41] who used the main concepts of the proposed ant algorithm in Abd-Alsabour and Randall [31] for designing an ensample of classifiers. Another example is the use of particle swarm optimization algorithms for producing good classification rules such as Kumar [42] who combined a PSO with a GA to produce them and Holden and Freitas [43] who later proposed several modifications to their proposed work in Holden and Freitas [44]. Another example is the work of Revathil and Malathi [45] who proposed a novel simplified swarm optimization algorithm as a rule-based classifier.

#### **6. Discussion**

The previous section closely explored the different ways to use hybrid metaheuristics for classification problems. In the light of that, we can come up with the following comments:


feature subsets (besides its extra computation). We should avoid selecting too few or too many features than necessary. This is because selecting insufficient features leads to degrading the information content to keep the concept of data. On the other side, if too many features are selected, the classification accuracy will decrease because of the interference of irrelevant features. Subset problems such as the feature selection problem do not have fixed length [49]. Another example is the use of more than one classifier (ensemble methods) rather than using one only. This is because of the extra computational cost, especially that there were already previous similar applications that had been successfully solved using only one classifier (besides the used metaheuristic for getting the best feature subset). This has been evidenced by many authors when they compared their work with the pervious ones and showed that their results were not better than the others. One more example is the use of two metaheuristics for performing the feature selection, while it was already solved using only one metaheuristic. These examples emphasize the fact that sufficient literature search before first hybridizing or adding extra computational steps can avoid extra computation, useless hybridization, or even moving toward a misleading research direction as illustrated in Section 4.

Therefore, choosing the suitable hybrid metaheuristic can achieve the top performance for many optimization problems, but this does not imply that more complex algorithms are always the best choice. This is because of the following disadvantages of the increased complexity:


Therefore, an important design aim is to keep the proposed algorithm as simple as possible and include extensions only if they will really benefit [4].

Despite the difficulties in developing a new hybrid metaheuristic, it is nontrivial to generalize it, that is, a particular hybrid metaheuristic that works well for a particular optimization problem might not work well for another problem. This means that research on hybrid metaheuristics has gone toward being problem-specific rather than algorithm-oriented as was when promoting a new metaheuristic [1, 2].

#### **7. Conclusions and future work**

There are other ways for using particle swarm optimization algorithms for the feature selection problem. An example is the work of Wahono and Suryana [39] who used a combination of PSO and a bagging of classifiers (bagging is an ensemble technique where many classifiers are built and the final classification decision is made based on voting of the committee of the combined classifiers. It is used in order to improve the classification accuracy [33]). Another example is the work of Nazir et al. [40] who combined a PSO and a GA to perform together the feature

The classification task involves other subtasks besides the feature selection problem, and many metaheuristics have been used for solving these subtasks, for example, the use of ant colony optimization for designing a classifier ensemble such as the work of Palanisamy and Kanmani [41] who used the main concepts of the proposed ant algorithm in Abd-Alsabour and Randall [31] for designing an ensample of classifiers. Another example is the use of particle swarm optimization algorithms for producing good classification rules such as Kumar [42] who combined a PSO with a GA to produce them and Holden and Freitas [43] who later proposed several modifications to their proposed work in Holden and Freitas [44]. Another example is the work of Revathil and Malathi [45] who proposed a novel simplified swarm optimization

The previous section closely explored the different ways to use hybrid metaheuristics for classification problems. In the light of that, we can come up with the following comments:

**1.** For solving many applications, using hybrid metaheuristics was crucial to get high-quality solutions especially for real-world applications (such as personnel and machine scheduling, educational timetabling, routing, cutting and packing, and protein alignment). An example for real-world classification problems is the work of Tantar et al. [46], who developed a hybrid metaheuristic (GA and SA) for predicting the protein structure. Examples for other real-world optimization problems are the work of Atkin et al. [47], who proposed a hybrid metaheuristic for runway scheduling at London Heathrow airport and the work of Xu and Qu [48], who used a hybrid metaheuristic to solve routing

**2.** However, there are other situations where the hybridization was not important for the prediction accuracy. An example is the use of extra metaheuristic (besides the two algorithms used: one for performing feature selection and the classifier) to determine the number of the features to be selected. Similar to this is the use of two instances of a metaheuristic: one to determine the number of the features to be selected and the second one to perform the feature selection. These two scenarios can lead to worse results besides the extra computation cost. The authors should have been avoided using extra metaheuristic or an instance of the used metaheuristic. The reason for that is revealed from the work of Abd-Alsabour et al. [49] who proved that fixing the length of the selected feature subsets can lead to getting worse classification accuracy than not fixing the length of the selected

selection.

algorithm as a rule-based classifier.

28 Pattern Recognition - Analysis and Applications

**6. Discussion**

problems.

This chapter addressed the use of hybrid metaheuristics for classification problems. Besides, it demonstrated hybridizing metaheuristics and designing them as well. Moreover, the most common used hybrid metaheuristics for classification problems from literature were presented.

As a research direction, more applications of hybrid metaheuristics for different optimization problems in general and more particularly for real-word classification problems will be considered. Another research direction is to move more toward setting specific methodologies and general guidelines for developing a new hybrid metaheuristic. Moreover, comparisons between hybrid metaheuristics for similar classification problems should be conducted.

#### **Author details**

Nadia Abd-Alsabour

Address all correspondence to: nadia.abdalsabour@cu.edu.eg

Cairo University, Cairo, Egypt

#### **References**


**Author details**

Nadia Abd-Alsabour

**References**

Cairo University, Cairo, Egypt

30 Pattern Recognition - Analysis and Applications

Address all correspondence to: nadia.abdalsabour@cu.edu.eg

berg, Springer; 2008. 114, p.1–30.

na, Slovenia, 2010, pp. 3–18.

Research. 2015; 244: 66–76

Reviews. 2005; 2: 353–373.

p. 14–27, Springer. 2002.

Australia. 2002

[1] Blum C, and Roli A. Hybrid Metaheuristics: An Introduction. In: Blum C, Aguilera M, Roli A, and Sampels M, editors. Hybrid Metaheuristics – An Emerging Approach to Optimization. Studies in Computational Intelligence. Springer-Verlag Berlin Heidel-

[2] Blum C, Puchinger J, Raidl G, and Roli A. Hybrid metaheuristics in combinatorial

[3] Blum C, Puchinger J, Raidl G, Roli A. A brief survey on hybrid metaheuristics. In: Filipic B, Silc J, editors. Proceedings of BIOMA 2010 – 4th International Conference on Bio-Inspired Optimization Methods and their Applications, Jozef Stefan Institute, Ljublja-

[4] Raidl G. Decomposition based hybrid metaheuristics. European Journal of Operational

[5] Dorigo M, and Stutzle T. Ant Colony Optimization. MIT Press, Cambridge, MA; 2004.

[6] Blum C. Ant colony optimization: Introduction and recent trends. Physics of Life

[7] Blum C, and Roli A. Metaheuristics in combinatorial optimization: Overview and

[8] Méndez P. Development of a hybrid metaheuristic for the efficient solution of strategic supply chain management problems: Application to the energy sector. M.Sc. Thesis.

[9] Blum C. ACO Applied to Group Shop Scheduling: A Case Study on Intensification and Diversification. In: Dorigo M, et al., editors. Lecture Notes in Computer Science; 2463:

[10] Randall M, and Tonkes A. Intensification and diversification strategies in ant colony search. Technical Report TR00-02, School of information technology, Bond University,

conceptual comparison. ACM Computing Surveys. 2003; 35 (3): 268–308.

Polytechnic University of Catalonia, Catalonia, Spain, 2011

optimization: A survey. Applied Soft Computing. 2011; 11: 4135–4151.


[42] Kumar K. Intrusion Detection system for malicious traffic by using PSO-GA algorithm. IJCSET. 2013; 3(6):236–238.

[28] Tan K C, Teoh E J, Yu Q, and Goh K.C. A hybrid evolutionary algorithm for attribute selection in data mining. Expert Systems with Applications. 2009; 36: 8616–8630

[29] Oh I, Lee J, and Moon B. Hybrid GAs for feature selection, IEEE Transactions on Pattern

[30] Salcedo-Sanz S, Camps-Valls G, and Pérez-Cruz F. Enhancing genetic feature selection through restricted search and Walsh analysis. IEEE Transactions on Systems, Man, and

[31] Abd-Alsabour N, and Randall M. Feature selection for classification using an ant colony system. In Proceedings of the 6th IEEE International Conference on e–Science Work-

[32] Vieira S M, Sousa J MC, and Runkler T A. Two cooperative ant colonies for feature selection using fuzzy models. Expert Systems with Applications; 2010. 37:2714–2723.

[34] Chuang L, Jhang H, and Yang C. Feature selection using complementary PSO for DNA micro-array data. In Proceedings of the International Multi-Conference of Engineers

[35] Zahran B M, and Kanaan G. Text feature selection using particle swarm optimization

[36] Jacob M, and Vishwanath N. Multi-objective evolutionary PSO algorithm for matching surgically altered face images. International Journal of Emerging Trends in Engineering

[37] Yan Y, Kamath G, and Osadciw L. Feature selection optimized by discrete particle swarm optimization for face recognition. In Proceedings of Optics and Photonics in Global Homeland Security and Biometric Technology for Human Identification. SPIE.

[38] Sivakumar S, and Chandrasekar C. Modified PSO based feature selection for classification of lung CT images. International Journal of Computer Science and Information

[39] Wahono R, and Suryana N. Combining PSO based feature selection and bagging technique for software defect prediction. International Journal of Software Engineering

[40] Nazir M, Majid-Mirza A, and Ali-Khan S. PSO-GA based optimized feature selection using facial and clothing information for gender classification. Journal of Applied

[41] Palanisamy S, and Kanmani S. Artificial Bee Colony Approach for Optimizing Feature Selection. International Journal of Computer Science Issues. 2012; 9(3):432–438.

Analysis and Machine Intelligence. 2004; 26(11): 1424–1437.

and Computer Scientists, Hong Kong; 2013; 1. p. 291–294.

algorithm. World Applied Sciences Journal. 2009; 7: pp. 69–74.

Cybernetics. Part C. 2004. 34(4):398–406.

shops. IEEE Press, 2010; p. 86–91.

32 Pattern Recognition - Analysis and Applications

[33] Lui B. Web Data Mining. Springer; 2010.

and Development. 2014; 2(4): 640–648.

Technologies. 2014; 5(2): 2095–2098.

and Its Applications. 2013; 7(5):153–166.

Research and Technology. 2014; 12:145–152.

2009; 7306.

