**Meet the editor**

Helio J.C. Barbosa (www.lncc.br/~hcbm) is a Senior Technologist at the Laboratório Nacional de Computação Científica, Brazil. He received a Civil Engineering degree (1974) from the Federal University of Juiz de Fora, where he is an Associate Professor in the Computer Science Department, and M.Sc. (1978) and D.Sc. (1986) degrees in Civil

Engineering from the Federal University of Rio de Janeiro, Brazil. During 1988-1990 he was a visiting scholar at the Division of Applied Mechanics, Stanford University, USA. He is currently mainly interested in the design and application of nature-inspired metaheuristics in engineering and biology.

Contents

**Preface VII**

Pierre Delisle

**Section 2 Applications 85**

**Processing Units 63**

**Traffic Control 87**

**Optimization 107** Anikó Csébfalvi

**Approach 129**

Satoshi Kurihara

Chapter 1 **Ant Colony Optimization Toward Feature Selection 3** Monirul Kabir, Md Shahjahan and Kazuyuki Murase

**Hardware Implementations 45**

Chapter 4 **An Ant Colony Optimization Algorithm for Area**

Chapter 2 **Parallel Ant Colony Optimization: Algorithmic Models and**

Chapter 3 **Strategies for Parallel Ant Colony Optimization on Graphics**

Soner Haldenbilen, Ozgur Baskan and Cenk Ozan

Chapter 5 **ANGEL: A Simplified Hybrid Metaheuristic for Structural**

Chapter 6 **Scheduling in Manufacturing Systems – Ant Colony**

**Communication Model 163**

Mieczysław Drabowski and Edward Wantuch

Chapter 7 **Traffic-Congestion Forecasting Algorithm Based on Pheromone**

Jaqueline S. Angelo, Douglas A. Augusto and Helio J. C. Barbosa

**Section 1 Techniques 1**

## Contents

## **Preface XI**


Chapter 7 **Traffic-Congestion Forecasting Algorithm Based on Pheromone Communication Model 163** Satoshi Kurihara

## Chapter 8 **Ant Colony Algorithm with Applications in the Field of Genomics 177**

R. Rekaya, K. Robbins, M. Spangler, S. Smith, E. H. Hay and K. Bertrand

Preface

this book.

Scientists have long been interested in understanding the behavior of ants and other social insects that, in spite of the relative simplicity of each individual, are able to collectively accom‐ plish complex tasks required by the survival of the colony. For instance, biologists have shown that behavior at the colony level, such as foraging, can be explained via the concept of stigmer‐

Ant Colony Optimization (ACO) is thus one of the best examples of how studies aimed at modeling such complex natural systems can provide inspiration for the development of computational algorithms for the solution of hard mathematical problems. The first ACO system, inspired by the observation of pheromone trails, was introduced by Marco Dorigo in his Ph.D. thesis (1992) and initially applied to the travelling salesman problem. Since then, the field has experienced a continuous growth, with the development of many ACO variants, the organization of specialized conferences and scientific journals, and the emer‐ gence of several successful applications. Today ACO stands as an important nature-inspired

This book is divided into two parts: (I) Techniques, and (II) Applications, and presents stateof-the-art ACO methods and recent contributions to diverse fields, such as traffic congestion

I would like to thank all contributing authors for their effort in preparing their chapters, and to acknowledge the assistance provided by the InTech Publishing Process Managers Mr. Marijan Polic, Mr. Vedran Greblo, and Mr. Dejan Grgur during the entire editing process of

**Dr. Helio J.C. Barbosa,**

Petrópolis, Brazil

Federal University of Juiz de Fora, Computer Science Department,

LNCC - National Laboratory for Scientific Computation

gy, a form of indirect communication mediated by modifications of the environment.

stochastic metaheuristic for many difficult optimization problems.

and control, structural optimization, manufacturing, and genomics.

## Preface

Chapter 8 **Ant Colony Algorithm with Applications in the Field of**

R. Rekaya, K. Robbins, M. Spangler, S. Smith, E. H. Hay and K.

**Genomics 177**

Bertrand

**VI** Contents

Scientists have long been interested in understanding the behavior of ants and other social insects that, in spite of the relative simplicity of each individual, are able to collectively accom‐ plish complex tasks required by the survival of the colony. For instance, biologists have shown that behavior at the colony level, such as foraging, can be explained via the concept of stigmer‐ gy, a form of indirect communication mediated by modifications of the environment.

Ant Colony Optimization (ACO) is thus one of the best examples of how studies aimed at modeling such complex natural systems can provide inspiration for the development of computational algorithms for the solution of hard mathematical problems. The first ACO system, inspired by the observation of pheromone trails, was introduced by Marco Dorigo in his Ph.D. thesis (1992) and initially applied to the travelling salesman problem. Since then, the field has experienced a continuous growth, with the development of many ACO variants, the organization of specialized conferences and scientific journals, and the emer‐ gence of several successful applications. Today ACO stands as an important nature-inspired stochastic metaheuristic for many difficult optimization problems.

This book is divided into two parts: (I) Techniques, and (II) Applications, and presents stateof-the-art ACO methods and recent contributions to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics.

I would like to thank all contributing authors for their effort in preparing their chapters, and to acknowledge the assistance provided by the InTech Publishing Process Managers Mr. Marijan Polic, Mr. Vedran Greblo, and Mr. Dejan Grgur during the entire editing process of this book.

> **Dr. Helio J.C. Barbosa,** Federal University of Juiz de Fora, Computer Science Department, LNCC - National Laboratory for Scientific Computation Petrópolis, Brazil

**Section 1**

**Techniques**

**Section 1**

## **Techniques**

**Chapter 1**

**Ant Colony Optimization Toward Feature Selection**

Over the past decades, there is an explosion of data composed by huge information, because of rapid growing up of computer and database technologies. Ordinarily, this information is hidden in the cast collection of raw data. Because of that, we are now drowning in informa‐ tion, but starving for knowledge [1]. As a solution, data mining successfully extracts knowl‐ edge from the series of data-mountains by means of data preprocessing [1]. In case of data preprocessing, feature selection (FS) is ordinarily used as a useful technique in order to reduce the dimension of the dataset. It significantly reduces the spurious information, that is to say, irrelevant, redundant, and noisy features, from the original feature set and eventually retain‐ ing a subset of most salient features. As a result, a number of good outcomes can be expect‐ ed from the applications, such as, speeding up data mining algorithms, improving mining

performances (including predictive accuracy) and comprehensibility of result [2].

In the available literature, different types of data mining are addressed, such as, regression, classification, and clustering [1]. The task of interest in this study is classification. In fact, classification problem is the task of assigning a data-point to a predefined class or group according to its predictive characteristics. In practice, data mining for classification techni‐ ques are significant in a wide range of domains, such as, financial engineering, medical diagnosis,

In details, FS is, however, a search process or technique in data mining that selects a sub‐ set of salient features for building robust learning models, such as, neural networks and decision trees. Some irrelevant and/or redundant features generally exist in the learning data that not only make learning harder, but also degrade generalization performance of learned models. More precisely, good FS techniques can detect and ignore noisy and mis‐ leading features. As a result, the dataset quality might even increase after selection. There are two feature qualities that need to be considered in FS methods: relevancy and redun‐ dancy. A feature is said to be relevant if it is predictive of the decision feature(s); other‐

> © 2013 Kabir et al.; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

© 2013 Kabir et al.; licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

distribution, and reproduction in any medium, provided the original work is properly cited.

Monirul Kabir, Md Shahjahan and Kazuyuki Murase

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51707

**1. Introduction**

and marketing.

## **Ant Colony Optimization Toward Feature Selection**

Monirul Kabir, Md Shahjahan and Kazuyuki Murase

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51707

## **1. Introduction**

Over the past decades, there is an explosion of data composed by huge information, because of rapid growing up of computer and database technologies. Ordinarily, this information is hidden in the cast collection of raw data. Because of that, we are now drowning in informa‐ tion, but starving for knowledge [1]. As a solution, data mining successfully extracts knowl‐ edge from the series of data-mountains by means of data preprocessing [1]. In case of data preprocessing, feature selection (FS) is ordinarily used as a useful technique in order to reduce the dimension of the dataset. It significantly reduces the spurious information, that is to say, irrelevant, redundant, and noisy features, from the original feature set and eventually retain‐ ing a subset of most salient features. As a result, a number of good outcomes can be expect‐ ed from the applications, such as, speeding up data mining algorithms, improving mining performances (including predictive accuracy) and comprehensibility of result [2].

In the available literature, different types of data mining are addressed, such as, regression, classification, and clustering [1]. The task of interest in this study is classification. In fact, classification problem is the task of assigning a data-point to a predefined class or group according to its predictive characteristics. In practice, data mining for classification techni‐ ques are significant in a wide range of domains, such as, financial engineering, medical diagnosis, and marketing.

In details, FS is, however, a search process or technique in data mining that selects a sub‐ set of salient features for building robust learning models, such as, neural networks and decision trees. Some irrelevant and/or redundant features generally exist in the learning data that not only make learning harder, but also degrade generalization performance of learned models. More precisely, good FS techniques can detect and ignore noisy and mis‐ leading features. As a result, the dataset quality might even increase after selection. There are two feature qualities that need to be considered in FS methods: relevancy and redun‐ dancy. A feature is said to be relevant if it is predictive of the decision feature(s); other‐

© 2013 Kabir et al.; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2013 Kabir et al.; licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

wise, it is irrelevant. A feature is considered to be redundant if it is highly correlated with other features. An informative feature is the one that is highly correlated with the deci‐ sion concept(s), but is highly uncorrelated with other features.

For a given classification task, the problem of FS can be described as follows: given the original set, *N*, of *n* features, find a subset *F* consisting of *f* relevant features, where *F* ⊂ *N* and *f* <*n*. The aim of selecting *F* is to maximize the classification accuracy in building learning mod‐ els. The selection of relevant features is important in the sense that the generalization perform‐ ance of learning models is greatly dependent on the selected features [3-6]. Moreover, FS assists for visualizing and understanding the data, reducing storage requirements, reducing train‐ ing times and so on [7].

It is found that, two features to be useless individually and yet highly predictive if taken to‐ gether. In FS terminology, they may be both redundant and irrelevant on their own, but their combination provides important information. For instance, in the Exclusive-OR prob‐ lem, the classes are not linearly separable. The two features on their own provide no infor‐ mation concerning this separability, because they are uncorrelated with each other. However, considering together, the two features are highly informative and can provide good predictive accuracy. Therefore, the search of FS is particularly for high-quality feature subsets and not only for ranking of features.

## **2. Applications of Feature Selection**

Feature selection has a wide-range of applications in various fields since the 1970s. The rea‐ son is that, many systems deal with datasets of large dimensionality. However, the areas, in which the task of FS can mainly be applied, are categorized into the following ways (see Figure 1.).

**Figure 2.** Picture taken by a camera from a fish processing industry, adapted from [8].

In the pattern recognition paradigm, the FS tasks are mostly concerned with the classifica‐ tion problems. Basically, pattern recognition is the study of how machines can monitor the environment, learn to differentiate patterns of interest, and make decision correctly about the categories of patterns. A pattern, ordinarily, contains some features based on classifying a target or object. As an example, a classification problem, that is to say, sorting incoming fish on a conveyor belt in a fish industry according to species. Assume that, there are only two kinds of fish available, such as, salmon and sea bass, exhibited in Figure 2. A machine gives the decision in classifying the fishes automatically based on training of some features, for example, length, width, weight, number and shape of fins, tail shape, and so on. But, prob‐ lem is that, if there are some irrelevant, redundant, and noisy features are available, classifi‐ cation performance then might be degraded. In such cases, FS has a significant performance to recognize the useless features from the patterns, delete the features, and finally bring the improved classification performance significantly in the context of pattern recognition.

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

5

FS technique has successfully been implemented in mobile robot vision to generate efficient navigation trajectories with an extremely simple neural control system [9]. In this case, evolved mobile robots select the salient visual features and actively maintain them on the same retinal position, while the useless image features are discarded. According to the anal‐

**Figure 1.** Applicable areas of feature selection.

**Figure 2.** Picture taken by a camera from a fish processing industry, adapted from [8].

wise, it is irrelevant. A feature is considered to be redundant if it is highly correlated with other features. An informative feature is the one that is highly correlated with the deci‐

For a given classification task, the problem of FS can be described as follows: given the original set, *N*, of *n* features, find a subset *F* consisting of *f* relevant features, where *F* ⊂ *N* and *f* <*n*. The aim of selecting *F* is to maximize the classification accuracy in building learning mod‐ els. The selection of relevant features is important in the sense that the generalization perform‐ ance of learning models is greatly dependent on the selected features [3-6]. Moreover, FS assists for visualizing and understanding the data, reducing storage requirements, reducing train‐

It is found that, two features to be useless individually and yet highly predictive if taken to‐ gether. In FS terminology, they may be both redundant and irrelevant on their own, but their combination provides important information. For instance, in the Exclusive-OR prob‐ lem, the classes are not linearly separable. The two features on their own provide no infor‐ mation concerning this separability, because they are uncorrelated with each other. However, considering together, the two features are highly informative and can provide good predictive accuracy. Therefore, the search of FS is particularly for high-quality feature

Feature selection has a wide-range of applications in various fields since the 1970s. The rea‐ son is that, many systems deal with datasets of large dimensionality. However, the areas, in which the task of FS can mainly be applied, are categorized into the following ways (see

sion concept(s), but is highly uncorrelated with other features.

4 Ant Colony Optimization - Techniques and Applications

ing times and so on [7].

Figure 1.).

subsets and not only for ranking of features.

**2. Applications of Feature Selection**

**Figure 1.** Applicable areas of feature selection.

In the pattern recognition paradigm, the FS tasks are mostly concerned with the classifica‐ tion problems. Basically, pattern recognition is the study of how machines can monitor the environment, learn to differentiate patterns of interest, and make decision correctly about the categories of patterns. A pattern, ordinarily, contains some features based on classifying a target or object. As an example, a classification problem, that is to say, sorting incoming fish on a conveyor belt in a fish industry according to species. Assume that, there are only two kinds of fish available, such as, salmon and sea bass, exhibited in Figure 2. A machine gives the decision in classifying the fishes automatically based on training of some features, for example, length, width, weight, number and shape of fins, tail shape, and so on. But, prob‐ lem is that, if there are some irrelevant, redundant, and noisy features are available, classifi‐ cation performance then might be degraded. In such cases, FS has a significant performance to recognize the useless features from the patterns, delete the features, and finally bring the improved classification performance significantly in the context of pattern recognition.

FS technique has successfully been implemented in mobile robot vision to generate efficient navigation trajectories with an extremely simple neural control system [9]. In this case, evolved mobile robots select the salient visual features and actively maintain them on the same retinal position, while the useless image features are discarded. According to the anal‐ ysis of evolved solutions, it can be found that, robots develop simple and very efficient edge detection to detect obstacles and to move away among them. Furthermore, FS has a signifi‐ cant role in image recognition systems [10]. In these systems, patterns are designed by im‐ age data specially describing the image pixel data. There could be hundreds of different features for an image. These features may include: *color* (in various channels), *texture* (di‐ mensionality, line likeness, contrast, roughness, coarseness), *edge*, *shape*, *spatial relations*, *tem‐ poral information*, *statistical measures* (moments- mean, variance, standard deviation, skewness, kurtosis). The FS expert can identify a subset of relevant features from the whole feature set.

the various applications, in this chapter, we are interested to discuss elaborately in a particu‐ lar topic of "pattern recognition", in which how FS task can play an important role especial‐ ly for the classification problem. The reason is that, in the recent years, solving classification problem using FS is a key source for the data mining and knowledge mining paradigm.

In the recent years, the available real-world problems of the classification tasks draw a high demand for FS, since the datasets of those problems are mixed by a number of irrelevant and redundant features. In practice, FS tasks work on basis of the classification datasets that are publicly available. The most popular benchmark dataset collection is the University of California, Irvine (UCI) machine learning repository [15]. The collection of UCI is mostly row data that must be preprocessed to use in NNs. Preprocessed datasets in it include Pro‐ ben1 [16]. The characteristics of the datasets particularly those were used in the experiments of this chapter, and their partitions are summarized in Table 1. The details of the table show a considerable diversity in the number of examples, features, and classes among the data‐ sets. All datasets were partitioned into three sets: a training set, a validation set, and a test‐ ing set, according to the suggestion mentioned in [16]. For all datasets, the first *P* 1 examples were used for the training set, the following *P* <sup>2</sup> examples for the validation set, and the final *P* 3 examples for the testing set. The above mentioned datasets were used widely in many previous studies and they represent some of the most challenging datasets in the NN and

**Datasets Features Classes Examples Partition sets**

**Table 1.** Characteristics and partitions of different classification datasets.

cancer (e.g., lymphoma and leukemia), that are described in [19] and [20].

Cancer 9 2 699 349 175 175 Glass 9 6 214 108 53 53 Vehicle 18 4 846 424 211 211 Thyroid 21 3 7200 3600 1800 1800 Ionosphere 34 2 351 175 88 88 Credit Card 51 2 690 346 172 172 Sonar 60 2 208 104 52 52 Gene 120 3 3175 1587 794 794 Colon cancer 2000 2 62 30 16 16

The description of the datasets reported in Table 1 is available in [15], except colon cancer, which can be found in [18]. There are also some other gene expression datasets like colon

Training Validation Testing

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

7

**3. Feature Selection for Classification**

machine learning [12, 17].

In analysis of human genome, gene expression microarray data have increased many folds in recent years. These data provide the opportunity to analyze the expression levels of thou‐ sand or tens of thousands of genes in a single experiment. A particular classification task distinguishes between healthy and cancer patients based on their gene expression profile. On the other hand, a typical gene expression data suffer from three problems:


Therefore, suitable FS methods (e.g., [11, 12]) are used upon these datasets to find out a min‐ imal set of gene that has sufficient classifying power to classify subgroups along with some initial filtering.

Text classification is, nowadays, a vital task because of the availability of the proliferated texts in the digital form. We need to access these texts in the flexible ways. A major prob‐ lem in regard to the text classification is the high dimensionality of the feature space. It is found that, text feature space has several tens of thousands of features, among which most of them are irrelevant and spurious for the text classification tasks. This high number of features resulting the reduction of classification accuracy and of learning speed of the clas‐ sifiers. Because of those features, a number of classifiers are being unable to utilize in their learning tasks. For this, FS is such a technique that is very much efficient for the text classi‐ fication task in order to reduce the feature dimensionality and to improve the perform‐ ance of the classifiers [13].

Knowledge discovery (KD) is an efficient process of identifying valid, novel, potentially use‐ ful, and ultimately understandable patterns from the large collections of data [14]. Indeed, the popularity of KD is caused due to our daily basis demands by federal agencies, banks, insurance companies, retail stores, and so on. One of the important KD steps is the data min‐ ing step. In the context of data mining, feature selection cleans up the dataset by reducing the set of least significant features. This step ultimately helps to extract some rules from the dataset, such as, *if---then* rule. This rule signifies the proper understanding about the data and increases the human capability to predict what is happening inside the data.

It is now clear that, FS task has an important role in various places, where one can easily produce better performances from the systems by distinguishing the salient features. Among the various applications, in this chapter, we are interested to discuss elaborately in a particu‐ lar topic of "pattern recognition", in which how FS task can play an important role especial‐ ly for the classification problem. The reason is that, in the recent years, solving classification problem using FS is a key source for the data mining and knowledge mining paradigm.

## **3. Feature Selection for Classification**

ysis of evolved solutions, it can be found that, robots develop simple and very efficient edge detection to detect obstacles and to move away among them. Furthermore, FS has a signifi‐ cant role in image recognition systems [10]. In these systems, patterns are designed by im‐ age data specially describing the image pixel data. There could be hundreds of different features for an image. These features may include: *color* (in various channels), *texture* (di‐ mensionality, line likeness, contrast, roughness, coarseness), *edge*, *shape*, *spatial relations*, *tem‐ poral information*, *statistical measures* (moments- mean, variance, standard deviation, skewness, kurtosis). The FS expert can identify a subset of relevant features from the whole

In analysis of human genome, gene expression microarray data have increased many folds in recent years. These data provide the opportunity to analyze the expression levels of thou‐ sand or tens of thousands of genes in a single experiment. A particular classification task distinguishes between healthy and cancer patients based on their gene expression profile.

Therefore, suitable FS methods (e.g., [11, 12]) are used upon these datasets to find out a min‐ imal set of gene that has sufficient classifying power to classify subgroups along with some

Text classification is, nowadays, a vital task because of the availability of the proliferated texts in the digital form. We need to access these texts in the flexible ways. A major prob‐ lem in regard to the text classification is the high dimensionality of the feature space. It is found that, text feature space has several tens of thousands of features, among which most of them are irrelevant and spurious for the text classification tasks. This high number of features resulting the reduction of classification accuracy and of learning speed of the clas‐ sifiers. Because of those features, a number of classifiers are being unable to utilize in their learning tasks. For this, FS is such a technique that is very much efficient for the text classi‐ fication task in order to reduce the feature dimensionality and to improve the perform‐

Knowledge discovery (KD) is an efficient process of identifying valid, novel, potentially use‐ ful, and ultimately understandable patterns from the large collections of data [14]. Indeed, the popularity of KD is caused due to our daily basis demands by federal agencies, banks, insurance companies, retail stores, and so on. One of the important KD steps is the data min‐ ing step. In the context of data mining, feature selection cleans up the dataset by reducing the set of least significant features. This step ultimately helps to extract some rules from the dataset, such as, *if---then* rule. This rule signifies the proper understanding about the data

It is now clear that, FS task has an important role in various places, where one can easily produce better performances from the systems by distinguishing the salient features. Among

and increases the human capability to predict what is happening inside the data.

On the other hand, a typical gene expression data suffer from three problems:

**a.** limited number of available examples, **b.** very high dimensional nature of data,

6 Ant Colony Optimization - Techniques and Applications

**c.** noisy characteristics of the data.

feature set.

initial filtering.

ance of the classifiers [13].

In the recent years, the available real-world problems of the classification tasks draw a high demand for FS, since the datasets of those problems are mixed by a number of irrelevant and redundant features. In practice, FS tasks work on basis of the classification datasets that are publicly available. The most popular benchmark dataset collection is the University of California, Irvine (UCI) machine learning repository [15]. The collection of UCI is mostly row data that must be preprocessed to use in NNs. Preprocessed datasets in it include Pro‐ ben1 [16]. The characteristics of the datasets particularly those were used in the experiments of this chapter, and their partitions are summarized in Table 1. The details of the table show a considerable diversity in the number of examples, features, and classes among the data‐ sets. All datasets were partitioned into three sets: a training set, a validation set, and a test‐ ing set, according to the suggestion mentioned in [16]. For all datasets, the first *P* 1 examples were used for the training set, the following *P* <sup>2</sup> examples for the validation set, and the final *P* 3 examples for the testing set. The above mentioned datasets were used widely in many previous studies and they represent some of the most challenging datasets in the NN and machine learning [12, 17].


**Table 1.** Characteristics and partitions of different classification datasets.

The description of the datasets reported in Table 1 is available in [15], except colon cancer, which can be found in [18]. There are also some other gene expression datasets like colon cancer (e.g., lymphoma and leukemia), that are described in [19] and [20].

## **4. Existing Works for Feature Selection**

A number of proposed approaches for solving FS problem that can broadly be categorized into the following three classifications [2]:

In solutions for FS, filter approaches are faster to implement, since they estimate the perform‐ ance of features without any actual model assumed between outputs and inputs of the da‐ ta. A feature can be selected or deleted on the basis of some predefined criteria, such as, mutual information [39], principal component analysis [43], independent component analysis [44], class separability measure [45], or variable ranking [46]. Filter approaches have the advant‐ age of computational efficiency, but the saliency of the selected features is insufficient, be‐

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

9

In order to implement the wrapper approaches, a number of algorithms ([4,24,26,30,47]) have been proposed that use sequential search strategies in finding a subset of salient fea‐ tures. In [24], features are added to a neural network (NN) according to SFS during training. The addition process is terminated when the performance of the trained classifier is degrad‐ ed. Recently, Kabir et al. [47] proposed approach has drawn much attention in SFS-based feature selections. In this approach, correlated (distinct) features from two groups, namely, similar and dissimilar, are added to the NN training model sequentially. At the end of the training process, when the NN classifier has captured all the necessary information of a giv‐ en dataset, a subset of salient features is generated with reduced redundancy of information. In a number of other studies (e.g., [4,26,30]), SBS is incorporated in FS using a NN, where the least salient features have been deleted in stepwise fashion during training. In this con‐ text, different algorithms employ different heuristic techniques for measuring saliency of features. In [24], saliency of features is measured using a NN training scheme, in which only one feature is used in the input layer at a time. Two different weight analysis-based heuristic techniques are employed in [4] and [26] for computing the saliency of features. Furthermore, in [30], a full feature set NN training scheme is used, where each feature is temporarily de‐

The value of a loss function, consisting of cross entropy with a penalty function, is consid‐ ered directly for measuring the saliency of a feature in [5] and [6]. In [5], the penalty func‐ tion encourages small weights to converge to zero, or prevents weights from converging to large values. After the penalty function has finished running, those features that have small‐ er weights are sequentially deleted during training as being irrelevant. On the other hand, in [6], the penalty function forces a network to keep the derivatives of the values of its neurons' transfer functions low. The aim of such a restriction is to reduce output sensitivity to input changes. In the FS process, feature removal operations are performed sequentially, especial‐ ly for those features that do not degrade accuracy of the NN upon removal. A class-depend‐ ent FS algorithm in [38], selects a desirable feature subset for each class. It first divides a *C* class classification problem into *C* two-class classification problems. Then, the features are integrated to train a support vector machine (SVM) using a SFS strategy in order to find the feature subset of each binary classification problem. Pal and Chintalapudi [36] has proposed a SBS-based FS technique that multiplies an attenuation function by each feature before al‐ lowing the features to be entered into the NN training. This FS technique is the root for pro‐ posing another FS algorithm in [48]. Rakotomamonjy [37] has proposed new FS criteria that are derived from SVMs and that are based on the sensitivity of generalization error bounds

cause they do not take into account the biases of classification models.

leted with a cross-check of NN performance.

with respect to features.


In the wrapper approach, a predetermined learning model is assumed, wherein features are selected that justify the learning performance of the particular learning model [21], whereas in the filter approach, statistical analysis of the feature set is required, without utilizing any learning model [22]. The hybrid approach attempts to utilize the complementary strengths of the wrapper and filter approaches [23]. The meta-heuristics (or, global search approaches) attempt to search a salient feature subset in a full feature space in order to find a high-quali‐ ty solution using mutual cooperation of individual agents, such as, genetic algorithm, ant colony optimization, and so on [64]. Now, the schematic diagrams of how the filter, wrap‐ per, and hybrid approaches find relevant (salient) features are given in Figures 3(a,b,c). These figures are summarized according to the investigations of different FS related works.

Subsets can be generated and the search process carried out in a number of ways. One method, called sequential forward search (SFS[24,25]), is to start the search process with an empty set and successfully add features; another option called sequential backward search (SBS[4,26]), is to start with a full set and successfully remove features. In addition, a third alternative, called bidirectional selection [27], is to start on both ends and add and remove features simultane‐ ously. A fourth method [28, 29], is to have a search process start with a randomly selected subset using a sequential or bidirectional strategy. Yet another search strategy, called com‐ plete search [2], may give a best solution to an FS task due to the thoroughness of its search, but is not feasible when dealing with a large number of features. Alternatively, the sequen‐ tial strategy is simple to implement and fast, but is affected by the nesting effect [3], wherein once a feature is added (or, deleted) it cannot be deleted (or, added) later. In order to over‐ come such disadvantages of the sequential search strategy, another search strategy, called the floating search strategy [3], has been implemented.

Search strategy considerations for any FS algorithm are a vital part in finding salient fea‐ tures of a given dataset [2]. Numerous algorithms have been proposed to address the prob‐ lem of searching. Most algorithms use either a sequential search (for example, [4,5,24,26,30]) or a global search (e.g., [11,23,31-35]). On the basis of guiding the search strategies and evalu‐ ating the subsets, in contrast, the existing FS algorithms can be grouped into the following three approaches: wrapper (e.g., [4,6,30,36-38]), filter (e.g., [40,41]), and hybrid (e.g., [23,42]). It is well-known that wrapper approaches always return features with a higher saliency than filter approaches, as the former utilize the association of features collectively during the learning process, but are computationally more expensive [2]).

In solutions for FS, filter approaches are faster to implement, since they estimate the perform‐ ance of features without any actual model assumed between outputs and inputs of the da‐ ta. A feature can be selected or deleted on the basis of some predefined criteria, such as, mutual information [39], principal component analysis [43], independent component analysis [44], class separability measure [45], or variable ranking [46]. Filter approaches have the advant‐ age of computational efficiency, but the saliency of the selected features is insufficient, be‐ cause they do not take into account the biases of classification models.

**4. Existing Works for Feature Selection**

floating search strategy [3], has been implemented.

process, but are computationally more expensive [2]).

into the following three classifications [2]:

8 Ant Colony Optimization - Techniques and Applications

**a.** wrapper,

**b.** filter, and

**c.** hybrid.

A number of proposed approaches for solving FS problem that can broadly be categorized

**d.** Other than these classifications, there is also another one, called as, meta-heuristic.

In the wrapper approach, a predetermined learning model is assumed, wherein features are selected that justify the learning performance of the particular learning model [21], whereas in the filter approach, statistical analysis of the feature set is required, without utilizing any learning model [22]. The hybrid approach attempts to utilize the complementary strengths of the wrapper and filter approaches [23]. The meta-heuristics (or, global search approaches) attempt to search a salient feature subset in a full feature space in order to find a high-quali‐ ty solution using mutual cooperation of individual agents, such as, genetic algorithm, ant colony optimization, and so on [64]. Now, the schematic diagrams of how the filter, wrap‐ per, and hybrid approaches find relevant (salient) features are given in Figures 3(a,b,c). These figures are summarized according to the investigations of different FS related works.

Subsets can be generated and the search process carried out in a number of ways. One method, called sequential forward search (SFS[24,25]), is to start the search process with an empty set and successfully add features; another option called sequential backward search (SBS[4,26]), is to start with a full set and successfully remove features. In addition, a third alternative, called bidirectional selection [27], is to start on both ends and add and remove features simultane‐ ously. A fourth method [28, 29], is to have a search process start with a randomly selected subset using a sequential or bidirectional strategy. Yet another search strategy, called com‐ plete search [2], may give a best solution to an FS task due to the thoroughness of its search, but is not feasible when dealing with a large number of features. Alternatively, the sequen‐ tial strategy is simple to implement and fast, but is affected by the nesting effect [3], wherein once a feature is added (or, deleted) it cannot be deleted (or, added) later. In order to over‐ come such disadvantages of the sequential search strategy, another search strategy, called the

Search strategy considerations for any FS algorithm are a vital part in finding salient fea‐ tures of a given dataset [2]. Numerous algorithms have been proposed to address the prob‐ lem of searching. Most algorithms use either a sequential search (for example, [4,5,24,26,30]) or a global search (e.g., [11,23,31-35]). On the basis of guiding the search strategies and evalu‐ ating the subsets, in contrast, the existing FS algorithms can be grouped into the following three approaches: wrapper (e.g., [4,6,30,36-38]), filter (e.g., [40,41]), and hybrid (e.g., [23,42]). It is well-known that wrapper approaches always return features with a higher saliency than filter approaches, as the former utilize the association of features collectively during the learning In order to implement the wrapper approaches, a number of algorithms ([4,24,26,30,47]) have been proposed that use sequential search strategies in finding a subset of salient fea‐ tures. In [24], features are added to a neural network (NN) according to SFS during training. The addition process is terminated when the performance of the trained classifier is degrad‐ ed. Recently, Kabir et al. [47] proposed approach has drawn much attention in SFS-based feature selections. In this approach, correlated (distinct) features from two groups, namely, similar and dissimilar, are added to the NN training model sequentially. At the end of the training process, when the NN classifier has captured all the necessary information of a giv‐ en dataset, a subset of salient features is generated with reduced redundancy of information. In a number of other studies (e.g., [4,26,30]), SBS is incorporated in FS using a NN, where the least salient features have been deleted in stepwise fashion during training. In this con‐ text, different algorithms employ different heuristic techniques for measuring saliency of features. In [24], saliency of features is measured using a NN training scheme, in which only one feature is used in the input layer at a time. Two different weight analysis-based heuristic techniques are employed in [4] and [26] for computing the saliency of features. Furthermore, in [30], a full feature set NN training scheme is used, where each feature is temporarily de‐ leted with a cross-check of NN performance.

The value of a loss function, consisting of cross entropy with a penalty function, is consid‐ ered directly for measuring the saliency of a feature in [5] and [6]. In [5], the penalty func‐ tion encourages small weights to converge to zero, or prevents weights from converging to large values. After the penalty function has finished running, those features that have small‐ er weights are sequentially deleted during training as being irrelevant. On the other hand, in [6], the penalty function forces a network to keep the derivatives of the values of its neurons' transfer functions low. The aim of such a restriction is to reduce output sensitivity to input changes. In the FS process, feature removal operations are performed sequentially, especial‐ ly for those features that do not degrade accuracy of the NN upon removal. A class-depend‐ ent FS algorithm in [38], selects a desirable feature subset for each class. It first divides a *C* class classification problem into *C* two-class classification problems. Then, the features are integrated to train a support vector machine (SVM) using a SFS strategy in order to find the feature subset of each binary classification problem. Pal and Chintalapudi [36] has proposed a SBS-based FS technique that multiplies an attenuation function by each feature before al‐ lowing the features to be entered into the NN training. This FS technique is the root for pro‐ posing another FS algorithm in [48]. Rakotomamonjy [37] has proposed new FS criteria that are derived from SVMs and that are based on the sensitivity of generalization error bounds with respect to features.

Unlike sequential search-based FS approaches, global search approaches (or, meta-heuris‐ tics) start a search in a full feature space instead of a partial feature space in order to find a high-quality solution. The strategy of these algorithms is based on the mutual cooperation of individual agents. A standard genetic algorithm (GA) has been used for FS [35], where fixed length strings in a population set represent a feature subset. The population set evolves over time to converge to an optimal solution via crossover and mutation operations. A number of other algorithms exist (e.g., [22,23]), in which GAs are used for solving FS. A hybrid ap‐ proach [23] for FS has been proposed that incorporates the filter and wrapper approaches in a cooperative manner. A filter approach involving mutual information computation is used here as a local search to rank features. A wrapper approach involving GAs is used here as global search to find a subset of salient features from the ranked features. In [22], two basic operations, namely, deletion and addition are incorporated that seek the least significant and most significant features for making a stronger local search during FS.

ACO is predominantly a useful tool, considered as a modern algorithm that has been used in several studies (e.g., [11,31,42,49-52]) for selecting salient features. During the operation of this algorithm, a number of artificial ants traverse the feature space to construct feature subsets iteratively. During subset construction (SC), the existing approaches ([11,42,49-52]) define the size of the constructed subsets by a fixed number for each iteration, whereas the SFS strat‐ egy has been followed in [31,49], and [51]. In order to measure the heuristic values of fea‐ tures during FS, some of the algorithms ([11,31,50,52]) use filter tools. Evaluating the constructed subsets is, on the other hand, a vital part in the study of ACO-based FS, since most algo‐ rithms design the pheromone update rules on the basis of outcomes of subset evaluations. In this regard, a scheme of training classifiers (i.e., wrapper tools) has been used in almost all of the above ACO-based FS algorithms, except for the two cases, where rough set theory and the latent variable model (i.e., filter tools) are considered, which are in [11] and [31], respectively.

A recently proposed FS [34] approach is based on rough sets and a particle swarm optimiza‐ tion (PSO) algorithm. A PSO algorithm is used for finding a subset of salient features over a large and complex feature space. The main heuristic strategy of PSO in FS is that particles fly up to a certain velocity through the feature space. PSO finds an optimal solution through the interaction of individuals in the population. Thus, PSO finds the best solution in the FS as the particles fly within the subset space. This approach is more efficient than a GA in the sense that it does not require crossover and mutation operators; simple mathematical opera‐ tors are required only.

**Figure 3.** a)Schematic diagram of filter approach. Each approach incorporates the specific search strategies. (b)Sche‐ matic diagram of wrapper approach. Each approach incorporates the specific search strategies and classifiers. Here, NN, KNN, SVM, and MLHD refer to the neural network, K-nearest neighbour, support vector machine, and maximum likelihood classifier, respectively. (c)Schematic diagram of hybrid approach. Each approach incorporates the specific search strategies and classifiers. Here, LDA, ROC, SU, MI, CI, and LVM, refer to the linear discriminant analysis classifier,

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

11

Unlike sequential search-based FS approaches, global search approaches (or, meta-heuris‐ tics) start a search in a full feature space instead of a partial feature space in order to find a high-quality solution. The strategy of these algorithms is based on the mutual cooperation of individual agents. A standard genetic algorithm (GA) has been used for FS [35], where fixed length strings in a population set represent a feature subset. The population set evolves over time to converge to an optimal solution via crossover and mutation operations. A number of other algorithms exist (e.g., [22,23]), in which GAs are used for solving FS. A hybrid ap‐ proach [23] for FS has been proposed that incorporates the filter and wrapper approaches in a cooperative manner. A filter approach involving mutual information computation is used here as a local search to rank features. A wrapper approach involving GAs is used here as global search to find a subset of salient features from the ranked features. In [22], two basic operations, namely, deletion and addition are incorporated that seek the least significant

10 Ant Colony Optimization - Techniques and Applications

and most significant features for making a stronger local search during FS.

ACO is predominantly a useful tool, considered as a modern algorithm that has been used in several studies (e.g., [11,31,42,49-52]) for selecting salient features. During the operation of this algorithm, a number of artificial ants traverse the feature space to construct feature subsets iteratively. During subset construction (SC), the existing approaches ([11,42,49-52]) define the size of the constructed subsets by a fixed number for each iteration, whereas the SFS strat‐ egy has been followed in [31,49], and [51]. In order to measure the heuristic values of fea‐ tures during FS, some of the algorithms ([11,31,50,52]) use filter tools. Evaluating the constructed subsets is, on the other hand, a vital part in the study of ACO-based FS, since most algo‐ rithms design the pheromone update rules on the basis of outcomes of subset evaluations. In this regard, a scheme of training classifiers (i.e., wrapper tools) has been used in almost all of the above ACO-based FS algorithms, except for the two cases, where rough set theory and the latent variable model (i.e., filter tools) are considered, which are in [11] and [31], respectively.

A recently proposed FS [34] approach is based on rough sets and a particle swarm optimiza‐ tion (PSO) algorithm. A PSO algorithm is used for finding a subset of salient features over a large and complex feature space. The main heuristic strategy of PSO in FS is that particles fly up to a certain velocity through the feature space. PSO finds an optimal solution through the interaction of individuals in the population. Thus, PSO finds the best solution in the FS as the particles fly within the subset space. This approach is more efficient than a GA in the sense that it does not require crossover and mutation operators; simple mathematical opera‐

tors are required only.

**Figure 3.** a)Schematic diagram of filter approach. Each approach incorporates the specific search strategies. (b)Sche‐ matic diagram of wrapper approach. Each approach incorporates the specific search strategies and classifiers. Here, NN, KNN, SVM, and MLHD refer to the neural network, K-nearest neighbour, support vector machine, and maximum likelihood classifier, respectively. (c)Schematic diagram of hybrid approach. Each approach incorporates the specific search strategies and classifiers. Here, LDA, ROC, SU, MI, CI, and LVM, refer to the linear discriminant analysis classifier,

receiver operating characteristic method, symmetrical uncertainty, mutual information, correlation information, and latent variable model, respectively.

above two approaches. The reason for the novelty and distinctness of ACOFS versus previ‐

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

13

First, ACOFS emphasizes not only the selection of a number of salient features, but also the attainment of a reduced number of them. ACOFS selects salient features of a reduced num‐ ber using a subset size determination scheme. Such a scheme works upon a bounded region and provides sizes of constructed subsets that are smaller in number. Thus, following this scheme, an ant attempts to traverse the node (or, feature) space to construct a path (or, sub‐ set). This approach is quite different from those of the existing schemes ([31,49,51]), where the ants are guided by using the SFS strategy in selecting features during the feature subset

ous algorithms (e.g., [11,31,42,49-52]) lie in the following two aspects.

**Figure 4.** Major steps of ACOFS, adapted from [64].

## **5. Common Problems**

Most of the afore-mentioned search strategies, however, attempt to find solutions in FS that range between sub-optimal and near optimal regions, since they use local search throughout the entire process, instead of global search. On the other hand, these search algorithms uti‐ lize a partial search over the feature space, and suffer from computational complexity. Con‐ sequently, near-optimal to optimal solutions are quite difficult to achieve using these algorithms. As a result, many research studies now focus on global search algorithms (or, metaheuristics) [31]). The significance of global search algorithms is that they can find a sol‐ ution in the full search space on the basis of activities of multi-agent systems that use a glob‐ al search ability utilizing local search appropriately, thus significantly increasing the ability of finding very high-quality solutions within a reasonable period of time[53]. To achieve global search, researchers have attempted simulated annealing [54], genetic algorithm [35], ant colony optimization ([49,50]), and particle swarm optimization [34] algorithms in solv‐ ing FS tasks.

On the other hand, most of the global search approaches discussed above do not use a bounded scheme to decide the size of the constructed subsets. Accordingly, in these algo‐ rithms, the selected subsets might be larger in size and include a number of least significant features. Furthermore, most of the ACO-based FS algorithms do not consider the random and probabilistic behavior of ants during SCs. Thus, the solutions found in these algorithms might be incomplete in nature. On the other hand, the above sequential search-based FS ap‐ proaches suffer from the nesting effect as they try to find subsets of salient features using a sequential search strategy. It is said that such an effect affects the generalization perform‐ ance of the learning model [3].

## **6. A New Hybrid ACO-based Feature Selection Algorithm-ACOFS**

It is found that, hybridization of several components gives rise to better overall performance in FS problem. The reason is that hybrid techniques are capable of finding a good solution, even when a single technique is often trapped with an incomplete solution [64]. Further‐ more, incorporation of any global search strategy in a hybrid system (called as hybrid metaheuristic approach) can likely provide high-quality solution in FS problem.

In this chapter, a new hybrid meta-heuristic approach for feature selection (ACOFS) has been presented that utilizes ant colony optimization. The main focus of this algorithm is to generate subsets of salient features of reduced size. ACOFS utilizes a hybrid search techni‐ que that combines the wrapper and filter approaches. In this regard, ACOFS modifies the standard pheromone update and heuristic information measurement rules based on the above two approaches. The reason for the novelty and distinctness of ACOFS versus previ‐ ous algorithms (e.g., [11,31,42,49-52]) lie in the following two aspects.

**Figure 4.** Major steps of ACOFS, adapted from [64].

receiver operating characteristic method, symmetrical uncertainty, mutual information, correlation information, and

Most of the afore-mentioned search strategies, however, attempt to find solutions in FS that range between sub-optimal and near optimal regions, since they use local search throughout the entire process, instead of global search. On the other hand, these search algorithms uti‐ lize a partial search over the feature space, and suffer from computational complexity. Con‐ sequently, near-optimal to optimal solutions are quite difficult to achieve using these algorithms. As a result, many research studies now focus on global search algorithms (or, metaheuristics) [31]). The significance of global search algorithms is that they can find a sol‐ ution in the full search space on the basis of activities of multi-agent systems that use a glob‐ al search ability utilizing local search appropriately, thus significantly increasing the ability of finding very high-quality solutions within a reasonable period of time[53]. To achieve global search, researchers have attempted simulated annealing [54], genetic algorithm [35], ant colony optimization ([49,50]), and particle swarm optimization [34] algorithms in solv‐

On the other hand, most of the global search approaches discussed above do not use a bounded scheme to decide the size of the constructed subsets. Accordingly, in these algo‐ rithms, the selected subsets might be larger in size and include a number of least significant features. Furthermore, most of the ACO-based FS algorithms do not consider the random and probabilistic behavior of ants during SCs. Thus, the solutions found in these algorithms might be incomplete in nature. On the other hand, the above sequential search-based FS ap‐ proaches suffer from the nesting effect as they try to find subsets of salient features using a sequential search strategy. It is said that such an effect affects the generalization perform‐

**6. A New Hybrid ACO-based Feature Selection Algorithm-ACOFS**

heuristic approach) can likely provide high-quality solution in FS problem.

It is found that, hybridization of several components gives rise to better overall performance in FS problem. The reason is that hybrid techniques are capable of finding a good solution, even when a single technique is often trapped with an incomplete solution [64]. Further‐ more, incorporation of any global search strategy in a hybrid system (called as hybrid meta-

In this chapter, a new hybrid meta-heuristic approach for feature selection (ACOFS) has been presented that utilizes ant colony optimization. The main focus of this algorithm is to generate subsets of salient features of reduced size. ACOFS utilizes a hybrid search techni‐ que that combines the wrapper and filter approaches. In this regard, ACOFS modifies the standard pheromone update and heuristic information measurement rules based on the

latent variable model, respectively.

12 Ant Colony Optimization - Techniques and Applications

**5. Common Problems**

ing FS tasks.

ance of the learning model [3].

First, ACOFS emphasizes not only the selection of a number of salient features, but also the attainment of a reduced number of them. ACOFS selects salient features of a reduced num‐ ber using a subset size determination scheme. Such a scheme works upon a bounded region and provides sizes of constructed subsets that are smaller in number. Thus, following this scheme, an ant attempts to traverse the node (or, feature) space to construct a path (or, sub‐ set). This approach is quite different from those of the existing schemes ([31,49,51]), where the ants are guided by using the SFS strategy in selecting features during the feature subset construction. However, a problem is that, SFS requires an appropriate stopping criterion to stop the SC. Otherwise, a number of irrelevant features may be included in the constructed subsets, and the solutions may not be effective. To solve this problem, some algorithms ([11,42,50,52]) define the size of a constructed subset by a fixed number for each iteration for all ants, which is incremented at a fixed rate for following iterations. This technique could be inefficient if the fixed number becomes too large or too small. Therefore, deciding the subset size within a reduced area may be a good step for constructing the subset while the ants tra‐ verse through the feature space.

Second, ACOFS utilizes a hybrid search technique for selecting salient features that com‐ bines the advantages of the wrapper and filter approaches. An alternative name for such a search technique is "ACO search". This technique is designed with two sets of new rules for pheromone update and heuristic information measurement. The idea of these rules is based mainly on the random and probabilistic behaviors of ants while selecting features during SC. The aim is to provide the correct information to the features and to maintain an effective balance between exploitation and exploration of ants during SC. Thus, ACOFS achieves a strong search capability that helps to select a smaller number of the most salient features among a feature set. In contrast, the existing approaches ([11,31,42,49-52]) try to design rules without distin‐ guishing between the random and probabilistic behaviors of ants during the construction of a subset. Consequently, ants may be deprived of the opportunity of utilizing enough previ‐ ous experience or investigating more salient features during SC in their solutions.

The main structure of ACOFS is shown in Figure 4, in which the detailed description can be found in [64]. However, at the first stage, while each of the *k* ants attempt to construct sub‐ set, it decides the subset size *r* first according to the subset size determination scheme. This scheme guides the ants to construct subsets in a reduced form. Then, it follows the conven‐ tional probabilistic transition rule [31] for selecting features as follows,

$$P\_i^k(t) = \begin{cases} \frac{\left[\tau\_i(t)\right]^\alpha \left[\eta\_i(t)\right]^\beta}{\sum\_{u \in j^k} \left[\tau\_u(t)\right]^\alpha \left[\eta\_u(t)\right]^\beta} \\ 0 \\ \text{if } i \in j^k \end{cases} \tag{1}$$

**Figure 5.** Representation of subset constructions by individual ants in ACO algorithm for FS. Here, *n1, n2,..., n5* repre‐

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

15

ACOFS imposes a restriction upon the subset size determination in determining the subset size, which is not an inherent constraint. Because, other than such restriction, likewise the conventional approaches, the above determination scheme works on an extended boundary after a certain range that results in ineffective solutions for FS. In order to solve another problem, that is to say, incomplete solutions to ACO-based FS algorithms; our ACOFS incor‐ porates a hybrid search strategy (i.e., a combination of the wrapper and filter approaches) by designing different rules to strengthen the global search ability of the ants. Incorporation of these two approaches results in an ACOFS that achieves high-quality solutions for FS from a given dataset. For better understanding, details about each aspect of ACOFS are now given

In an ACO algorithm, the activities of ants have significance for solving different combinato‐ rial optimization problems. Therefore, in solving the FS problem, guiding ants in the correct directions is very advantageous in this sense. In contrast to other existing ACO- based FS algorithms, ACOFS uses a straightforward mechanism to determine the subset size *r*. It em‐ ploys a simpler probabilistic formula with a constraint and a random function. The aim of using such a probabilistic formula is to provide information to the random function in such a way that the minimum subset size has a higher probability of being selected. This is im‐ portant in the sense that ACOFS can be guided toward a particular direction by the choice of which reduced-size subset of salient features is likely to be generated. The subset size deter‐

First, ACOFS uses a probabilistic formula modified from [32] to decide the size of a subset *r*

mination scheme used can be described in two ways as follows.

sent the individual features. As an example, one ant placed in *n1* constructed one subset { *n1, n2, n3*}.

in the following sections.

*(≤n)* as follows:

**6.1. Determination of Subset Size**

where *j* k is the set of feasible features that can be added to the partial solution, τ<sup>i</sup> and η<sup>i</sup> are the pheromone and heuristic values associated with feature *i* (i 1, 2,…..,n), and α and β are two parameters that determine the relative importance of the pheromone value and heuris‐ tic information. Note that, since the initial value of and for all individual features are equal, Eq. (1) shows random behaviour in SC initially. The approach used by the ants in construct‐ ing individual subsets during SC can be seen in Figure 5.

**Figure 5.** Representation of subset constructions by individual ants in ACO algorithm for FS. Here, *n1, n2,..., n5* repre‐ sent the individual features. As an example, one ant placed in *n1* constructed one subset { *n1, n2, n3*}.

ACOFS imposes a restriction upon the subset size determination in determining the subset size, which is not an inherent constraint. Because, other than such restriction, likewise the conventional approaches, the above determination scheme works on an extended boundary after a certain range that results in ineffective solutions for FS. In order to solve another problem, that is to say, incomplete solutions to ACO-based FS algorithms; our ACOFS incor‐ porates a hybrid search strategy (i.e., a combination of the wrapper and filter approaches) by designing different rules to strengthen the global search ability of the ants. Incorporation of these two approaches results in an ACOFS that achieves high-quality solutions for FS from a given dataset. For better understanding, details about each aspect of ACOFS are now given in the following sections.

## **6.1. Determination of Subset Size**

construction. However, a problem is that, SFS requires an appropriate stopping criterion to stop the SC. Otherwise, a number of irrelevant features may be included in the constructed subsets, and the solutions may not be effective. To solve this problem, some algorithms ([11,42,50,52]) define the size of a constructed subset by a fixed number for each iteration for all ants, which is incremented at a fixed rate for following iterations. This technique could be inefficient if the fixed number becomes too large or too small. Therefore, deciding the subset size within a reduced area may be a good step for constructing the subset while the ants tra‐

Second, ACOFS utilizes a hybrid search technique for selecting salient features that com‐ bines the advantages of the wrapper and filter approaches. An alternative name for such a search technique is "ACO search". This technique is designed with two sets of new rules for pheromone update and heuristic information measurement. The idea of these rules is based mainly on the random and probabilistic behaviors of ants while selecting features during SC. The aim is to provide the correct information to the features and to maintain an effective balance between exploitation and exploration of ants during SC. Thus, ACOFS achieves a strong search capability that helps to select a smaller number of the most salient features among a feature set. In contrast, the existing approaches ([11,31,42,49-52]) try to design rules without distin‐ guishing between the random and probabilistic behaviors of ants during the construction of a subset. Consequently, ants may be deprived of the opportunity of utilizing enough previ‐

ous experience or investigating more salient features during SC in their solutions.

tional probabilistic transition rule [31] for selecting features as follows,

The main structure of ACOFS is shown in Figure 4, in which the detailed description can be found in [64]. However, at the first stage, while each of the *k* ants attempt to construct sub‐ set, it decides the subset size *r* first according to the subset size determination scheme. This scheme guides the ants to construct subsets in a reduced form. Then, it follows the conven‐

> [ ] [ ] [ ] [ ] () ()

the pheromone and heuristic values associated with feature *i* (i 1, 2,…..,n), and α and β are two parameters that determine the relative importance of the pheromone value and heuris‐ tic information. Note that, since the initial value of and for all individual features are equal, Eq. (1) shows random behaviour in SC initially. The approach used by the ants in construct‐

a

*t t*

a

 h

 h  b

> b

> > (1)

and η<sup>i</sup> are

*i i*

() () ( )

t

*<sup>k</sup> u u <sup>i</sup> u j*

where *j* k is the set of feasible features that can be added to the partial solution, τ<sup>i</sup>

*P t t t*

t

0 *k*

ì ï ï = í ï ïî

Î

å

*k*

*if i j*

ing individual subsets during SC can be seen in Figure 5.

Î

verse through the feature space.

14 Ant Colony Optimization - Techniques and Applications

In an ACO algorithm, the activities of ants have significance for solving different combinato‐ rial optimization problems. Therefore, in solving the FS problem, guiding ants in the correct directions is very advantageous in this sense. In contrast to other existing ACO- based FS algorithms, ACOFS uses a straightforward mechanism to determine the subset size *r*. It em‐ ploys a simpler probabilistic formula with a constraint and a random function. The aim of using such a probabilistic formula is to provide information to the random function in such a way that the minimum subset size has a higher probability of being selected. This is im‐ portant in the sense that ACOFS can be guided toward a particular direction by the choice of which reduced-size subset of salient features is likely to be generated. The subset size deter‐ mination scheme used can be described in two ways as follows.

First, ACOFS uses a probabilistic formula modified from [32] to decide the size of a subset *r (≤n)* as follows:

$$P\_r = \frac{n-r}{\sum\_{l=1}^{l} (n-l)}\tag{2}$$

a NN classifier is not an inherent constraint; instead of NN, any other type of classifier, such as SVM, can be used as well for this evaluation tasks. In this study, the evaluation of the subset is represented by the percentage value of NN classification accuracy (CA) for the testing set. A detailed discussion of the evaluation mechanism integrated into ACOFS as follows.

First, during training the features of a constructed subset, the NN is trained partially for τ<sup>p</sup> epochs. Training is performed sequentially using the examples of a training set and a backpropagation (BP) learning algorithm [56]. The number of training epochs, τp, is specified by the user. In partial training, which was first used in conjunction with an evolutionary algo‐ rithm [17], the NN is trained for a fixed number of epochs, regardless of whether the algo‐

Second, check the progress of training to determine whether further training is necessary. If training error is reduced by a predefined amount, ε, after the τp training epochs (as men‐ tioned in Eq. (4)), we assume that the training process has been progressing well, and that further training is thus necessary, and then proceed to the first step. Otherwise, we go to the

2

<sup>=</sup> åå - (3)

(4)

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

17

æ ö <sup>=</sup> ç ÷ è ø (5)

next step for adding a hidden neuron. The error, *E*, is calculated as follows:

1 1

*p c*

t

validation set. The CA can be calculated as follows:

NN and *Pv* is the total number of patterns in the validation set.

*P C*

<sup>1</sup> ( ( ) ( )) <sup>2</sup>

*E op tp* = =

( ) ( ), , 2 , 3, . *E t Et t <sup>p</sup>* - + > = ¼¼

 ttt

On the other hand, in the case of adding the hidden neuron, the addition operation is guid‐ ed by computing the contributions of the current hidden neurons. If the contributions are high, then it is assumed that another one more hidden neuron is required. Otherwise, freeze the extension of the hidden layer size for further partial training of the NN. Computation of the contribution of previously added hidden neurons in the NN is based on the CA of the

> 100 *vc v*

*P*

where *Pvc* refers to the number of examples in the validation set correctly classified by the

*<sup>P</sup> CA*

 e *c c*

where *oc(p)* and *tc(p)* are the actual and target responses of the c-th output neuron for the training example *p*. The symbols *P* and *C* represent the total number of examples and of out‐ put neurons in the training set, respectively. The reduction of training error can be described

rithm has converged on a result.

as follows:

Here, *Pr* is maximized linearly as r is minimized, and the value of r is restricted by a con‐ straint, namely, 2 ≤ *r* ≤ δ. Therefore, *r* 2, 3,……,δ, where δ = μ x*n* and *l* = *n* - *r*. Here, μ is a user-specified parameter that controls δ. Its value depends on the *n* for a given dataset. If is closed to *n*, then the search space of finding the salient features becomes larger, which cer‐ tainly causes a high computational cost, and raises the risk that ineffective feature subsets might be generated. Since the aim of ACOFS is to select a subset of salient features within a smaller range, the length of the selected subset is preferred to be between 3 and 12 depend‐ ing on the given dataset. Thus, is set as [0.1, 0.6]. Then, normalize all the values of *Pr* in such a way that the summation of all possible values of *Pr* is equal to 1.

Second, ACOFS utilizes all the values of *Pr* for the random selection scheme mentioned in Figure 6 to determine the size of the subset, *r* eventually. This selection scheme is almost similar to the classical roulette wheel procedure [55].

**Figure 6.** Pseudo-code of the random selection procedure.

## **6.2. Subset Evaluation**

Subset evaluation has a significant role, along with other basic operations of ACO for select‐ ing salient features in FS tasks. In common practices, filter or wrapper approaches are in‐ volved for evaluation tasks. However, it is found in [7] that the performance of a wrapper approach is always better than that of a filter approach. Therefore, the evaluation of the constructed subsets is inspired by a feed-forward NN training scheme for each iteration. Such a NN classifier is not an inherent constraint; instead of NN, any other type of classifier, such as SVM, can be used as well for this evaluation tasks. In this study, the evaluation of the subset is represented by the percentage value of NN classification accuracy (CA) for the testing set. A detailed discussion of the evaluation mechanism integrated into ACOFS as follows.

1 ( )

=

*n i*

Here, *Pr* is maximized linearly as r is minimized, and the value of r is restricted by a con‐ straint, namely, 2 ≤ *r* ≤ δ. Therefore, *r* 2, 3,……,δ, where δ = μ x*n* and *l* = *n* - *r*. Here, μ is a user-specified parameter that controls δ. Its value depends on the *n* for a given dataset. If is closed to *n*, then the search space of finding the salient features becomes larger, which cer‐ tainly causes a high computational cost, and raises the risk that ineffective feature subsets might be generated. Since the aim of ACOFS is to select a subset of salient features within a smaller range, the length of the selected subset is preferred to be between 3 and 12 depend‐ ing on the given dataset. Thus, is set as [0.1, 0.6]. Then, normalize all the values of *Pr* in such

Second, ACOFS utilizes all the values of *Pr* for the random selection scheme mentioned in Figure 6 to determine the size of the subset, *r* eventually. This selection scheme is almost

Subset evaluation has a significant role, along with other basic operations of ACO for select‐ ing salient features in FS tasks. In common practices, filter or wrapper approaches are in‐ volved for evaluation tasks. However, it is found in [7] that the performance of a wrapper approach is always better than that of a filter approach. Therefore, the evaluation of the constructed subsets is inspired by a feed-forward NN training scheme for each iteration. Such

å - (2)

*i*

*n r <sup>P</sup>*


*r l*

a way that the summation of all possible values of *Pr* is equal to 1.

similar to the classical roulette wheel procedure [55].

16 Ant Colony Optimization - Techniques and Applications

**Figure 6.** Pseudo-code of the random selection procedure.

**6.2. Subset Evaluation**

First, during training the features of a constructed subset, the NN is trained partially for τ<sup>p</sup> epochs. Training is performed sequentially using the examples of a training set and a backpropagation (BP) learning algorithm [56]. The number of training epochs, τp, is specified by the user. In partial training, which was first used in conjunction with an evolutionary algo‐ rithm [17], the NN is trained for a fixed number of epochs, regardless of whether the algo‐ rithm has converged on a result.

Second, check the progress of training to determine whether further training is necessary. If training error is reduced by a predefined amount, ε, after the τp training epochs (as men‐ tioned in Eq. (4)), we assume that the training process has been progressing well, and that further training is thus necessary, and then proceed to the first step. Otherwise, we go to the next step for adding a hidden neuron. The error, *E*, is calculated as follows:

$$E = \frac{1}{2} \sum\_{p=1}^{P} \sum\_{c=1}^{C} \left( o\_c(p) - t\_c(p) \right)^2 \tag{3}$$

where *oc(p)* and *tc(p)* are the actual and target responses of the c-th output neuron for the training example *p*. The symbols *P* and *C* represent the total number of examples and of out‐ put neurons in the training set, respectively. The reduction of training error can be described as follows:

$$E\left(t\right) - E\left(t + \tau\_p\right) > \varepsilon, \ t = \tau, \ 2\tau, \ 3\tau, \dots \tag{4}$$

On the other hand, in the case of adding the hidden neuron, the addition operation is guid‐ ed by computing the contributions of the current hidden neurons. If the contributions are high, then it is assumed that another one more hidden neuron is required. Otherwise, freeze the extension of the hidden layer size for further partial training of the NN. Computation of the contribution of previously added hidden neurons in the NN is based on the CA of the validation set. The CA can be calculated as follows:

$$CA = 100 \left( \frac{P\_{\rm rc}}{P\_{\rm v}} \right) \tag{5}$$

where *Pvc* refers to the number of examples in the validation set correctly classified by the NN and *Pv* is the total number of patterns in the validation set.

At this stage, the ACOFS measures error and CA in the validation set using Eqs. (3) and (5) after every τp epochs of training. It then terminates training when either the validation CA decreases or the validation error increases or both are satisfied for *T* successive times, which are measured at the end of each of *T* successive τp epochs of training [16]. Finally, the testing accuracy of the current NN architecture is checked with selected hidden neurons, using the example of the testing set according to Eq. (5).

ability of utilizing previous successful moves and of expressing desirability of moves to‐ wards a high-quality solution in FS. This search process is composed of two sets of newly designed rules, such as, the pheromone update rule and the heuristic information rule,

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

19

Pheromone updating in the ACO algorithm is a vital aspect of FS tasks. Ants exploit features in SC that have been most suitable in prior iterations through the pheromone update rule, consisting of local update and global update. More precisely, global update applies only to those features that are a part of the best feature subset in the current iteration. It allows the features to receive a large amount of pheromone update in equal shares. The aim of global update is to encourage ants to construct subsets with a significant CA. In contrast to the global update, local update not only causes the irrelevant features to be less desirable, but also helps ants to select those features, which have never been explored before. This update either decreases the strength of the pheromone trail or maintains the same level, based on

In ACOFS, a set of new pheromone update rules has been designed on the basis of two basic behaviors (that is to say, random and probabilistic) of ants during SCs. These rules have been modified from the standard rule in [49] and [53], which aims to provide a proper bal‐ ance between exploration and exploitation of ants for the next iteration. Exploration is re‐ ported to prohibit ants from converging on a common path. Actual ants also have a similar behavioral characteristic [61], which is an attractive property. If different paths can be ex‐ plored by different ants, then there is a higher probability that one of the ants may find a

Random case: The rule presenting in Eq. (6) is modified only in the second term, which is

lowing constructions. The reason is that during the random behavior of the transition rule, the features are being chosen to be selected randomly in practice, instead of according to their experiences. Thus, to provide an exploration facility for the ants, the modification has

. Such a modification provides for sufficient exploration of the ants for the fol‐

(*t*) + *eΔτ<sup>i</sup>*

(*t*)

(*t*)

*k*

*<sup>g</sup>*(*t*)

(6)

is the count for the specific se‐

(*t*)is the amount of pheromone received by the

which are further described as follows.

whether a particular feature has been selected.

better solution, as opposed to all ants converging on the same tour.

**6.4.1. Pheromone Update Rule**

divided by *mi*

been adopted as follows:

*τi*

*Δτ<sup>i</sup> k*

*Δτ<sup>i</sup>*

lected feature i in the current iteration. *Δτ<sup>i</sup>*

(*t* + 1)=(1−*ρ*)*τ<sup>i</sup>*

(*t*)={*γ*(*<sup>S</sup> <sup>k</sup>*

*<sup>g</sup>*(*t*)={*γ*(*<sup>S</sup> <sup>l</sup>*

Here, i refers to the number of feature (*i* 1, 2,……*n*), and *mi*

(*t*) + 1 *mi* ∑ *k*=1 *n Δτ<sup>i</sup> k*

0 *otherwise*

0 *otherwise*

(*t*)) *f i* ∈*S <sup>l</sup>*

(*t*)) *if i* ∈*S <sup>k</sup>*

The idea behind this evaluation process is straightforward: minimize the training error, and maximize the validation accuracy. To achieve these goals, ACOFS uses a constructive ap‐ proach to determine NN architectures automatically. Although other approaches, such as, pruning [57] and regularization [58] could be used in ACOFS, the selection of an initial NN architecture in these approaches is difficult [59]. This selection, however, is simple in the case of a constructive approach. For example, the initial network architecture in a construc‐ tive approach can consist of a hidden layer with one neuron. On the other hand, an input layer is set with r neurons, and an output layer with c neurons. More precisely, among r and c neurons, one neuron for each feature of the corresponding subset and one neuron for each class, respectively. If this minimal architecture cannot solve the given task, hidden neurons can be added one by one. Due to the simplicity of initialization, the constructive approach is used widely in multi-objective learning tasks [60].

## **6.3 Best Subset Selection**

Generally, finding salient subsets with a reduced size is always preferable due to the low cost in hardware implementation and less time consumed in operation. Unlike other exist‐ ing algorithms (e.g., [49,50]), in ACOFS, the best salient feature subset is recognized eventu‐ ally as a combination of the local best and global best selections as follows:

Local best selection: Determine the local best subset, *Sl (t)* for a particular *t* (*t* ∈ 1, 2, 3,…..) iteration according to Max(*Sk (t)*), where *Sk (t)* is the number of subsets constructed by k ants, and *k* 1, 2,…,*n*.

Global best selection: Determine the global best subset (*Sg* ), that is, the best subset of salient features from the all local best solutions in such a way that *Sg* is compared with the currently decided local best subset, *Sl (t)* at every t iteration by their classification performances. If *Sl (t)* is found better, then *Sl (t)* is replaced by *Sg* . One thing is that, during this selection process, if the performances are found similar at any time, then select the one among the two, i.e., *Sg* and *Sl (t)* as a best subset that has reduced size. Note that, at the first iteration *Sl (t)* is consid‐ ered as *Sg* .

## **6.4 Hybrid Search Process**

The new hybrid search technique, incorporated in ACOFS, consists of wrapper and filter ap‐ proaches. A significant advantage of this search technique is that ants achieve a significant ability of utilizing previous successful moves and of expressing desirability of moves to‐ wards a high-quality solution in FS. This search process is composed of two sets of newly designed rules, such as, the pheromone update rule and the heuristic information rule, which are further described as follows.

## **6.4.1. Pheromone Update Rule**

At this stage, the ACOFS measures error and CA in the validation set using Eqs. (3) and (5) after every τp epochs of training. It then terminates training when either the validation CA decreases or the validation error increases or both are satisfied for *T* successive times, which are measured at the end of each of *T* successive τp epochs of training [16]. Finally, the testing accuracy of the current NN architecture is checked with selected hidden neurons, using the

The idea behind this evaluation process is straightforward: minimize the training error, and maximize the validation accuracy. To achieve these goals, ACOFS uses a constructive ap‐ proach to determine NN architectures automatically. Although other approaches, such as, pruning [57] and regularization [58] could be used in ACOFS, the selection of an initial NN architecture in these approaches is difficult [59]. This selection, however, is simple in the case of a constructive approach. For example, the initial network architecture in a construc‐ tive approach can consist of a hidden layer with one neuron. On the other hand, an input layer is set with r neurons, and an output layer with c neurons. More precisely, among r and c neurons, one neuron for each feature of the corresponding subset and one neuron for each class, respectively. If this minimal architecture cannot solve the given task, hidden neurons can be added one by one. Due to the simplicity of initialization, the constructive approach is

Generally, finding salient subsets with a reduced size is always preferable due to the low cost in hardware implementation and less time consumed in operation. Unlike other exist‐ ing algorithms (e.g., [49,50]), in ACOFS, the best salient feature subset is recognized eventu‐

the performances are found similar at any time, then select the one among the two, i.e., *Sg*

The new hybrid search technique, incorporated in ACOFS, consists of wrapper and filter ap‐ proaches. A significant advantage of this search technique is that ants achieve a significant

*(t)* as a best subset that has reduced size. Note that, at the first iteration *Sl*

*(t)* for a particular *t* (*t* ∈ 1, 2, 3,…..)

), that is, the best subset of salient

is compared with the currently

*(t)*

*(t)* is consid‐

*(t)* is the number of subsets constructed by k ants,

. One thing is that, during this selection process, if

*(t)* at every t iteration by their classification performances. If *Sl*

ally as a combination of the local best and global best selections as follows:

*(t)*), where *Sk*

example of the testing set according to Eq. (5).

18 Ant Colony Optimization - Techniques and Applications

used widely in multi-objective learning tasks [60].

Local best selection: Determine the local best subset, *Sl*

Global best selection: Determine the global best subset (*Sg*

features from the all local best solutions in such a way that *Sg*

*(t)* is replaced by *Sg*

**6.3 Best Subset Selection**

iteration according to Max(*Sk*

decided local best subset, *Sl*

**6.4 Hybrid Search Process**

is found better, then *Sl*

.

and *k* 1, 2,…,*n*.

and *Sl*

ered as *Sg*

Pheromone updating in the ACO algorithm is a vital aspect of FS tasks. Ants exploit features in SC that have been most suitable in prior iterations through the pheromone update rule, consisting of local update and global update. More precisely, global update applies only to those features that are a part of the best feature subset in the current iteration. It allows the features to receive a large amount of pheromone update in equal shares. The aim of global update is to encourage ants to construct subsets with a significant CA. In contrast to the global update, local update not only causes the irrelevant features to be less desirable, but also helps ants to select those features, which have never been explored before. This update either decreases the strength of the pheromone trail or maintains the same level, based on whether a particular feature has been selected.

In ACOFS, a set of new pheromone update rules has been designed on the basis of two basic behaviors (that is to say, random and probabilistic) of ants during SCs. These rules have been modified from the standard rule in [49] and [53], which aims to provide a proper bal‐ ance between exploration and exploitation of ants for the next iteration. Exploration is re‐ ported to prohibit ants from converging on a common path. Actual ants also have a similar behavioral characteristic [61], which is an attractive property. If different paths can be ex‐ plored by different ants, then there is a higher probability that one of the ants may find a better solution, as opposed to all ants converging on the same tour.

Random case: The rule presenting in Eq. (6) is modified only in the second term, which is divided by *mi* . Such a modification provides for sufficient exploration of the ants for the fol‐ lowing constructions. The reason is that during the random behavior of the transition rule, the features are being chosen to be selected randomly in practice, instead of according to their experiences. Thus, to provide an exploration facility for the ants, the modification has been adopted as follows:

$$\begin{aligned} \tau\_i(t+1) &= (1-\rho)\tau\_i(t) + \frac{1}{m\_i} \sum\_{k=1}^n \Delta \tau\_i^k(t) + e\Delta \tau\_i^g(t) \\ \Delta \tau\_i^k(t) &= \begin{vmatrix} \mathcal{V}(\mathcal{S}^k(t)) & \text{if } i \in \mathcal{S}^k(t) \\ 0 & \text{otherwise} \end{vmatrix} \\ \Delta \tau\_i^g(t) &= \begin{vmatrix} \mathcal{V}(\mathcal{S}^l(t)) & f \text{ i } i \in \mathcal{S}^l(t) \\ 0 & \text{otherwise} \end{vmatrix} \end{aligned} \tag{6}$$

Here, i refers to the number of feature (*i* 1, 2,……*n*), and *mi* is the count for the specific se‐ lected feature i in the current iteration. *Δτ<sup>i</sup> k* (*t*)is the amount of pheromone received by the local update for feature i, which is included in Sk(t) at iteration t. Similarly, the global up‐ date,*Δτ<sup>i</sup> <sup>g</sup>*(*t*) , is the amount of pheromone for feature i that is included in Sl (t). Finally, *ρ* and *e* refer to the pheromone decay value, and elitist parameter, respectively.

Probabilistic case: Eq. (7) shows the modified pheromone rule for the probabilistic case. The rule is similar to the original form, but actual modification has been made only for the inner portions of the second and third terms.

$$\begin{aligned} \tau\_i(t+1) &= (1-\rho)\tau\_i(t) + \sum\_{k=1}^{u} \Delta \tau\_i^k(t) + e\Delta \tau\_i^\xi(t) \\ \Delta \tau\_i^k(t) &= \begin{vmatrix} \gamma^\prime(S^{-k}\{t\}) \times \lambda\_i & f \ i \in S^{-k}\{t\} \\ 0 & \text{otherwise} \end{vmatrix} \\ \Delta \tau\_i^\xi(t) &= \begin{vmatrix} \gamma^\prime(S^{-l}\{t\}) \times \lambda\_i & f \ i \in S^{-l}\{t\} \\ 0 & \text{otherwise} \end{vmatrix} \end{aligned} \tag{7}$$

( )

Probabilistic case: In the following iterations, when ants complete the feature SCs on the ba‐ sis of the probabilistic behavior, the following formula is used to estimate for all features *i* :

*k k n*

*S t e if i S t*

( )

( ( )) (1 ) ( )

*k k n*

the subsets that are constructed within the currently completed iterations, except for the ini‐

on measurement of information gain can be seen in [64]. However, the aim of including is

Thus, different features may get an opportunity to be selected in the SC for different itera‐ tions, thus definitely enhancing the exploration behavior of ants. Furthermore, one addition‐ al exponential term has been multiplied by these rules in aiming for a reduced size subset.

In order to understand the actual computational cost of a method, an exact analysis of com‐ putational complexity is required. In this sense, the big-O notation [62] is a prominent ap‐ proach in terms of analyzing computational complexity. Thus, ACOFS here uses the above process for this regard. There are seven basic steps in ACOFS, namely, information gain measurement, subset construction, subset evaluation, termination criterion, subset determina‐ tion, pheromone update, and heuristic information measurement. The following paragraphs present the computational complexity of ACOFS in order to show that inclusion of different

techniques does not increase computational complexity in selecting a feature subset.

once, specifically, before starting the FS process.

**i.** Information Gain Measurement: In this step, information gain (IG) for each feature

is measured according to [64]. If the number of total features for a given dataset is n, then the cost of measuring IG is O(n × P), where P denotes the number of exam‐ ples in the given dataset. It is further mentioning that this cost is required only

 j*m S t e if i S t* -

and φ<sup>i</sup>

=+ Î å (8)

= +Î å (9)

refers to the number of a particular selected feature i that is a part of

refers to the information gain for feature *i*. A detailed discussion

is to provide a proper exploitation capability

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

21

<sup>1</sup> ( ( ))(1 ) ( )

*<sup>k</sup> S t <sup>n</sup>*

 j-

*<sup>k</sup> S t <sup>n</sup>*

 l

**a.** reducing the greediness of some particular feature i in n during SCs, and

Here, is the user specified parameter that controls the exponential term.

1

1

=

**b.** increasing the diversity between the features in n.

 fg

*i ii a i k*

=

*i k*

*m* hg

*i*

h

tial iteration. The aim of multiplying *mi*

**6.5. Computational Complexity**

based on the following two factors:

In these two rules, φ<sup>i</sup>

for the ants during SCs. λ<sup>i</sup>

Here, feature *i* is rewarded by the global update, and Δ τ<sup>g</sup> is in the third term, where *i Sl (t)i*. It is important to emphasize that, *i* is maintained strictly here. That is, *i* at iteration *i tt i* is compared with *i* at iteration (*tt -τp*), where *tt* = *t + τp, and τp* 1, 2, 3,……In this regard, if γ(*Sl* (*tt* )) max((γ*Sl (ttp*ε)),), where ε refers to the number of CAs for those local best subsets that maintain |*Sl (tt* )| = |*Sl (ttp*)|, then a number of features, *nc* are ignored to get Δτ*<sup>g</sup>* , since those features are available in *Sl (tt* ), which causes to degrade its performance. Here, *nc* ∈*Sl (tt )* but *nc*∉ *Slb*, where *Slb* provides max((γ*Sl (ttp*)),), and |*Sl (tt )*| implies the size of the subset *Sl (tt )*. Note that, the aim of this restriction is to provide Δτ<sup>g</sup> only to those features that are actually significant, because, global update has a vital role in selecting the salient features in ACOFS. Distinguish such salient features and allow them to receive Δτ*<sup>g</sup>* by imposing the above re‐ striction.

### **6.4.2. Heuristic Information Measurement**

A heuristic value,*η* , for each feature generally represents the attractiveness of the features, and depends on the dependency degree. It is therefore necessary to use ; otherwise, the algo‐ rithm may become too greedy, and ultimately a better solution may not be found [31]. Here, a set of new rules is introduced for measuring heuristic information using the advantages of wrapper and filter tools. More precisely, the outcome of subset evaluations using the NN is used here as a wrapper tool, whereas the value of information gain for each feature is used as a filter tool. These rules are, therefore, formulated according to the random and probabil‐ istic behaviors of the ants, which are described as follows.

Random case: In the initial iteration, while ants are involved in constructing the feature sub‐ sets randomly, the heuristic value of all features i can be estimated as follows:

$$\eta\_i = \frac{1}{m\_i} \sum\_{k=1}^n \gamma(S^k(t)) (1 + \rho \underline{e^{-\frac{\left\| S^k(t) \right\|}})} \qquad\qquad\text{if } i \in S^k(t) \tag{8}$$

Probabilistic case: In the following iterations, when ants complete the feature SCs on the ba‐ sis of the probabilistic behavior, the following formula is used to estimate for all features *i* :

$$\eta\_i = m\_i \phi\_i \sum\_{k=1}^n \gamma\_a(S^k(t)) \mathcal{J}\_\mathbf{r} (1 + \rho e^{-\frac{\left\| S^k(t) \right\|}{a}}) \qquad \text{if } i \in S^k(t) \tag{9}$$

In these two rules, φ<sup>i</sup> refers to the number of a particular selected feature i that is a part of the subsets that are constructed within the currently completed iterations, except for the ini‐ tial iteration. The aim of multiplying *mi* and φ<sup>i</sup> is to provide a proper exploitation capability for the ants during SCs. λ<sup>i</sup> refers to the information gain for feature *i*. A detailed discussion on measurement of information gain can be seen in [64]. However, the aim of including is based on the following two factors:


Thus, different features may get an opportunity to be selected in the SC for different itera‐ tions, thus definitely enhancing the exploration behavior of ants. Furthermore, one addition‐ al exponential term has been multiplied by these rules in aiming for a reduced size subset. Here, is the user specified parameter that controls the exponential term.

## **6.5. Computational Complexity**

local update for feature i, which is included in Sk(t) at iteration t. Similarly, the global up‐

Probabilistic case: Eq. (7) shows the modified pheromone rule for the probabilistic case. The rule is similar to the original form, but actual modification has been made only for the inner

(*t*) + *eΔτ<sup>i</sup>*

(*t*)

(*t*)

Here, feature *i* is rewarded by the global update, and Δ τ<sup>g</sup> is in the third term, where *i Sl*

It is important to emphasize that, *i* is maintained strictly here. That is, *i* at iteration *i tt*

*(ttp*)),), and |*Sl*

Note that, the aim of this restriction is to provide Δτ<sup>g</sup> only to those features that are actually significant, because, global update has a vital role in selecting the salient features in ACOFS.

A heuristic value,*η* , for each feature generally represents the attractiveness of the features, and depends on the dependency degree. It is therefore necessary to use ; otherwise, the algo‐ rithm may become too greedy, and ultimately a better solution may not be found [31]. Here, a set of new rules is introduced for measuring heuristic information using the advantages of wrapper and filter tools. More precisely, the outcome of subset evaluations using the NN is used here as a wrapper tool, whereas the value of information gain for each feature is used as a filter tool. These rules are, therefore, formulated according to the random and probabil‐

Random case: In the initial iteration, while ants are involved in constructing the feature sub‐

sets randomly, the heuristic value of all features i can be estimated as follows:

*<sup>g</sup>*(*t*)

*(ttp*ε)),), where ε refers to the number of CAs for those local best subsets that

), which causes to degrade its performance. Here, *nc* ∈*Sl*

*(ttp*)|, then a number of features, *nc* are ignored to get Δτ*<sup>g</sup>*

*(tt*

(t). Finally, *ρ* and

(7)

*(t)i*.

*i* is

, since those

*(tt )* but

> *(tt )*.

= *t + τp, and τp* 1, 2, 3,……In this regard, if

*)*| implies the size of the subset *Sl*

by imposing the above re‐

*<sup>g</sup>*(*t*) , is the amount of pheromone for feature i that is included in Sl

*e* refer to the pheromone decay value, and elitist parameter, respectively.

(*t*) + ∑ *k*=1 *n Δτ<sup>i</sup> k*

0 *otherwise*

0 *otherwise*

(*t*))×*λ<sup>i</sup> <sup>f</sup> <sup>i</sup>* <sup>∈</sup>*<sup>S</sup> <sup>k</sup>*

(*t*))×*λ<sup>i</sup> <sup>f</sup> <sup>i</sup>* <sup>∈</sup>*<sup>S</sup> <sup>l</sup>*

date,*Δτ<sup>i</sup>*

γ(*Sl* (*tt*

maintain |*Sl*

striction.

portions of the second and third terms.

20 Ant Colony Optimization - Techniques and Applications

(*t* + 1)=(1−*ρ*)*τ<sup>i</sup>*

(*t*)={*γ*(*<sup>S</sup> <sup>k</sup>*

*<sup>g</sup>*(*t*)={*γ*(*<sup>S</sup> <sup>l</sup>*

compared with *i* at iteration (*tt -τp*), where *tt*

*(tt*

Distinguish such salient features and allow them to receive Δτ*<sup>g</sup>*

istic behaviors of the ants, which are described as follows.

*τi*

*Δτ<sup>i</sup> k*

*Δτ<sup>i</sup>*

)) max((γ*Sl*

*(tt* )| = |*Sl*

features are available in *Sl*

*nc*∉ *Slb*, where *Slb* provides max((γ*Sl*

**6.4.2. Heuristic Information Measurement**

In order to understand the actual computational cost of a method, an exact analysis of com‐ putational complexity is required. In this sense, the big-O notation [62] is a prominent ap‐ proach in terms of analyzing computational complexity. Thus, ACOFS here uses the above process for this regard. There are seven basic steps in ACOFS, namely, information gain measurement, subset construction, subset evaluation, termination criterion, subset determina‐ tion, pheromone update, and heuristic information measurement. The following paragraphs present the computational complexity of ACOFS in order to show that inclusion of different techniques does not increase computational complexity in selecting a feature subset.

**i.** Information Gain Measurement: In this step, information gain (IG) for each feature is measured according to [64]. If the number of total features for a given dataset is n, then the cost of measuring IG is O(n × P), where P denotes the number of exam‐ ples in the given dataset. It is further mentioning that this cost is required only once, specifically, before starting the FS process.

**ii.** Subset Construction: Subset construction shows two different types of phenomena according to Eq. (1). For the random case, if the total number of features for a given dataset is n, then the cost of an ant constructing a single subset is O(*r* × *n*). Here,*r* refers to the size of subsets. Since the total number of ants is k, the computational cost is O(*r* × *k* × *n*) operations. However, in practice, *r* <*n* ; hence, the cost becomes O(*k* × *n*) ≈ O(n<sup>2</sup> ). In terms of the probabilistic case, ACOFS uses the Eq. (1) for select‐ ing the features in SC, which shows a constant computational cost of O(1) for each ant. If the number of ants is *k*, then the computational cost becomes O(*k*).

**v.** Subset determination: ACOFS requires two steps to determine the best subset,

**vi.** Pheromone update rule: ACOFS executes Eqs. (6) and (7) to update the pheromone

**vii.** Heuristic information measurement: Similar to the pheromone update operation,

In accordance with the above analysis, summarize the different parts of the entire computa‐

to the cost of training a fixed network architecture using BP [56], and that the total cost is similar to that of other existing ACO-based FS approaches [42]. Thus, it can be said that in‐ corporation of several techniques in ACOFS does not increase the computational cost.

The performance of ACOFS has been presented in this context on eight well-known bench‐ mark classification datasets, including the breast cancer, glass, vehicle, thyroid, ionosphere, credit card, sonar, and gene datasets; and one gene expressional classification dataset, name‐ ly, the colon cancer dataset. These datasets have been the subject of many studies in NNs and machine learning, covering examples of small, medium, high, and very high-dimen‐ sional datasets. The characteristics of these datasets, summarized in Table 1, show a consid‐ erable diversity in the number of features, classes, and examples. Now, the experimental details, results, roles of subset size determination scheme in FS, the user specified parameter μ in FS, and hybrid search in FS are described in this context. Finally, one additional experi‐ ment on ACOFS concerning performance for FS over real-world datasets mixed with some noisy features, and comparisons of ACOFS with other existing works, are also discussed in

In order to ascertain the effectiveness of ACOFS for FS, extensive experiments have been carried out on ACOFS that are adapted from [64]. To accomplish the FS task suitably in ACOFS, two

the computational cost becomes O(n). Note that, O(n) O(*k* × τp × *M* × *P*<sup>t</sup>

) + O(*k*) + O(*k* × τp × *M* × *P*<sup>t</sup>

constant operations, which is less than O(*k* × τp × M × Pt

× *W*). Hence, the total computational cost of ACOFS is O((τp × *M* × *P*<sup>t</sup>

*M* × *P*<sup>t</sup>

tional cost as O(*n* × *P*) + O(*n*<sup>2</sup>

**7. Experimental Studies**

this context.

**7.1. Experimental Setup**

the first and second terms, namely, *n* × *P* and × *n*<sup>2</sup>

once, and are much less than *k* × τp × *M* × *P*<sup>t</sup>

× *W*).

namely, finding the local best subset and the global best subset. In order to find the local best subset in each iteration t, ACOFS requires O(k) operations. The total com‐ putational cost for finding the local best subsets thus becomes O(k × t). In order to find the global best subset, ACOFS requires O(1) operations. Thus, the total compu‐ tational cost for subset determination becomes O(*k* × *t*), which is less than O(*k* × τp ×

trails for each feature in terms of the random and probabilistic cases. Since the number of features is n for a given learning dataset, the computation takes O(n)

ACOFS uses Eqs. (8) and (9) to update the heuristic value of n features. Thereafter,

× *W*).

× *W*).

× *W*), which is similar

× *W*). It is important to note here that

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

23

, are the cost of operations performed only

× *P*. On the other hand, O(k) << O(*k* × τp × *M* × *P*<sup>t</sup>


The aforementioned computation is done for a partial training session consisting of τ<sup>p</sup> ep‐ ochs. In general, ACOFS requires a number, say *M*, of such partial training sessions for eval‐ uating a single subset. Thus, the cost becomes O(τp × *M* × *P*<sup>t</sup> × *W*). Furthermore, by considering all subsets, the computational cost required is O(*k* × τp × *M* × *P*<sup>t</sup> × *W*) operations.

**iv.** Termination criterion: A termination criterion is employed in ACOFS for terminat‐ ing the FS process eventually. Since only one criterion is required to be executed (i.e., the algorithm achieves a predefined accuracy, or executes a iteration threshold, I), the execution of such a criterion requires a constant computational cost of O(1).


In accordance with the above analysis, summarize the different parts of the entire computa‐ tional cost as O(*n* × *P*) + O(*n*<sup>2</sup> ) + O(*k*) + O(*k* × τp × *M* × *P*<sup>t</sup> × *W*). It is important to note here that the first and second terms, namely, *n* × *P* and × *n*<sup>2</sup> , are the cost of operations performed only once, and are much less than *k* × τp × *M* × *P*<sup>t</sup> × *P*. On the other hand, O(k) << O(*k* × τp × *M* × *P*<sup>t</sup> × *W*). Hence, the total computational cost of ACOFS is O((τp × *M* × *P*<sup>t</sup> × *W*), which is similar to the cost of training a fixed network architecture using BP [56], and that the total cost is similar to that of other existing ACO-based FS approaches [42]. Thus, it can be said that in‐ corporation of several techniques in ACOFS does not increase the computational cost.

## **7. Experimental Studies**

**ii.** Subset Construction: Subset construction shows two different types of phenomena

ant. If the number of ants is *k*, then the computational cost becomes O(*k*).

**iii.** In ACOFS, five types of operations are necessarily required for evaluating a single

hidden neuron. The subsequent paragraphs describe these types in details.

**a.** Partial training: In case of training, standard BP [56] is used. During training each epoch BP takes O(W) operations for one example. Here, W is the number of weights in the current NN. Thus, training all examples in the training set for τ<sup>p</sup> epochs requires O(τ<sup>p</sup> ×

**b.** Stopping criterion: During training, the stopping criterion uses either validation accura‐ cy or validation errors for subset evaluation. Since training error is computed as a part of the training process, evaluating the termination criterion takes O(*P*v × *W*) operations,

**c.** Further training: ACOFS uses Eq. (4) to check whether further training is necessary. The evaluation of Eq. (4) takes a constant number of computational operations O(1), since

**d.** Contribution computation: ACOFS computes the contribution of the added hidden neu‐ ron using Eq. (5). This computation takes O(*P*v) operations, which is less than O(τp × *P*<sup>t</sup>

**e.** Addition of a hidden neuron: The computational cost for adding a hidden neuron is O(*r* × *c*) for initializing the connection weights, where r is the number of features in the cur‐ rent subset, and c is the number of neurons in the output layer. Also note that O(*r* + *c*) <

The aforementioned computation is done for a partial training session consisting of τ<sup>p</sup> ep‐ ochs. In general, ACOFS requires a number, say *M*, of such partial training sessions for eval‐

**iv.** Termination criterion: A termination criterion is employed in ACOFS for terminat‐

execution of such a criterion requires a constant computational cost of O(1).

ing the FS process eventually. Since only one criterion is required to be executed (i.e., the algorithm achieves a predefined accuracy, or executes a iteration threshold, I), the

uating a single subset. Thus, the cost becomes O(τp × *M* × *P*<sup>t</sup>

considering all subsets, the computational cost required is O(*k* × τp × *M* × *P*<sup>t</sup>

where *P*v denotes the number of examples in the validation set. Since *P*v< *P*<sup>t</sup>

the error values used in Eq. (3) have already been evaluated during training.

O(*k* × *n*) ≈ O(n<sup>2</sup>

22 Ant Colony Optimization - Techniques and Applications

× *W*) operations, where *P*<sup>t</sup>

× *W*).

*P*t

*W*).

O(p × *P*<sup>t</sup>

× *W*).

*W*) < O(p × *k*Pt

according to Eq. (1). For the random case, if the total number of features for a given dataset is n, then the cost of an ant constructing a single subset is O(*r* × *n*). Here,*r* refers to the size of subsets. Since the total number of ants is k, the computational cost is O(*r* × *k* × *n*) operations. However, in practice, *r* <*n* ; hence, the cost becomes

ing the features in SC, which shows a constant computational cost of O(1) for each

subset using a constructive NN training scheme: (a) partial training, (b) stopping criterion, (c) further training, (d) contribution computation, and (e) addition of a

). In terms of the probabilistic case, ACOFS uses the Eq. (1) for select‐

denotes the number of examples in the training set.

, O(*P* × v ×

× *W*). Furthermore, by

× *W*) operations.

×

The performance of ACOFS has been presented in this context on eight well-known bench‐ mark classification datasets, including the breast cancer, glass, vehicle, thyroid, ionosphere, credit card, sonar, and gene datasets; and one gene expressional classification dataset, name‐ ly, the colon cancer dataset. These datasets have been the subject of many studies in NNs and machine learning, covering examples of small, medium, high, and very high-dimen‐ sional datasets. The characteristics of these datasets, summarized in Table 1, show a consid‐ erable diversity in the number of features, classes, and examples. Now, the experimental details, results, roles of subset size determination scheme in FS, the user specified parameter μ in FS, and hybrid search in FS are described in this context. Finally, one additional experi‐ ment on ACOFS concerning performance for FS over real-world datasets mixed with some noisy features, and comparisons of ACOFS with other existing works, are also discussed in this context.

## **7.1. Experimental Setup**

In order to ascertain the effectiveness of ACOFS for FS, extensive experiments have been carried out on ACOFS that are adapted from [64]. To accomplish the FS task suitably in ACOFS, two basic steps need to be considered, namely, dimensionality reduction of the datasets and assigning values for user-specified parameters. In case of dimensionality reduction, in con‐ trast to other datasets used in this study, colon cancer is being very high-dimensional data‐ sets containing a very large number of genes (features). The number of genes of colon cancer (i.e., 2000 genes) is too high to manipulate in the learning classifier and not all genes are useful for classification [63]. To remove such difficulties, we first reduced the dimension of the colon cancer dataset to within 100 features, using an information gain (IG) measurement techni‐ que. Ordinarily, IG measurement determines statistically those features that are informative for classifying a target. On the basis of such a concept, we have used such a technique for reducing the dimension of the colon cancer dataset. Details about IG measurement can be found in [64].

**7.2 Experimental Results**

Tables 3 shows the results of ACOFS over 20 independent runs on nine real-world bench‐ mark classification datasets. The classification accuracy (CA) in Table 3 refers to the percent‐ age of exact classifications produced by trained NNs on the testing set of a classification dataset. In addition, the weights of features for the above nine datasets over 20 independent runs are exhibited in Tables 4-11. On the other hand, Figure 7 shows how the best solution was selected in ACOFS for the glass dataset. In order to observe whether the internal proc‐ ess of FS in ACOFS is appropriately being performed, Figures. 8-11 have been considered.

**Dataset Avg. result with all features Avg. result with selected features**

Cancer 9.00 0.00 97.97 0.42 3.50 1.36 98.91 0.40 Glass 9.00 0.00 76.60 2.55 3.30 1.14 82.54 1.44 Vehicle 18.00 0.00 60.71 11.76 2.90 1.37 75.90 0.64 Thyroid 21.0 0.00 98.04 0.58 3.00 1.34 99.08 0.11 Ionosphere 34.0 0.00 97.67 1.04 4.15 2.53 99.88 0.34 Credit card 51.0 0.00 85.23 0.67 5.85 1.76 87.99 0.38 Sonar 60.0 0.00 76.82 6.97 6.25 3.03 86.05 2.26 Gene 120.0 0.00 78.97 5.51 7.25 2.53 89.20 2.46 Colon cancer 100.0 0.00 59.06 5.75 5.25 2.48 84.06 3.68

**Table 3.** Performance of ACOFS for different classification datasets. Results were averaged over 20 independent runs. Here, *n* and *n*s refer to the total number of original features and selected features, respectively. On the other hand, CA

**i.** As can be seen from Table 3, ACOFS was able to select a smaller number of fea‐

**ii.** The positive effect of a small number of selected features (*n*s) is clearly visible when

we observe the CA. For example, for the vehicle dataset, the average CA of all fea‐ tures was 60.71%, whereas it had been 75.90% with 2.90 features. Similarly, ACOFS produced an average CA of 86.05% with the average number of features of 6.25

tures for solving different datasets. For example, ACOFS selected, on average, 3.00 features from a set of 21 features in solving the thyroid dataset. It also selected, on average, 7.25 genes (features) from a set of 120 genes in solving the gene dataset. On the other hand, a very large-dimensional dataset, that of colon cancer, was pre‐ processed from the original one to be utilized in ACOFS. In this manner, the origi‐ nal 2000 features of colon cancer were reduced to within 100 features. ACOFS then obtained a small number of salient genes, 5.25 on average, from the set of 100 genes for solving the colon cancer dataset. In fact, ACOFS selected a small number of fea‐ tures for all other datasets having more features. Feature reduction in such datasets

and SD signify the classification accuracy and standard deviation, respectively.

was several orders of magnitude (see Table 3).

*n* SD CA (%) SD *ns* SD CA(%) SD

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

25

Now, the following observations can be made from Tables 3-11 and Figures 7-11.

In case of user-specified parameters, we used a number of parameters, which are common for the all datasets, reported in the Table 2. It should be noted that, these parameters are not specific to our algorithm, rather usual for any ACO-based FS algorithm using NN. We have chosen these parameters after some preliminary runs. They were not meant to be optimal. It is worth mentioning that, among the parameters mentioned in Table 2, proper selection of the values of parameters and , is helpful for achieving a level of balance between exploitation and exploration of ants in selecting salient features. For example, if 0, then no pheromone infor‐ mation is used, that is to say, previous search experience is neglected. The search then changes to a greedy search. If 0, then attractiveness, the potential benefit of moves, is neglected. In this work, the values of and were chosen according to the suggestion of [53].


**Table 2.** Common parameters for all datasets.

## **7.2 Experimental Results**

basic steps need to be considered, namely, dimensionality reduction of the datasets and assigning values for user-specified parameters. In case of dimensionality reduction, in con‐ trast to other datasets used in this study, colon cancer is being very high-dimensional data‐ sets containing a very large number of genes (features). The number of genes of colon cancer (i.e., 2000 genes) is too high to manipulate in the learning classifier and not all genes are useful for classification [63]. To remove such difficulties, we first reduced the dimension of the colon cancer dataset to within 100 features, using an information gain (IG) measurement techni‐ que. Ordinarily, IG measurement determines statistically those features that are informative for classifying a target. On the basis of such a concept, we have used such a technique for reducing the dimension of the colon cancer dataset. Details about IG measurement can be

In case of user-specified parameters, we used a number of parameters, which are common for the all datasets, reported in the Table 2. It should be noted that, these parameters are not specific to our algorithm, rather usual for any ACO-based FS algorithm using NN. We have chosen these parameters after some preliminary runs. They were not meant to be optimal. It is worth mentioning that, among the parameters mentioned in Table 2, proper selection of the values of parameters and , is helpful for achieving a level of balance between exploitation and exploration of ants in selecting salient features. For example, if 0, then no pheromone infor‐ mation is used, that is to say, previous search experience is neglected. The search then changes to a greedy search. If 0, then attractiveness, the potential benefit of moves, is neglected. In this

**Parameter Value**

Iteration threshold, 10 to 18 Accuracy threshold Depends on dataset

Training error threshold, λ Depends on dataset

Learning rate for BP algorithm 0.1 to 0.2 Momentum term for BP algorithm 0.5 to 0.9 Initial weights of NNs -1.0 to 1.0

The number of epochs for partial training, τ 20 to 40

Training threshold for terminating NN training, *T* 3

**Table 2.** Common parameters for all datasets.

Initial pheromone level for all features, τ 0.5 Initial heuristic value for all features, η 0.1 *(*,used in subset size determination 0.08 to 0.6 Strength of pheromone level, α 1 Strength of heuristic value, β 3 Pheromone decay parameter, ρ 0.4 Exponential term control parameter, φ 0.1

work, the values of and were chosen according to the suggestion of [53].

found in [64].

24 Ant Colony Optimization - Techniques and Applications

Tables 3 shows the results of ACOFS over 20 independent runs on nine real-world bench‐ mark classification datasets. The classification accuracy (CA) in Table 3 refers to the percent‐ age of exact classifications produced by trained NNs on the testing set of a classification dataset. In addition, the weights of features for the above nine datasets over 20 independent runs are exhibited in Tables 4-11. On the other hand, Figure 7 shows how the best solution was selected in ACOFS for the glass dataset. In order to observe whether the internal proc‐ ess of FS in ACOFS is appropriately being performed, Figures. 8-11 have been considered. Now, the following observations can be made from Tables 3-11 and Figures 7-11.


**Table 3.** Performance of ACOFS for different classification datasets. Results were averaged over 20 independent runs. Here, *n* and *n*s refer to the total number of original features and selected features, respectively. On the other hand, CA and SD signify the classification accuracy and standard deviation, respectively.


substantially reduced for the sonar dataset, while the average CA had been 76.82% with all 60 features. Other similar types of scenarios can also be seen for all remain‐ ing datasets in ACOFS. Thus, it can be said that ACOFS has a powerful searching capability for providing high-quality solutions. CA improvement for such datasets was several orders of magnitude (see Table 3). Furthermore, the use of *n*<sup>s</sup> caused a relatively small standard deviation (SD), as presented in Table 3 for each entry. The low SDs imply robustness of ACOFS. Robustness is represented by consistency of an algorithm under different initial conditions.

Furthermore, different features that were included in different local best subsets

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

27

**Figure 9.** Distribution of pheromone level of some selected features of the glass dataset in different iterations for a

**Figure 10.** Distribution of heuristic level of some selected features of the glass dataset in different iterations for a sin‐

**iv.** In order to observe the manner, in which how the selection of salient features in

different iterations progresses in ACOFS, Figure 8 shows the scenario of such infor‐ mation for the glass dataset for a single run. We can see that features 1, 7, 8, 6, and 2 received most of the selections by ants during SCs compared to the other fea‐ tures. The selection of features was basically performed based on the values of pheromone update (*τ*) and heuristic information (*η*) for individual features. Ac‐ cordingly, those features that had higher values of τ and η ordinarily obtained a higher priority of selection, as could be seen in Figures 9 and 10. For clarity, these

caused variations in CAs.

single run.

gle run.

**Figure 7.** Finding best subset of the glass dataset for a single run. Here, the classification accuracy is the accuracy of the local best subset.

**Figure 8.** Number of selections of each feature by different ants for different iterations in the glass dataset for a single run.

**iii.** The method of determination for the final solution of a subset in ACOFS can be seen in Figure 7. We can observe that for the performances of the local best subsets, the CAs varied together with the size of those subsets. There were also several points, where the CAs were maximized, but the best solution was selected (indicat‐ ed by circle) by considering the reduced size subset. It can also be seen in Figure 7 that CAs varied due to size variations of local best subsets in different iterations. Furthermore, different features that were included in different local best subsets caused variations in CAs.

substantially reduced for the sonar dataset, while the average CA had been 76.82% with all 60 features. Other similar types of scenarios can also be seen for all remain‐ ing datasets in ACOFS. Thus, it can be said that ACOFS has a powerful searching capability for providing high-quality solutions. CA improvement for such datasets was several orders of magnitude (see Table 3). Furthermore, the use of *n*<sup>s</sup> caused a relatively small standard deviation (SD), as presented in Table 3 for each entry. The low SDs imply robustness of ACOFS. Robustness is represented by consistency of

**Figure 7.** Finding best subset of the glass dataset for a single run. Here, the classification accuracy is the accuracy of

**Figure 8.** Number of selections of each feature by different ants for different iterations in the glass dataset for a single

**iii.** The method of determination for the final solution of a subset in ACOFS can be

seen in Figure 7. We can observe that for the performances of the local best subsets, the CAs varied together with the size of those subsets. There were also several points, where the CAs were maximized, but the best solution was selected (indicat‐ ed by circle) by considering the reduced size subset. It can also be seen in Figure 7 that CAs varied due to size variations of local best subsets in different iterations.

an algorithm under different initial conditions.

26 Ant Colony Optimization - Techniques and Applications

the local best subset.

run.

**Figure 9.** Distribution of pheromone level of some selected features of the glass dataset in different iterations for a single run.

**Figure 10.** Distribution of heuristic level of some selected features of the glass dataset in different iterations for a sin‐ gle run.

**iv.** In order to observe the manner, in which how the selection of salient features in different iterations progresses in ACOFS, Figure 8 shows the scenario of such infor‐ mation for the glass dataset for a single run. We can see that features 1, 7, 8, 6, and 2 received most of the selections by ants during SCs compared to the other fea‐ tures. The selection of features was basically performed based on the values of pheromone update (*τ*) and heuristic information (*η*) for individual features. Ac‐ cordingly, those features that had higher values of τ and η ordinarily obtained a higher priority of selection, as could be seen in Figures 9 and 10. For clarity, these figures represented five features, of which four (features 1, 7, 8, 6) had a higher rate of selection by ants during SCs and one (feature 2) had a lower rate.

**Feature** 47 72 249 267 493 765 1247 1325 1380 1843 **Weight** 0.051 0.038 0.051 0.038 0.051 0.038 0.038 0.038 0.051 0.051

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

29

**Figure 11.** Training process for evaluating the subsets constructed by ants in the ionosphere dataset: (a) training error on training set, (b) training error on validation set, (c) classification accuracy on validation set, and (d) the hidden neu‐

ron addition process.

**Table 11.** Weights of the features selected by ACOFS for the colon cancer dataset.


**Table 4.** Weights of the features selected by ACOFS for the cancer and glass datasets.


**Table 5.** Weights of the features selected by ACOFS for the vehicle dataset.


**Table 6.** Weights of the features selected by ACOFS for the thyroid dataset.


**Table 7.** Weights of the features selected by ACOFS for the ionosphere dataset.


**Table 8.** Weights of the features selected by ACOFS for the credit card dataset.


**Table 9.** Weights of the features selected by ACOFS for the sonar dataset.


**Table 10.** Weights of the features selected by ACOFS for the gene dataset.


**Table 11.** Weights of the features selected by ACOFS for the colon cancer dataset.

figures represented five features, of which four (features 1, 7, 8, 6) had a higher rate

12 3 456 7 89

**Cancer** 0.186 0.042 0.129 0.142 0.129 0.2 0.115 0.042 0.015 **Glass** 0.258 0.045 0.258 0.107 0.06 0.015 0.182 0.06 0.015

**Feature** 1 2 4 7 9 10 11 12 **Weight** 0.189 0.103 0.069 0.051 0.086 0.086 0.103 0.086

**Feature** 1 7 17 19 20 21 **Weight** 0.052 0.052 0.332 0.1 0.069 0.15

**Feature** 1 3 4 5 7 8 12 27 29 **Weight** 0.108 0.036 0.036 0.036 0.06 0.12 0.06 0.12 0.036

**Feature** 5 8 29 41 42 43 44 49 51 **Weight** 0.042 0.06 0.034 0.051 0.17 0.111 0.128 0.034 0.12

**Feature** 2 9 10 11 12 15 17 18 44 **Weight** 0.037 0.046 0.056 0.084 0.112 0.037 0.037 0.037 0.06

**Feature** 22 59 60 61 62 63 64 69 70 119 **Weight** 0.027 0.064 0.045 0.1 0.073 0.073 0.119 0.110 0.128 0.036

of selection by ants during SCs and one (feature 2) had a lower rate.

**Dataset Feature**

28 Ant Colony Optimization - Techniques and Applications

**Table 4.** Weights of the features selected by ACOFS for the cancer and glass datasets.

**Table 5.** Weights of the features selected by ACOFS for the vehicle dataset.

**Table 6.** Weights of the features selected by ACOFS for the thyroid dataset.

**Table 7.** Weights of the features selected by ACOFS for the ionosphere dataset.

**Table 8.** Weights of the features selected by ACOFS for the credit card dataset.

**Table 9.** Weights of the features selected by ACOFS for the sonar dataset.

**Table 10.** Weights of the features selected by ACOFS for the gene dataset.

**Figure 11.** Training process for evaluating the subsets constructed by ants in the ionosphere dataset: (a) training error on training set, (b) training error on validation set, (c) classification accuracy on validation set, and (d) the hidden neu‐ ron addition process.

**v.** Upon completion of the entire FS process, the features that were most salient could be identified by means of weight computation for individual features. That is to say, features having higher weight values were more significant. On the other hand, for a particular feature to have a maximum weight value implied that the feature had the maximum number of selections by ants in any algorithm for most of the runs. Tables 4-11 show the weight of features for the cancer, glass, vehicle, thyroid, ionosphere, credit card, sonar, gene, and colon cancer datasets, respective‐ ly, over 20 independent runs. We can see in Table 4 that ACOFS selected features 6, 1, 4, 3, 5, and 7 from the cancer dataset very frequently, that these features had rela‐ tively higher weight values, and preformed well as discriminators. Similarly, our ACOFS selected features 42, 44, 51, 43, 8, and 5 as most important from the credit card dataset (Table 8), as well as features 70, 64, 69, 61, 63, 59, and 60 from the gene dataset (Table 10). Note that, weights for certain features are reported in Tables 5-11, whereas weights that were of negligible value for the rest of each dataset are not included.

Table 12 shows the average results of the new experiments for vehicle and credit card data‐ sets over only 20 independent runs. The positive effects of determining the subset size dur‐ ing the FS process are clearly visible. For example, for the credit card dataset, the average values of ns of ACOFS without and with subset size determination were 15.30 and 5.85, re‐ spectively. A similar scenario can also be seen for the other dataset. In terms of CAs, the average CAs for ACOFS with subset size determination were either better than or compara‐

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

31

The essence of the proposed techniques in ACOFS can be seen in Table 3 for recognizing the subsets of salient features from the given datasets; however, the effects of the inner compo‐ nent μ of subset size determination (see Section 6.1) on the overall results were not clear. The reason is that the size of the subsets constructed by the ants depended roughly on the value of μ. To observe such effects, we conducted a new set of experiments. The setups of these experiments were almost exactly the same as those described before. The only differ‐ ence was that the value of μ varied within a range of 0.2 to 0.94 by a small threshold value

**Values of μ Average performance**

Initial Final *ns* SD CA (%) SD 0.40 0.64 2.60 0.91 80.09 2.69 0.50 0.74 3.05 1.16 82.16 1.51 0.60 0.84 3.30 1.14 82.54 1.44 0.70 0.94 3.45 1.39 81.98 1.39

**Table 13.** Effect of varying the value of µ on the average performances of ACOFS for the glass dataset. The value is

**Values of μ Average performance**

Initial Final *ns* SD CA (%) SD 0.20 0.30 4.70 2.59 99.54 0.83 0.23 0.33 3.65 2.32 99.65 0.63 0.26 0.36 4.15 2.53 99.88 0.34 0.29 0.39 6.00 3.78 99.48 0.76

**Table 14.** Effect of varying the value of µ on the average performances of ACOFS for the ionosphere dataset. The

incremented by a threshold value of 0.01 over 20 individual runs.

value is incremented by a threshold value of 0.005 over 20 individual runs.

ble to ACOFS without subset size determination for these two datasets.

**7.4. Effect of µ**

over 20 individual runs.

**vi.** Finally, we wish to note that a successful evaluation function leads to finding highquality solutions for ACOFS in FS. Our ACOFS uses a constructive NN model that evaluates the subsets constructed by ants in each and every step during training. As training process progresses, the training error for the training set converges to a certain limit (Figure 11(a)). However, there is an instance in which the training er‐ ror increases. This is due to the addition of one unnecessary hidden neuron. Such an addition also hampers the training error on the validation set (Figure 11(b)). Therefore, ACOFS deletes such an unnecessary hidden neuron (Figure 11(d)) from the NN architecture, since it cannot improve the classification accuracy on the vali‐ dation set (Figure 11(c)).

## **7.3. Effects of Subset Size Determination**

The results presented in Table 3 show the ability of ACOFS in selecting salient features. However, the effects resulting from determining the subset size to control ants in such a manner as to construct the subset in a reduced boundary were not clear. To observe such effects, we carried out a new set of experiments. The setups of these experiments were al‐ most exactly the same as those described before. The only difference was that ACOFS had not determined the subset size earlier using a bounded scheme; instead the size of the subset for each ant had been decided randomly.


**Table 12.** Effect of determining subset size on the average performances of ACOFS.

Table 12 shows the average results of the new experiments for vehicle and credit card data‐ sets over only 20 independent runs. The positive effects of determining the subset size dur‐ ing the FS process are clearly visible. For example, for the credit card dataset, the average values of ns of ACOFS without and with subset size determination were 15.30 and 5.85, re‐ spectively. A similar scenario can also be seen for the other dataset. In terms of CAs, the average CAs for ACOFS with subset size determination were either better than or compara‐ ble to ACOFS without subset size determination for these two datasets.

## **7.4. Effect of µ**

**v.** Upon completion of the entire FS process, the features that were most salient could

**vi.** Finally, we wish to note that a successful evaluation function leads to finding high-

The results presented in Table 3 show the ability of ACOFS in selecting salient features. However, the effects resulting from determining the subset size to control ants in such a manner as to construct the subset in a reduced boundary were not clear. To observe such effects, we carried out a new set of experiments. The setups of these experiments were al‐ most exactly the same as those described before. The only difference was that ACOFS had not determined the subset size earlier using a bounded scheme; instead the size of the subset

Vehicle 6.05 4.76 75.73 0.48 2.90 1.37 75.90 0.64 Credit card 15.30 8.25 88.34 0.22 5.85 1.76 87.99 0.38

*ns* SD CA(%) SD *ns* SD CA(%) SD

**Dataset ACOFS without bounded scheme ACOFS**

**Table 12.** Effect of determining subset size on the average performances of ACOFS.

quality solutions for ACOFS in FS. Our ACOFS uses a constructive NN model that evaluates the subsets constructed by ants in each and every step during training. As training process progresses, the training error for the training set converges to a certain limit (Figure 11(a)). However, there is an instance in which the training er‐ ror increases. This is due to the addition of one unnecessary hidden neuron. Such an addition also hampers the training error on the validation set (Figure 11(b)). Therefore, ACOFS deletes such an unnecessary hidden neuron (Figure 11(d)) from the NN architecture, since it cannot improve the classification accuracy on the vali‐

not included.

30 Ant Colony Optimization - Techniques and Applications

dation set (Figure 11(c)).

**7.3. Effects of Subset Size Determination**

for each ant had been decided randomly.

be identified by means of weight computation for individual features. That is to say, features having higher weight values were more significant. On the other hand, for a particular feature to have a maximum weight value implied that the feature had the maximum number of selections by ants in any algorithm for most of the runs. Tables 4-11 show the weight of features for the cancer, glass, vehicle, thyroid, ionosphere, credit card, sonar, gene, and colon cancer datasets, respective‐ ly, over 20 independent runs. We can see in Table 4 that ACOFS selected features 6, 1, 4, 3, 5, and 7 from the cancer dataset very frequently, that these features had rela‐ tively higher weight values, and preformed well as discriminators. Similarly, our ACOFS selected features 42, 44, 51, 43, 8, and 5 as most important from the credit card dataset (Table 8), as well as features 70, 64, 69, 61, 63, 59, and 60 from the gene dataset (Table 10). Note that, weights for certain features are reported in Tables 5-11, whereas weights that were of negligible value for the rest of each dataset are

The essence of the proposed techniques in ACOFS can be seen in Table 3 for recognizing the subsets of salient features from the given datasets; however, the effects of the inner compo‐ nent μ of subset size determination (see Section 6.1) on the overall results were not clear. The reason is that the size of the subsets constructed by the ants depended roughly on the value of μ. To observe such effects, we conducted a new set of experiments. The setups of these experiments were almost exactly the same as those described before. The only differ‐ ence was that the value of μ varied within a range of 0.2 to 0.94 by a small threshold value over 20 individual runs.


**Table 13.** Effect of varying the value of µ on the average performances of ACOFS for the glass dataset. The value is incremented by a threshold value of 0.01 over 20 individual runs.


**Table 14.** Effect of varying the value of µ on the average performances of ACOFS for the ionosphere dataset. The value is incremented by a threshold value of 0.005 over 20 individual runs.

Tables 13 and 14 show the average results of our new experiments over 20 independent runs. The significance of the effect of varying *μ*. can be seen from these results. For example, for the glass dataset (Table 13), the average percentage of the CA improved as the value of *μ*. increased up to a certain point. Afterwards, the CA degraded as the value of *μ*. increased. Thus, a subset of features was selected with a large size. A similar scenario can also be seen for the ionosphere dataset (Table 14). It is clear here that the significance of the result of FS in ACOFS depends on the value of *μ*. Furthermore, the determination of subset size in ACOFS is an important aspect for suitable FS.

ACOFS performed significantly better than ACOFS without local search operation at a 95% confidence level for all the datasets, except for the colon cancer dataset. On the other hand, the t-test was also used here to determine whether the difference in performances between the above two approaches with regard to selecting a reduced number of salient features was statistically significant. We found that ACOFS was significantly better than ACOFS without

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

33

In order to understand precisely how hybrid search plays an important role in ACOFS for FS tasks, a set of experiments was additionally conducted. The setups of these experiments were similar to those described before, and different initial conditions were maintained con‐ stant between these two experiments. Figures 12 and 13 show the CAs of ACOFS without and with hybrid search, respectively. These CAs were produced by local best subsets in dif‐ ferent iterations of a single run. The positive role of using hybrid local search in ACOFS can clearly be seen in these figures. In Figure 12, we can see that a better CA was found only in the initial iteration because of the rigorous survey by the ants in finding salient features. For the next iterations, the CAs fluctuated up to a higher iteration, 19, but were not able to reach a best state. This occurred due to the absence of hybrid search, which resulted in a weak search in ACOFS. The opposite scenario can be seen in Figure 13, where the search was suf‐ ficiently powerful that by a very low number of iterations, 5, ACOFS was able to achieve the best accuracy (99.42%) of the salient feature subset. Thereafter, ACOFS terminated the searching of salient features. The reason for such a high performance of FS was just the in‐

**Figure 12.** Classification accuracies (CAs) of the cancer dataset without considering hybrid search for a single run.

hybrid search at a 95% confidence level for all four datasets.

corporation of the hybrid search.

Here, CA is the accuracy of a local best subset.

## **7.5. Effect of Hybrid Search**

The capability of ACOFS for FS can be seen in Table 3, but the effect of using hybrid search in ACOFS for FS is not clear. Therefore, a new set of experiments was carried out to ob‐ serve such effects. The setups of these experiments were almost exactly as same as those described before. The only difference was that ACOFS did not use the modified rules of pheromone update and heuristic value for each feature; instead, standard rules were used. In such considerations, we avoided not only the incorporation of the information gain term, but also the concept of random and probabilistic behaviors, during SC for both specific rules. Furthermore, we ignored the exponential term in the heuristic measurement rule.


**Table 15.** Effect of considering hybrid search on average performances of ACOFS. Results were averaged over 20 independent runs.

Table 15 shows the average results of our new experiments for the glass, credit card, sonar, and colon cancer datasets over 20 independent runs. The positive effects of using a hybrid search in ACOFS are clearly visible. For example, for the credit card dataset, the average CAs of ACOFS with and without hybrid search were 87.99% and 87.26%, respectively. A similar classification improvement for ACOFS with hybrid search was also observed for the other datasets. On the other hand, in terms of *ns*, for the glass dataset, the average values of *ns* of ACOFS and ACOFS without hybrid search were 3.30 and 4.05, respectively. For the other datasets it was also found that ACOFS selected a smaller number of salient features. We used t-test here to determine whether the difference of classification performances be‐ tween ACOFS and ACOFS without hybrid search was statistically significant. We found that ACOFS performed significantly better than ACOFS without local search operation at a 95% confidence level for all the datasets, except for the colon cancer dataset. On the other hand, the t-test was also used here to determine whether the difference in performances between the above two approaches with regard to selecting a reduced number of salient features was statistically significant. We found that ACOFS was significantly better than ACOFS without hybrid search at a 95% confidence level for all four datasets.

Tables 13 and 14 show the average results of our new experiments over 20 independent runs. The significance of the effect of varying *μ*. can be seen from these results. For example, for the glass dataset (Table 13), the average percentage of the CA improved as the value of *μ*. increased up to a certain point. Afterwards, the CA degraded as the value of *μ*. increased. Thus, a subset of features was selected with a large size. A similar scenario can also be seen for the ionosphere dataset (Table 14). It is clear here that the significance of the result of FS in ACOFS depends on the value of *μ*. Furthermore, the determination of subset size in

The capability of ACOFS for FS can be seen in Table 3, but the effect of using hybrid search in ACOFS for FS is not clear. Therefore, a new set of experiments was carried out to ob‐ serve such effects. The setups of these experiments were almost exactly as same as those described before. The only difference was that ACOFS did not use the modified rules of pheromone update and heuristic value for each feature; instead, standard rules were used. In such considerations, we avoided not only the incorporation of the information gain term, but also the concept of random and probabilistic behaviors, during SC for both specific rules. Furthermore, we ignored the exponential term in the heuristic measurement rule.

**Dataset ACOFS without hybrid search ACOFS**

**Table 15.** Effect of considering hybrid search on average performances of ACOFS. Results were averaged over 20

Table 15 shows the average results of our new experiments for the glass, credit card, sonar, and colon cancer datasets over 20 independent runs. The positive effects of using a hybrid search in ACOFS are clearly visible. For example, for the credit card dataset, the average CAs of ACOFS with and without hybrid search were 87.99% and 87.26%, respectively. A similar classification improvement for ACOFS with hybrid search was also observed for the other datasets. On the other hand, in terms of *ns*, for the glass dataset, the average values of *ns* of ACOFS and ACOFS without hybrid search were 3.30 and 4.05, respectively. For the other datasets it was also found that ACOFS selected a smaller number of salient features. We used t-test here to determine whether the difference of classification performances be‐ tween ACOFS and ACOFS without hybrid search was statistically significant. We found that

*ns* SD CA (%) SD *ns* SD CA(%) SD

Glass 4.05 1.35 81.22 1.39 3.30 1.14 82.54 1.44 Credit card 6.15 2.21 87.26 0.66 5.85 1.76 87.99 0.38 Sonar 6.50 2.80 84.42 3.03 6.25 3.03 86.05 2.26 Colon cancer 6.35 4.05 82.18 4.08 5.25 2.48 84.06 3.68

ACOFS is an important aspect for suitable FS.

32 Ant Colony Optimization - Techniques and Applications

**7.5. Effect of Hybrid Search**

independent runs.

In order to understand precisely how hybrid search plays an important role in ACOFS for FS tasks, a set of experiments was additionally conducted. The setups of these experiments were similar to those described before, and different initial conditions were maintained con‐ stant between these two experiments. Figures 12 and 13 show the CAs of ACOFS without and with hybrid search, respectively. These CAs were produced by local best subsets in dif‐ ferent iterations of a single run. The positive role of using hybrid local search in ACOFS can clearly be seen in these figures. In Figure 12, we can see that a better CA was found only in the initial iteration because of the rigorous survey by the ants in finding salient features. For the next iterations, the CAs fluctuated up to a higher iteration, 19, but were not able to reach a best state. This occurred due to the absence of hybrid search, which resulted in a weak search in ACOFS. The opposite scenario can be seen in Figure 13, where the search was suf‐ ficiently powerful that by a very low number of iterations, 5, ACOFS was able to achieve the best accuracy (99.42%) of the salient feature subset. Thereafter, ACOFS terminated the searching of salient features. The reason for such a high performance of FS was just the in‐ corporation of the hybrid search.

**Figure 12.** Classification accuracies (CAs) of the cancer dataset without considering hybrid search for a single run. Here, CA is the accuracy of a local best subset.

out noisy features were 81.69% and 82.54%, respectively. On the other hand, in terms of *ns*, the average values were 4.45 and 3.30, respectively. A similar scenario can also be found for the cancer dataset. Thus, it is clear that ACOFS has a strong ability to select the salient fea‐ tures from real-valued datasets even with a mixture of noisy features. We can observe that ACOFS selected a slightly higher average number of salient features from the glass dataset with noisy features. The reason is that adding the noisy features created confusion in the feature space. This may assist our ACOFS in selecting a greater number of noiseless features

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

35

The results of ACOFS obtained on nine real-world benchmark classification datasets are compared here with the results of various existing FS algorithms (i.e., ACO-based and non ACO-based) as well as with a normal ACO-based FS algorithm, as reported in Tables 17-19. The various FS algorithms are as follows: ACO-based hybrid FS (ACOFSS[42]), ACO-based attribute reduction (ACOAR[31]), genetic programming for FS (GPFS[32]), hybrid genetic al‐ gorithm for FS (HGAFS[23]), MLP-based FS method (MLPFS[4]), constructive approach for feature selection (CAFS[47]), and artificial neural net input gain measurement approxima‐ tion (ANNIGMA[26]). The results reported in these tables are over 20 independent runs. In comparing these algorithms, we have mainly used two parameters: classification accuracy

The comparisons between eight FS algorithms represent a wide range of FS techniques. Five of the FS techniques, namely, ACOFS, ACOFSS, ACOAR, GPFS, and HGAFS, use global search strategies for FS. Among them, ACOFS, ACOFSS, and ACOAR use the ant colony op‐ timization algorithm. HGAFS uses a GA in finding salient features, and GPFS uses genetic programming, a variant of GA. For the remaining three FS techniques, namely, MLPFS, AN‐ NIGMA and CAFS; MLPFS and ANNIGMA use backward selection strategy for finding sa‐ lient features, while CAFS uses forward selection strategy. For evaluating the feature subset, ACOFS, ACOFSS, MLPFS, CAFS, and ANNIGMA use a NN for classifiers, while GPFS and HGAFS use a decision tree and support vector machine, respectively, for classifiers, and ACOAR uses rough set theory by calculating a dependency degree. ACOFS, and CAFS uses a training set, validation set and testing set, while ACOFSS and ANNIGMA use only a train‐ ing set and testing set. MLPFS and GPFS use 10-fold cross-validation. A similar method, that is, *5*-fold cross-validation, is used in HGAFS, where *k* refers to a value ranging from 2 to 10, depending on the given dataset scale. The aforementioned algorithms not only use different data partitions, but also employ a different number of independent runs in measuring aver‐ age performances. For example, ANNIGMA and CAFS use 30 runs, ACOFS uses 20 runs, and MLPFS and GPFS use 10 runs. It is important to note that no further information re‐ garding the number of runs has been mentioned in the literature for ACOFSS and HGAFS.

to resolve the confusion in the feature space caused by the noisy features.

**7.7. Comparisons**

(CA) and the number of selected features ( *ns*).

*7.7.1. Comparison with other works*

**Figure 13.** Classification accuracies (CAs) of the cancer dataset in ACOFS for a single run. Here, CA is the accuracy of a local best subset.

#### **7.6. Performance on noisy features**

The results presented in Table 3 exhibit the ability of ACOFS to select salient features from real-valued datasets. In this study, we examine the sensitivity of ACOFS to noisy features that have been synthetically inserted into a number of real-valued datasets. In order to gen‐ erate these noisy features, we followed the process discussed in [32]. Briefly, at first, we con‐ sidered four features, namely, *fn1*, *fn2*, *fn3*, *fn4* and the values of these respective features were generated randomly. Specifically, the values of *fn1* and *fn2* were bound up to [0, 1] and [-1, +1], respectively. For the domains of *fn3* and *fn4*, we first randomly selected two different fea‐ tures from the datasets. Subsequently, the data points of these two selected features were taken as a random basis for use in the domains of *fn3* and *fn4*.


**Table 16.** Performances of ACOFS for noisy datasets. Results were averaged over 20 independent runs.

Table 16 shows the average performances of ACOFS on the real-valued datasets of cancer and glass mixed with noisy features over 20 independent runs. The ability of ACOFS for FS over real-valued datasets can also be found in Table 3. In comparing Tables 3 and 16, the following observations can be made. For the glass dataset, the average CAs with and with‐ out noisy features were 81.69% and 82.54%, respectively. On the other hand, in terms of *ns*, the average values were 4.45 and 3.30, respectively. A similar scenario can also be found for the cancer dataset. Thus, it is clear that ACOFS has a strong ability to select the salient fea‐ tures from real-valued datasets even with a mixture of noisy features. We can observe that ACOFS selected a slightly higher average number of salient features from the glass dataset with noisy features. The reason is that adding the noisy features created confusion in the feature space. This may assist our ACOFS in selecting a greater number of noiseless features to resolve the confusion in the feature space caused by the noisy features.

## **7.7. Comparisons**

**Figure 13.** Classification accuracies (CAs) of the cancer dataset in ACOFS for a single run. Here, CA is the accuracy of a

The results presented in Table 3 exhibit the ability of ACOFS to select salient features from real-valued datasets. In this study, we examine the sensitivity of ACOFS to noisy features that have been synthetically inserted into a number of real-valued datasets. In order to gen‐ erate these noisy features, we followed the process discussed in [32]. Briefly, at first, we con‐ sidered four features, namely, *fn1*, *fn2*, *fn3*, *fn4* and the values of these respective features were generated randomly. Specifically, the values of *fn1* and *fn2* were bound up to [0, 1] and [-1, +1], respectively. For the domains of *fn3* and *fn4*, we first randomly selected two different fea‐ tures from the datasets. Subsequently, the data points of these two selected features were

**Dataset With all features With selected features**

**Table 16.** Performances of ACOFS for noisy datasets. Results were averaged over 20 independent runs.

Cancer 13.00 0.00 97.80 0.89 3.80 1.80 98.74 0.46 Glass 13.00 0.00 73.86 2.81 4.45 1.71 81.69 2.31

Table 16 shows the average performances of ACOFS on the real-valued datasets of cancer and glass mixed with noisy features over 20 independent runs. The ability of ACOFS for FS over real-valued datasets can also be found in Table 3. In comparing Tables 3 and 16, the following observations can be made. For the glass dataset, the average CAs with and with‐

*ns* S.D. CA (%) S.D. *ns* S.D. CA (%) S.D.

local best subset.

**7.6. Performance on noisy features**

34 Ant Colony Optimization - Techniques and Applications

taken as a random basis for use in the domains of *fn3* and *fn4*.

The results of ACOFS obtained on nine real-world benchmark classification datasets are compared here with the results of various existing FS algorithms (i.e., ACO-based and non ACO-based) as well as with a normal ACO-based FS algorithm, as reported in Tables 17-19. The various FS algorithms are as follows: ACO-based hybrid FS (ACOFSS[42]), ACO-based attribute reduction (ACOAR[31]), genetic programming for FS (GPFS[32]), hybrid genetic al‐ gorithm for FS (HGAFS[23]), MLP-based FS method (MLPFS[4]), constructive approach for feature selection (CAFS[47]), and artificial neural net input gain measurement approxima‐ tion (ANNIGMA[26]). The results reported in these tables are over 20 independent runs. In comparing these algorithms, we have mainly used two parameters: classification accuracy (CA) and the number of selected features ( *ns*).

## *7.7.1. Comparison with other works*

The comparisons between eight FS algorithms represent a wide range of FS techniques. Five of the FS techniques, namely, ACOFS, ACOFSS, ACOAR, GPFS, and HGAFS, use global search strategies for FS. Among them, ACOFS, ACOFSS, and ACOAR use the ant colony op‐ timization algorithm. HGAFS uses a GA in finding salient features, and GPFS uses genetic programming, a variant of GA. For the remaining three FS techniques, namely, MLPFS, AN‐ NIGMA and CAFS; MLPFS and ANNIGMA use backward selection strategy for finding sa‐ lient features, while CAFS uses forward selection strategy. For evaluating the feature subset, ACOFS, ACOFSS, MLPFS, CAFS, and ANNIGMA use a NN for classifiers, while GPFS and HGAFS use a decision tree and support vector machine, respectively, for classifiers, and ACOAR uses rough set theory by calculating a dependency degree. ACOFS, and CAFS uses a training set, validation set and testing set, while ACOFSS and ANNIGMA use only a train‐ ing set and testing set. MLPFS and GPFS use 10-fold cross-validation. A similar method, that is, *5*-fold cross-validation, is used in HGAFS, where *k* refers to a value ranging from 2 to 10, depending on the given dataset scale. The aforementioned algorithms not only use different data partitions, but also employ a different number of independent runs in measuring aver‐ age performances. For example, ANNIGMA and CAFS use 30 runs, ACOFS uses 20 runs, and MLPFS and GPFS use 10 runs. It is important to note that no further information re‐ garding the number of runs has been mentioned in the literature for ACOFSS and HGAFS.


Significantly, it can be said that FS improves the performance of classifiers by ignoring irrel‐ evant features in the original feature set. An important task in such a process is to capture necessary information in selecting salient features; otherwise, the performance of classifiers might be degraded. For example, for the cancer dataset, GPFS selected the smallest feature subset consisting of 2.23 features, but achieved a lower CA. On the other hand, ACOFS se‐ lected a slightly larger feature subset that provided a better CA compared to others for the cancer dataset. In fact, the results presented for other algorithms in Table 18 indicate that having the smallest or largest feature subset did not guarantee performing with the best or

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

37

In this context, a normal ACO algorithm for solving FS is used, considering similar steps as incorporated in ACOFS, except for a number of differences. We call this algorithm "NA‐ COFS". In NACOFS, issues of guiding the ants and forcing the ants during SC were not con‐ sidered. Instead, the ants followed a process for SC where the size of subsets was fixed for each iteration and increased at a fixed rate for following iterations. On the other hand, hy‐ brid search was not used here; that is to say, the concept of random and probabilistic behav‐ ior was not considered, including the incorporation of information gain in designing the

> ACOFS NACOFS *ns* S.D. CA S.D. *ns* S.D. CA S.D.

Cancer 3.50 1.36 98.91 0.40 4.50 0.97 98.77 0.37 Glass 3.30 1.14 82.54 1.44 4.60 1.01 80.66 1.44 Ionosphere 4.15 2.53 99.88 0.34 11.45 6.17 99.88 0.34 Credit card 5.85 1.76 87.99 0.38 22.85 6.01 88.19 0.45

**Table 19.** Comparisons between ACOFS and NACOFS. Here, NACOFS refers to the normal ACO-based FS algorithm.

exhibiting a low standard deviation (SD) under different experimental setups.

It is seen in Table 19 that the results produced by ACOFS achieved the best CA compared to NACOFS for three out of four datasets. For the remaining dataset, NACOFS achieved the best result. In terms of *ns*, ACOFS selected the smallest number of features for the all four datasets, while NACOFS selected subsets of bulky size. Between these two algorithms, the performances of the CAs seemed to be similar, but the results of the numbers of selected fea‐ tures were very different. The performance of ACOFS was also found to be very consistent,

worst CA.

*7.7.2. Comparison with normal ACO based FS algorithm*

pheromone update rule and heuristic information measurement rule.

**Dataset Comparison**

**Table 17.** Comparisons between ACOFS, ACOFSS [42], ACOAR [31]. Here, "\_" means not available.

We can see in Table 17 that ACOFS produced the best solutions in terms of a reduced num‐ ber of selected features, and the best CA in comparison with the two ACO-based FS algo‐ rithms, namely, ACOFSS and ACOAR, for all four datasets. Furthermore, the results produced by ACOFS shown in Table 18 represented the best CA among the other algorithms for all four datasets. For the remaining three datasets, while HGAFS achieved the best CA for two data‐ sets, GPFS achieved the best CA for one dataset. Note that, ACOFS and ANNIGMA jointly achieved the best CA for the credit card dataset. In terms of *ns*, ACOFS selected the smallest number of features for four out of seven datasets, and the second smallest for two dataset; that is to say, CAFS and HGAFS. In a close observation, ACOFS achieved the smallest *ns*, which resulted in the best CAs for the glass and ionosphere datasets in comparison with the other five algorithms (see Table 18).


**Table 18.** Comparisons between ACOFS, GPFS [32], HGAFS [23], MLPFS [4], CAFS [47], and ANNIGMA [26]. Here, "\_" means not available.

Significantly, it can be said that FS improves the performance of classifiers by ignoring irrel‐ evant features in the original feature set. An important task in such a process is to capture necessary information in selecting salient features; otherwise, the performance of classifiers might be degraded. For example, for the cancer dataset, GPFS selected the smallest feature subset consisting of 2.23 features, but achieved a lower CA. On the other hand, ACOFS se‐ lected a slightly larger feature subset that provided a better CA compared to others for the cancer dataset. In fact, the results presented for other algorithms in Table 18 indicate that having the smallest or largest feature subset did not guarantee performing with the best or worst CA.

## *7.7.2. Comparison with normal ACO based FS algorithm*

**Dataset Comparison**

Cancer *ns* 3.50 12.00

**Table 17.** Comparisons between ACOFS, ACOFSS [42], ACOAR [31]. Here, "\_" means not available.

**Dataset Comparison**

five algorithms (see Table 18).

36 Ant Colony Optimization - Techniques and Applications

means not available.

CA(%) 98.91 95.57 Thyroid *ns* 3.00 14.00 --

Credit card *ns* 5.85 - 8.00

Colon cancer *ns* 5.25 - 8.00

We can see in Table 17 that ACOFS produced the best solutions in terms of a reduced num‐ ber of selected features, and the best CA in comparison with the two ACO-based FS algo‐ rithms, namely, ACOFSS and ACOAR, for all four datasets. Furthermore, the results produced by ACOFS shown in Table 18 represented the best CA among the other algorithms for all four datasets. For the remaining three datasets, while HGAFS achieved the best CA for two data‐ sets, GPFS achieved the best CA for one dataset. Note that, ACOFS and ANNIGMA jointly achieved the best CA for the credit card dataset. In terms of *ns*, ACOFS selected the smallest number of features for four out of seven datasets, and the second smallest for two dataset; that is to say, CAFS and HGAFS. In a close observation, ACOFS achieved the smallest *ns*, which resulted in the best CAs for the glass and ionosphere datasets in comparison with the other

Cancer *ns* 3.50 2.23 3.00 8.00 6.33 5.80

Glass *ns* 3.30 -- 5.00 8.00 4.73 -

Vehicle *ns* 2.90 5.37 11.00 13.00 2.70 -

Ionosphere *ns* 4.15 - 6.00 32 6.73 9.00

Credit card *ns* 5.85 - 1.00 - - 6.70

Sonar *ns* 6.25 9.45 15.00 29.00 - -

Colon cancer *ns* 5.25 - 6.00 - - -

**Table 18.** Comparisons between ACOFS, GPFS [32], HGAFS [23], MLPFS [4], CAFS [47], and ANNIGMA [26]. Here, "\_"

CA(%) 98.91 96.84 94.24 89.40 98.76 96.50

CA (%) 82.54 -- 65.51 44.10 76.91 -

CA(%) 75.90 78.45 76.36 74.60 74.56 -

CA (%) 99.88 - 92.76 90.60 96.55 90.20

CA (%) 87.99 - 86.43 - - 88.00

CA (%) 86.05 86.26 87.02 59.10 - -

CA (%) 84.06 - 86.77 - - -

CA (%) 99.08 94.50 --

CA (%) 87.99 - -

CA(%) 84.06 - 59.5

ACOFS GPFS HGAFS MLPFS CAFS ANNIGMA

ACOFS ACOFSS ACOAR

In this context, a normal ACO algorithm for solving FS is used, considering similar steps as incorporated in ACOFS, except for a number of differences. We call this algorithm "NA‐ COFS". In NACOFS, issues of guiding the ants and forcing the ants during SC were not con‐ sidered. Instead, the ants followed a process for SC where the size of subsets was fixed for each iteration and increased at a fixed rate for following iterations. On the other hand, hy‐ brid search was not used here; that is to say, the concept of random and probabilistic behav‐ ior was not considered, including the incorporation of information gain in designing the pheromone update rule and heuristic information measurement rule.


**Table 19.** Comparisons between ACOFS and NACOFS. Here, NACOFS refers to the normal ACO-based FS algorithm.

It is seen in Table 19 that the results produced by ACOFS achieved the best CA compared to NACOFS for three out of four datasets. For the remaining dataset, NACOFS achieved the best result. In terms of *ns*, ACOFS selected the smallest number of features for the all four datasets, while NACOFS selected subsets of bulky size. Between these two algorithms, the performances of the CAs seemed to be similar, but the results of the numbers of selected fea‐ tures were very different. The performance of ACOFS was also found to be very consistent, exhibiting a low standard deviation (SD) under different experimental setups.

## **7.8. Discussions**

This section briefly explains the reason that the performance of ACOFS was better than those of the other ACO-based FS algorithms compared in Table 17. There are three major differen‐ ces that might contribute to the better performance of ACOFS compared to the other algorithms. **8. Conclusions**

global search capability of ACOFS.

**Acknowledgements**

**Author details**

, Md Shahjahan2

\*Address all correspondence to: murase@u-fukui.ac.jp

Monirul Kabir1

In this chapter, an efficient hybrid ACO-based FS algorithm has been reported. Since ants are the foremost strength of an ACO algorithm, guiding the ants in the correct directions is an urgent requirement for high-quality solutions. Accordingly, ACOFS guides ants during SC by determining the subset size. Furthermore, new sets of pheromone update and heuris‐ tic information measurement rules for individual features bring out the potential of the

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

39

Extensive experiments have been carried out in this chapter to evaluate how well ACOFS has performed in finding salient features on different datasets (see Table 3). It is observed that a set of high-quality solutions for FS was found from small, medium, large, and very large dimensional datasets. The results of the low standard deviations of the average classi‐ fication accuracies as well as the average number of selected features, showed the robust‐ ness of this algorithm. On the other hand, in comparison with seven prominent FS algorithms (see Tables 17 and 18), with only a few exceptions, ACOFS outperformed the others in terms of a reduced number of selected features and best classification performan‐ ces. Furthermore, the estimated computational complexity of this algorithm reflected that incorporation of several techniques did not increase the computational cost during FS in

We can see that there are a number of areas, where ACOFS failed to improve performances in terms of number of selected features and classification accuracies. Accordingly, more suita‐ ble heuristic schemes are necessary in order to guide the ants appropriately. In the current implementation, ACOFS has a number of user-specified parameters, given in Table 2, which are common in the field of ACO-based algorithms using NNs for FS. Further tuning of the user-specified parameters related to ACO provides some scope for further investigations in future. On the other hand, among these parameters, μ, used in determining the subset size, was sensitive to moderate change, according to our observations. One of the future improve‐ ments to ACOFS could be to reduce the number of parameters, or render them adaptive.

Supported by grants to K.M. from the Japanese Society for Promotion of Sciences, the Yazaki

and Kazuyuki Murase3\*

Memorial Foundation for Science and Technology, and the University of Fukui.

comparison with other ACO-based FS algorithms (see Section 6.5).

The first reason is that ACOFS uses a bounded scheme to determine the subset size, while ACOFSS, ACOAR, and other ACO-based FS algorithms (e.g., [11,49-52]) do not use such a scheme. It is now clear that without a bounded scheme, ants are free to construct subsets of bulky size. Accordingly, there is a high possibility of including a number of irrelevant fea‐ tures in the constructed subsets. Using the bounded scheme with assistance from other tech‐ niques, ACOFS includes the most highly salient features in a reduced number, although it functioned upon a wide range of feature spaces. As shown in Table 17, ACOFS selected, on average, 3.00 salient features, while ACOFSS selected 14.00 features, on average, from the thyroid dataset. For the remaining other three datasets, ACOFS also selected a very small number of salient features. The benefit of using the bounded scheme can also be seen from the results of the selected subsets in ACOFS.

The second reason is the new hybrid search technique integrated in ACOFS. The algorithms ACOFSS, ACOAR and others do not use such a hybrid search technique in performing pher‐ omone update and heuristic information measurement. The benefit of adopting the hybrid search in ACOFS can clearly be seen in Figures 12 and 13. These figures show that ACOFS achieved a powerful and faster searching capability in finding salient features in the feature space. The above advantage can also be seen in Tables 17 and 18. We found that ACOFS had a remarkable capability to produce significant classification performances from different da‐ tasets using a reduced number of salient features.

The third reason is that ACOFS used a constructive approach for determining appropriate architectures, that is to say, an appropriate size of the hidden layer for the NN classifiers. The NN then evaluated the subsets constructed by the ants in each iteration during train‐ ing. The existing ACO-based FS approaches (e.g., [42]) often ignored the above issue of the NN classifiers. Furthermore, a number of other approaches (e.g., [49,50]) often ignored the classifier portions to consider any heuristic methodology by which the activity of the clas‐ sifiers could be improved for evaluating the subsets effectively. Furthermore, most ACObased FS approaches performed the pheromone update rule based on classifier performances in evaluating the subsets. In this sense, the evaluation function was one of the most cru‐ cial parts in these approaches for FS. However, the most common practice was to choose the number of hidden neurons in the NN randomly. Thus, the random selection of hid‐ den neurons affected the generalization performances of the NNs. Furthermore, the entire FS process was eventually affected, resulting in ineffective solutions in FS. It is also impor‐ tant to say that the performance of any NN was greatly dependent on the architecture [17, 57]. Thus, automatic determination of the number of hidden neurons' lead to providing a better solution for FS in ACOFS.

## **8. Conclusions**

**7.8. Discussions**

38 Ant Colony Optimization - Techniques and Applications

the results of the selected subsets in ACOFS.

tasets using a reduced number of salient features.

better solution for FS in ACOFS.

This section briefly explains the reason that the performance of ACOFS was better than those of the other ACO-based FS algorithms compared in Table 17. There are three major differen‐ ces that might contribute to the better performance of ACOFS compared to the other algorithms.

The first reason is that ACOFS uses a bounded scheme to determine the subset size, while ACOFSS, ACOAR, and other ACO-based FS algorithms (e.g., [11,49-52]) do not use such a scheme. It is now clear that without a bounded scheme, ants are free to construct subsets of bulky size. Accordingly, there is a high possibility of including a number of irrelevant fea‐ tures in the constructed subsets. Using the bounded scheme with assistance from other tech‐ niques, ACOFS includes the most highly salient features in a reduced number, although it functioned upon a wide range of feature spaces. As shown in Table 17, ACOFS selected, on average, 3.00 salient features, while ACOFSS selected 14.00 features, on average, from the thyroid dataset. For the remaining other three datasets, ACOFS also selected a very small number of salient features. The benefit of using the bounded scheme can also be seen from

The second reason is the new hybrid search technique integrated in ACOFS. The algorithms ACOFSS, ACOAR and others do not use such a hybrid search technique in performing pher‐ omone update and heuristic information measurement. The benefit of adopting the hybrid search in ACOFS can clearly be seen in Figures 12 and 13. These figures show that ACOFS achieved a powerful and faster searching capability in finding salient features in the feature space. The above advantage can also be seen in Tables 17 and 18. We found that ACOFS had a remarkable capability to produce significant classification performances from different da‐

The third reason is that ACOFS used a constructive approach for determining appropriate architectures, that is to say, an appropriate size of the hidden layer for the NN classifiers. The NN then evaluated the subsets constructed by the ants in each iteration during train‐ ing. The existing ACO-based FS approaches (e.g., [42]) often ignored the above issue of the NN classifiers. Furthermore, a number of other approaches (e.g., [49,50]) often ignored the classifier portions to consider any heuristic methodology by which the activity of the clas‐ sifiers could be improved for evaluating the subsets effectively. Furthermore, most ACObased FS approaches performed the pheromone update rule based on classifier performances in evaluating the subsets. In this sense, the evaluation function was one of the most cru‐ cial parts in these approaches for FS. However, the most common practice was to choose the number of hidden neurons in the NN randomly. Thus, the random selection of hid‐ den neurons affected the generalization performances of the NNs. Furthermore, the entire FS process was eventually affected, resulting in ineffective solutions in FS. It is also impor‐ tant to say that the performance of any NN was greatly dependent on the architecture [17, 57]. Thus, automatic determination of the number of hidden neurons' lead to providing a In this chapter, an efficient hybrid ACO-based FS algorithm has been reported. Since ants are the foremost strength of an ACO algorithm, guiding the ants in the correct directions is an urgent requirement for high-quality solutions. Accordingly, ACOFS guides ants during SC by determining the subset size. Furthermore, new sets of pheromone update and heuris‐ tic information measurement rules for individual features bring out the potential of the global search capability of ACOFS.

Extensive experiments have been carried out in this chapter to evaluate how well ACOFS has performed in finding salient features on different datasets (see Table 3). It is observed that a set of high-quality solutions for FS was found from small, medium, large, and very large dimensional datasets. The results of the low standard deviations of the average classi‐ fication accuracies as well as the average number of selected features, showed the robust‐ ness of this algorithm. On the other hand, in comparison with seven prominent FS algorithms (see Tables 17 and 18), with only a few exceptions, ACOFS outperformed the others in terms of a reduced number of selected features and best classification performan‐ ces. Furthermore, the estimated computational complexity of this algorithm reflected that incorporation of several techniques did not increase the computational cost during FS in comparison with other ACO-based FS algorithms (see Section 6.5).

We can see that there are a number of areas, where ACOFS failed to improve performances in terms of number of selected features and classification accuracies. Accordingly, more suita‐ ble heuristic schemes are necessary in order to guide the ants appropriately. In the current implementation, ACOFS has a number of user-specified parameters, given in Table 2, which are common in the field of ACO-based algorithms using NNs for FS. Further tuning of the user-specified parameters related to ACO provides some scope for further investigations in future. On the other hand, among these parameters, μ, used in determining the subset size, was sensitive to moderate change, according to our observations. One of the future improve‐ ments to ACOFS could be to reduce the number of parameters, or render them adaptive.

## **Acknowledgements**

Supported by grants to K.M. from the Japanese Society for Promotion of Sciences, the Yazaki Memorial Foundation for Science and Technology, and the University of Fukui.

## **Author details**

Monirul Kabir1 , Md Shahjahan2 and Kazuyuki Murase3\*

\*Address all correspondence to: murase@u-fukui.ac.jp

1 Department of Electrical and Electronic Engineering, Dhaka University of Engineering and Technology (DUET), Bangladesh

[13] Chen, J., Huang, H., Tian, S., & Qu, Y. (2009). Feature Selection for Text Classification

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

41

[14] Fayyad, U. M., Piatesky-Shapiro, G., Smyth, P., & Uthurusamy, R. (1996). Advances

[15] Newman, D. J., Hettich, S., Blake, C. L., & Merz, C. J. (1998). UCI Repository of Ma‐ chine Learning Databases. *University of California, Irvine*, http://www.ics.uci.edu/

[16] Prechelt, L. (1994). PROBEN1-A set of Neural Network Benchmark Problems and Benchmarking Rules. *Technical Report 21/94, Faculty of Informatics, University of Karls‐*

[17] Yao, X., & Liu, Y. (1997). A New Evolutionary System for Evolving Artificial Neural

[18] Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays: proceedings of Inter‐

[19] Alizadeh, AA, et al. (2000). Distinct Types of Diffuse Large B-cell Lymphoma Identi‐

[20] Golub, T., et al. (1999). Molecular Classification of Cancer: Class Discovery and Class

[21] Guyon, I., & Elisseeff, A. An Introduction to Variable and Feature Selection. *Journal of*

[22] Dash, M., & Liu, H. (1997). Feature Selection for Classification. *Intelligent Data Analy‐*

[23] Huang, J, Cai, Y, & Xu, X. (2007). A Hybrid Genetic Algorithm for Feature Selection Wrapper based on Mutual Information. *Pattern Recognition Letters*, 28, 1825-1844. [24] Guan, S., Liu, J., & Qi, Y. (2004). An Incremental Approach to Contribution-based

[25] Peng, H., Long, F., & Ding, C. (2003). Overfitting in Making Comparisons between Variable Selection Methods. *Journal of Machine Learning Research*, 3, 1371-1382. [26] Hsu, C., Huang, H., & Schuschel, D. (2002). The ANNIGMA-Wrapper Approach to Fast Feature Selection for Neural Nets. *IEEE Trans. on Systems, Man, and Cybernetics-*

[27] Caruana, R., & Freitag, D. (1994). Greedy Attribute Selection: proceedings of the 11th

[28] Lai, C., Reinders, M. J. T., & Wessels, L. (2006). Random Subspace Method for Multi‐

International Conference of Machine Learning. *USA, Morgan Kaufmann*.

variate Feature Selection. *Pattern Recognition Letters*, 27, 1067-1076.

with Naïve Bayes. *Expert Systems with Applications*, 36, 5432-5435.

in Knowledge Discovery and Data Mining. *AAAI: MIT Press*.

~mlearn/MLRepository.html, Accessed 02 July 2012.

Networks. *IEEE Trans. on Neural Networks*, 8(3), 694-713.

national Academic Science. *USA*, 96, 6745-6750.

*Machine Learning Research*, 3, 1157-1182.

*Part B: Cybernetics*, 32(2), 207-212.

*sis*, 1, 131-156.

fied by Gene Expression Profiling. *Nature*, 403-503.

Prediction by Gene Expression. *Science*, 286(5439), 531-537.

Feature Selection. *Journal of Intelligence Systems*, 13(1).

*ruhe.*

2 Department of Electrical and Electronic Engineering, Khulna University of Engineering and Technology (KUET), Bangladesh

3 Department of Human and Artificial Intelligence Systems and Research and Education Program for Life Science, University of Fukui, Japan

## **References**


[13] Chen, J., Huang, H., Tian, S., & Qu, Y. (2009). Feature Selection for Text Classification with Naïve Bayes. *Expert Systems with Applications*, 36, 5432-5435.

1 Department of Electrical and Electronic Engineering, Dhaka University of Engineering and

2 Department of Electrical and Electronic Engineering, Khulna University of Engineering

3 Department of Human and Artificial Intelligence Systems and Research and Education

[1] Abraham, A., Grosan, C., & Ramos, V. (2006). Swarm Intelligence in Data Mining.

[2] Liu, H., & Lei, Tu. (2004). Toward Integrating Feature Selection Algorithms for Clas‐ sification and Clustering. *IEEE Transactions on Knowledge and Data Engineering;*, 17(4),

[3] Pudil, P., Novovicova, J., & Kittler, J. (1994). Floating Search Methods in Feature Se‐

[4] Gasca, E., Sanchez, J. S., & Alonso, R. (2006). Eliminating Redundancy and Irrele‐ vance using a New MLP-based Feature Selection Method. *Pattern Recognition*, 39,

[5] Setiono, R., & Liu, H. (1997). Neural Network Feature Selector. *IEEE Trans. on Neural*

[6] Verikas, A., & Bacauskiene, M. (2002). Feature Selection with Neural Networks. *Pat‐*

[7] Guyon, I., & Elisseeff, A. (2003). An Introduction to Variable and Feature Selection.

[8] Photo by Aksoy S. (2012). http://retina.cs.bilkent.edu.tr/papers/patrec\_tutorial1.pdf,

[9] Floreano, D., Kato, T., Marocco, D., & Sauser, E. (2004). Coevolution and Active Vi‐

[10] Dash, M., Kollipakkam, D., & Liu, H. (2006). Automatic View Selection: An Applica‐ tion to Image Mining: proceedings of the International Conference. *PAKDD*, 107-113.

[11] Robbins, K. R., Zhang, W., & Bertrand, J. K. (2008). The Ant Colony Algorithm for Feature Selection in High-Dimension Gene Expression Data for Disease Classifica‐

[12] Ooi, C. H., & Tan, P. (2003). Genetic Algorithm Applied to Multi-class Prediction for

sion and Feature Selection. *Biological Cybernetics*, 90(3), 218-228.

the Analysis of Gene Expression Data. *Bioinformatics*, 19(1), 37-44.

tion. *Journal of Mathematical Medicine and Biology*, 1-14.

Technology (DUET), Bangladesh

**References**

491-502.

313-315.

*Networks*, 8.

Accessed 02 July.

and Technology (KUET), Bangladesh

40 Ant Colony Optimization - Techniques and Applications

*Springer-Verlag Press*.

Program for Life Science, University of Fukui, Japan

lection. *Pattern Recognition Letters*, 15(11), 1119-1125.

*tern Recognition Letters*, 23, 1323-1335.

*Journal of Machine Learning Research*, 3-1157.


[29] Straceezzi, D J, & Utgoff, P E. (2004). Randomized Variable Elimination. *Journal of Machine Learning Research;*, 5, 1331-1362.

[44] Back, A D, & Trappenberg, T P. (2001). Selecting Inputs for Modeling using Normal‐ ized Higher Order Statistics and Independent Component Analysis. *IEEE Trans. Neu‐*

Ant Colony Optimization Toward Feature Selection

http://dx.doi.org/10.5772/51707

43

[45] Mao, K Z. (2002). Fast Orthogonal Forward Selection Algorithm for Feature Subset

[46] Caruana, R., & De Sa, V. (2003). Benefitting from the Variables that Variable Selection

[47] Kabir, M. M., Islam, M. M., & Murase, K. (2010). A New Wrapper Feature Selection

[48] Chakraborty, D., & Pal, N. R. (2004). A Neuro-fuzzy Scheme for Simultaneous Fea‐ ture Selection and Fuzzy Rule-based Classification. *IEEE Trans. on Neural Networks*,

[49] Aghdam, M H, Aghaee, N G, & Basiri, M E. (2009). Test Feature Selection using Ant

[50] Ani, A. (2005). Feature Subset Selection using Ant Colony Optimization. *International*

[51] Kanan, H. R., Faez, K., & Taheri, S. M. (2007). Feature Selection using Ant Colony Optimization (ACO): A New Method and Comparative Study in the Application of Face Recognition System. *Proceedings of International Conference on Data Mining*, 63-76.

[52] Khushaba, R. N., Alsukker, A., Ani, A. A., & Jumaily, A. A. (2008). Enhanced Feature Selection Algorithm using Ant Colony Optimization and Fuzzy Memberships: pro‐ ceedings of the sixth international conference on biomedical engineering. *IASTED*,

[54] Filippone, M., Masulli, F., & Rovetta, S. (2006). Supervised Cassification and Gene Se‐ lection using Simulated Annealing. *Proceedings of International Joint Conference on Neu‐*

[55] Goldberg, D E. (2004). Genetic Algorithms in Search. *Genetic Algorithms in Search, Op‐*

[56] Rumelhart, D. E., & Mc Clelland, J. (1986). Parallel Distributed Processing. *MIT Press*. [57] Reed, R. (1993). Pruning Algorithms-a Survey. *IEEE Trans. on Neural Networks*, 4(5),

[58] Girosi, F., Jones, M., & Poggio, T. (1995). Regularization Theory and Neural Net‐

[59] Kwok, T Y, & Yeung, D Y. (1997). Constructive Algorithms for Structure Learning in Feed-forward Neural Networks for Regression Problems. *IEEE Trans. on Neural Net‐*

Selection. *IEEE Trans. Neural Network*, 13(5), 1218-1224.

*Journal of Computational Intelligence*, 2, 53-58.

Discards. *Journal of Machine Learning Research*, 3, 1245-1264.

Approach using Neural Network. *Neurocomputing*, 73, 3273-3283.

Colony Optimization. *Expert Systems with Applications*, 36, 6843-6853.

[53] Dorigo, M., & Stutzle, T. (2004). Ant Colony Optimization. *MIT Press*.

*timization and Machine Learning*, Addison-Wesley Press.

works Architectures. *Neural Computation*, 7(2), 219-269.

*ral Network*, 12(3), 612-617.

15(1), 110-123.

34-39.

740-747.

*works*, 8, 630-645.

*ral Networks*, 3566-3571.


[44] Back, A D, & Trappenberg, T P. (2001). Selecting Inputs for Modeling using Normal‐ ized Higher Order Statistics and Independent Component Analysis. *IEEE Trans. Neu‐ ral Network*, 12(3), 612-617.

[29] Straceezzi, D J, & Utgoff, P E. (2004). Randomized Variable Elimination. *Journal of*

[30] Abe, S. (2005). Modified Backward Feature Selection by Cross Validation. *proceedings*

[31] Ke, L., Feng, Z., & Ren, Z. (2008). An Efficient Ant Colony Optimization Approach to Attribute Reduction in Rough Set Theory. *Pattern Recognition Letters*, 29, 1351-1357.

[32] Muni, D. P., Pal, N. R., & Das, J. (2006). Genetic Programming for Simultaneous Fea‐ ture Selection and Classifier Design. *IEEE Trans. on Systems, Man, and Cybernetics-Part*

[33] Oh, I., Lee, J., & Moon, B. (2004). Hybrid Genetic Algorithms for Feature Selection.

[34] Wang, X., Yang, J., Teng, X., Xia, W., & Jensen, R. (2007). Feature Selection based on Rough Sets and Particle Swarm Optimization. *Pattern Recognition Letters*, 28(4),

[35] Yang, J. H., & Honavar, V. (1998). Feature Subset Selection using a Genetic Algo‐

[36] Pal, N. R., & Chintalapudi, K. (1997). A Connectionist System for Feature Selection. *International Journal of Neural, Parallel and Scientific Computation*, 5, 359-361.

[37] Rakotomamonjy, A. (2003). Variable Selection using SVM-based Criteria. *Journal of*

[38] Wang, L., Zhou, N., & Chu, F. (2008). A General Wrapper Approach to Selection of Class-dependent Features. *IEEE Trans. on Neural Networks*, 19(7), 1267-1278.

[39] Chow, T W S, & Huang, D. (2005). Estimating Optimal Feature Subsets using Effi‐ cient Estimation of High-dimensional Mutual Information. *IEEE Trans. Neural Net‐*

[40] Hall, M A. (2000). Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning:. *Proceedings of 17th International Conference on Machine Learn‐*

[41] Sindhwani, V., Rakshit, S., Deodhare, D., Erdogmus, D., Principe, J. C., & Niyogi, P. (2004). Feature Selection in MLPs and SVMs based on Maximum Output Informa‐

[42] Sivagaminathan, R. K., & Ramakrishnan, S. (2007). A Hybrid Approach for Feature Subset Selection using Neural Networks and Ant Colony Optimization. *Expert Sys‐*

[43] Kambhatla, N., & Leen, T. K. (1997). Dimension Reduction by Local Principal Com‐

*IEEE Trans. on Pattern Analysis and Machine Intelligence*, 26(11), 1424-1437.

*of the European Symposium on Artificial Neural Networks*, 163-168.

*Machine Learning Research;*, 5, 1331-1362.

rithm. *IEEE Intelligent Systems*, 13(2), 44-49.

*Machine Learning Research*, 3, 1357-1370.

tion. *IEEE Trans. on Neural Networks*, 15(4), 937-948.

ponent Analysis. *Neural Computation*, 9(7), 1493-1516.

*work*, 16(1), 213-224.

*tems with Applications*, 33-49.

*ing*.

*B: Cybernetics*, 36(1), 106-117.

42 Ant Colony Optimization - Techniques and Applications

459-471.


[60] Lehtokangas, M. (2000). Modified Cascade-correlation Learning for Classification. *IEEE Transactions on Neural Networks*, 11, 795-798.

**Chapter 2**

**Provisional chapter**

**Parallel Ant Colony Optimization: Algorithmic Models**

The Ant Colony Optimization (ACO) metaheuristic [1] is a constructive population-based approach based on the social behavior of ants. As it is acknowledged as a powerful method to solve academic and industrial combinatorial optimization problems, a considerable amount of research is dedicated to improving its performance. Among the proposed solutions, we find the use of parallel computing to reduce computation time, improve solution quality or

Most parallel ACO implementations can be classified into two general approaches. The first one is the parallel execution of the ants construction phase in a single colony. Initiated by Bullnheimer *et al.* [2], it aims to accelerate computations by distributing ants to computing elements. The second one, introduced by Stützle [3], is the execution of multiple ant colonies. In this case, entire ant colonies are attributed to processors in order to speedup computations as well as to potentially improve solution quality by introducing cooperation

Recently, a more detailed classification was proposed by Pedemonte *et al.* [4]. It shows that most existing works are based on designing parallel ACO algorithms at a relatively high level of abstraction which may be suitable for conventional parallel computers. However, as research on parallel architectures is rapidly evolving, new types of hardware have recently become available for high performance computing. Among them, we find multicore processors and graphics processing units (GPU) which provide great computing power at an affordable cost but are more difficult to program. In fact, it is not clear that conventional high-level abstraction models are suitable for expressing parallelism in a way that is efficiently implementable and reproducible on these architectures. As academic and industrial combinatorial optimization problems always increase in size and complexity, the field of parallel metaheuristics has to follow this evolution of high performance computing.

> ©2012 Delisle, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2013 Delisle; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

© 2013 Delisle, licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

distribution, and reproduction in any medium, provided the original work is properly cited.

**Parallel Ant Colony Optimization: Algorithmic**

**Models and Hardware Implementations**

**and Hardware Implementations**

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

Pierre Delisle

**1. Introduction**

schemes between colonies.

both.

Pierre Delisle

http://dx.doi.org/10.5772/54252

http://dx.doi.org/10.5772/CHAPTERDOI


**Provisional chapter**

## **Parallel Ant Colony Optimization: Algorithmic Models and Hardware Implementations Models and Hardware Implementations**

**Parallel Ant Colony Optimization: Algorithmic**

Pierre Delisle Additional information is available at the end of the chapter

Pierre Delisle

[60] Lehtokangas, M. (2000). Modified Cascade-correlation Learning for Classification.

[61] Dorigo, M., Caro, G. D., & Gambardella, L. M. (1999). Ant Algorithm for Discrete Op‐

[62] Kudo, M., & Sklansky, J. (2000). Comparison of Algorithms that Select Features for

[63] Kim, K., & Cho, S. (2004). Prediction of Colon Cancer using an Evolutionary Neural

[64] Kabir, M. M., Shahjahan, M., & Murase, K. (2012). A New Hybrid Ant Colony Opti‐ mization Algorithm for Feature Selection. *Expert Systems with Applications*, 39,

*IEEE Transactions on Neural Networks*, 11, 795-798.

Pattern Classifiers. *Pattern Recognition*, 33, 25-41.

timization. *Artificial Life*, 5(2), 137-172.

44 Ant Colony Optimization - Techniques and Applications

Network. *Neurocomputing*, 61, 61-379.

3747-3763.

Additional information is available at the end of the chapter http://dx.doi.org/10.5772/CHAPTERDOI

http://dx.doi.org/10.5772/54252

## **1. Introduction**

The Ant Colony Optimization (ACO) metaheuristic [1] is a constructive population-based approach based on the social behavior of ants. As it is acknowledged as a powerful method to solve academic and industrial combinatorial optimization problems, a considerable amount of research is dedicated to improving its performance. Among the proposed solutions, we find the use of parallel computing to reduce computation time, improve solution quality or both.

Most parallel ACO implementations can be classified into two general approaches. The first one is the parallel execution of the ants construction phase in a single colony. Initiated by Bullnheimer *et al.* [2], it aims to accelerate computations by distributing ants to computing elements. The second one, introduced by Stützle [3], is the execution of multiple ant colonies. In this case, entire ant colonies are attributed to processors in order to speedup computations as well as to potentially improve solution quality by introducing cooperation schemes between colonies.

Recently, a more detailed classification was proposed by Pedemonte *et al.* [4]. It shows that most existing works are based on designing parallel ACO algorithms at a relatively high level of abstraction which may be suitable for conventional parallel computers. However, as research on parallel architectures is rapidly evolving, new types of hardware have recently become available for high performance computing. Among them, we find multicore processors and graphics processing units (GPU) which provide great computing power at an affordable cost but are more difficult to program. In fact, it is not clear that conventional high-level abstraction models are suitable for expressing parallelism in a way that is efficiently implementable and reproducible on these architectures. As academic and industrial combinatorial optimization problems always increase in size and complexity, the field of parallel metaheuristics has to follow this evolution of high performance computing.

Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2013 Delisle; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2013 Delisle, licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

©2012 Delisle, licensee InTech. This is an open access chapter distributed under the terms of the Creative

The main purpose of this chapter is to complement existing parallel ACO models with a computational design that relates more closely to high performance computing architectures. Emerging from several years of work by the authors on the parallelization of ACO in various computing environments including clusters, symmetric multiprocessors (SMP), multicore processors and graphics processing units (GPU) [5–10], it is based on the concepts of computing entities and memory structures. It provides a conceptual vision of parallel ACO that we believe more balanced between theory and practice. We revisit the existing literature and present various implementations from this viewpoint. Extensive experimental results are presented to validate the proposed approaches across a broad range of computing environments. Key algorithmic, technical and programming issues are also addressed in this context.

http://dx.doi.org/10.5772/CHAPTERDOI

http://dx.doi.org/10.5772/54252

47

It was introduced by Stützle [3] with the parallel execution of multiple independent copies of the same algorithm. Middendorf *et al.* [16] extended this approach by introducing four information exchange strategies between ant colonies: exchange of globally best solution, circular exchange of locally best solutions, migrants or locally best solutions plus migrants. It is shown that it can be advantageous for ant colonies to avoid communicating too much information and too often. Giving up on the idea of sharing whole pheromone information,

Parallel Ant Colony Optimization: Algorithmic Models and Hardware Implementations

Chu *et al.* [17], Manfrin *et al.* [18], Ellabib *et al.* [19] and Alba *et al.* [20] have also proposed different information exchange strategies for the multiple ant colony approach. Many parameters are studied like the topology of the links between processors as well as the nature and frequency of information exchanges. These strategies are implemented using MPI on distributed memory architectures. On the other hand, Delisle *et al.* [8] adapted some

Even though they mostly follow the parallel ants and multiple ant colonies approaches, hardware-oriented approaches are dedicated to specific and untraditional parallel architectures. Scheuermann *et al.* [21, 22] designed parallel implementations of ACO on Field Programmable Gate Arrays (FPGA). Considerable changes to the algorithmic structure

Few authors have tackled the problem of parallelizing ACO on GPU in the form of preliminary work. Catala *et al.* [23] propose an implementation of ACO to solve the Orienteering Problem. Instances of up to a few thousand nodes are solved by building solutions on GPU. Wang *et al.* [24] propose an implementation of the MMAS where the tour construction phase is executed on a GPU to solve a 30 city TSP. Similar implementations are reported by You [25], Zhu and Curry [26], Li *et al.* [27], Cecilia *et al.* [28] and Delévacq *et al.* [9] . Following these works, Delévacq *et al.* [10] have proposed various parallelization strategies for ACO on GPU as well as a comparative study to show the influence of various

Finally, concerning grid applications, Weis and Lewis [29] implemented an ACO algorithm on an ad-hoc grid for the design of a radio frequency antenna structure. Mocholi *et al.* [30] also proposed a medium grain master-slave algorithm to solve the Orienteering Problem. In addition to a complete survey, Pedemonte *et al.* [4] proposed a taxonomy for Parallel ACO which is illustrated in Fig. 1. Although it provides a comprehensive view of the field, its relatively high level of abstraction does not capture some important features that are crucial for obtaining efficient implementations on modern high performance computing

The present work does not seek to replace this taxonomy but rather provides a conceptual view of parallel ACO that relates more closely to real parallel architectures. By bringing together the high-level concepts of parallel ACO and the lower-level parallel computing models, it aims to serve as a methodological framework for the design of efficient ACO

they based their strategy on the trade of a single solution at each exchange step.

of the metaheuristic were needed to take benefit of this particular architecture.

of them on shared-memory architectures.

**2.3. Hardware-oriented parallel ACO**

parameters on search efficiency.

architectures.

implementations.

## **2. Literature review on Parallel Ant Colony Optimization**

During the past 20 years, the ACO metaheuristic has improved significantly to become one of the most effective combinatorial optimization methods. For about a decade, following this trend, a number of parallelization techniques have been proposed to further enhance its search process. Works on traditional CPU-based parallel ACO can be classified into two general approaches: *parallel ants* and *multiple ant colonies*. These approaches are briefly explained in Sections 2.1 and 2.2. On the other hand, few authors have proposed parallel implementations dedicated to specific architectures. Section 2.3 is dedicated to these *hardware-oriented* approaches. In all cases, a survey of related works is also provided.

## **2.1. Parallel ants**

Works related to the parallel ants approach, which aims to execute the ants tour construction phase on many processing elements, were initiated by Bullnheimer *et al.* [2]. They proposed two parallelization strategies for the Ant System on a message passing and distributed-memory architecture. The first one is a low-level and synchronous strategy that aims to accelerate computations by distributing ants to processors in a master-slave fashion. At each iteration, the master broadcasts the pheromone structure to slaves, which then compute their tours in parallel and send them back to the master. The time needed for these global communications and synchronizations implies a considerable overhead. The second strategy aims to reduce it by letting the algorithm perform a given number of iterations without exchanging information. The authors conclude that this partially asynchronous strategy is preferable due to the considerable reduction of the communication overhead.

The works of Talbi *et al.* [11], Randall and Lewis [12], Islam *et al.* [13], Craus and Rudeanu [14], Stützle [3] and Doerner *et al.* [15] are based on a similar parallelization approach and a distributed memory architecture. Delisle *et al.* [5, 6] implemented this scheme on shared-memory architectures like SMP computers and multi-core processors. They also compared performance between the two types of architectures [7].

## **2.2. Multiple ant colonies**

The multiple ant colonies approach, also based on a message-passing and distributed memory architecture, aims to execute whole ant colonies on available processing elements. It was introduced by Stützle [3] with the parallel execution of multiple independent copies of the same algorithm. Middendorf *et al.* [16] extended this approach by introducing four information exchange strategies between ant colonies: exchange of globally best solution, circular exchange of locally best solutions, migrants or locally best solutions plus migrants. It is shown that it can be advantageous for ant colonies to avoid communicating too much information and too often. Giving up on the idea of sharing whole pheromone information, they based their strategy on the trade of a single solution at each exchange step.

Chu *et al.* [17], Manfrin *et al.* [18], Ellabib *et al.* [19] and Alba *et al.* [20] have also proposed different information exchange strategies for the multiple ant colony approach. Many parameters are studied like the topology of the links between processors as well as the nature and frequency of information exchanges. These strategies are implemented using MPI on distributed memory architectures. On the other hand, Delisle *et al.* [8] adapted some of them on shared-memory architectures.

## **2.3. Hardware-oriented parallel ACO**

2 And Colony Optimization

context.

**2.1. Parallel ants**

**2.2. Multiple ant colonies**

The main purpose of this chapter is to complement existing parallel ACO models with a computational design that relates more closely to high performance computing architectures. Emerging from several years of work by the authors on the parallelization of ACO in various computing environments including clusters, symmetric multiprocessors (SMP), multicore processors and graphics processing units (GPU) [5–10], it is based on the concepts of computing entities and memory structures. It provides a conceptual vision of parallel ACO that we believe more balanced between theory and practice. We revisit the existing literature and present various implementations from this viewpoint. Extensive experimental results are presented to validate the proposed approaches across a broad range of computing environments. Key algorithmic, technical and programming issues are also addressed in this

During the past 20 years, the ACO metaheuristic has improved significantly to become one of the most effective combinatorial optimization methods. For about a decade, following this trend, a number of parallelization techniques have been proposed to further enhance its search process. Works on traditional CPU-based parallel ACO can be classified into two general approaches: *parallel ants* and *multiple ant colonies*. These approaches are briefly explained in Sections 2.1 and 2.2. On the other hand, few authors have proposed parallel implementations dedicated to specific architectures. Section 2.3 is dedicated to these *hardware-oriented* approaches. In all cases, a survey of related works is also provided.

Works related to the parallel ants approach, which aims to execute the ants tour construction phase on many processing elements, were initiated by Bullnheimer *et al.* [2]. They proposed two parallelization strategies for the Ant System on a message passing and distributed-memory architecture. The first one is a low-level and synchronous strategy that aims to accelerate computations by distributing ants to processors in a master-slave fashion. At each iteration, the master broadcasts the pheromone structure to slaves, which then compute their tours in parallel and send them back to the master. The time needed for these global communications and synchronizations implies a considerable overhead. The second strategy aims to reduce it by letting the algorithm perform a given number of iterations without exchanging information. The authors conclude that this partially asynchronous strategy is preferable due to the considerable reduction of the communication overhead.

The works of Talbi *et al.* [11], Randall and Lewis [12], Islam *et al.* [13], Craus and Rudeanu [14], Stützle [3] and Doerner *et al.* [15] are based on a similar parallelization approach and a distributed memory architecture. Delisle *et al.* [5, 6] implemented this scheme on shared-memory architectures like SMP computers and multi-core processors. They also

The multiple ant colonies approach, also based on a message-passing and distributed memory architecture, aims to execute whole ant colonies on available processing elements.

compared performance between the two types of architectures [7].

**2. Literature review on Parallel Ant Colony Optimization**

Even though they mostly follow the parallel ants and multiple ant colonies approaches, hardware-oriented approaches are dedicated to specific and untraditional parallel architectures. Scheuermann *et al.* [21, 22] designed parallel implementations of ACO on Field Programmable Gate Arrays (FPGA). Considerable changes to the algorithmic structure of the metaheuristic were needed to take benefit of this particular architecture.

Few authors have tackled the problem of parallelizing ACO on GPU in the form of preliminary work. Catala *et al.* [23] propose an implementation of ACO to solve the Orienteering Problem. Instances of up to a few thousand nodes are solved by building solutions on GPU. Wang *et al.* [24] propose an implementation of the MMAS where the tour construction phase is executed on a GPU to solve a 30 city TSP. Similar implementations are reported by You [25], Zhu and Curry [26], Li *et al.* [27], Cecilia *et al.* [28] and Delévacq *et al.* [9] . Following these works, Delévacq *et al.* [10] have proposed various parallelization strategies for ACO on GPU as well as a comparative study to show the influence of various parameters on search efficiency.

Finally, concerning grid applications, Weis and Lewis [29] implemented an ACO algorithm on an ad-hoc grid for the design of a radio frequency antenna structure. Mocholi *et al.* [30] also proposed a medium grain master-slave algorithm to solve the Orienteering Problem.

In addition to a complete survey, Pedemonte *et al.* [4] proposed a taxonomy for Parallel ACO which is illustrated in Fig. 1. Although it provides a comprehensive view of the field, its relatively high level of abstraction does not capture some important features that are crucial for obtaining efficient implementations on modern high performance computing architectures.

The present work does not seek to replace this taxonomy but rather provides a conceptual view of parallel ACO that relates more closely to real parallel architectures. By bringing together the high-level concepts of parallel ACO and the lower-level parallel computing models, it aims to serve as a methodological framework for the design of efficient ACO implementations.

http://dx.doi.org/10.5772/CHAPTERDOI

http://dx.doi.org/10.5772/54252

49

Grids may be considered as pools of heterogeneous and dynamic computing resources geographically distributed across multiple administrative domains and owned by different organizations ([32]). These resources are usually high performance computing platforms connected with a dedicated high-speed network or workstations linked by a nondedicated network such as the Internet. In such volatile systems, security, fault tolerance and resource discovery are important issues to address. Fortunately, middleware usually frees the grid

Parallel Ant Colony Optimization: Algorithmic Models and Hardware Implementations

Finally, graphics processing units (GPUs) are devices that are used in computers to manipulate computer graphics. As GPU technology has evolved drastically in the last few years, it has been increasingly used to accelerate general-purpose scientific and engineering applications. As shown in Figure 3, the conventional NVIDIA GPU [33] includes many multiprocessors and processors which execute multiple coordinated threads. Several memories are distinguished on this special hardware, differing in size, latency and access

Considering the variety of architectures currently available in the world of high performance computing, the successful design and implementation of a parallel ACO algorithm on one platform or another may be a significant challenge. Moreover, most computers fall into many categories: a computational cluster may be composed of many distributed nodes which include multicore processors and GPUs. The challenge then becomes two fold: identifying a suitable combination of parallel strategies and implementing it on the target system. In order to make this process simpler, we propose a taxonomy for parallel ACO which takes implementation details into account. It distinguishes three criteria: the ACO granularity level, the "computational entity" associated to that level and the memory structure available

The decomposition of an ACO algorithm into tasks to be executed by different processors may be performed according to several granularities. One of the main goals of the parallelization process is to find an equitable compromise between the number of tasks and the cost associated to the management of these tasks. Based on the algorithmic structure of

application programmer from much of these issues.

type.

**Figure 3.** NVIDIA GPU architecture [33].

**3.1. ACO granularity level**

at that level.

**Figure 1.** Taxonomy for parallel ACO [4].

## **3. A new architecture-oriented taxonomy for parallel ACO**

The efficient implementation of a parallel metaheuristic in optimization software generally requires the consideration of the underlying architecture. Inspired by Talbi [31], we distinguish the following main parallel architectures: clusters/networks of workstations, symmetric multiprocessors / multicore processors, grids and graphics processing units.

Clusters and Networks of Workstations (COWs/NOWs) are distributed-memory architectures where each processor has its own memory (Fig. 2(a)). Information exchanges between processors require explicit message passing which implies programming efforts and communication costs. NOWs may be seen as an heterogeneous group of computers whereas COWs are homogeneous, unified computing devices.

**Figure 2.** Shared-memory and distributed-memory parallel architectures [31].

Symmetric multiprocessors (SMPs) and multicore processors are shared-memory architectures where the processors are connected to a common memory (Fig. 2(b)). Information exchanges between processors are facilitated by the single address space but synchronizations still have to be managed. SMPs consist of many processors that are linked to a bus network and multicore processors contain many processors on a single chip.

Grids may be considered as pools of heterogeneous and dynamic computing resources geographically distributed across multiple administrative domains and owned by different organizations ([32]). These resources are usually high performance computing platforms connected with a dedicated high-speed network or workstations linked by a nondedicated network such as the Internet. In such volatile systems, security, fault tolerance and resource discovery are important issues to address. Fortunately, middleware usually frees the grid application programmer from much of these issues.

Finally, graphics processing units (GPUs) are devices that are used in computers to manipulate computer graphics. As GPU technology has evolved drastically in the last few years, it has been increasingly used to accelerate general-purpose scientific and engineering applications. As shown in Figure 3, the conventional NVIDIA GPU [33] includes many multiprocessors and processors which execute multiple coordinated threads. Several memories are distinguished on this special hardware, differing in size, latency and access type.

**Figure 3.** NVIDIA GPU architecture [33].

4 And Colony Optimization

**Figure 1.** Taxonomy for parallel ACO [4].

**3. A new architecture-oriented taxonomy for parallel ACO**

COWs are homogeneous, unified computing devices.

**Figure 2.** Shared-memory and distributed-memory parallel architectures [31].

The efficient implementation of a parallel metaheuristic in optimization software generally requires the consideration of the underlying architecture. Inspired by Talbi [31], we distinguish the following main parallel architectures: clusters/networks of workstations, symmetric multiprocessors / multicore processors, grids and graphics processing units.

Clusters and Networks of Workstations (COWs/NOWs) are distributed-memory architectures where each processor has its own memory (Fig. 2(a)). Information exchanges between processors require explicit message passing which implies programming efforts and communication costs. NOWs may be seen as an heterogeneous group of computers whereas

Symmetric multiprocessors (SMPs) and multicore processors are shared-memory architectures where the processors are connected to a common memory (Fig. 2(b)). Information exchanges between processors are facilitated by the single address space but synchronizations still have to be managed. SMPs consist of many processors that are linked to a bus network and multicore processors contain many processors on a single chip.

Considering the variety of architectures currently available in the world of high performance computing, the successful design and implementation of a parallel ACO algorithm on one platform or another may be a significant challenge. Moreover, most computers fall into many categories: a computational cluster may be composed of many distributed nodes which include multicore processors and GPUs. The challenge then becomes two fold: identifying a suitable combination of parallel strategies and implementing it on the target system. In order to make this process simpler, we propose a taxonomy for parallel ACO which takes implementation details into account. It distinguishes three criteria: the ACO granularity level, the "computational entity" associated to that level and the memory structure available at that level.

## **3.1. ACO granularity level**

The decomposition of an ACO algorithm into tasks to be executed by different processors may be performed according to several granularities. One of the main goals of the parallelization process is to find an equitable compromise between the number of tasks and the cost associated to the management of these tasks. Based on the algorithmic structure of ACO, the proposed classification distinguishes four granularity levels from coarsest to finest: colony, iteration, ant and solution element.

http://dx.doi.org/10.5772/CHAPTERDOI

http://dx.doi.org/10.5772/54252

51

to be implemented according to at least a part of this hierarchy. The proposed classification distinguishes each level of this hierarchy from the parallel programming perspective. This translates into the definition of five computational entities: system, node, process, block and

Parallel Ant Colony Optimization: Algorithmic Models and Hardware Implementations

A **system** defines a parallel computer as a unified computational resource which may be a standard workstation or a cluster. A distinction is made between these single systems and

A **node** is a discernable part of a system to which tasks can be assigned. A system may then be composed of a single node which is the case of the standard workstation, or of multiple

A **process** is a computational entity that manages and executes sequential and parallel programs. As this concept refers to the typical process in operating systems, it can hold one or many threads which may be grouped together or not. When a process executes only sequential code, it is considered as the smallest indivisible entity of an implementation.

A **block** is an intermediate entity between process and thread. This notion comes from the field of GPU computing in which a block is composed of many threads. The standard processor may be seen as a particular case where a single block is executed. A sequential processor then holds one block and one thread whereas a multicore processor holds one

Finally, a **thread** is a sequential flow of instructions that is part of a block. It represents an indivisible entity and the smallest one in the model: it is always sequential and executes instructions on a processor at a given time. Therefore, even though in practice there may be more threads than processors (some threads will be executed while some others will be idle), in this model we consider that these threads may be merged into a smaller number of

Complementary to the notion of computational entity, we add the concept of memory that

Memory is an important aspect of ACO algorithms. It serves as a container for pheromone information, problem data and various parameters. It also serves as a channel for information exchange in many parallel implementations. Therefore, as accessibility and access speed will have a significant impact on the feasibility and performance of the parallel implementation,

**Local** memory refers to a memory space that is directly accessible by the computational entities of a given level and fast in access time relatively to this particular level. For example, the shared memory of one multiprocessor of a GPU (see Figure 3) is considered as local memory for all the threads that are executed by a block on this multiprocessor. The registers of a processor could also be considered as local memory if they were managed directly,

**Global** memory is a memory space that can also be accessed directly by the computational entities of a given level, but relatively slow in access time. For example, the device memory of

threads corresponding to the number of available processors.

three categories are distinguished: local, global and remote.

although it is usually not the case.

may be relevant to all five levels previously defined.

thread.

grids which are considered multiple systems.

nodes which is the case of clusters.

block and several threads.

**3.3. Memory**

Parallelization at the **colony** level consists in defining the execution of a whole ACO algorithm as a task and assigning it to a processor. The multiple independent colonies and the multiple cooperating colonies approaches, as defined respectively by Stützle [3] and Middendorf *et al.* [16], may be associated to this level. A single colony is typically assigned to a processor but it is possible to assign many with some form of scheduling. At this level, the main factors to consider in the parallelization process are the homogeneity of the colonies as well as their interactions.

Depending on design choices, parallelization at the **iteration** level may be considered as a particular case of either the colony level or the ant level parallelizations. In fact, it may be seen as a hybrid between these two levels instead of a full level. The idea is then to share the iterations of the algorithm between available processors. A first way to implement this strategy is to divide the ants of a single colony into groups and to let each group evolve independently during the algorithm. A second way is to let these groups share their pheromone information after a given number of iterations in a way similar to the partially asynchronous implementation of Bullnheimer *et al.* [2]. At this level, the way the iterations are coordinated between groups will effect the global parallel performance.

Parallelization at the **ant** level implies the distribution of the tasks included in an iteration to available processors. It is mainly the ants construction phase but also operations associated to pheromone update and solution management. This level is related to the typical parallel ants strategy where one or many ants are assigned to each processing element. In that case, special care must be taken to ensure that pheromone updates and general management operations like the identification and update of the best ant do not significantly degrade the performance of the implementation.

Until a few years ago, parallelization at the ant level was generally the finest granularity considered for most optimization problems. However, the emergence of massively parallel architectures like the GPU have resulted in the need for finer approaches. At the **solution element** level, the main operations that are considered for parallelization are the state transition rule and solution evaluation. In the first case, one possible strategy is to evaluate several candidates in parallel to speedup the choice of the next move by an ant. In the second case, the evaluation of the objective function of a particular ant is decomposed among several processors.

The approach proposed in this section sought to determine a parallelization framework taking into account both the main ACO components and the multiple possible granularities. In the next section, it is augmented by considering the underlying computational architecture.

## **3.2. Computational entity**

Nowadays, the typical high performance parallel computer is composed of a hierarchy of several different architectures. For example, it is common to find a computational cluster with multiple distributed SMP nodes, each one of them being composed of multicore processors and GPU cards. Moreover, this type of machine is often found in computational grids. In order to obtain the best possible performance on these platforms, an algorithm has to be implemented according to at least a part of this hierarchy. The proposed classification distinguishes each level of this hierarchy from the parallel programming perspective. This translates into the definition of five computational entities: system, node, process, block and thread.

A **system** defines a parallel computer as a unified computational resource which may be a standard workstation or a cluster. A distinction is made between these single systems and grids which are considered multiple systems.

A **node** is a discernable part of a system to which tasks can be assigned. A system may then be composed of a single node which is the case of the standard workstation, or of multiple nodes which is the case of clusters.

A **process** is a computational entity that manages and executes sequential and parallel programs. As this concept refers to the typical process in operating systems, it can hold one or many threads which may be grouped together or not. When a process executes only sequential code, it is considered as the smallest indivisible entity of an implementation.

A **block** is an intermediate entity between process and thread. This notion comes from the field of GPU computing in which a block is composed of many threads. The standard processor may be seen as a particular case where a single block is executed. A sequential processor then holds one block and one thread whereas a multicore processor holds one block and several threads.

Finally, a **thread** is a sequential flow of instructions that is part of a block. It represents an indivisible entity and the smallest one in the model: it is always sequential and executes instructions on a processor at a given time. Therefore, even though in practice there may be more threads than processors (some threads will be executed while some others will be idle), in this model we consider that these threads may be merged into a smaller number of threads corresponding to the number of available processors.

Complementary to the notion of computational entity, we add the concept of memory that may be relevant to all five levels previously defined.

## **3.3. Memory**

6 And Colony Optimization

colony, iteration, ant and solution element.

as well as their interactions.

performance of the implementation.

processors.

architecture.

**3.2. Computational entity**

ACO, the proposed classification distinguishes four granularity levels from coarsest to finest:

Parallelization at the **colony** level consists in defining the execution of a whole ACO algorithm as a task and assigning it to a processor. The multiple independent colonies and the multiple cooperating colonies approaches, as defined respectively by Stützle [3] and Middendorf *et al.* [16], may be associated to this level. A single colony is typically assigned to a processor but it is possible to assign many with some form of scheduling. At this level, the main factors to consider in the parallelization process are the homogeneity of the colonies

Depending on design choices, parallelization at the **iteration** level may be considered as a particular case of either the colony level or the ant level parallelizations. In fact, it may be seen as a hybrid between these two levels instead of a full level. The idea is then to share the iterations of the algorithm between available processors. A first way to implement this strategy is to divide the ants of a single colony into groups and to let each group evolve independently during the algorithm. A second way is to let these groups share their pheromone information after a given number of iterations in a way similar to the partially asynchronous implementation of Bullnheimer *et al.* [2]. At this level, the way the iterations

Parallelization at the **ant** level implies the distribution of the tasks included in an iteration to available processors. It is mainly the ants construction phase but also operations associated to pheromone update and solution management. This level is related to the typical parallel ants strategy where one or many ants are assigned to each processing element. In that case, special care must be taken to ensure that pheromone updates and general management operations like the identification and update of the best ant do not significantly degrade the

Until a few years ago, parallelization at the ant level was generally the finest granularity considered for most optimization problems. However, the emergence of massively parallel architectures like the GPU have resulted in the need for finer approaches. At the **solution element** level, the main operations that are considered for parallelization are the state transition rule and solution evaluation. In the first case, one possible strategy is to evaluate several candidates in parallel to speedup the choice of the next move by an ant. In the second case, the evaluation of the objective function of a particular ant is decomposed among several

The approach proposed in this section sought to determine a parallelization framework taking into account both the main ACO components and the multiple possible granularities. In the next section, it is augmented by considering the underlying computational

Nowadays, the typical high performance parallel computer is composed of a hierarchy of several different architectures. For example, it is common to find a computational cluster with multiple distributed SMP nodes, each one of them being composed of multicore processors and GPU cards. Moreover, this type of machine is often found in computational grids. In order to obtain the best possible performance on these platforms, an algorithm has

are coordinated between groups will effect the global parallel performance.

Memory is an important aspect of ACO algorithms. It serves as a container for pheromone information, problem data and various parameters. It also serves as a channel for information exchange in many parallel implementations. Therefore, as accessibility and access speed will have a significant impact on the feasibility and performance of the parallel implementation, three categories are distinguished: local, global and remote.

**Local** memory refers to a memory space that is directly accessible by the computational entities of a given level and fast in access time relatively to this particular level. For example, the shared memory of one multiprocessor of a GPU (see Figure 3) is considered as local memory for all the threads that are executed by a block on this multiprocessor. The registers of a processor could also be considered as local memory if they were managed directly, although it is usually not the case.

**Global** memory is a memory space that can also be accessed directly by the computational entities of a given level, but relatively slow in access time. For example, the device memory of a GPU is considered as global memory for the threads of a given block. The shared memory of a SMP node is also considered as global memory for the processors or cores of that node.

http://dx.doi.org/10.5772/CHAPTERDOI

http://dx.doi.org/10.5772/54252

53

The proposed implementation is defined assuming a shared-memory model based on threads in which algorithm execution begins with a single thread called the master thread and executed sequentially. To execute a part of the algorithm in parallel, a parallel region is defined where many threads are created, each one of them executing that part of the algorithm concurrently. All threads have access to the whole shared memory, but we can define private data, which is data that will be accessible only by a single thread. Inside a parallel region, we can define a parallel loop, which is a loop where cycles are divided among existing threads in a work-sharing manner. To manage synchronizations between threads, some form of explicit control must be used. A barrier, as the name implies, is a point in the execution of the algorithm beyond which no thread may execute until all threads have reached that point. Also, a critical region is a part of a parallel region which can be executed only by one thread at a time. It is usually used to avoid concurrent writes to shared data. We

Parallel Ant Colony Optimization: Algorithmic Models and Hardware Implementations

Two versions of the multicolony strategy are proposed which are related to the author's previous work ([6, 8]). The first one, related to parallel independent runs as defined by Stützle [3], implies multiple threads each executing their own copy of the sequential metaheuristic. For the second strategy, we let the colonies cooperate by using a common global best known solution in the shared memory. In both cases, ants are executed in parallel by many nested

In the first implementation, search processes are independent. There are as many copies of data structures as there are colonies. In particular, even if they all reside in the shared memory, pheromone structures are private and exclusive to each thread. ACO parameters are also private, which means that they could be different even if it will not be experimented in this study. In a theoretical context, this kind of parallelization should imply minimal communication and synchronization overheads, hence maximal efficiency. However, this is not the case in a practical context. Even if the data structures are private, colonies need to simultaneously access them through common system resources. At this point, it is up to the

Parallelizing ACO in multiple search processes is quite simple: we only need to create a parallel region at the beginning of the sequential algorithm. This way, we can create as many threads as we have colonies. A memory location dedicated to store the global best solution known by all processors is reserved in the shared memory and is accessible by all threads. At the end of the parallel region, a critical section lets each thread verify if the best solution it has found qualifies for replacing the global best one and update the data structure accordingly. The best solution of the parallel independent runs can then be identified after the parallel

To illustrate the scheme of multiple interacting colonies in a shared-memory model, the simple case of a common best global solution located in the shared memory is implemented. This relates to the first strategy defined by Middendorf [16], that is, exchange of the globally best solution. The exchange rule of this strategy implies that in each information exchange step, the globally best known solution is broadcast to all colonies where it becomes the locally

In a shared-memory context, there is no such thing as an explicit broadcast communication step. It is replaced by the use of the global best solution as a dedicated structure in the shared memory. However, it is now used differently and more frequently. At each information

best solution. Information exchanges are performed at each given number of cycles.

can now describe the shared-memory parallelization strategy for ACO.

computer system to efficiently manage this concurrency.

region as the result of the parallel algorithm.

threads.

**Remote** memory is a memory space that can not be directly accessed by the entities, but for which the information can be made available by an explicit operation between entities. Obviously, remote memory access is considered to be slower than global memory access. For example, the memory available to a processor located in a specific node of a cluster will be considered as remote for the processors on other nodes.

Table 1 summarizes the proposed taxonomy. According to it, designing a parallel ACO implementation implies to link a computational entity and a memory structure to each ACO granularity level. In the next section, two case studies, extracted from the author's previous works, are proposed and expressed according to this taxonomy. In each case, the parallelization strategy and experimental results are synthesized and discussed in order to illustrate various features of the classification.


**Table 1.** Architecture-based taxonomy for parallel ACO.

## **4. Case studies**

Two case studies are presented to illustrate how the proposed framework relates to real implementations. In order to cover the two main general parallelization strategies for ACO, both parallel ants and multicolony approaches are proposed. In the first case, SMP and muticore processors are considered as underlying architectures. In the second case, a GPU is used as a coprocessor of a sequential processor. This section is then concluded with a more general discussion about how this taxonomy applies to most other combinations of ACO algorithms and parallel architectures.

## **4.1. Multi-Colony parallel ACO on a SMP and multicore architecture**

This approach deals with the management of multiple colonies which use a global shared memory to exchange information. The whole algorithm executes on a single system and a single node so there is no parallelism at these levels. The colonies are executed in parallel and spawn multiple parallel ants. Therefore, colonies are associated to processes and ants to threads. At the programming level, this can be implemented either with multiple operating system processes and multiple threads or with multiple nested threads. In this implementation, we choose the latter as the available SMP node supports nested threads with a shared memory available to all processors. Therefore, this implementation is defined as *COLONYglobal process*-*ITERAT IONglobal process*-*ANTglobal thread* . There is no additionnal parallelism at the solution element level so it is not specified here.

The proposed implementation is defined assuming a shared-memory model based on threads in which algorithm execution begins with a single thread called the master thread and executed sequentially. To execute a part of the algorithm in parallel, a parallel region is defined where many threads are created, each one of them executing that part of the algorithm concurrently. All threads have access to the whole shared memory, but we can define private data, which is data that will be accessible only by a single thread. Inside a parallel region, we can define a parallel loop, which is a loop where cycles are divided among existing threads in a work-sharing manner. To manage synchronizations between threads, some form of explicit control must be used. A barrier, as the name implies, is a point in the execution of the algorithm beyond which no thread may execute until all threads have reached that point. Also, a critical region is a part of a parallel region which can be executed only by one thread at a time. It is usually used to avoid concurrent writes to shared data. We can now describe the shared-memory parallelization strategy for ACO.

8 And Colony Optimization

a GPU is considered as global memory for the threads of a given block. The shared memory of a SMP node is also considered as global memory for the processors or cores of that node. **Remote** memory is a memory space that can not be directly accessed by the entities, but for which the information can be made available by an explicit operation between entities. Obviously, remote memory access is considered to be slower than global memory access. For example, the memory available to a processor located in a specific node of a cluster will be

Table 1 summarizes the proposed taxonomy. According to it, designing a parallel ACO implementation implies to link a computational entity and a memory structure to each ACO granularity level. In the next section, two case studies, extracted from the author's previous works, are proposed and expressed according to this taxonomy. In each case, the parallelization strategy and experimental results are synthesized and discussed in order to

> **ACO granularity Computational entity Memory** Colony System Local Iteration Node Global Ant Process Remote

Two case studies are presented to illustrate how the proposed framework relates to real implementations. In order to cover the two main general parallelization strategies for ACO, both parallel ants and multicolony approaches are proposed. In the first case, SMP and muticore processors are considered as underlying architectures. In the second case, a GPU is used as a coprocessor of a sequential processor. This section is then concluded with a more general discussion about how this taxonomy applies to most other combinations of ACO

This approach deals with the management of multiple colonies which use a global shared memory to exchange information. The whole algorithm executes on a single system and a single node so there is no parallelism at these levels. The colonies are executed in parallel and spawn multiple parallel ants. Therefore, colonies are associated to processes and ants to threads. At the programming level, this can be implemented either with multiple operating system processes and multiple threads or with multiple nested threads. In this implementation, we choose the latter as the available SMP node supports nested threads with a shared memory available to all processors. Therefore, this implementation is defined

*thread* . There is no additionnal parallelism at the

**4.1. Multi-Colony parallel ACO on a SMP and multicore architecture**

*process*-*ANTglobal*

Thread

Solution element Block

considered as remote for the processors on other nodes.

illustrate various features of the classification.

**Table 1.** Architecture-based taxonomy for parallel ACO.

algorithms and parallel architectures.

*process*-*ITERAT IONglobal*

solution element level so it is not specified here.

**4. Case studies**

as *COLONYglobal*

Two versions of the multicolony strategy are proposed which are related to the author's previous work ([6, 8]). The first one, related to parallel independent runs as defined by Stützle [3], implies multiple threads each executing their own copy of the sequential metaheuristic. For the second strategy, we let the colonies cooperate by using a common global best known solution in the shared memory. In both cases, ants are executed in parallel by many nested threads.

In the first implementation, search processes are independent. There are as many copies of data structures as there are colonies. In particular, even if they all reside in the shared memory, pheromone structures are private and exclusive to each thread. ACO parameters are also private, which means that they could be different even if it will not be experimented in this study. In a theoretical context, this kind of parallelization should imply minimal communication and synchronization overheads, hence maximal efficiency. However, this is not the case in a practical context. Even if the data structures are private, colonies need to simultaneously access them through common system resources. At this point, it is up to the computer system to efficiently manage this concurrency.

Parallelizing ACO in multiple search processes is quite simple: we only need to create a parallel region at the beginning of the sequential algorithm. This way, we can create as many threads as we have colonies. A memory location dedicated to store the global best solution known by all processors is reserved in the shared memory and is accessible by all threads. At the end of the parallel region, a critical section lets each thread verify if the best solution it has found qualifies for replacing the global best one and update the data structure accordingly. The best solution of the parallel independent runs can then be identified after the parallel region as the result of the parallel algorithm.

To illustrate the scheme of multiple interacting colonies in a shared-memory model, the simple case of a common best global solution located in the shared memory is implemented. This relates to the first strategy defined by Middendorf [16], that is, exchange of the globally best solution. The exchange rule of this strategy implies that in each information exchange step, the globally best known solution is broadcast to all colonies where it becomes the locally best solution. Information exchanges are performed at each given number of cycles.

In a shared-memory context, there is no such thing as an explicit broadcast communication step. It is replaced by the use of the global best solution as a dedicated structure in the shared memory. However, it is now used differently and more frequently. At each information exchange step, each thread compare its local value of the best solution with the global best solution. If it has lower cost, it then becomes the new global best known solution. The use of a critical region lets threads do their comparison without risking concurrent writes to the data structure. At this point, the new global best known solution is used by all colonies for the upcoming pheromone update. Since all threads need to have done their comparisons for the new global best solution to be effectively known globally, a synchronization barrier needs to be placed before the pheromone update procedure.

http://dx.doi.org/10.5772/CHAPTERDOI

http://dx.doi.org/10.5772/54252

55

**Problem Nb. of Speedup Avg. tour Best tour Closeness cores length length**

> 1 - 8,824 8,810 99.80 2 1.98 8,823 8,806 99.81 4 3.69 8,820 8,815 99.84 8 5.93 8,829 8,822 99.74

Parallel Ant Colony Optimization: Algorithmic Models and Hardware Implementations

1 - 80,511 80,466 99.92 2 1.97 80,573 80,466 99.85 4 4.00 80,508 80,477 99.93 8 6.92 80,501 80,463 99.94

1 - 23,365,444 23,353,738 99.55 2 1.99 23,352,192 23,332,663 99.61 4 3.80 23,380,613 23,350,736 99.48 8 7.80 23,425,288 23,396,612 99.29

1 - 20,465,969 20,414,755 97.58 2 1.89 20,376,567 20,250,719 98.03 4 3.65 20,443,190 20,423,250 97.70 8 7.30 20,441,068 20,410,519 97.71

1 - 8,824 8,810 99.80 2 1.95 8,822 8,810 99.82 4 3.69 8,819 8,815 99.86 8 5.72 8,816 8,812 99.89

1 - 80,511 80,466 99.92 2 1.95 80,475 80,450 99.97 4 3.81 80,489 80,450 99.95 8 6.85 80,484 80,454 99.96

1 - 23,365,444 23,353,738 99.55 2 2.00 23,348,946 23,322,729 99.62 4 3.89 23,358,733 23,334,364 99.58 8 7.75 23,356,251 23,350,596 99.59

1 - 20,465,969 20,414,755 97.58 2 2.02 20,456,702 20,392,284 97.63 4 3.20 20,450,581 20,414,972 97.66 8 5.55 20,434,287 20,375,145 97.74

**Table 3.** Multiple cooperating colonies - Global best exchange each 10 cycles: number of cores, speedup, average tour

length, best tour length and relative closeness of the average tour length to the optimal solution.

**Table 2.** Multiple independent colonies: number of cores, speedup, average tour length, best tour length and relative

**Problem Nb. of Speedup Avg. tour Best tour Closeness cores length length**

rat783

d2103

pla7397

usa13509

rat783

d2103

pla7397

usa13509

closeness of the average tour length to the optimal solution.

Each colony executes its own ants in parallel by creating a nested group of threads with an additional parallel region. Ants are then distributed to the available processor cores and update the global shared pheromone structure of the colony. Therefore, these updates must be carried out within some form of critical zone to guarantee that unmanaged concurrent writes are avoided. Next subsection shows how these strategies translate into a real computing environment.

## *4.1.1. Experimental results*

The proposed experimentations are based on the Ant Colony System (ACS) applied to the Travelling Salesman Problem ([34]). Both implementations have been experimented on ROMEO II in the Centre de Calcul de Champagne-Ardenne. ROMEO II is a parallel supercomputer of cluster type, consisting of 8 Novascale SMP nodes dedicated to computations. Each node includes 4 Intel Itanium II dual-core processors running at 1.6 GHz with 8MB of cache memory, for a total number of 8 cores, as well as from 16 GB to 128 GB of memory. Each execution is performed on a single node using from 1 to 8 cores. Application code is written in C++ with OpenMP directives for parallelization. The chosen TSP instances range in size from 783 cities to 13 509 cities. For a more detailed version of the experimental setup and results, the reader may consult Delisle *et al.* [8].

Table 2 provides the summary of the experimentations with 1 to 8 independent colonies, each colony residing on a separate core. For each problem and number of cores, the 4 columns provide respectively the speedup, the average tour length, the best tour length and the relative closeness of the average tour length to the optimal solution. For each execution, computed time comes from the last colony that finishes its search and tour length comes from the colony that found the best solution.

We first notice that this implementation is quite scalable. In fact, speedups are relatively close to the number of cores in all configurations. Obviously, there are still some system costs associated to the parallel execution in a shared memory environment, which tend to slightly grow as the number of processors/cores increases. Also, as each core performs the computations associated with a whole ant colony, workload is considerably large in the parallel region. The ratio between parallelism costs and total execution time per core is then greatly reduced.

Table 3 provides results obtained with multiple cooperating colonies. Every 10 iterations, the global best solution is used for the global pheromone update. For the remaining iterations, each colony uses its own best known solution to update its pheromone structure. We first note that the exchange strategy does not significantly hurt the execution time as speedups are still excellent with up to 8 processors. Still, when 4 and 8 processors are used, most efficiency measures are slightly inferior to the ones obtained with independent colonies. This was <sup>54</sup> Ant Colony Optimization - Techniques and Applications Parallel Ant Colony Optimization: Algorithmic Models and Hardware Implementations 11 http://dx.doi.org/10.5772/CHAPTERDOI Parallel Ant Colony Optimization: Algorithmic Models and Hardware Implementations http://dx.doi.org/10.5772/54252 55


10 And Colony Optimization

computing environment.

*4.1.1. Experimental results*

greatly reduced.

exchange step, each thread compare its local value of the best solution with the global best solution. If it has lower cost, it then becomes the new global best known solution. The use of a critical region lets threads do their comparison without risking concurrent writes to the data structure. At this point, the new global best known solution is used by all colonies for the upcoming pheromone update. Since all threads need to have done their comparisons for the new global best solution to be effectively known globally, a synchronization barrier needs

Each colony executes its own ants in parallel by creating a nested group of threads with an additional parallel region. Ants are then distributed to the available processor cores and update the global shared pheromone structure of the colony. Therefore, these updates must be carried out within some form of critical zone to guarantee that unmanaged concurrent writes are avoided. Next subsection shows how these strategies translate into a real

The proposed experimentations are based on the Ant Colony System (ACS) applied to the Travelling Salesman Problem ([34]). Both implementations have been experimented on ROMEO II in the Centre de Calcul de Champagne-Ardenne. ROMEO II is a parallel supercomputer of cluster type, consisting of 8 Novascale SMP nodes dedicated to computations. Each node includes 4 Intel Itanium II dual-core processors running at 1.6 GHz with 8MB of cache memory, for a total number of 8 cores, as well as from 16 GB to 128 GB of memory. Each execution is performed on a single node using from 1 to 8 cores. Application code is written in C++ with OpenMP directives for parallelization. The chosen TSP instances range in size from 783 cities to 13 509 cities. For a more detailed version of the

Table 2 provides the summary of the experimentations with 1 to 8 independent colonies, each colony residing on a separate core. For each problem and number of cores, the 4 columns provide respectively the speedup, the average tour length, the best tour length and the relative closeness of the average tour length to the optimal solution. For each execution, computed time comes from the last colony that finishes its search and tour length comes

We first notice that this implementation is quite scalable. In fact, speedups are relatively close to the number of cores in all configurations. Obviously, there are still some system costs associated to the parallel execution in a shared memory environment, which tend to slightly grow as the number of processors/cores increases. Also, as each core performs the computations associated with a whole ant colony, workload is considerably large in the parallel region. The ratio between parallelism costs and total execution time per core is then

Table 3 provides results obtained with multiple cooperating colonies. Every 10 iterations, the global best solution is used for the global pheromone update. For the remaining iterations, each colony uses its own best known solution to update its pheromone structure. We first note that the exchange strategy does not significantly hurt the execution time as speedups are still excellent with up to 8 processors. Still, when 4 and 8 processors are used, most efficiency measures are slightly inferior to the ones obtained with independent colonies. This was

experimental setup and results, the reader may consult Delisle *et al.* [8].

from the colony that found the best solution.

to be placed before the pheromone update procedure.

**Table 2.** Multiple independent colonies: number of cores, speedup, average tour length, best tour length and relative closeness of the average tour length to the optimal solution.


**Table 3.** Multiple cooperating colonies - Global best exchange each 10 cycles: number of cores, speedup, average tour length, best tour length and relative closeness of the average tour length to the optimal solution.

expected as the information exchange steps imply a synchronization cost that grows with the number of colonies used.

http://dx.doi.org/10.5772/CHAPTERDOI

http://dx.doi.org/10.5772/54252

57

GPUs are programmable through different Application Programming Interfaces like CUDA, OpenCL or DirectX. However, as current general-purpose APIs are still closely tied to specific GPU models, we choose CUDA to fully exploit the available state-of-the-art NVIDIA Fermi architecture. In the CUDA programming model [33], the GPU works as a SIMT co-processor of a conventional CPU. It is based on the concept of kernels, which are functions (written in C) executed in parallel by a given number of CUDA threads. These threads are grouped together into *blocks* that are distributed on the GPU SMs to be executed independently of each other. However, the number of blocks that an SM can process at the same time (*active blocks*) is restricted and depends on the quantity of registers and shared memory used by the threads of each block. Threads within a block can cooperate by sharing data through the shared memory and by synchronizing their execution to coordinate memory accesses. In a block, the system groups threads (typically 32) into *warps* which are executed simultaneously on successive clock cycles. The number of threads per block must be a multiple of its size to maximize efficiency. Much of the global memory latency can then be hidden by the thread scheduler if there are sufficient independent arithmetic instructions that can be issued while waiting for the global memory access to complete. Consequently, the more active blocks

Parallel Ant Colony Optimization: Algorithmic Models and Hardware Implementations

there are per SM, and also active warps, the more the latency can be hidden.

increasing the total number of instructions executed by this warp.

It is important to note that in the context of GPU execution, flow control instructions (if, switch, do, for, while) can affect the efficiency of an algorithm. In fact, depending on the provided data, these instructions may force threads of a same warp to diverge, in other words, to take different paths in the program. In that case, execution paths must be serialized,

In the parallel ants general strategy, ants of a single colony are distributed to processing elements in order to execute tour constructions in parallel. On a conventional CPU architecture, the concept of processing element is usually associated to a single-core processor or to one of the cores of a multi-core processor. On a GPU architecture, the main choices are to associate this concept either to an SP or to an SM. As this case study is concerned with the latter, each ant is associated to a CUDA block and runs its tour construction phase in parallel on a specific SM of the GPU. A dedicated thread of a given block is then in charge of managing the tour construction of an ant, but an additional level of parallelism, the solution element level, may be exploited in the computation of the state transition rule. In fact, an ant evaluates several candidates before selecting the one to add to its current solution. As these evaluations can be done in parallel, they are assigned to the remaining threads of the block. A simple implementation would then imply keeping ant's private data structures in the global memory. However, as only one ant is assigned to a block and so to an SM, taking advantage of the shared-memory is possible. Data needed to compute the ant state transition rule is then stored in this memory that is faster and accessible by all threads that participate in the computation. Most remaining issues encountered in the GPU implementation of the parallel ants general strategy are related to memory management. More particularly, data transfers between CPU and GPU as well as global memory accesses require considerable time. As it was mentioned before, these accesses may be reduced by storing the related data structures in shared memory. However, in the case of ACO, the three central data structures are the pheromone matrix, the penalty matrix (typically the transition cost between all pairs of solution elements) and the candidates lists, which are needed by all ants of the colony while being too large (typically ranging from *O*(*n*) to *O*(*n*2) in size) to fit in shared memory. They are then kept in global memory. On the other hand, as they are not modified during

Concerning solution quality, the reader may observe that in all cases, the average tour length obtained with multiple cooperating colonies is closer to the optimal solution than with independent colonies or sequential execution. In most cases, the minimum solution found is also better. It shows that the information exchange scheme, while simple, is benefical to solution quality. Overall, results show that a *COLONYglobal process*-*ITERAT IONglobal process*-*ANTglobal thread* implementation can be efficiently implemented on a SMP and multi-core computer node containing up to 8 processors.

## **4.2. Parallel ants on Graphics Processing Units**

This approach deals with the execution of a single ant colony on a GPU architecure as defined in the author's previous work ([10]). Ants are associated to blocks and solution elements are associated to threads. As it is shown below, ants may communicate with the relatively slow device memory of the GPU and solution elements may do so with the faster, shared memory of a multiprocessor. As the ACO is not parallelized at the colony and iteration levels, their execution remain sequential and memory structure is not specified. This implementation is then defined as *COLONY*<sup>−</sup> *process*-*ITERAT ION*<sup>−</sup> *process*-*ANTglobal block* -*SOLUT ION*\_*ELEMENTlocal thread*. Before providing more details about this implementation, a brief description of the underlying GPU architecture and computational model are given.

As it may be seen in Figure 3, the conventional NVIDIA GPU [33] includes many *Streaming Multiprocessors* (SM), each one of them being composed of *Streaming Processors* (SP). Several memories are distinguished on this special hardware, differing in size, latency and access type (read-only or read/write). *Device memory* is relatively large in size but slow in access time. The *global* and *local* memory spaces are specific regions of the device memory that can be accessed in read and write modes. Data structures of a computer program to be executed on GPU must be created on the CPU and transferred on global memory which is accessible to all SPs of the GPU. On the other hand, local memory stores automatic data structures that consume more registers than available.

Each SM employs an architecture model called *SIMT* (*Single Instruction, Multiple Thread*) which allows the execution of many coordinated threads in a data-parallel fashion. It is composed of a *constant memory cache*, a *texture memory cache*, a *shared memory* and *registers*. Constant and texture caches are linked to the constant and texture memories that are physically located in the device memory. Consequently, they are accessible in read-only mode by the SPs and faster in access time than the rest of the device memory. The constant memory is very limited in size whereas texture memory size can be adjusted in order to occupy the available device memory. All SPs can read and write in their local shared memory, which is fast in access time but small in size. It is divided into memory banks of 32-bits words that can be accessed simultaneously. This implies that parallel requests for memory addresses that fall into the same memory bank cause the serialization of accesses [33]. Registers are the fastest memories available on a GPU but involve the use of slow local memory when too many are used. Moreover, accesses may be delayed due to register read-after-write dependencies and register memory bank conflicts.

GPUs are programmable through different Application Programming Interfaces like CUDA, OpenCL or DirectX. However, as current general-purpose APIs are still closely tied to specific GPU models, we choose CUDA to fully exploit the available state-of-the-art NVIDIA Fermi architecture. In the CUDA programming model [33], the GPU works as a SIMT co-processor of a conventional CPU. It is based on the concept of kernels, which are functions (written in C) executed in parallel by a given number of CUDA threads. These threads are grouped together into *blocks* that are distributed on the GPU SMs to be executed independently of each other. However, the number of blocks that an SM can process at the same time (*active blocks*) is restricted and depends on the quantity of registers and shared memory used by the threads of each block. Threads within a block can cooperate by sharing data through the shared memory and by synchronizing their execution to coordinate memory accesses. In a block, the system groups threads (typically 32) into *warps* which are executed simultaneously on successive clock cycles. The number of threads per block must be a multiple of its size to maximize efficiency. Much of the global memory latency can then be hidden by the thread scheduler if there are sufficient independent arithmetic instructions that can be issued while waiting for the global memory access to complete. Consequently, the more active blocks there are per SM, and also active warps, the more the latency can be hidden.

12 And Colony Optimization

the number of colonies used.

containing up to 8 processors.

then defined as *COLONY*<sup>−</sup>

consume more registers than available.

expected as the information exchange steps imply a synchronization cost that grows with

Concerning solution quality, the reader may observe that in all cases, the average tour length obtained with multiple cooperating colonies is closer to the optimal solution than with independent colonies or sequential execution. In most cases, the minimum solution found is also better. It shows that the information exchange scheme, while simple, is benefical to

implementation can be efficiently implemented on a SMP and multi-core computer node

This approach deals with the execution of a single ant colony on a GPU architecure as defined in the author's previous work ([10]). Ants are associated to blocks and solution elements are associated to threads. As it is shown below, ants may communicate with the relatively slow device memory of the GPU and solution elements may do so with the faster, shared memory of a multiprocessor. As the ACO is not parallelized at the colony and iteration levels, their execution remain sequential and memory structure is not specified. This implementation is

Before providing more details about this implementation, a brief description of the

As it may be seen in Figure 3, the conventional NVIDIA GPU [33] includes many *Streaming Multiprocessors* (SM), each one of them being composed of *Streaming Processors* (SP). Several memories are distinguished on this special hardware, differing in size, latency and access type (read-only or read/write). *Device memory* is relatively large in size but slow in access time. The *global* and *local* memory spaces are specific regions of the device memory that can be accessed in read and write modes. Data structures of a computer program to be executed on GPU must be created on the CPU and transferred on global memory which is accessible to all SPs of the GPU. On the other hand, local memory stores automatic data structures that

Each SM employs an architecture model called *SIMT* (*Single Instruction, Multiple Thread*) which allows the execution of many coordinated threads in a data-parallel fashion. It is composed of a *constant memory cache*, a *texture memory cache*, a *shared memory* and *registers*. Constant and texture caches are linked to the constant and texture memories that are physically located in the device memory. Consequently, they are accessible in read-only mode by the SPs and faster in access time than the rest of the device memory. The constant memory is very limited in size whereas texture memory size can be adjusted in order to occupy the available device memory. All SPs can read and write in their local shared memory, which is fast in access time but small in size. It is divided into memory banks of 32-bits words that can be accessed simultaneously. This implies that parallel requests for memory addresses that fall into the same memory bank cause the serialization of accesses [33]. Registers are the fastest memories available on a GPU but involve the use of slow local memory when too many are used. Moreover, accesses may be delayed due to register

*process*-*ANTglobal*

*process*-*ITERAT IONglobal*

*block* -*SOLUT ION*\_*ELEMENTlocal*

*process*-*ANTglobal*

*thread*

*thread*.

solution quality. Overall, results show that a *COLONYglobal*

*process*-*ITERAT ION*<sup>−</sup>

underlying GPU architecture and computational model are given.

read-after-write dependencies and register memory bank conflicts.

**4.2. Parallel ants on Graphics Processing Units**

It is important to note that in the context of GPU execution, flow control instructions (if, switch, do, for, while) can affect the efficiency of an algorithm. In fact, depending on the provided data, these instructions may force threads of a same warp to diverge, in other words, to take different paths in the program. In that case, execution paths must be serialized, increasing the total number of instructions executed by this warp.

In the parallel ants general strategy, ants of a single colony are distributed to processing elements in order to execute tour constructions in parallel. On a conventional CPU architecture, the concept of processing element is usually associated to a single-core processor or to one of the cores of a multi-core processor. On a GPU architecture, the main choices are to associate this concept either to an SP or to an SM. As this case study is concerned with the latter, each ant is associated to a CUDA block and runs its tour construction phase in parallel on a specific SM of the GPU. A dedicated thread of a given block is then in charge of managing the tour construction of an ant, but an additional level of parallelism, the solution element level, may be exploited in the computation of the state transition rule. In fact, an ant evaluates several candidates before selecting the one to add to its current solution. As these evaluations can be done in parallel, they are assigned to the remaining threads of the block.

A simple implementation would then imply keeping ant's private data structures in the global memory. However, as only one ant is assigned to a block and so to an SM, taking advantage of the shared-memory is possible. Data needed to compute the ant state transition rule is then stored in this memory that is faster and accessible by all threads that participate in the computation. Most remaining issues encountered in the GPU implementation of the parallel ants general strategy are related to memory management. More particularly, data transfers between CPU and GPU as well as global memory accesses require considerable time. As it was mentioned before, these accesses may be reduced by storing the related data structures in shared memory. However, in the case of ACO, the three central data structures are the pheromone matrix, the penalty matrix (typically the transition cost between all pairs of solution elements) and the candidates lists, which are needed by all ants of the colony while being too large (typically ranging from *O*(*n*) to *O*(*n*2) in size) to fit in shared memory. They are then kept in global memory. On the other hand, as they are not modified during the tour construction phase, it is possible to take benefit of the texture cache to reduce their access times.

http://dx.doi.org/10.5772/CHAPTERDOI

http://dx.doi.org/10.5772/54252

59

**Problem Speedup Stützle Avg. tour Best tour Closeness and Hoos length length**

eil51 Sequential - 427.80 427.32 426 99.69

kroA100 Sequential - 21,336.90 21,314.36 21,282 99.85

d198 Sequential - 15,952.30 15,973.84 15,913 98.77

lin318 Sequential - 42,346.60 42,341.72 42,107 99.26

rat783 Sequential - - 9,042.44 8,923 97.32

fl1577 Sequential - - 24,490.30 24,201 89.83

d2103 Sequential - - 82,754.30 82,378 97.14

The main objective of this chapter was to provide a new algorithmic model to formalize the implementation of Ant Colony Optimization on high performance computing platforms. The proposed taxonomy managed to capture important features related to both the algorithmic structure of ACO and the architecture of parallel computers. Case studies were also presented in order to illustrate how this classification translates into real applications. Finally, with its synthesized literature review and experimental study, this chapter served as an

Still, as it is the case in the field of parallel metaheuristics in general, much can still be done for the effective use of state-of-the-art parallel computing platforms. For example, maximal exploitation of computing resources often requires algorithmic configurations that do not let ACO perform an effective exploration and exploitation of the search space. On the other hand, parallel performance is strongly influenced by the combined effects of parameters related to the metaheuristic, the hardware technical architecture and the granularity of the parallelization. As it becomes clear that the future of computers no longer relies on increasing the performance on a single computing core but on using many of them in a hybrid system, it becomes desirable to adapt optimization tools for parallel execution on many kinds of architectures. We believe that the global acceptance of parallel computing in optimization systems requires algorithms and software that are not only effective, but also usable by a

This work is supported by the Agence Nationale de la Recherche (ANR) under grant no. ANR-2010-COSI-003-03 and by the Centre de Calcul de Champagne-Ardenne ROMEO which

**Table 4.** GPU implementation: speedup, average tour length from Stützle and Hoos original MMAS implementation [35], average tour length, best tour length and relative closeness of the average tour length to the optimal solution.

**5. Conclusion**

overview of current works on parallel ACO.

wide range of academicians and practitioners.

provides the computational resources used for experiments.

**Acknowledgements**

Parallel 6.84 - 427.20 426 99.72

Parallel Ant Colony Optimization: Algorithmic Models and Hardware Implementations

Parallel 8.12 - 21,317.32 21,282 99.83

Parallel 11.13 - 15,961.64 15,851 98.85

Parallel 11.03 - 42,325.32 42,147 99.29

Parallel 15.58 - 9,002.32 8,899 97.77

Parallel 19.47 - 24,287.80 23,938 90.84

Parallel 17.64 - 82,756.00 82,547 97.13

## *4.2.1. Experimental results*

The proposed GPU strategy is implemented into an MMAS algorithm ([35]) and experimented on various TSPs with sizes varying from 51 to 2103 cities. Minimums and averages are computed from 25 trials for problems with less than 1000 cities and from 10 trials for larger instances. An effort is made to keep the algorithm and parameters as close as possible to the original MMAS. Following the guidelines of Barr and Hickman [36] and Alba [37], the *relative speedup* metric is computed on *mean execution times* to evaluate the performance of the proposed implementation. Speedups are calculated by dividing the sequential CPU time with the parallel time, which is obtained with the same CPU and the GPU acting as a co-processor.

Experiments were made on one GPU of an NVIDIA Fermi C2050 server available at the Centre de Calcul de Champagne-Ardenne. It contains 14 SMs, 32 SPs per SM, 48 KB of shared memory per SM and a warp size of 32. The CPU code runs on one core of a 4-core Xeon E5640 CPUs running at 2.67 Ghz and 24 GB of DDR3 memory. Application code was written in the "C for CUDA V3.1" programming environment.

The implementation uses a number of blocks equal to the number of ants, each one of them being composed of a number of threads equal to the size of candidate lists, in that case 20. Also, the number of iterations is set with the intent of globally keeping the same global number of tour constructions for each experiment. For more details on the experimental setup, the reader may consult Delévacq *et al.* ([10]).

A first step in our experiments is to compare solution quality obtained by sequential and parallel versions of the algorithm. Table 4 presents average tour length, best tour length and closeness to the optimal solution for each problem. The reader may note the similarity between the results obtained by our sequential implementation and the ones provided by the authors of the original MMAS ([35]), as well as their significant closeness to optimal solutions.

A second step is to evaluate and compare the reduction of execution time that is obtained with the GPU parallelization strategy. Table 4 shows the speedups obtained for each problem. The reader may notice that speedups are ranging from 6.84 to 19.47. This shows that distributing ants to blocks and sharing the computation of the state transition rule between several threads of a block is efficient. Also, speedup generally increases with problem size, indicating the good scalabilty of the strategy. However, a slight decrease is encountered with the 2103 cities problem. In that case, the large workload and data structures imply memory access latencies and bank conflicts costs that grow faster than the benefits of parallelizing available work. Associated to the combined effect of the increasing number of blocks required to perform computations and a limited number of active blocks per SM, performance gains become less significative. Overall, results show that a *COLONY*<sup>−</sup> *process*-*ITERAT ION*<sup>−</sup> *process*-*ANTglobal block* -*SOLUT ION*\_*ELEMENTlocal thread* implementation can be efficiently implemented on a state-of-the-art GPU.


**Table 4.** GPU implementation: speedup, average tour length from Stützle and Hoos original MMAS implementation [35], average tour length, best tour length and relative closeness of the average tour length to the optimal solution.

## **5. Conclusion**

14 And Colony Optimization

access times.

solutions.

show that a *COLONY*<sup>−</sup>

*4.2.1. Experimental results*

GPU acting as a co-processor.

the tour construction phase, it is possible to take benefit of the texture cache to reduce their

The proposed GPU strategy is implemented into an MMAS algorithm ([35]) and experimented on various TSPs with sizes varying from 51 to 2103 cities. Minimums and averages are computed from 25 trials for problems with less than 1000 cities and from 10 trials for larger instances. An effort is made to keep the algorithm and parameters as close as possible to the original MMAS. Following the guidelines of Barr and Hickman [36] and Alba [37], the *relative speedup* metric is computed on *mean execution times* to evaluate the performance of the proposed implementation. Speedups are calculated by dividing the sequential CPU time with the parallel time, which is obtained with the same CPU and the

Experiments were made on one GPU of an NVIDIA Fermi C2050 server available at the Centre de Calcul de Champagne-Ardenne. It contains 14 SMs, 32 SPs per SM, 48 KB of shared memory per SM and a warp size of 32. The CPU code runs on one core of a 4-core Xeon E5640 CPUs running at 2.67 Ghz and 24 GB of DDR3 memory. Application code was

The implementation uses a number of blocks equal to the number of ants, each one of them being composed of a number of threads equal to the size of candidate lists, in that case 20. Also, the number of iterations is set with the intent of globally keeping the same global number of tour constructions for each experiment. For more details on the experimental

A first step in our experiments is to compare solution quality obtained by sequential and parallel versions of the algorithm. Table 4 presents average tour length, best tour length and closeness to the optimal solution for each problem. The reader may note the similarity between the results obtained by our sequential implementation and the ones provided by the authors of the original MMAS ([35]), as well as their significant closeness to optimal

A second step is to evaluate and compare the reduction of execution time that is obtained with the GPU parallelization strategy. Table 4 shows the speedups obtained for each problem. The reader may notice that speedups are ranging from 6.84 to 19.47. This shows that distributing ants to blocks and sharing the computation of the state transition rule between several threads of a block is efficient. Also, speedup generally increases with problem size, indicating the good scalabilty of the strategy. However, a slight decrease is encountered with the 2103 cities problem. In that case, the large workload and data structures imply memory access latencies and bank conflicts costs that grow faster than the benefits of parallelizing available work. Associated to the combined effect of the increasing number of blocks required to perform computations and a limited number of active blocks per SM, performance gains become less significative. Overall, results

*process*-*ANTglobal*

*block* -*SOLUT ION*\_*ELEMENTlocal*

*thread*

*process*-*ITERAT ION*<sup>−</sup>

implementation can be efficiently implemented on a state-of-the-art GPU.

written in the "C for CUDA V3.1" programming environment.

setup, the reader may consult Delévacq *et al.* ([10]).

The main objective of this chapter was to provide a new algorithmic model to formalize the implementation of Ant Colony Optimization on high performance computing platforms. The proposed taxonomy managed to capture important features related to both the algorithmic structure of ACO and the architecture of parallel computers. Case studies were also presented in order to illustrate how this classification translates into real applications. Finally, with its synthesized literature review and experimental study, this chapter served as an overview of current works on parallel ACO.

Still, as it is the case in the field of parallel metaheuristics in general, much can still be done for the effective use of state-of-the-art parallel computing platforms. For example, maximal exploitation of computing resources often requires algorithmic configurations that do not let ACO perform an effective exploration and exploitation of the search space. On the other hand, parallel performance is strongly influenced by the combined effects of parameters related to the metaheuristic, the hardware technical architecture and the granularity of the parallelization. As it becomes clear that the future of computers no longer relies on increasing the performance on a single computing core but on using many of them in a hybrid system, it becomes desirable to adapt optimization tools for parallel execution on many kinds of architectures. We believe that the global acceptance of parallel computing in optimization systems requires algorithms and software that are not only effective, but also usable by a wide range of academicians and practitioners.

## **Acknowledgements**

This work is supported by the Agence Nationale de la Recherche (ANR) under grant no. ANR-2010-COSI-003-03 and by the Centre de Calcul de Champagne-Ardenne ROMEO which provides the computational resources used for experiments.

## **Author details**

Pierre Delisle

CReSTIC, Université de Reims Champagne-Ardenne, Reims, France

## **6. References**

[1] M. Dorigo and T. Stützle. *Ant Colony Optimization*. MIT Press/Bradford Books, 2004.

http://dx.doi.org/10.5772/CHAPTERDOI

http://dx.doi.org/10.5772/54252

61

[11] E. Talbi, O. Roux, C. Fonlupt, and D. Robillard. Parallel ant colonies for the quadratic assignment problem. *Future Generation Computer Systems*, 17(4):441–449, 2001.

Parallel Ant Colony Optimization: Algorithmic Models and Hardware Implementations

[12] M. Randall and A. Lewis. A parallel implementation of ant colony optimization. *Journal*

[13] M. T. Islam, P. Thulasiraman, and R. K. Thulasiram. A parallel ant colony optimization algorithm for all-pair routing in manets. In *Proceedings of the 17th international Symposium*

[14] M. Craus and L. Rudeanu. Parallel framework for ant-like algorithms. In *Proceedings of the Third International Symposium on Parallel and Distributed Computing*

[15] K. Doerner, R. Hartl, S. Benker, and M. Lucka. Parallel cooperative savings based ant colony optimization - multiple search and decomposition approaches. *Parallel Processing*

[16] M. Middendorf, F. Reischle, and H. Schmeck. Multi colony ant algorithms. *Journal of*

[17] D. Chu and A. Y. Zomaya. Parallel ant colony optimization for 3d protein structure prediction using the hp lattice model. In N. Nedjah, L. de Macedo, and E. Alba, editors, *Parallel Evolutionary Computations*, volume 22 of *Studies in Computational Intelligence*,

[18] M. Manfrin, M. Birattari, T. Stützle, and M. Dorigo. Parallel ant colony optimization for the traveling salesman problem. In *Proceedings of the 5th International Workshop on Ant Colony Optimization and Swarm Intelligence*, volume 4150 of *Lecture Notes in Computer*

[19] I. Ellabib, P. Calamai, and O. Basir. Exchange strategies for multiple ant colony system.

[20] E. Alba, G. Leguizamon, and G. Ordonez. Two models of parallel aco algorithms for the minimum tardy task problem. *International Journal of High Performance Systems*

[21] B. Scheuermann, K. So, M. Guntsch, M. Middendorf, O. Diessel, H. ElGindy, and H. Schmeck. Fpga implementation of population-based ant colony optimization. *Applied*

[22] B. Scheuermann, S. Janson, and M. Middendorf. Hardware-oriented ant colony

[23] A. Catala, J. Jaen, and J. Mocholi. Strategies for accelerating ant colony optimization algorithms on graphical processing units. In *Proceedings of the IEEE Congress on*

[24] J. Wang, J. Dong, and C. Zhang. Implementation of ant colony algorithm based on gpu. In E. Banissi, M. Sarfraz, J. Zhang, A. Ursyn, W. C. Jeng, M. W. Bannatyne, J. J. Zhang, L. H. San, and M. L. Huang, editors, *Proceedings of the Sixth International Conference on*

optimization. *Journal of Systems Architecture*, 53:386–402, 2007.

*Evolutionary Computation*, pages 492–500. IEEE Press, 2007.

*of Parallel and Distributed Computing*, 62(9):1421–1432, 2002.

*(ISPDC/HeteroPar'04)*, pages 36–41, 2004.

chapter 9, pages 177–198. Springer, 2006.

*Information Sciences*, 177(5):1248–1264, 2007.

*Letters*, 16(3):351–370, 2006.

*Heuristics*, 8(3):305–320, 2002.

*Science*, pages 224–234, 2006.

*Architecture*, 1(1):50–59, 2007.

*Soft Computing*, 4:303–322, 2004.

*on Parallel and Distributed Processing*. IEEE Computer Society, 2003.


[11] E. Talbi, O. Roux, C. Fonlupt, and D. Robillard. Parallel ant colonies for the quadratic assignment problem. *Future Generation Computer Systems*, 17(4):441–449, 2001.

16 And Colony Optimization

**Author details**

CReSTIC, Université de Reims Champagne-Ardenne, Reims, France

optimization. *Applied Soft Computing*, 11:5181–5197, 2011.

*OpenMP (EWOMP'01)*, pages 8–12, Barcelona, Spain, 2001.

[1] M. Dorigo and T. Stützle. *Ant Colony Optimization*. MIT Press/Bradford Books, 2004.

[2] B. Bullnheimer, G. Kotsis, and C. Strauss. Parallelization strategies for the ant system. In R. De Leone, A. Murli, P. Pardalos, and G. Toraldo, editors, *High Performance Algorithms and Software in Nonlinear Optimization*, volume 24 of *Applied Optimization*, pages 87–100.

[3] T. Stützle. Parallelisation strategies for ant colony optimization. In A.E. Eiben, T. Bäck, H.-P. Schwefel, and M. Schoenauer, editors, *Proceedings of the Fifth International Conference on Parallel Problem Solving from Nature (PPSN V)*, volume 1498, pages 722–731.

[4] M. Pedemonte, S. Nesmachnow, and H. Cancela. A survey on parallel ant colony

[5] P. Delisle, M. Krajecki, M. Gravel, and C. Gagné. Parallel implementation of an ant colony optimization metaheuristic with openmp. In *Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 3rd European Workshop on*

[6] P. Delisle, M. Gravel, M. Krajecki, C. Gagné, and W. L. Price. A shared memory parallel implementation of ant colony optimization. In *Proceedings of the 6th Metaheuristics*

[7] P. Delisle, M. Gravel, M. Krajecki, C. Gagné, and W. L. Price. Comparing parallelization of an aco: Message passing vs. shared-memory. In M.J. Blesa, C. Blum, A. Roli, and M. Sampels, editors, *Proceedings of the 2nd International Conference on Hybrid Metaheuristics*, volume 3636 of *Lecture Notes in Computer Science*, pages 1–11.

[8] P. Delisle, M. Gravel, and M. Krajecki. Multi-colony parallel ant colony optimization on smp and multi-core computers. In *Proceedings of the World Congress on Nature and*

[9] A. Delévacq, P. Delisle, M. Gravel, and M. Krajecki. Parallel ant colony optimization on graphics processing units. In H. R. Arabnia, S. C. Chiu, G. A. Gravvanis, M. Ito, K. Joe, H. Nishikawa, and A. M. G. Solo, editors, *Proceedings of the 16th International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'10)*,

[10] A. Delévacq, P. Delisle, M. Gravel, and M. Krajecki. Parallel ant colony optimization on graphics processing units. *Journal of Parallel and Distributed Computing*, page doi :

*International Conference (MIC'2005)*, pages 257–264, Vienna, Autria, 2005.

*Biologically Inspired Computing (NaBIC 2009)*, pages 318–323. IEEE, 2009.

Pierre Delisle

**6. References**

Kluwer, Dordrecht, 1997.

Springer-Verlag, New York, 1998.

Springer-Verlag Berlin Heidelberg, 2005.

pages 196–202. CSREA Press, 2010.

10.1016/j.jpdc.2012.01.003, 2012.


*Computer Graphics, Imaging and Visualization: New Advances and Trends*, pages 50–53. IEEE Computer Society, 2009.

**Chapter 3**

**Provisional chapter**

**Strategies for Parallel Ant Colony Optimization on**

**Strategies for Parallel Ant Colony Optimization on**

Ant colony optimization (ACO) is a population-based metaheuristic inspired by the collective behavior of ants which is used for solving optimization problems in general and, in particular, those that can be reduced to finding good paths through graphs. In ACO a set of agents (artificial ants) cooperate in trying to find good solutions to the problem at hand [1]. Ant colony algorithms are known to have a significant ability of finding high-quality solutions in a reasonable time [2]. However, the computational time of these methods is seriously compromised when the current instance of the problem has a high dimension and/or is hard to solve. In this line, a significant amount of research has been done in order to reduce computation time and improve the solution quality of ACO algorithms by using parallel computing. Due to the independence of the artificial ants, which are guided by an indirect communication via their environment (pheromone trail and heuristic information),

Parallel computing has become attractive during the last decade as an instrument to improve the efficiency of population-based methods. One can highlight different reasons to parallelize an algorithm: to (i) reduce the execution time, (ii) enable to increase the size of the problem, (iii) expand the class of problems computationally treatable, and so on. In the literature one can find many possibilities on how to explore parallelism, and the final performance strongly

In the last years, several works were devoted to the implementation of parallel ACO algorithms [4]. Most of these use clusters of PCs, where the workload is distributed to multiple computers [5]. More recently, the emergence of parallel architectures such as multi-core processors and graphics processing units (GPU) allowed new implementations

> ©2012 Angelo et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2013 S. Angelo et al.; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2013 S. Angelo et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

depends on both the problem they are applied to and the hardware available [3].

of parallel ACO algorithms in order to speedup the computational performance.

**Graphics Processing Units**

**Graphics Processing Units**

Douglas A. Augusto and Helio J. C. Barbosa

Helio J. C. Barbosa

**1. Introduction**

Jaqueline S. Angelo,

http://dx.doi.org/10.5772/51679

Jaqueline S. Angelo, Douglas A. Augusto and

Additional information is available at the end of the chapter

ACO algorithms are naturally suitable for parallel implementation.

Additional information is available at the end of the chapter


**Provisional chapter**

## **Strategies for Parallel Ant Colony Optimization on Graphics Processing Units Strategies for Parallel Ant Colony Optimization on Graphics Processing Units**

Jaqueline S. Angelo, Douglas A. Augusto and Helio J. C. Barbosa Jaqueline S. Angelo, Douglas A. Augusto and Helio J. C. Barbosa

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51679

## **1. Introduction**

18 And Colony Optimization

Computer Society, 2009.

62 Ant Colony Optimization - Techniques and Applications

IEEE Press, 2005.

1(1):53–66, 1997.

5(1):2–18, 1993.

16(8):889–914, 2000.

Morgan Kaufmann, 1999.

*Computer Graphics, Imaging and Visualization: New Advances and Trends*, pages 50–53. IEEE

[25] Y. You. Parallel ant system for traveling salesman problem on gpus. In *Proceedings of*

[26] W. Zhu and J. Curry. Parallel ant colony for nonlinear function optimization with graphics hardware acceleration. In *Proceedings of the 2009 IEEE international conference*

[27] J. Li, X. Hu, Z. Pang, and K. Qian. A parallel ant colony optimization algorithm based on fine-grained model with gpu-acceleration. *International Journal of Innovative Computing,*

[29] G. Weis and A. Lewis. Using xmpp for ad-hoc grid computing - an application example using parallel ant colony optimisation. In *Proceedings of the International Symposium on*

[30] J. Mocholí, J. Martínez, and J. Canós. A grid ant colony algorithm for the orienteering problem. In *Proceedings of the IEEE Congress on Evolutionary Computation*, pages 942–949.

[32] I. Foster and C. Kesselman. *The Grid: Blueprint for a New Computing Infrastructure*.

[34] M. Dorigo and L. M. Gambardella. Ant colony system: a cooperative learning approach to the traveling salesman problem. *IEEE Transactions on Evolutionary Computation*,

[35] T. Stützle and H. Hoos. Max-min ant system. *Future Generation Computer Systems*,

[36] R. S. Barr and B. L. Hickman. Reporting computational experiments with parallel algorithms : Issues, measures and experts' opinions. *ORSA Journal on Computing*,

[37] E. Alba. Parallel evolutionary algorithms can achieve super-linear performance.

[31] E. Talbi. *Metaheuristics: From Design to Implementation*. Wiley Publishing, 2009.

[33] *CUDA : Computer Unified Device Architecture Programming Guide 3.1*, 2010.

*GECCO 2009 - Genetic and Evolutionary Computation*, pages 1–2, 2009.

*on Systems, Man and Cybernetics*, pages 1803–1808. IEEE Press, 2009.

*Information and Control*, 5(11(A)):3707–3716, 2009.

*Parallel and Distributed Processing*, pages 1–4, 2009.

*Information Processing Letters*, 82(1):7–13, 2002.

[28] J. M. Cecilia, J. M. Garcia, A. Nisbet, M. Amos, and M. Ujaldon.

Ant colony optimization (ACO) is a population-based metaheuristic inspired by the collective behavior of ants which is used for solving optimization problems in general and, in particular, those that can be reduced to finding good paths through graphs. In ACO a set of agents (artificial ants) cooperate in trying to find good solutions to the problem at hand [1].

Ant colony algorithms are known to have a significant ability of finding high-quality solutions in a reasonable time [2]. However, the computational time of these methods is seriously compromised when the current instance of the problem has a high dimension and/or is hard to solve. In this line, a significant amount of research has been done in order to reduce computation time and improve the solution quality of ACO algorithms by using parallel computing. Due to the independence of the artificial ants, which are guided by an indirect communication via their environment (pheromone trail and heuristic information), ACO algorithms are naturally suitable for parallel implementation.

Parallel computing has become attractive during the last decade as an instrument to improve the efficiency of population-based methods. One can highlight different reasons to parallelize an algorithm: to (i) reduce the execution time, (ii) enable to increase the size of the problem, (iii) expand the class of problems computationally treatable, and so on. In the literature one can find many possibilities on how to explore parallelism, and the final performance strongly depends on both the problem they are applied to and the hardware available [3].

In the last years, several works were devoted to the implementation of parallel ACO algorithms [4]. Most of these use clusters of PCs, where the workload is distributed to multiple computers [5]. More recently, the emergence of parallel architectures such as multi-core processors and graphics processing units (GPU) allowed new implementations of parallel ACO algorithms in order to speedup the computational performance.

GPU devices have been traditionally used for graphics processing, which requires a high computational power to process a large number of pixels in a short time-frame. The massively parallel architecture of the GPUs makes them more efficient than general-purpose CPUs when large amount of independent data need to be processed in parallel.

In the Traveling Salesman Problem (TSP), a salesman, starting from an initial city, wants to travel the shortest path to serve its customers in the neighboring towns, eventually returning to the city where he originally came from, visiting each city once. The representation of the TSP can be done through a fully connected graph *G* = (*N*, *A*), with *N* being the set of nodes representing cities and *A* the set of edges fully connecting the nodes. For each arc (*i*, *j*) is assigned a value *dij*, which may be distance, time, price, or other factor of interest associated with edges *ai*,*<sup>j</sup>* ∈ *<sup>A</sup>*. The TSP can be symmetric or asymmetric. Using distances (associated with each arc) as cost values, in the symmetric TSP the distance between cities *i* and *j* is the same as between *j* and *i*, i.e. *dij* = *dji*; in the asymmetric TSP the direction used for crossing an arc is taken into consideration and so there is at least one arc in which *dij* �= *dji*. The objective of the problem is to find the minimum Hamiltonian cycle, where a Hamiltonian

Strategies for Parallel Ant Colony Optimization on Graphics Processing Units

http://dx.doi.org/10.5772/51679

65

cycle is a closed tour visiting each of the *n* = |*N*| nodes (cities) of *G* exactly once.

Until recently the only viable choice as a platform for parallel programming was the conventional CPU processor, be it single- or multi-core. Usually many of them were arranged either tightly as multiprocessors, sharing a single memory space, or loosely as multicomputers, with the communication among them done indirectly due to the isolated

The parallelism provided by the CPU is reasonably efficient and still very attractive, particularly for tasks with low degree of parallelism, but a new trendy platform for parallel computing has emerged in the past few years, the *graphics processing unit*, or simply the GPU

The beginning of the GPU architecture dates back to a couple of decades ago when some primitive devices were developed to offload certain basic graphics operations from the CPU. Graphics operations, which end up being essentially the task to determine the right color of each individual pixel per frame, are in general both independent and specialized, allowing a high degree of parallelism to be explored. However, doing such operations on conventional CPU processors, which are general-purpose and back then were exclusively sequential, is slow and inefficient. The advantage of parallel devices designed for such particular purpose was then becoming progressively evident, enabling and inviting a new world of graphics

One of those applications was the computer game, which played an important role on the entire development history of the GPU. As with other graphics applications, games involve computing and displaying—possibly in parallel—numerous pixels at a time. But differently from other graphics applications, computer games were always popular among all range of computer users, and thus very attractive from a business perspective. Better and visually appealing games sell more, but they require more computational power. This demand, as a consequence, has been pushing forward the GPU development since the early days, which

Of course, in the meantime the CPU development had also been advancing, with the processors becoming progressively more complex, particularly due to the addition of cache memory hierarchies and many specific-purpose control units (such as branch prediction, speculative and out-of-order execution, and so on) [11]. Another source of development has

in turn has been enabling the creation of more and more complex games.

**2. Graphics Processing Unit**

memory spaces.

architecture.

applications.

The main type of parallelism in ACO algorithms is the parallel ant approach, which is the parallelism at the level of individual ants. Other steps of the ACO algorithms are also considered for speeding up their performance, such as the tour construction, evaluation of the solution and the pheromone update procedure.

The purpose of this chapter is to present a survey of the recent developments for parallel ant colony algorithms on GPU devices, highlighting and detailing parallelism strategies for each step of an ACO algorithm.

## **1.1. Ant Colony Optimization**

Ant Colony Optimization is a metaheuristic inspired by the observation of real ants' behavior, applied with great success to a large number of difficult optimization problems.

Ant colonies, and other insects that live in colony, present interesting characteristics by the view of the collective behavior of those entities. Some characteristics of social groups in swarm intelligence are widely discussed in [6]. Among them, ant colonies in particular present a highly structured social organization, making them capable of self-organizing, without a centralized controller, in order to accomplish complex tasks for the survival of the entire colony [2]. Those capabilities, such as division of labor, foraging behavior, brood sorting and cooperative transportation, inspired different kinds of ant colony algorithms. The first ACO algorithm was inspired on the capability of ants to find the shortest path between a food source and their nest.

In all those examples ants coordinate their activities via *stigmergy* [7], which is an indirect communication mediated by modifications on the environment. While moving, ants deposit pheromone (chemical substance) on the ground to mark paths that may be followed by other members of the colony, which then reinforce the pheromone on that path. This behavior leads to a self-reinforcing process that results in path marked by high concentration of pheromone while less used paths tend to have a decreasing pheromone level due to evaporation. However, real ants can choose a path that has not the highest concentration of pheromone, so that new sources of food and/or shorter paths can be found.

## **1.2. Combinatorial problems**

In combinatorial optimization problems one wants to find discrete values for solution variables that lead to the optimal solution with respect to a given objective function. An interesting characteristic of combinatorial problems is that they are easy to understand but very difficult to be solved [2].

One of the most extensively studied combinatorial problem is the Traveling Salesman Problem (TSP) [8] and it was the first problem approached by the ACO metaheuristic. The first developed ACO algorithm, called Ant System [1, 9], was initially applied to the TSP, then later improved and applied to many kinds of optimization problems [10].

In the Traveling Salesman Problem (TSP), a salesman, starting from an initial city, wants to travel the shortest path to serve its customers in the neighboring towns, eventually returning to the city where he originally came from, visiting each city once. The representation of the TSP can be done through a fully connected graph *G* = (*N*, *A*), with *N* being the set of nodes representing cities and *A* the set of edges fully connecting the nodes. For each arc (*i*, *j*) is assigned a value *dij*, which may be distance, time, price, or other factor of interest associated with edges *ai*,*<sup>j</sup>* ∈ *<sup>A</sup>*. The TSP can be symmetric or asymmetric. Using distances (associated with each arc) as cost values, in the symmetric TSP the distance between cities *i* and *j* is the same as between *j* and *i*, i.e. *dij* = *dji*; in the asymmetric TSP the direction used for crossing an arc is taken into consideration and so there is at least one arc in which *dij* �= *dji*. The objective of the problem is to find the minimum Hamiltonian cycle, where a Hamiltonian cycle is a closed tour visiting each of the *n* = |*N*| nodes (cities) of *G* exactly once.

## **2. Graphics Processing Unit**

2 Ant Colony Optimization

step of an ACO algorithm.

a food source and their nest.

**1.2. Combinatorial problems**

very difficult to be solved [2].

**1.1. Ant Colony Optimization**

GPU devices have been traditionally used for graphics processing, which requires a high computational power to process a large number of pixels in a short time-frame. The massively parallel architecture of the GPUs makes them more efficient than general-purpose

The main type of parallelism in ACO algorithms is the parallel ant approach, which is the parallelism at the level of individual ants. Other steps of the ACO algorithms are also considered for speeding up their performance, such as the tour construction, evaluation

The purpose of this chapter is to present a survey of the recent developments for parallel ant colony algorithms on GPU devices, highlighting and detailing parallelism strategies for each

Ant Colony Optimization is a metaheuristic inspired by the observation of real ants' behavior,

Ant colonies, and other insects that live in colony, present interesting characteristics by the view of the collective behavior of those entities. Some characteristics of social groups in swarm intelligence are widely discussed in [6]. Among them, ant colonies in particular present a highly structured social organization, making them capable of self-organizing, without a centralized controller, in order to accomplish complex tasks for the survival of the entire colony [2]. Those capabilities, such as division of labor, foraging behavior, brood sorting and cooperative transportation, inspired different kinds of ant colony algorithms. The first ACO algorithm was inspired on the capability of ants to find the shortest path between

In all those examples ants coordinate their activities via *stigmergy* [7], which is an indirect communication mediated by modifications on the environment. While moving, ants deposit pheromone (chemical substance) on the ground to mark paths that may be followed by other members of the colony, which then reinforce the pheromone on that path. This behavior leads to a self-reinforcing process that results in path marked by high concentration of pheromone while less used paths tend to have a decreasing pheromone level due to evaporation. However, real ants can choose a path that has not the highest concentration

In combinatorial optimization problems one wants to find discrete values for solution variables that lead to the optimal solution with respect to a given objective function. An interesting characteristic of combinatorial problems is that they are easy to understand but

One of the most extensively studied combinatorial problem is the Traveling Salesman Problem (TSP) [8] and it was the first problem approached by the ACO metaheuristic. The first developed ACO algorithm, called Ant System [1, 9], was initially applied to the TSP,

of pheromone, so that new sources of food and/or shorter paths can be found.

then later improved and applied to many kinds of optimization problems [10].

applied with great success to a large number of difficult optimization problems.

CPUs when large amount of independent data need to be processed in parallel.

of the solution and the pheromone update procedure.

Until recently the only viable choice as a platform for parallel programming was the conventional CPU processor, be it single- or multi-core. Usually many of them were arranged either tightly as multiprocessors, sharing a single memory space, or loosely as multicomputers, with the communication among them done indirectly due to the isolated memory spaces.

The parallelism provided by the CPU is reasonably efficient and still very attractive, particularly for tasks with low degree of parallelism, but a new trendy platform for parallel computing has emerged in the past few years, the *graphics processing unit*, or simply the GPU architecture.

The beginning of the GPU architecture dates back to a couple of decades ago when some primitive devices were developed to offload certain basic graphics operations from the CPU. Graphics operations, which end up being essentially the task to determine the right color of each individual pixel per frame, are in general both independent and specialized, allowing a high degree of parallelism to be explored. However, doing such operations on conventional CPU processors, which are general-purpose and back then were exclusively sequential, is slow and inefficient. The advantage of parallel devices designed for such particular purpose was then becoming progressively evident, enabling and inviting a new world of graphics applications.

One of those applications was the computer game, which played an important role on the entire development history of the GPU. As with other graphics applications, games involve computing and displaying—possibly in parallel—numerous pixels at a time. But differently from other graphics applications, computer games were always popular among all range of computer users, and thus very attractive from a business perspective. Better and visually appealing games sell more, but they require more computational power. This demand, as a consequence, has been pushing forward the GPU development since the early days, which in turn has been enabling the creation of more and more complex games.

Of course, in the meantime the CPU development had also been advancing, with the processors becoming progressively more complex, particularly due to the addition of cache memory hierarchies and many specific-purpose control units (such as branch prediction, speculative and out-of-order execution, and so on) [11]. Another source of development has been the technological advance in the manufacturing process, which has been allowing the manufactures to systematically increase the transistor density on a microchip. However, all those progresses recently begun to decline with the Moore's Law [12] being threatened by the approaching of the physical limits of the technology on the transistor density and operating frequency. The response from the industry to continually raise the computational power was to migrate from the sequential single-core to the parallel multi-core design.

It is in this convergence that OpenCL is situated. In these days, most of the processors are, to some extent, both parallel and general purpose; therefore, it should be possible to come along with a uniform programming interface to target such different but fundamentally related architectures. This is the main idea behind OpenCL, a platform for uniform parallel

Strategies for Parallel Ant Colony Optimization on Graphics Processing Units

http://dx.doi.org/10.5772/51679

67

OpenCL is an open standard managed by a non-profit organization, the Khronos Group [14], that is architecture- and vendor-independent, so it is designed to work across multiple devices from different manufactures. The two main goals of OpenCL are *portability* and *efficiency*. Portability is achieved by the guarantee that every supported device conforms with a common set of functionality defined by the OpenCL specification [15].2 As for efficiency, it is obtained through the flexible multi-device programming model and a rich set of relatively low-level instructions that allow the programmer to greatly optimize the parallel implementation (possibly targeting a specific architecture if so desirable) without

An OpenCL program comprises two distinct types of code: the *host*, which runs sequentially on the CPU, and the *kernel*, which runs in parallel on one or more devices, including CPUs and GPUs. The host code is responsible for managing the OpenCL devices and setting up/controlling the execution of kernels on them, whereas the actual parallel processing is

The tasks performed by the host portion usually involve: (1) discovering and enumeration of the available compute devices; (2) loading and compilation of the kernels' source code; (3) loading of domain-specific data, such as algorithm's parameters and problem's data; (4) setting up kernels' parameters; (5) launching and coordinating kernel executions; and finally (6) outputting the results. The host code can be written in the C/C++ programming

Since it implements the parallel decomposition of a given problem—a *parallel strategy*—, the kernel is usually the most critical aspect of an OpenCL program and so care should be taken

The OpenCL kernel is similar to the concept of a procedure in a programming language, which takes a set of input arguments, performs computation on them, and writes back the result. The main difference is that an OpenCL kernel is a procedure that, when launched, actually multiple instances of them are spawned simultaneously, each one assigned to an

<sup>2</sup> In fact, all the parallel strategies described in Section 4 can be readily applied on a CPU device (or any other

<sup>3</sup> Of course, although OpenCL guarantees the functional portability, i.e. that the code will run on any other supported device, doing optimizations aimed at getting the most out of a specific device or architecture may lead to the loss of

<sup>4</sup> C and C++ are the only officially supported languages by the OpenCL specification, but there exist many other

programming of heterogeneous systems [14].

*2.1.1. Fundamental Concepts and Terminology*

individual execution unit of a parallel device.

what is known as *performance portability*.

third-party languages that could also be used.

OpenCL-supported device, such as DSPs and FPGAs) without modification.

programmed in the kernel code.

loss of portability.3

2.1.1.1. Host code

language.<sup>4</sup>

in its design.

2.1.1.2. Kernel code

Although the nowadays multi-core CPU processors perform fairly well, the decades of accumulative architectural optimizations toward sequential tasks have led to big and complex CPU cores, hence restricting the amount of them that could be packed on a single processor—not more than a few cores. As a consequence, the current CPU design cannot take advantage of workloads having high degree of parallelism, in other words, it is inefficient for massive parallelism.

Contrary to the development philosophy of the CPU, because of the requirements of graphics operations the GPU took since its infancy the massive parallelism as a design goal. Filling the processor with numerous ALUs<sup>1</sup> means that there is not much die area left for anything else, such as cache memory and control units. The benefit of this design choice is two-fold: (i) it simplifies the architecture due to the uniformity; and (ii) since there is a high portion of transistors dedicated to actual computation (spread over many ALUs), the theoretical computational power is proportionally high. As one can expect, the GPU reaches its peak of efficiency when the device is fully occupied, that is, when there are enough parallel tasks to utilize each one of the thousands of ALUs, as commonly found on a modern GPU.

Besides being highly parallel, this feature alone would not be enough to establish the GPU architecture as a compelling platform for mainstream high-performance computation. In the early days, the graphics operations were mainly primitive and thus could be more easily and efficiently implemented in hardware through fixed, i.e. specialized, functional units. But again, such operations were becoming increasingly more complex, particularly in visually-rich computer games, that the GPU was forced to switch to a programmable architecture, where it was possible to execute not only strict graphics operations, but also arbitrary instructions. The union of an efficient massively parallel architecture with the general-purpose capability has created one of the most exciting processor, the modern GPU architecture, outstanding in performance with respect to power consumption, price and space occupied.

The following section will introduce the increasingly adopted open standard for heterogeneous programming, including of course the GPU, known as OpenCL.

## **2.1. Open Computing Language – OpenCL**

An interesting fact about the CPU and GPU architectures is that while the CPU started as a general-purpose processor and got more and more parallelism through the multi-core design, the GPU did the opposite path, that is, started as a highly specialized parallel processor and was increasingly endowed with general-purpose capabilities as well. In other words, these architectures have been slowly converging into a common design, although each one still has—and probably will always have due to fundamental architectural differences—divergent strengths: the CPU is optimized for achieving low-latency in sequential tasks whereas the GPU is optimized for maximizing the throughput in highly parallel tasks [13].

<sup>1</sup> *Arithmetic and Logic Unit*, the most basic form of computational unit.

It is in this convergence that OpenCL is situated. In these days, most of the processors are, to some extent, both parallel and general purpose; therefore, it should be possible to come along with a uniform programming interface to target such different but fundamentally related architectures. This is the main idea behind OpenCL, a platform for uniform parallel programming of heterogeneous systems [14].

OpenCL is an open standard managed by a non-profit organization, the Khronos Group [14], that is architecture- and vendor-independent, so it is designed to work across multiple devices from different manufactures. The two main goals of OpenCL are *portability* and *efficiency*. Portability is achieved by the guarantee that every supported device conforms with a common set of functionality defined by the OpenCL specification [15].2 As for efficiency, it is obtained through the flexible multi-device programming model and a rich set of relatively low-level instructions that allow the programmer to greatly optimize the parallel implementation (possibly targeting a specific architecture if so desirable) without loss of portability.3

## *2.1.1. Fundamental Concepts and Terminology*

An OpenCL program comprises two distinct types of code: the *host*, which runs sequentially on the CPU, and the *kernel*, which runs in parallel on one or more devices, including CPUs and GPUs. The host code is responsible for managing the OpenCL devices and setting up/controlling the execution of kernels on them, whereas the actual parallel processing is programmed in the kernel code.

## 2.1.1.1. Host code

4 Ant Colony Optimization

massive parallelism.

occupied.

been the technological advance in the manufacturing process, which has been allowing the manufactures to systematically increase the transistor density on a microchip. However, all those progresses recently begun to decline with the Moore's Law [12] being threatened by the approaching of the physical limits of the technology on the transistor density and operating frequency. The response from the industry to continually raise the computational power was

Although the nowadays multi-core CPU processors perform fairly well, the decades of accumulative architectural optimizations toward sequential tasks have led to big and complex CPU cores, hence restricting the amount of them that could be packed on a single processor—not more than a few cores. As a consequence, the current CPU design cannot take advantage of workloads having high degree of parallelism, in other words, it is inefficient for

Contrary to the development philosophy of the CPU, because of the requirements of graphics operations the GPU took since its infancy the massive parallelism as a design goal. Filling the processor with numerous ALUs<sup>1</sup> means that there is not much die area left for anything else, such as cache memory and control units. The benefit of this design choice is two-fold: (i) it simplifies the architecture due to the uniformity; and (ii) since there is a high portion of transistors dedicated to actual computation (spread over many ALUs), the theoretical computational power is proportionally high. As one can expect, the GPU reaches its peak of efficiency when the device is fully occupied, that is, when there are enough parallel tasks to

Besides being highly parallel, this feature alone would not be enough to establish the GPU architecture as a compelling platform for mainstream high-performance computation. In the early days, the graphics operations were mainly primitive and thus could be more easily and efficiently implemented in hardware through fixed, i.e. specialized, functional units. But again, such operations were becoming increasingly more complex, particularly in visually-rich computer games, that the GPU was forced to switch to a programmable architecture, where it was possible to execute not only strict graphics operations, but also arbitrary instructions. The union of an efficient massively parallel architecture with the general-purpose capability has created one of the most exciting processor, the modern GPU architecture, outstanding in performance with respect to power consumption, price and space

The following section will introduce the increasingly adopted open standard for

An interesting fact about the CPU and GPU architectures is that while the CPU started as a general-purpose processor and got more and more parallelism through the multi-core design, the GPU did the opposite path, that is, started as a highly specialized parallel processor and was increasingly endowed with general-purpose capabilities as well. In other words, these architectures have been slowly converging into a common design, although each one still has—and probably will always have due to fundamental architectural differences—divergent strengths: the CPU is optimized for achieving low-latency in sequential tasks whereas the

heterogeneous programming, including of course the GPU, known as OpenCL.

GPU is optimized for maximizing the throughput in highly parallel tasks [13].

**2.1. Open Computing Language – OpenCL**

<sup>1</sup> *Arithmetic and Logic Unit*, the most basic form of computational unit.

utilize each one of the thousands of ALUs, as commonly found on a modern GPU.

to migrate from the sequential single-core to the parallel multi-core design.

The tasks performed by the host portion usually involve: (1) discovering and enumeration of the available compute devices; (2) loading and compilation of the kernels' source code; (3) loading of domain-specific data, such as algorithm's parameters and problem's data; (4) setting up kernels' parameters; (5) launching and coordinating kernel executions; and finally (6) outputting the results. The host code can be written in the C/C++ programming language.<sup>4</sup>

## 2.1.1.2. Kernel code

Since it implements the parallel decomposition of a given problem—a *parallel strategy*—, the kernel is usually the most critical aspect of an OpenCL program and so care should be taken in its design.

The OpenCL kernel is similar to the concept of a procedure in a programming language, which takes a set of input arguments, performs computation on them, and writes back the result. The main difference is that an OpenCL kernel is a procedure that, when launched, actually multiple instances of them are spawned simultaneously, each one assigned to an individual execution unit of a parallel device.

<sup>2</sup> In fact, all the parallel strategies described in Section 4 can be readily applied on a CPU device (or any other OpenCL-supported device, such as DSPs and FPGAs) without modification.

<sup>3</sup> Of course, although OpenCL guarantees the functional portability, i.e. that the code will run on any other supported device, doing optimizations aimed at getting the most out of a specific device or architecture may lead to the loss of what is known as *performance portability*.

<sup>4</sup> C and C++ are the only officially supported languages by the OpenCL specification, but there exist many other third-party languages that could also be used.

An instance of a kernel is formally called a *work-item*. The total number of work-items is referred to as *global size*, and defines the level of decomposition of the problem: the larger the global size, the finer is the granularity, and is always preferred over a coarser granularity when targeting a GPU device in order to maximize its utilization—if that does not imply in a substantial raise of the communication overhead.

The *N*-D domain range can also be extended to higher dimensions. For instance, in a 2-D

mapped to index the *row* and the second the *column* of a matrix. The reasoning is analogous

There are situations in which it is desirable or required to allow work-items to *communicate* and *synchronize* among them. For efficiency reasons, such operations are not arbitrarily allowed among work-items across the whole *N*-D domain.6 For that purpose, though, one can resort to the notion of *work-group*, which in a nutshell is just a collection of work-items. All the work-items within a work-group are free to communicate and synchronize with each other. The number of work-items per work-group is given by the parameter *local size*, which in practice determines how the global domain is partitioned. For example, if *globalsize* is 256 and *localsize* is 64, then the computational domain is partitioned into 4 work-groups (256/64) with each work-group having 64 work-items. Again, the OpenCL runtime provides means that allow each work-group and work-item to identify themselves. A work-group is identified with respect to the global *N*-D domain through *groupid*, and a work-item is

In order to provide a uniform programming interface, OpenCL abstracts the architecture of a parallel compute device, as shown in Figure 2. There are two fundamental concepts in this

OpenCL defines two levels of compute hardware organization, the *compute units* (CU) and *processing elements* (PE). Not coincidentally this partitioning matches the software abstraction of work-groups and work-items. In fact, OpenCL guarantees that a work-group is entirely executed on a single compute unit whereas work-items are executed by processing elements. Nowadays GPUs usually have thousands of processing elements clustered in a dozen of

<sup>6</sup> There are two main reasons why those operations are restricted: (i) to encourage the better programming practice of avoiding dependence on communication as much as possible; and, most importantly, (ii) to allow the OpenCL to support even those rather limited devices that cannot keep—at least not efficiently—the state of all the running

work-items as needed to fulfill the requirements to implement the global synchronization.

*id* and *global*<sup>1</sup>

Strategies for Parallel Ant Colony Optimization on Graphics Processing Units

*id*, where the first could be

http://dx.doi.org/10.5772/51679

69

domain a work-item would have two identifiers, *global*<sup>0</sup>

2.1.1.3. Communication and Synchronization

identified locally within its work-group via *localid*.

abstraction, the *compute* and *memory* hierarchies.

**Figure 2.** Abstraction of a parallel compute device architecture [16].

*2.1.2. Compute Device Abstraction*

for a 3-D domain range.

The mapping between a work-item and the problem's data is set up through the concept known as *N-dimensional domain range*, or just *N*-D domain, where *N* denotes a one-, two-, or three-dimensional domain. In practice, this is the mechanism that connects the work-items execution ("compute domain") with the problem's data ("data domain"). More specifically, the OpenCL runtime assigns to each work-item a unique identifier, a *globalid*, which in turn makes it possible to an individual work-item to operate on a subset of the problem's data by somehow indexing these elements through the identifier.

Figure 1 illustrates the concept of a mapping between the compute and data domains. Suppose one is interested in computing in parallel a certain operation over an array of four dimensions (*n* = 4), e.g. computing the square root of each element. A trivial strategy would be to dedicate a work-item per element, but let us assume one wants to limit the number of work-items to just two, that is, *globalsize* = 2. This means that a single work-item will have to handle two data elements, thus the granularity *g* = 2. So, how could one connect the compute and data domains? There are different ways of doing that, but one way is to, from within the work-item, index the elements of the *input* and *output* by the expression *<sup>g</sup>* × *<sup>t</sup>* + *globalid*, where *<sup>t</sup>* ∈ {0, 1} is the time step (iteration).

**Figure 1.** Example of a mapping between the compute and data domains.

A pseudo-OpenCL kernel implementing such strategy is presented in Algorithm 1.<sup>5</sup> At step *t*0, the first and second work-items will be accessing, respectively, the indices 0 and 1, and at *t*<sup>1</sup> they will access the indices 2 and 3.

**Algorithm 1:** Example of a pseudo-OpenCL kernel

**for** *t* ← 0 **to** *<sup>n</sup> globalsize* <sup>−</sup> <sup>1</sup> **do** output[*<sup>g</sup>* × *<sup>t</sup>* + *globalid*] ← input[*<sup>g</sup>* × *<sup>t</sup>* + *globalid*];

<sup>5</sup> An actual OpenCL kernel is implemented in OpenCL C, which is almost indistinguishable from the C language, but adds a few extensions and also some restrictions [15].

The *N*-D domain range can also be extended to higher dimensions. For instance, in a 2-D domain a work-item would have two identifiers, *global*<sup>0</sup> *id* and *global*<sup>1</sup> *id*, where the first could be mapped to index the *row* and the second the *column* of a matrix. The reasoning is analogous for a 3-D domain range.

## 2.1.1.3. Communication and Synchronization

6 Ant Colony Optimization

a substantial raise of the communication overhead.

somehow indexing these elements through the identifier.

*<sup>g</sup>* × *<sup>t</sup>* + *globalid*, where *<sup>t</sup>* ∈ {0, 1} is the time step (iteration).

**Figure 1.** Example of a mapping between the compute and data domains.

**Algorithm 1:** Example of a pseudo-OpenCL kernel

output[*<sup>g</sup>* × *<sup>t</sup>* + *globalid*] ← input[*<sup>g</sup>* × *<sup>t</sup>* + *globalid*];

adds a few extensions and also some restrictions [15].

*t*<sup>1</sup> they will access the indices 2 and 3.

*globalsize* <sup>−</sup> <sup>1</sup> **do**

**for** *t* ← 0 **to** *<sup>n</sup>*

An instance of a kernel is formally called a *work-item*. The total number of work-items is referred to as *global size*, and defines the level of decomposition of the problem: the larger the global size, the finer is the granularity, and is always preferred over a coarser granularity when targeting a GPU device in order to maximize its utilization—if that does not imply in

The mapping between a work-item and the problem's data is set up through the concept known as *N-dimensional domain range*, or just *N*-D domain, where *N* denotes a one-, two-, or three-dimensional domain. In practice, this is the mechanism that connects the work-items execution ("compute domain") with the problem's data ("data domain"). More specifically, the OpenCL runtime assigns to each work-item a unique identifier, a *globalid*, which in turn makes it possible to an individual work-item to operate on a subset of the problem's data by

Figure 1 illustrates the concept of a mapping between the compute and data domains. Suppose one is interested in computing in parallel a certain operation over an array of four dimensions (*n* = 4), e.g. computing the square root of each element. A trivial strategy would be to dedicate a work-item per element, but let us assume one wants to limit the number of work-items to just two, that is, *globalsize* = 2. This means that a single work-item will have to handle two data elements, thus the granularity *g* = 2. So, how could one connect the compute and data domains? There are different ways of doing that, but one way is to, from within the work-item, index the elements of the *input* and *output* by the expression

A pseudo-OpenCL kernel implementing such strategy is presented in Algorithm 1.<sup>5</sup> At step *t*0, the first and second work-items will be accessing, respectively, the indices 0 and 1, and at

<sup>5</sup> An actual OpenCL kernel is implemented in OpenCL C, which is almost indistinguishable from the C language, but

There are situations in which it is desirable or required to allow work-items to *communicate* and *synchronize* among them. For efficiency reasons, such operations are not arbitrarily allowed among work-items across the whole *N*-D domain.6 For that purpose, though, one can resort to the notion of *work-group*, which in a nutshell is just a collection of work-items. All the work-items within a work-group are free to communicate and synchronize with each other. The number of work-items per work-group is given by the parameter *local size*, which in practice determines how the global domain is partitioned. For example, if *globalsize* is 256 and *localsize* is 64, then the computational domain is partitioned into 4 work-groups (256/64) with each work-group having 64 work-items. Again, the OpenCL runtime provides means that allow each work-group and work-item to identify themselves. A work-group is identified with respect to the global *N*-D domain through *groupid*, and a work-item is identified locally within its work-group via *localid*.

## *2.1.2. Compute Device Abstraction*

In order to provide a uniform programming interface, OpenCL abstracts the architecture of a parallel compute device, as shown in Figure 2. There are two fundamental concepts in this abstraction, the *compute* and *memory* hierarchies.

**Figure 2.** Abstraction of a parallel compute device architecture [16].

OpenCL defines two levels of compute hardware organization, the *compute units* (CU) and *processing elements* (PE). Not coincidentally this partitioning matches the software abstraction of work-groups and work-items. In fact, OpenCL guarantees that a work-group is entirely executed on a single compute unit whereas work-items are executed by processing elements. Nowadays GPUs usually have thousands of processing elements clustered in a dozen of

<sup>6</sup> There are two main reasons why those operations are restricted: (i) to encourage the better programming practice of avoiding dependence on communication as much as possible; and, most importantly, (ii) to allow the OpenCL to support even those rather limited devices that cannot keep—at least not efficiently—the state of all the running work-items as needed to fulfill the requirements to implement the global synchronization.

compute units. Therefore, to fully utilize such devices, there is needed at the very least this same amount of work-items in flight—however, the optimal amount of work-items in execution should be substantially more than that in order to the device have enough room to hide latencies [17, 18].

via a local search procedure, while the CPU controls the initialization process, pheromone evaporation and updating, the sorting of the generated solutions, and the updating of the probability vectors. The experiments were executed on a workstation equipped with a CPU Intel Xeon E5420 at 2.5GHz and a GPU NVIDIA GeForce GTX 280 at 1296MHz and 240 processing elements. The computational experiments showed acceleration values between 128 and almost 404 in the parallel GPU implementation. On the other hand, both the parallel and serial versions obtained satisfactory results. However, regarding the solution quality under a time limit of one second, the parallel version outperformed the sequential one in most of the test problems. As a side note, the results could have been ever better if the authors had generated the random numbers directly on the GPU instead of pre computing

Strategies for Parallel Ant Colony Optimization on Graphics Processing Units

http://dx.doi.org/10.5772/51679

71

A parallel MMAS under a MATLAB environment was presented in [22]. The authors proposed an algorithm implementation which arranges the data into large scale matrices, taking advantage of the fact that the integration of MATLAB with the Jacket accelerator handles matrices on the GPU more naturally and efficiently than it could do with other data types. Therefore, auxiliary matrices were created, besides the usual matrices (*τ* and *η*) in a standard ACO algorithm. Instances from the TSPLIB were solved using a workstation with a CPU Intel i7 at 3.3GHz and GPU NVIDIA Tesla C1060 at 1.3GHz and 240 processing elements. Given a fixed number of iterations, the experimental evaluation showed that the CPU and GPU implementations obtained similar results, yet with the parallel GPU version much faster than the CPU. The speedup values had been growing with the number of TSP nodes, but when the number of nodes reached 439 the growth could not be sustained and slowed down drastically due to the frequent data-transfer operations between the CPU and

In [23], the authors make use of the GPU parallel computing power to solve pathfinding in games. The ACO algorithm proposed was implemented on a GPU device, where the parallelism strategies follow a similar strategy to the one presented in [19]. In this strategy, ants works in parallel to obtain a solution to the problem. The author intended to study the algorithm scalability when large size problems are solved, against a corresponding implementation on a CPU. The hardware architecture was not available but the computational experiments showed that the GPU version was 15 times faster than its

In [24] an ACO algorithm was proposed for epistasis7 analysis. In order to tackle large scale problems, the authors proposed a multi-GPU parallel implementation consisting of one, three and six devices. The experiments show that the results generated by the GPU implementation outperformed two other sequential versions in almost all trials and, when

The Quadratic Assignment Problem (QAP) was solved in [25] by a parallel ACO based algorithm. Besides the initialization process, all the algorithm steps are performed on the GPU, and all data (pheromone matrix, set of solutions, etc.) are located in the global memory of the GPU. Therefore, no data was needed to be transferred between the CPU and GPU, only the best-so-far solution which checks if the termination condition is satisfied. The authors focus on a parallelism strategy for the 2-opt local search procedure since, from previews experiments, this was the most costly step. The experiments were done in a workstation

the dataset increased, the GPU performed faster than the other implementations.

<sup>7</sup> Phenomenon where the effects of one gene are modified by one or several other genes.

them on the CPU.

GPU.

corresponding CPU implementation.

As for the memories, OpenCL exposes three memory spaces; from the more general to the more specific: the (i) *global/constant* memory, which is the main memory of the device, accessible from all the work-items—the *constant* space is a slightly optimized global memory for read-only access; (ii) the *local* memory, a very fast low-latency memory which is shared only across the work-items within their work-group—normally used as a programmable cache memory or as a means to share data (communicate); and (iii) the *private* memory, also a very fast memory, but only visible to the corresponding work-item.

## **3. Review of the literature**

In the last few years, many works have been devoted to parallel implementations of ACO algorithms in GPU devices, motivated by the powerful massively parallel architecture provided by the GPU.

In reference [19], the authors proposed two parallel ACO implementations to solve the Orienteering Problem (OP). The strategies applied to the GPU were based on the intrinsically data-parallelism provided by the vertex processor and the fragment processor. The first experiments compared a grid implementation with 32 workstations equipped with CPUs Intel Pentium IV at 2.4GHz against one workstation with a GPU NVIDIA GeForce 6600 GT. Both strategies performed similarly with respect to the quality of the obtained solutions. The second experiment compared both the GPU parallel strategies proposed, showing that the strategy applied to the fragment processor performed about 35% faster than the strategy applied to the vertex processor.

In [20], the authors implemented a parallel MMAS using multiple colonies, where each colony is associated with a work-group and ants are associated with work-items within each work-group. The experiments compared a parallel version of MMAS on the GPU, with three serial CPU versions. In the parallel implementation the CPU initializes the pheromone trails, parameters, and also controls the iteration process, while the GPU is responsible for running the main steps of the algorithm: solution construction, choice of the best solution, and pheromone evaporation and updating. Six instances from the Travelling Salesman Problem library (TSPLIB), containing up to 400 cities, were solved using a workstation with a CPU AMD Athlon X2 3600+ running at 1.9GHz and a GPU NVIDIA GeForce GTX 8800 at 1.35GHz with 128 processing elements. The parallel GPU version was 2 to 32 times faster than the sequential version, whereas the solutions quality of the parallel version outperformed all the three MMAS serial versions. In order to accelerate the choice of the iteration-best solution, the authors used a parallel reduction technique that "hangs up" the execution of certain work-items. This technique requires the use of barrier synchronization in order to ensure consistency of memory.

In the work described in [21] the authors implemented a parallel ACO algorithm with a pattern search procedure to solve continuous functions with bound constraints. The parallel method was compared with a serial CPU implementation. Each work-item is responsible for evaluating the solution's costs and constraints, constructing solutions and improving them via a local search procedure, while the CPU controls the initialization process, pheromone evaporation and updating, the sorting of the generated solutions, and the updating of the probability vectors. The experiments were executed on a workstation equipped with a CPU Intel Xeon E5420 at 2.5GHz and a GPU NVIDIA GeForce GTX 280 at 1296MHz and 240 processing elements. The computational experiments showed acceleration values between 128 and almost 404 in the parallel GPU implementation. On the other hand, both the parallel and serial versions obtained satisfactory results. However, regarding the solution quality under a time limit of one second, the parallel version outperformed the sequential one in most of the test problems. As a side note, the results could have been ever better if the authors had generated the random numbers directly on the GPU instead of pre computing them on the CPU.

8 Ant Colony Optimization

hide latencies [17, 18].

**3. Review of the literature**

applied to the vertex processor.

ensure consistency of memory.

provided by the GPU.

compute units. Therefore, to fully utilize such devices, there is needed at the very least this same amount of work-items in flight—however, the optimal amount of work-items in execution should be substantially more than that in order to the device have enough room to

As for the memories, OpenCL exposes three memory spaces; from the more general to the more specific: the (i) *global/constant* memory, which is the main memory of the device, accessible from all the work-items—the *constant* space is a slightly optimized global memory for read-only access; (ii) the *local* memory, a very fast low-latency memory which is shared only across the work-items within their work-group—normally used as a programmable cache memory or as a means to share data (communicate); and (iii) the *private* memory, also

In the last few years, many works have been devoted to parallel implementations of ACO algorithms in GPU devices, motivated by the powerful massively parallel architecture

In reference [19], the authors proposed two parallel ACO implementations to solve the Orienteering Problem (OP). The strategies applied to the GPU were based on the intrinsically data-parallelism provided by the vertex processor and the fragment processor. The first experiments compared a grid implementation with 32 workstations equipped with CPUs Intel Pentium IV at 2.4GHz against one workstation with a GPU NVIDIA GeForce 6600 GT. Both strategies performed similarly with respect to the quality of the obtained solutions. The second experiment compared both the GPU parallel strategies proposed, showing that the strategy applied to the fragment processor performed about 35% faster than the strategy

In [20], the authors implemented a parallel MMAS using multiple colonies, where each colony is associated with a work-group and ants are associated with work-items within each work-group. The experiments compared a parallel version of MMAS on the GPU, with three serial CPU versions. In the parallel implementation the CPU initializes the pheromone trails, parameters, and also controls the iteration process, while the GPU is responsible for running the main steps of the algorithm: solution construction, choice of the best solution, and pheromone evaporation and updating. Six instances from the Travelling Salesman Problem library (TSPLIB), containing up to 400 cities, were solved using a workstation with a CPU AMD Athlon X2 3600+ running at 1.9GHz and a GPU NVIDIA GeForce GTX 8800 at 1.35GHz with 128 processing elements. The parallel GPU version was 2 to 32 times faster than the sequential version, whereas the solutions quality of the parallel version outperformed all the three MMAS serial versions. In order to accelerate the choice of the iteration-best solution, the authors used a parallel reduction technique that "hangs up" the execution of certain work-items. This technique requires the use of barrier synchronization in order to

In the work described in [21] the authors implemented a parallel ACO algorithm with a pattern search procedure to solve continuous functions with bound constraints. The parallel method was compared with a serial CPU implementation. Each work-item is responsible for evaluating the solution's costs and constraints, constructing solutions and improving them

a very fast memory, but only visible to the corresponding work-item.

A parallel MMAS under a MATLAB environment was presented in [22]. The authors proposed an algorithm implementation which arranges the data into large scale matrices, taking advantage of the fact that the integration of MATLAB with the Jacket accelerator handles matrices on the GPU more naturally and efficiently than it could do with other data types. Therefore, auxiliary matrices were created, besides the usual matrices (*τ* and *η*) in a standard ACO algorithm. Instances from the TSPLIB were solved using a workstation with a CPU Intel i7 at 3.3GHz and GPU NVIDIA Tesla C1060 at 1.3GHz and 240 processing elements. Given a fixed number of iterations, the experimental evaluation showed that the CPU and GPU implementations obtained similar results, yet with the parallel GPU version much faster than the CPU. The speedup values had been growing with the number of TSP nodes, but when the number of nodes reached 439 the growth could not be sustained and slowed down drastically due to the frequent data-transfer operations between the CPU and GPU.

In [23], the authors make use of the GPU parallel computing power to solve pathfinding in games. The ACO algorithm proposed was implemented on a GPU device, where the parallelism strategies follow a similar strategy to the one presented in [19]. In this strategy, ants works in parallel to obtain a solution to the problem. The author intended to study the algorithm scalability when large size problems are solved, against a corresponding implementation on a CPU. The hardware architecture was not available but the computational experiments showed that the GPU version was 15 times faster than its corresponding CPU implementation.

In [24] an ACO algorithm was proposed for epistasis7 analysis. In order to tackle large scale problems, the authors proposed a multi-GPU parallel implementation consisting of one, three and six devices. The experiments show that the results generated by the GPU implementation outperformed two other sequential versions in almost all trials and, when the dataset increased, the GPU performed faster than the other implementations.

The Quadratic Assignment Problem (QAP) was solved in [25] by a parallel ACO based algorithm. Besides the initialization process, all the algorithm steps are performed on the GPU, and all data (pheromone matrix, set of solutions, etc.) are located in the global memory of the GPU. Therefore, no data was needed to be transferred between the CPU and GPU, only the best-so-far solution which checks if the termination condition is satisfied. The authors focus on a parallelism strategy for the 2-opt local search procedure since, from previews experiments, this was the most costly step. The experiments were done in a workstation

<sup>7</sup> Phenomenon where the effects of one gene are modified by one or several other genes.

with CPU Intel i7 965 at 3.2GHz and GPU NVIDIA GeForce GTX 480 at 1401MHz and 480 processing elements. Instances from the Quadratic Assignment Problem library (QAPLIB) were solved with the problem size ranging from 50 to 150. The GPU computing performed 24 times faster than the CPU.

**4. Parallelization strategies**

continuously updated by the ants.

**Algorithm 2:** Pseudo-code of Ant System.

// Initialization phase

Return the best solution;

generated.

**while** *termination criteria not met* **do** Ants' solutions construction; Ants' solutions evaluation; Pheromone trails updating;

parallel computing provided by the GPU.

Pheromone trails *τ*; Heuristic information *η*; // Iterative phase

In ACO algorithms, artificial ants cooperate while exploring the search space, searching good solutions for the problem through a communication mediated by artificial pheromone trails. The construction solution process is incremental, where a solution is built by adding solution components to an initially empty solution under construction. The ant's heuristic rule probabilistically decides the next solution component guided by (i) the heuristic information (*η*), representing a priori information about the problem instance to be solved; and (ii) the pheromone trail (*τ*), which encodes a memory about the ant colony search process that is

Strategies for Parallel Ant Colony Optimization on Graphics Processing Units

http://dx.doi.org/10.5772/51679

73

The main steps of the Ant System (AS) algorithm [1, 9] can be described as: initialization phase, ants' solutions construction, ants' solutions evaluation and pheromone trails updating. In Algorithm 2 a pseudo-code of AS is given. As opposed to the following parallel strategies, this algorithm is meant to be implemented and run as host code, preparing and transferring

After setting the parameters, the first step of the algorithm is the initialization procedure, which initializes the heuristic information and the pheromone trails. In ants' solution construction, each ant starts with a randomly chosen node (city) and incrementally builds solutions according to the decision policy of choosing an unvisited node *j* being at node *i*, which is guided by the pheromone trails (*τij*) and the heuristic information (*ηij*) associated with that arc. When all ants construct a complete path (feasible solution), the solutions are evaluated. Then, the pheromone trails are updated considering the quality of the candidate solutions found; also a certain level of evaporation is applied. When the iterative phase is complete, that is, when the termination criteria is met, the algorithm returns the best solution

As showed in the previous section, different parallel techniques for ACO algorithms were proposed, each one adapted to the optimization problem considered and the GPU architecture available. In all cases, researchers tried to extract the maximum efficiency of the

This section is dedicated to describe, in a pseudo-OpenCL form, parallelization strategies of the ACO algorithm described in Algorithm 2, taking the TSP as an illustrative reference

data to/from the GPU, setting kernels' arguments and managing their executions.

An ACO based parallel algorithm was proposed for design validation of circuits [26]. The ACO method is different from the standard ACO implementation, since it does not use pheromones trails to guide the search process. The proposed method explores the maximum occupancy of the GPU, defining the global size as the number of work-groups times the amount of work-items per work-group. A workstation with CPU Intel i7 at 3.33GHz and a GPU NVIDIA GeForce GTX 285 with 240 processing elements were used for the computational experiments. The results showed average speedup values between 7 and 11 regarding all the test problems, and reaching a peak speedup value of 228 in a specific test problem when compared with two other methods.

In [27], the MMAS with a 3-opt local search was implemented in parallel on the GPU. The authors proposed four parallel strategies, two based on parallel ants and two based on multiple ant colonies. In the first parallel-ants strategy, ants are assigned to work-items, each one responsible for all calculation needed in the tour construction process. The second parallel-ants proposal assigned each ant to a work-group, making possible to extract an additional level of parallelism in the computation of the state transition rule. In the multiple colony strategy, a single GPU and multiples GPUs—each one associated to a colony—were used, applying the same parallel-ants strategies proposed. TSP instances varying from 51 to 2103 cities were used as test problems. The experiments were done using two CPUs 4-core Xeon E5640 at 2.67GHz and two GPUs NVIDIA Fermi C2050 with 448 processing elements. Evaluating the parallel ants strategies against the sequential version of the MMAS, the overall experiments showed that the solutions quality were similar, when no local search was used. However, speedup values ranging from 6.84 to 19.47 could be achieved when the ants were associated with work-groups. For the multiple colonies strategies the speedup varied between 16.24 and 23.60.

The authors in [28] proposed parallel strategies for the tour construction and the pheromone updating phases. In the tour construction phase three different aspects were reworked in order to increase parallelism: (i) the choice-info matrix calculation, which combines pheromone and heuristic information; (ii) the *roulette wheel* selection procedure; and (iii) the decomposition granularity, which switched to the parallel processing of both ants and tours. Regarding the pheromone trails updating, the authors applied a *scatter to gather* based design to avoid atomic instructions required for proper updating the pheromone matrix. The hardware used for the computational experiments were composed by a CPU Intel Xeon E5620 running at 2.4Ghz and a GPU NVIDIA Tesla C2050 at 1.15GHz and 448 processing elements. For the phase of the construction of the solution, the parallel version performed up to 21 times faster than the sequential version, while for the pheromone updating the scatter to gather technique performed poorly. However, considering a data-based parallelism with atomic instructions, the authors presented a strategy that was up to 20 times faster than a sequential execution.

The next section will present strategies for the parallel ACO on the GPU for each step of the algorithm.

## **4. Parallelization strategies**

10 Ant Colony Optimization

24 times faster than the CPU.

between 16.24 and 23.60.

a sequential execution.

algorithm.

problem when compared with two other methods.

with CPU Intel i7 965 at 3.2GHz and GPU NVIDIA GeForce GTX 480 at 1401MHz and 480 processing elements. Instances from the Quadratic Assignment Problem library (QAPLIB) were solved with the problem size ranging from 50 to 150. The GPU computing performed

An ACO based parallel algorithm was proposed for design validation of circuits [26]. The ACO method is different from the standard ACO implementation, since it does not use pheromones trails to guide the search process. The proposed method explores the maximum occupancy of the GPU, defining the global size as the number of work-groups times the amount of work-items per work-group. A workstation with CPU Intel i7 at 3.33GHz and a GPU NVIDIA GeForce GTX 285 with 240 processing elements were used for the computational experiments. The results showed average speedup values between 7 and 11 regarding all the test problems, and reaching a peak speedup value of 228 in a specific test

In [27], the MMAS with a 3-opt local search was implemented in parallel on the GPU. The authors proposed four parallel strategies, two based on parallel ants and two based on multiple ant colonies. In the first parallel-ants strategy, ants are assigned to work-items, each one responsible for all calculation needed in the tour construction process. The second parallel-ants proposal assigned each ant to a work-group, making possible to extract an additional level of parallelism in the computation of the state transition rule. In the multiple colony strategy, a single GPU and multiples GPUs—each one associated to a colony—were used, applying the same parallel-ants strategies proposed. TSP instances varying from 51 to 2103 cities were used as test problems. The experiments were done using two CPUs 4-core Xeon E5640 at 2.67GHz and two GPUs NVIDIA Fermi C2050 with 448 processing elements. Evaluating the parallel ants strategies against the sequential version of the MMAS, the overall experiments showed that the solutions quality were similar, when no local search was used. However, speedup values ranging from 6.84 to 19.47 could be achieved when the ants were associated with work-groups. For the multiple colonies strategies the speedup varied

The authors in [28] proposed parallel strategies for the tour construction and the pheromone updating phases. In the tour construction phase three different aspects were reworked in order to increase parallelism: (i) the choice-info matrix calculation, which combines pheromone and heuristic information; (ii) the *roulette wheel* selection procedure; and (iii) the decomposition granularity, which switched to the parallel processing of both ants and tours. Regarding the pheromone trails updating, the authors applied a *scatter to gather* based design to avoid atomic instructions required for proper updating the pheromone matrix. The hardware used for the computational experiments were composed by a CPU Intel Xeon E5620 running at 2.4Ghz and a GPU NVIDIA Tesla C2050 at 1.15GHz and 448 processing elements. For the phase of the construction of the solution, the parallel version performed up to 21 times faster than the sequential version, while for the pheromone updating the scatter to gather technique performed poorly. However, considering a data-based parallelism with atomic instructions, the authors presented a strategy that was up to 20 times faster than

The next section will present strategies for the parallel ACO on the GPU for each step of the

In ACO algorithms, artificial ants cooperate while exploring the search space, searching good solutions for the problem through a communication mediated by artificial pheromone trails. The construction solution process is incremental, where a solution is built by adding solution components to an initially empty solution under construction. The ant's heuristic rule probabilistically decides the next solution component guided by (i) the heuristic information (*η*), representing a priori information about the problem instance to be solved; and (ii) the pheromone trail (*τ*), which encodes a memory about the ant colony search process that is continuously updated by the ants.

The main steps of the Ant System (AS) algorithm [1, 9] can be described as: initialization phase, ants' solutions construction, ants' solutions evaluation and pheromone trails updating. In Algorithm 2 a pseudo-code of AS is given. As opposed to the following parallel strategies, this algorithm is meant to be implemented and run as host code, preparing and transferring data to/from the GPU, setting kernels' arguments and managing their executions.

**Algorithm 2:** Pseudo-code of Ant System.

// Initialization phase Pheromone trails *τ*; Heuristic information *η*; // Iterative phase **while** *termination criteria not met* **do** Ants' solutions construction; Ants' solutions evaluation; Pheromone trails updating;

Return the best solution;

After setting the parameters, the first step of the algorithm is the initialization procedure, which initializes the heuristic information and the pheromone trails. In ants' solution construction, each ant starts with a randomly chosen node (city) and incrementally builds solutions according to the decision policy of choosing an unvisited node *j* being at node *i*, which is guided by the pheromone trails (*τij*) and the heuristic information (*ηij*) associated with that arc. When all ants construct a complete path (feasible solution), the solutions are evaluated. Then, the pheromone trails are updated considering the quality of the candidate solutions found; also a certain level of evaporation is applied. When the iterative phase is complete, that is, when the termination criteria is met, the algorithm returns the best solution generated.

As showed in the previous section, different parallel techniques for ACO algorithms were proposed, each one adapted to the optimization problem considered and the GPU architecture available. In all cases, researchers tried to extract the maximum efficiency of the parallel computing provided by the GPU.

This section is dedicated to describe, in a pseudo-OpenCL form, parallelization strategies of the ACO algorithm described in Algorithm 2, taking the TSP as an illustrative reference problem.<sup>8</sup> Those strategies, however, should be readily applicable, with minor or no adaptations at all, to all the problems that belong to the same class of the TSP.9

## **4.1. Data initialization**

This phase is responsible for defining the stopping criteria, initializing the parameters and allocating all data structures of the algorithm. The list of parameters is: *α* and *β*, which regulate the relative importance of the pheromone trails and the heuristic information, respectively; *ρ*, the pheromone evaporation rate; *τ*0, the initial pheromone value; number of ants (*numberants*); and the number of nodes (*numbernodes*). The parameters setting is done on the host and then passed as kernel's arguments.

In the following kernels all the data structures, in particular the matrices, are actually allocated and accessed as linear arrays, since OpenCL does not provide abstraction for higher-dimensional data structures. Therefore, the element *aij* ∈ *<sup>A</sup>* is indexed in its linear form as *A*[*i* × *n* + *j*], where *n* is the number of columns of matrix *A*.

## *4.1.1. Pheromone Trails and Heuristic Information*

To initialize the pheromone trails, all connections (*i*, *j*) must be set to the same initial value (*τ*0), whereas in the heuristic information each connection (*i*, *j*) is set as the distance between the nodes *i* and *j* of the TSP instance being solved. Since the initialization operation is inherently independent it can be trivially parallelized. Algorithm 3 presents the kernel implementation in which a 2-D domain range10 is used and defined as

$$\begin{aligned} \log \text{lobal}\_{size}^0 &\leftarrow \text{number}\_{nodes} \\ \log \text{lobal}\_{size}^1 &\leftarrow \text{number}\_{nodes} \end{aligned} \tag{1}$$

**4.2. Solution construction**

regarding the parallel strategy.

choice*inf o* [*global*<sup>0</sup>

*τ*[*global*<sup>0</sup>

For the TSP, this phase is the most costly of the ACO algorithm and needs special attention

In this section, a parallel implementation for the solution construction will be presented—the

those values need to be computed by all ants, hence, in order to reduce the computation time [2] an additional matrix, choice*inf o*[·][·], is utilized to cache them. For this caching

*size* <sup>←</sup> *numbernodes*

*id* <sup>×</sup> *global*<sup>1</sup>

Whenever the pheromone trails *τ* is modified (4.1 and 4.4), the matrix choice*inf o* also needs to be updated since it depends on the former. In other words, the caching data is recalculated

In this strategy, each ant is associated with a work-item, each one responsible for constructing a complete solution, managing all data required for this phase (list of visited cities, probabilities calculations, and so on). Algorithm 5 presents a kernel which implements the

The matrix of candidate solutions (solution[·][·]) stores the ants' paths, with each row representing a complete ant's solution. The set of visited nodes, visited[·], keeps track of the current visited nodes for each ant, preventing duplicate selection as forbidden by the TSP: the *i*-th element is set to *true* when the *i*-th node is chosen to be part of the ant's solution (initially all elements are set to *false*). At a current node *c*, selection*prob*[*i*] stores the probability of each node *i* being selected, which is based on the pheromone trails and

*size* + *global*<sup>1</sup>

*id*] *β*;

*globalsize* ← *numberants* (3)

*<sup>α</sup>*[*ηij*]

http://dx.doi.org/10.5772/51679

*size* <sup>←</sup> *numbernodes*, (2)

Strategies for Parallel Ant Colony Optimization on Graphics Processing Units

*<sup>β</sup>*. Each of

75

*ant-based* parallelism—which associates an ant with a work-item.

computation, a 2-D domain range is employed and defined as

with the corresponding kernel described in Algorithm 4.

*size* + *global*<sup>1</sup>

*size* + *global*<sup>1</sup>

AS decision rule, where the 1-D domain range is set as

heuristic information—such data is cached in choice*inf o*[*c*][*i*].

*id* <sup>×</sup> *global*<sup>1</sup>

*id* <sup>×</sup> *global*<sup>1</sup>

*4.2.2. Ant-based Parallelism (AP)*

**Algorithm 4:** OpenCL kernel for calculating the *choice-info* cache

*id*]

at each iteration, just before the actual construction of the solution.

The probability of choosing a node *j* being at node *i* is associated with [*τij*]

*global*<sup>0</sup>

*global*<sup>1</sup>

*id*] <sup>←</sup>

*α* × *η*[*global*<sup>0</sup>

*4.2.1. Caching the Pheromone and Heuristic Information*


In the kernel, the helper function Distance(*i*, *j*) returns the distance between nodes *i* and *j*. The input data are two arrays with the coordinates *x* and *y* of each node. This function should implement the Euclidean, Manhattan or other distance function defined by the problem. The input coordinates must be set on the CPU, by reading the TSP instance, then transferred to the GPU prior to the kernel launch.

<sup>8</sup> In this chapter only the key components to the understanding of the parallel strategies—the OpenCL kernels and the corresponding setup of the *N*-dimensional domains—are presented. For specific details regarding secondary elements, such as the host code and the actual OpenCL kernel, please refer to the appropriated OpenCL literature.

<sup>9</sup> It might be necessary some adaptations concerning the algorithmic structure (data initialization, evaluation of costs, etc.) that might have particular needs with respect to the underlying strategy of parallelism.

<sup>10</sup> The OpenCL kernels presented throughout this chapter are either in a one- or two-dimensional domain range, depending on which one fits more naturally the particular mapping between the data and compute domains.

## **4.2. Solution construction**

12 Ant Colony Optimization

**4.1. Data initialization**

on the host and then passed as kernel's arguments.

*4.1.1. Pheromone Trails and Heuristic Information*

**Algorithm 3:** OpenCL kernel for initializing *τ* and *η*

*id*] <sup>←</sup> *<sup>τ</sup>*0;

*size* + *global*<sup>1</sup>

*size* + *global*<sup>1</sup>

the GPU prior to the kernel launch.

*τ*[*global*<sup>0</sup>

*η*[*global*<sup>0</sup>

*id* <sup>×</sup> *global*<sup>1</sup>

*id* <sup>×</sup> *global*<sup>1</sup>

form as *A*[*i* × *n* + *j*], where *n* is the number of columns of matrix *A*.

implementation in which a 2-D domain range10 is used and defined as

*global*<sup>0</sup>

*global*<sup>1</sup>

*id*] <sup>←</sup> Distance(*x*[*global*<sup>0</sup>

etc.) that might have particular needs with respect to the underlying strategy of parallelism.

problem.<sup>8</sup> Those strategies, however, should be readily applicable, with minor or no

This phase is responsible for defining the stopping criteria, initializing the parameters and allocating all data structures of the algorithm. The list of parameters is: *α* and *β*, which regulate the relative importance of the pheromone trails and the heuristic information, respectively; *ρ*, the pheromone evaporation rate; *τ*0, the initial pheromone value; number of ants (*numberants*); and the number of nodes (*numbernodes*). The parameters setting is done

In the following kernels all the data structures, in particular the matrices, are actually allocated and accessed as linear arrays, since OpenCL does not provide abstraction for higher-dimensional data structures. Therefore, the element *aij* ∈ *<sup>A</sup>* is indexed in its linear

To initialize the pheromone trails, all connections (*i*, *j*) must be set to the same initial value (*τ*0), whereas in the heuristic information each connection (*i*, *j*) is set as the distance between the nodes *i* and *j* of the TSP instance being solved. Since the initialization operation is inherently independent it can be trivially parallelized. Algorithm 3 presents the kernel

*size* <sup>←</sup> *numbernodes*

(1)

*size* <sup>←</sup> *numbernodes*

In the kernel, the helper function Distance(*i*, *j*) returns the distance between nodes *i* and *j*. The input data are two arrays with the coordinates *x* and *y* of each node. This function should implement the Euclidean, Manhattan or other distance function defined by the problem. The input coordinates must be set on the CPU, by reading the TSP instance, then transferred to

<sup>8</sup> In this chapter only the key components to the understanding of the parallel strategies—the OpenCL kernels and the corresponding setup of the *N*-dimensional domains—are presented. For specific details regarding secondary elements, such as the host code and the actual OpenCL kernel, please refer to the appropriated OpenCL literature. <sup>9</sup> It might be necessary some adaptations concerning the algorithmic structure (data initialization, evaluation of costs,

<sup>10</sup> The OpenCL kernels presented throughout this chapter are either in a one- or two-dimensional domain range, depending on which one fits more naturally the particular mapping between the data and compute domains.

*id*]*, y*[*global*<sup>1</sup>

*id*]);

adaptations at all, to all the problems that belong to the same class of the TSP.9

For the TSP, this phase is the most costly of the ACO algorithm and needs special attention regarding the parallel strategy.

In this section, a parallel implementation for the solution construction will be presented—the *ant-based* parallelism—which associates an ant with a work-item.

## *4.2.1. Caching the Pheromone and Heuristic Information*

The probability of choosing a node *j* being at node *i* is associated with [*τij*] *<sup>α</sup>*[*ηij*] *<sup>β</sup>*. Each of those values need to be computed by all ants, hence, in order to reduce the computation time [2] an additional matrix, choice*inf o*[·][·], is utilized to cache them. For this caching computation, a 2-D domain range is employed and defined as

$$\begin{aligned} \text{global}^{0}\_{\text{size}} & \leftarrow \text{number}\_{\text{nodes}} \\ \text{global}^{1}\_{\text{size}} & \leftarrow \text{number}\_{\text{nodes}} \end{aligned} \tag{2}$$

with the corresponding kernel described in Algorithm 4.


Whenever the pheromone trails *τ* is modified (4.1 and 4.4), the matrix choice*inf o* also needs to be updated since it depends on the former. In other words, the caching data is recalculated at each iteration, just before the actual construction of the solution.

## *4.2.2. Ant-based Parallelism (AP)*

In this strategy, each ant is associated with a work-item, each one responsible for constructing a complete solution, managing all data required for this phase (list of visited cities, probabilities calculations, and so on). Algorithm 5 presents a kernel which implements the AS decision rule, where the 1-D domain range is set as

$$
gamma\_{size} \gets number\_{ants} \tag{3}
$$

The matrix of candidate solutions (solution[·][·]) stores the ants' paths, with each row representing a complete ant's solution. The set of visited nodes, visited[·], keeps track of the current visited nodes for each ant, preventing duplicate selection as forbidden by the TSP: the *i*-th element is set to *true* when the *i*-th node is chosen to be part of the ant's solution (initially all elements are set to *false*). At a current node *c*, selection*prob*[*i*] stores the probability of each node *i* being selected, which is based on the pheromone trails and heuristic information—such data is cached in choice*inf o*[*c*][*i*].

```
Algorithm 5: OpenCL kernel for the ant-based solution construction
```

```
// Initialization
visited[·] ← f alse;
// Selection of the initial node
Initialnode ← Random(0, numbernodes − 1);
solution[globalid × numbernodes + 0] ← Initialnode;
visited[globalid × numbernodes + Initialnode] ← true;
for step ← 1 to numbernodes − 1 do
    sumprob ← 0.0;
    currentnode ← solution[globalid × numbernodes + (step − 1)];
    // Calculation of the nodes' probabilities
    for i ← 0 to numbernodes − 1 do
        if visited[globalid × numbernodes + i] then
            selectionprob[globalid × numbernodes + i] ← 0.0;
        else
            selectionprob[globalid × numbernodes + i] ← choiceinf o [currentnode × numbernodes + i];
             sumprob ← sumprob + selectionprob[globalid × numbernodes + i];
    // Node selection via roulette wheel
    r ← Random(0,sumprob);
    i ← 0;
    p ← selectionprob[globalid × numbernodes + 0];
    while p < r do
        i ← i + 1;
        p ← p + selectionprob[globalid × numbernodes + i];
    solution[globalid × numbernodes + step] ← i;
    visited[globalid × numbernodes + i] ← true;
```
to a second level of parallelism: each work-group takes care of an ant, with each work-item

The simplest strategy for evaluating the solutions is to parallelize by the number of ants, assigning each solution evaluation to a work-item. In this case, the kernel could be written

The cost resulting from the evaluation of the complete solution of ant *k*, which in the kernel

*globalsize* ← *numberants* (4)

Strategies for Parallel Ant Colony Optimization on Graphics Processing Units

http://dx.doi.org/10.5772/51679

77

within this group in charge of a subset of the solution.

as in Algorithm 6, with the 1-D domain range as

*<sup>j</sup>* ← solution[*globalid* × *numbernodes* + *<sup>i</sup>*]; *<sup>h</sup>* ← solution[*globalid* × *numbernodes* + (*<sup>i</sup>* + <sup>1</sup>)];

*<sup>h</sup>* ← solution[*globalid* × *numbernodes* + <sup>0</sup>];

*4.3.2. Data-based Evaluation (DE)*

let us call it *δ*.

*<sup>j</sup>* ← solution[*globalid* × *numbernodes* + (*numbernodes* − <sup>1</sup>)];

**Algorithm 6:** OpenCL kernel for the ant-based evaluation

solution*value*[*globalid*] ← solution*value*[*globalid*] + *<sup>η</sup>*[*<sup>j</sup>* × *numbernodes* + *<sup>h</sup>*];

instance, the sum of an array of 8 nodes is computed in just 3 iterations.

is denoted by *globalid*, is put into the array solution*value*[*k*] of dimension *numberants*.

This second strategy adds one more level of parallelism than the one previously presented. In the case of TSP, the costs of traveling from node *i* to *j*, *j* to *k* and so on can be summed up in parallel. To this end, the parallel primitive known as *prefix sum* is employed [30]. Its idea is illustrated in Figure 3, where *w*<sup>0</sup> ... *wN*−<sup>1</sup> correspond to the work-items within a work-group. The computational step complexity of the parallel prefix sum is *O*(*log*2*N*), meaning that, for

In order to apply this primitive to a TSP's solution, a preparatory step is required: the cost for each adjacent node must be obtained from the distance matrix and put into an array,

also describes the subsequent prefix sum procedure. In the kernel, the helper function Distance(*k*, *i*) returns the distance between the node *i* and *i* + 1 for ant *k*; when *i* is the last node, the function returns the distance from this one to the first node. One can notice the use of the function Barrier(). In OpenCL, a barrier is a synchronization point that ensures that a memory region written by other work-items is consistent at that point. The first barrier is necessary because *<sup>δ</sup>*[*localid* − *<sup>s</sup>*] references a memory region that was written by the *<sup>s</sup>*-th previous work-item. As for the second barrier, it is needed to prevent *δ*[*localid*] from being updated before the *s*-th next work-item reads it. Finally, the final sum, which ends up at the

last element of *δ*, is stored in the solution*value* vector for the ant indexed by *groupid*. <sup>11</sup> To improve efficiency, the array *δ* could—and frequently is—be allocated directly in the *local* memory (cf. 2.1).

<sup>11</sup> This preprocessing is done in parallel, as shown in Algorithm 7, which

solution*value*[*globalid*] ← solution*value*[*globalid*] + *<sup>η</sup>*[*<sup>j</sup>* × *numbernodes* + *<sup>h</sup>*];

*4.3.1. Ant-based Evaluation (AE)*

solution*value*[*globalid*] ← 0.0; **for** *<sup>i</sup>* ← <sup>0</sup> **to** *numbernodes* − <sup>2</sup> **do**

The function Random(*a*, *b*) returns a uniform real-valued pseudo-number between *a* and *b*. The random number generator could be easily implemented on the GPU through the simple linear congruential method [29]; the only requirement would be to keep in the device's global memory a *state* information (an integral number) for each work-item that must persist across kernel executions.

There exist data-based parallel strategies for the construction of the solutions, where usually a work-group takes care of an ant and its work-items compute in parallel some portion of the construction procedure. For instance, the ANT*block* strategy in [27], which in parallel evaluates and chooses the next node (city) from all the possible candidates. However, those strategies are considerably more complex than the ant-based parallelism, and for large-scale problems in which the number of ants is reasonably high—i.e. the class of problems that one would make use of GPUs—the ant-based strategy is enough to saturate the GPU.

## **4.3. Solution evaluation**

When all solutions are constructed, they must be evaluated. The direct approach is to parallelize this step by the number of ants, dedicating a work-item per solution. However, in many problems it is possible to decompose the evaluation of the solution itself, leading to a second level of parallelism: each work-group takes care of an ant, with each work-item within this group in charge of a subset of the solution.

## *4.3.1. Ant-based Evaluation (AE)*

14 Ant Colony Optimization

// Initialization visited[·] ← *f alse*;

*sumprob* ← 0.0;

**else**

*i* ← 0;

**while** *p* < *r* **do** *i* ← *i* + 1;

kernel executions.

**4.3. Solution evaluation**

// Selection of the initial node *Initialnode* ← Random(0, *numbernodes* − <sup>1</sup>); solution[*globalid* × *numbernodes* + <sup>0</sup>] ← *Initialnode*; visited[*globalid* × *numbernodes* + *Initialnode*] ← *true*;

**for** *step* ← <sup>1</sup> **to** *numbernodes* − <sup>1</sup> **do**

*<sup>r</sup>* ← Random(0,*sumprob*);

**for** *<sup>i</sup>* ← <sup>0</sup> **to** *numbernodes* − <sup>1</sup> **do**

**Algorithm 5:** OpenCL kernel for the ant-based solution construction

*currentnode* ← solution[*globalid* × *numbernodes* + (*step* − <sup>1</sup>)]; // Calculation of the nodes' probabilities

selection*prob*[*globalid* × *numbernodes* + *<sup>i</sup>*] ← 0.0;

*sumprob* ← *sumprob* + selection*prob*[*globalid* × *numbernodes* + *<sup>i</sup>*];

selection*prob*[*globalid* × *numbernodes* + *<sup>i</sup>*] ← choice*inf o* [*currentnode* × *numbernodes* + *<sup>i</sup>*];

The function Random(*a*, *b*) returns a uniform real-valued pseudo-number between *a* and *b*. The random number generator could be easily implemented on the GPU through the simple linear congruential method [29]; the only requirement would be to keep in the device's global memory a *state* information (an integral number) for each work-item that must persist across

There exist data-based parallel strategies for the construction of the solutions, where usually a work-group takes care of an ant and its work-items compute in parallel some portion of the construction procedure. For instance, the ANT*block* strategy in [27], which in parallel evaluates and chooses the next node (city) from all the possible candidates. However, those strategies are considerably more complex than the ant-based parallelism, and for large-scale problems in which the number of ants is reasonably high—i.e. the class of problems that one

When all solutions are constructed, they must be evaluated. The direct approach is to parallelize this step by the number of ants, dedicating a work-item per solution. However, in many problems it is possible to decompose the evaluation of the solution itself, leading

would make use of GPUs—the ant-based strategy is enough to saturate the GPU.

**if** *visited*[*globalid* × *numbernodes* + *<sup>i</sup>*] **then**

// Node selection via roulette wheel

*<sup>p</sup>* ← *<sup>p</sup>* + selection*prob*[*globalid* × *numbernodes* + *<sup>i</sup>*];

*<sup>p</sup>* ← selection*prob*[*globalid* × *numbernodes* + <sup>0</sup>];

solution[*globalid* × *numbernodes* + *step*] ← *<sup>i</sup>*; visited[*globalid* × *numbernodes* + *<sup>i</sup>*] ← *true*;

The simplest strategy for evaluating the solutions is to parallelize by the number of ants, assigning each solution evaluation to a work-item. In this case, the kernel could be written as in Algorithm 6, with the 1-D domain range as

$$
\log \text{lobal}\_{\text{size}} \leftarrow \text{number}\_{\text{ants}} \tag{4}
$$

The cost resulting from the evaluation of the complete solution of ant *k*, which in the kernel


is denoted by *globalid*, is put into the array solution*value*[*k*] of dimension *numberants*.

## *4.3.2. Data-based Evaluation (DE)*

This second strategy adds one more level of parallelism than the one previously presented. In the case of TSP, the costs of traveling from node *i* to *j*, *j* to *k* and so on can be summed up in parallel. To this end, the parallel primitive known as *prefix sum* is employed [30]. Its idea is illustrated in Figure 3, where *w*<sup>0</sup> ... *wN*−<sup>1</sup> correspond to the work-items within a work-group. The computational step complexity of the parallel prefix sum is *O*(*log*2*N*), meaning that, for instance, the sum of an array of 8 nodes is computed in just 3 iterations.

In order to apply this primitive to a TSP's solution, a preparatory step is required: the cost for each adjacent node must be obtained from the distance matrix and put into an array, let us call it *δ*. <sup>11</sup> This preprocessing is done in parallel, as shown in Algorithm 7, which also describes the subsequent prefix sum procedure. In the kernel, the helper function Distance(*k*, *i*) returns the distance between the node *i* and *i* + 1 for ant *k*; when *i* is the last node, the function returns the distance from this one to the first node. One can notice the use of the function Barrier(). In OpenCL, a barrier is a synchronization point that ensures that a memory region written by other work-items is consistent at that point. The first barrier is necessary because *<sup>δ</sup>*[*localid* − *<sup>s</sup>*] references a memory region that was written by the *<sup>s</sup>*-th previous work-item. As for the second barrier, it is needed to prevent *δ*[*localid*] from being updated before the *s*-th next work-item reads it. Finally, the final sum, which ends up at the last element of *δ*, is stored in the solution*value* vector for the ant indexed by *groupid*.

<sup>11</sup> To improve efficiency, the array *δ* could—and frequently is—be allocated directly in the *local* memory (cf. 2.1).

*4.3.3. Finding the Best Solution*

It is important at each iteration to keep track of the best-so-far solution. This could be achieved naively by iterating over all the evaluated solutions sequentially. There is though a parallel alternative to that which utilizes a primitive, analogous to the previous one, called *reduction* [30]. The idea of the parallel reduction is visualized in Figure 4. It

Strategies for Parallel Ant Colony Optimization on Graphics Processing Units

http://dx.doi.org/10.5772/51679

79

starts by comparing the elements of an array—that is, solution*value*—by pairs to find the smallest element between each pair. The next iteration finds the smallest values of the previously reduced ones, then the process continues until a single value remains; this is the smallest element—or cost—of the entire array. The implementation is somewhat similar to the prefix sum, and will not be detailed here. The global and local sizes should both be set to *numberants*, meaning that the reduction will occur within one work-group since synchronization is required. The actual implementation will also need a mapping between the cost values (the solution*value* array) and the corresponding solutions in order to link the

After all ants have constructed their tours (solutions), the pheromone trails are updated. In AS, the pheromone update step starts evaporating *all* arcs by a constant factor, followed by a

In the pheromone evaporation, each element of the pheromone matrix has its value decreased by a constant factor *ρ* ∈ (0, 1]. Hence, the parallel implementation can explore parallelism in the order of *numbernodes* × *numbernodes*. For this step, the kernel can be described as in

*size* <sup>←</sup> *numbernodes*

(7)

*size* <sup>←</sup> *numbernodes*

*global*<sup>0</sup>

*global*<sup>1</sup>

**Figure 4.** *O*(*log*2*N*) parallel reduction: the remaining element is the smallest of the array.

smallest cost found with the respective solution.

Algorithm 8, with the 2-D domain range given by

reinforcement on the arcs visited by the ants in their tours.

**4.4. Pheromone Trails Updating**

*4.4.1. Pheromone Evaporation*

**Figure 3.** Parallel *prefix sum*: each element of the final array is the sum of all the previous elements, i.e. the partial cost; the last element is the total cost.

## **Algorithm 7:** OpenCL kernel for the data-based evaluation

```
// Preparatory step
δ[localid] ← Distance(groupid, localid);
// Prefix sum
tmp ← δ[localid];
s ← 1;
while s < localsize do
    Barrier();
    if localid ≥ s then
        tmp ← δ[localid] + δ[localid − s];
    Barrier();
    δ[localid] ← tmp;
    s ← s × 2;
if localid = groupsize − 1 then
    solutionvalue[groupid] ← δ[groupsize − 1];
```
Regarding the *N*-D domain definition, since there are *numberants* ants and for each ant (solution) there are *numbernodes* distances, the global size is given by

$$
\times global\_{size} \leftarrow number\_{ants} \times number\_{nodes}\tag{5}
$$

and the local size, i.e. the amount of work-items devoted to compute the total cost per solution, simply by

$$local\_{size} \gets number\_{nodes\text{\textquotedblleft}nodes\text{\textquotedblright}} \tag{6}$$

resulting in *numberants* work-groups (one per ant).<sup>12</sup>

<sup>12</sup> For the sake of simplicity, it is assumed that the number of nodes (cities) is such that the resulting local size is less than the device's maximum supported local size, a hardware limit. If this is not the case, then Algorithm 7 should be modified in such a way that each work-item would compute more than just one partial sum.

## *4.3.3. Finding the Best Solution*

16 Ant Colony Optimization

last element is the total cost.

// Preparatory step

// Prefix sum *tmp* ← *<sup>δ</sup>*[*localid*];

**while** *s* < *localsize* **do** Barrier(); **if** *localid* ≥ *<sup>s</sup>* **then**

> Barrier(); *<sup>δ</sup>*[*localid*] ← *tmp*; *s* ← *s* × 2;

solution, simply by

**if** *localid* = *groupsize* − <sup>1</sup> **then**

*s* ← 1;

*<sup>δ</sup>*[*localid*] ← Distance(*groupid, localid*);

*tmp* ← *<sup>δ</sup>*[*localid*] + *<sup>δ</sup>*[*localid* − *<sup>s</sup>*];

solution*value*[*groupid*] ← *<sup>δ</sup>*[*groupsize* − <sup>1</sup>];

resulting in *numberants* work-groups (one per ant).<sup>12</sup>

**Figure 3.** Parallel *prefix sum*: each element of the final array is the sum of all the previous elements, i.e. the partial cost; the

Regarding the *N*-D domain definition, since there are *numberants* ants and for each ant

and the local size, i.e. the amount of work-items devoted to compute the total cost per

<sup>12</sup> For the sake of simplicity, it is assumed that the number of nodes (cities) is such that the resulting local size is less than the device's maximum supported local size, a hardware limit. If this is not the case, then Algorithm 7 should

be modified in such a way that each work-item would compute more than just one partial sum.

*globalsize* ← *numberants* × *numbernodes* (5)

*localsize* ← *numbernodes*, (6)

(solution) there are *numbernodes* distances, the global size is given by

**Algorithm 7:** OpenCL kernel for the data-based evaluation

It is important at each iteration to keep track of the best-so-far solution. This could be achieved naively by iterating over all the evaluated solutions sequentially. There is though a parallel alternative to that which utilizes a primitive, analogous to the previous one, called *reduction* [30]. The idea of the parallel reduction is visualized in Figure 4. It

**Figure 4.** *O*(*log*2*N*) parallel reduction: the remaining element is the smallest of the array.

starts by comparing the elements of an array—that is, solution*value*—by pairs to find the smallest element between each pair. The next iteration finds the smallest values of the previously reduced ones, then the process continues until a single value remains; this is the smallest element—or cost—of the entire array. The implementation is somewhat similar to the prefix sum, and will not be detailed here. The global and local sizes should both be set to *numberants*, meaning that the reduction will occur within one work-group since synchronization is required. The actual implementation will also need a mapping between the cost values (the solution*value* array) and the corresponding solutions in order to link the smallest cost found with the respective solution.

## **4.4. Pheromone Trails Updating**

After all ants have constructed their tours (solutions), the pheromone trails are updated. In AS, the pheromone update step starts evaporating *all* arcs by a constant factor, followed by a reinforcement on the arcs visited by the ants in their tours.

## *4.4.1. Pheromone Evaporation*

In the pheromone evaporation, each element of the pheromone matrix has its value decreased by a constant factor *ρ* ∈ (0, 1]. Hence, the parallel implementation can explore parallelism in the order of *numbernodes* × *numbernodes*. For this step, the kernel can be described as in Algorithm 8, with the 2-D domain range given by

$$\begin{aligned} \text{global}^{0}\_{size} &\leftarrow \text{number}\_{nodes} \\ \text{global}^{1}\_{size} &\leftarrow \text{number}\_{nodes} \end{aligned} \tag{7}$$


## *4.4.2. Pheromone Updating*

After evaporation, ants deposit different quantities of pheromone on the arcs that they crossed. Therefore, in an ant-based parallel implementation each element of the pheromone matrix may potentially be updated by many ants at the same time, leading of course to memory inconsistency. An alternative is to parallelize on the ant's solution, taking advantage of the fact that in the TSP there is no duplicate node in a given solution. This strategy works on one ant *k* at a time, but all edges (*i*, *j*) are processed in parallel. Hence, the 1-D domain range is given by

$$
\log \text{lobal}\_{\text{size}} \leftarrow \text{number}\_{\text{nodes}} - 1,\tag{8}
$$

Finally, it is expected that this chapter will provide the readers with an extensive view of the existing ACO parallel strategies on the GPU and will assist them in developing new or

Strategies for Parallel Ant Colony Optimization on Graphics Processing Units

http://dx.doi.org/10.5772/51679

81

The authors thank the support from the Brazilian agencies CNPq (grants 141519/2010-0 and

<sup>⋆</sup> Address all correspondence to: jsangelo@lncc.br; douglas@lncc.br; hcbm@lncc.br

1 Laboratório Nacional de Computação Científica (LNCC/MCTI), Petropólis, RJ, Brazil

[1] Marco Dorigo. *Optimization, Learning and Natural Algorithms*. PhD thesis, Dipartimento

[3] Thomas Stutzle. Parallelization strategies for ant colony optimization. In *Proc. of PPSN-V, Fifth International Conference on Parallel Problem Solving from Nature*, pages

[4] Martín Pedemonte, Sergio Nesmachnow, and Héctor Cancela. A survey on parallel ant

[5] Stefan Janson, Daniel Merkle, and Martin Middendorf. *Parallel Ant Colony Algorithms*,

[6] Marco Dorigo, Eric Bonabeau, and Guy Theraulaz. *Swarm Intelligence*. Oxford University

[7] Marco Dorigo, Eric Bonabeau, and Guy Theraulaz. Ant algorithms and stigmergy.

[8] Marco Dorigo, Gianni Di Caro, and Luca M. Gambardella. Ant algorithms for discrete

[9] Marco Dorigo, Vittorio Maniezzo, and Alberto Colorni. Ant system: Optimization by a colony of cooperating agents. *IEEE Trans. on Systems, Man, and Cybernetics–Part B*,

colony optimization. *Appl. Soft Comput.*, 11(8):5181–5197, December 2011.

[2] Marco Dorigo and Thomas Stutzle. *Ant Colony Optimization*. The MIT Press, 2004.

derived parallel strategies to suit their particular needs.

308317/2009-2) and FAPERJ (grant E-26/102.025/2009).

2 Universidade Federal de Juiz de Fora (UFJF), MG, Brazil

di Elettronica, Politecnico di Milano, Milan, 1992.

pages 171–201. John Wiley and Sons, Inc., 2005.

*Future Gener. Comput. Syst.*, 16(9):851–871, 2000.

optimization. *Artificial Life*, 5:137–172, 1999.

Douglas A. Augusto1 and Helio J. C. Barbosa1,2

722–731. Springer-Verlag, 1998.

Press, Oxford, New York, 1999.

26(1):29–41, 1996.

**Acknowledgments**

**Author details** Jaqueline S. Angelo1,<sup>⋆</sup>,

**References**

with the corresponding kernel described in Algorithm 9. The kernel should be launched *numberants* times from the host code, each time passing a different *k* ∈ [0, *numberants*) as a kernel's argument—the only way of guaranteeing global memory consistency (synchronism) in OpenCL, which is necessary to prevent two or more ants from being processed simultaneously, is when a kernel finishes its execution.


*<sup>i</sup>* ← solution[*<sup>k</sup>* × *numbernodes* + *globalid*]; *<sup>j</sup>* ← solution[*<sup>k</sup>* × *numbernodes* + *globalid* + <sup>1</sup>]; *<sup>τ</sup>*[*<sup>i</sup>* × *numbernodes* + *<sup>j</sup>*] ← *<sup>τ</sup>*[*<sup>i</sup>* × *numbernodes* + *<sup>j</sup>*] + 1.0/solution*value*[*k*]; *<sup>τ</sup>*[*<sup>j</sup>* × *numbernodes* + *<sup>i</sup>*] ← *<sup>τ</sup>*[*<sup>i</sup>* × *numbernodes* + *<sup>j</sup>*];

## **5. Conclusions**

This chapter has presented and discussed different parallelization strategies for implementing an Ant Colony Optimization algorithm on Graphics Processing Unit, presenting also a list of references on previous works on this area.

The chapter also provided straightforward explanation of the GPU architecture and gave special attention to the Open Computing Language (OpenCL), explaining in details the concepts behind these two topics, which are often just mentioned in references in the literature.

It was shown that each step of an ACO algorithm, from the initialization phase through the return of the final solution, can be parallelized to some degree, at least at the granularity of the number of ants. For complex or large-scale problems—in which numerous ants would be desired—the ant-based parallel strategies should suffice to fully explore the computational power of the GPUs.

Although the chapter has focused on a particular computing architecture, the GPU, all the described kernels can be promptly executed on any other OpenCL parallel device, such as the multi-core CPUs.

Finally, it is expected that this chapter will provide the readers with an extensive view of the existing ACO parallel strategies on the GPU and will assist them in developing new or derived parallel strategies to suit their particular needs.

## **Acknowledgments**

18 Ant Colony Optimization

*id* <sup>×</sup> *global*<sup>1</sup>

*4.4.2. Pheromone Updating*

range is given by

**5. Conclusions**

literature.

power of the GPUs.

the multi-core CPUs.

*size* + *global*<sup>1</sup>

simultaneously, is when a kernel finishes its execution.

*<sup>i</sup>* ← solution[*<sup>k</sup>* × *numbernodes* + *globalid*]; *<sup>j</sup>* ← solution[*<sup>k</sup>* × *numbernodes* + *globalid* + <sup>1</sup>];

*<sup>τ</sup>*[*<sup>j</sup>* × *numbernodes* + *<sup>i</sup>*] ← *<sup>τ</sup>*[*<sup>i</sup>* × *numbernodes* + *<sup>j</sup>*];

**Algorithm 9:** OpenCL kernel for updating the pheromone for ant *k*

presenting also a list of references on previous works on this area.

*<sup>τ</sup>*[*<sup>i</sup>* × *numbernodes* + *<sup>j</sup>*] ← *<sup>τ</sup>*[*<sup>i</sup>* × *numbernodes* + *<sup>j</sup>*] + 1.0/solution*value*[*k*];

*τ*[*global*<sup>0</sup>

**Algorithm 8:** OpenCL kernel for computing the pheromone evaporation

*id*] <sup>←</sup> (<sup>1</sup> <sup>−</sup> *<sup>ρ</sup>*) <sup>×</sup> *<sup>τ</sup>*[*global*<sup>0</sup>

*id* <sup>×</sup> *global*<sup>1</sup>

After evaporation, ants deposit different quantities of pheromone on the arcs that they crossed. Therefore, in an ant-based parallel implementation each element of the pheromone matrix may potentially be updated by many ants at the same time, leading of course to memory inconsistency. An alternative is to parallelize on the ant's solution, taking advantage of the fact that in the TSP there is no duplicate node in a given solution. This strategy works on one ant *k* at a time, but all edges (*i*, *j*) are processed in parallel. Hence, the 1-D domain

with the corresponding kernel described in Algorithm 9. The kernel should be launched *numberants* times from the host code, each time passing a different *k* ∈ [0, *numberants*) as a kernel's argument—the only way of guaranteeing global memory consistency (synchronism) in OpenCL, which is necessary to prevent two or more ants from being processed

This chapter has presented and discussed different parallelization strategies for implementing an Ant Colony Optimization algorithm on Graphics Processing Unit,

The chapter also provided straightforward explanation of the GPU architecture and gave special attention to the Open Computing Language (OpenCL), explaining in details the concepts behind these two topics, which are often just mentioned in references in the

It was shown that each step of an ACO algorithm, from the initialization phase through the return of the final solution, can be parallelized to some degree, at least at the granularity of the number of ants. For complex or large-scale problems—in which numerous ants would be desired—the ant-based parallel strategies should suffice to fully explore the computational

Although the chapter has focused on a particular computing architecture, the GPU, all the described kernels can be promptly executed on any other OpenCL parallel device, such as

*size* + *global*<sup>1</sup>

*globalsize* ← *numbernodes* − 1, (8)

*id*];

The authors thank the support from the Brazilian agencies CNPq (grants 141519/2010-0 and 308317/2009-2) and FAPERJ (grant E-26/102.025/2009).

## **Author details**

Jaqueline S. Angelo1,<sup>⋆</sup>, Douglas A. Augusto1 and Helio J. C. Barbosa1,2

<sup>⋆</sup> Address all correspondence to: jsangelo@lncc.br; douglas@lncc.br; hcbm@lncc.br

1 Laboratório Nacional de Computação Científica (LNCC/MCTI), Petropólis, RJ, Brazil 2 Universidade Federal de Juiz de Fora (UFJF), MG, Brazil

## **References**


[10] R.J. Mullen, D. Monekosso, S. Barman, and P. Remagnino. A review of ant algorithms. *Expert Systems with Applications*, 36(6):9608 – 9617, 2009.

[25] S. Tsutsui and N. Fujimoto. Fast qap solving by aco with 2-opt local search on a gpu.

Strategies for Parallel Ant Colony Optimization on Graphics Processing Units

http://dx.doi.org/10.5772/51679

83

[26] Min Li, Kelson Gent, and Michael S. Hsiao. Utilizing gpgpus for design validation with a modified ant colony optimization. *High-Level Design, Validation, and Test Workshop,*

[27] A. Delévacq, P. Delisle, M. Gravel, and M. Krajecki. Parallel ant colony optimization on

[28] José M. Cecilia, José M. García, Andy Nisbet, Martyn Amos, and Manuel Ujaldón. Enhancing data parallelism for ant colony optimization on gpus. *Journal of Parallel and*

[29] Donald E. Knuth. *Art of Computer Programming, Volume 2: Seminumerical Algorithms (3rd*

[30] W. Daniel Hillis and Guy L. Steele, Jr. Data parallel algorithms. *Commun. ACM*,

graphics processing units. *J. Parallel Distrib. Comput.*, 2012.

*Edition)*. Addison-Wesley Professional, 3 edition, November 1997.

pages 812 –819, june 2011.

*Distributed Computing*, 2012.

29(12):1170–1183, December 1986.

*IEEE International*, 0:128–135, 2011.


[25] S. Tsutsui and N. Fujimoto. Fast qap solving by aco with 2-opt local search on a gpu. pages 812 –819, june 2011.

20 Ant Colony Optimization

[10] R.J. Mullen, D. Monekosso, S. Barman, and P. Remagnino. A review of ant algorithms.

[11] J.L. Hennessy and D.A. Patterson. *Computer Architecture: A Quantitative Approach*. The Morgan Kaufmann Series in Computer Architecture and Design. Elsevier Science, 2011.

[12] Ethan Mollick. Establishing Moore's law. *IEEE Ann. Hist. Comput.*, 28:62–75, July 2006.

[13] Michael Garland and David B. Kirk. Understanding throughput-oriented architectures.

[14] Khronos Group. OpenCL - the open standard for parallel programming of

[15] Khronos OpenCL Working Group. *The OpenCL Specification, version 1.2*, November 2011.

[16] Douglas A. Augusto and Helio J.C. Barbosa. Accelerated parallel genetic programming tree evaluation with opencl. *Journal of Parallel and Distributed Computing*, (0):–, 2012.

[17] Advanced Micro Devices. *AMD Accelerated Parallel Processing Programming Guide -*

[19] A. Catala, J. Jaen, and J.A. Modioli. Strategies for accelerating ant colony optimization algorithms on graphical processing units. In *Evolutionary Computation, 2007. CEC 2007.*

[20] Hongtao Bai, Dantong OuYanga, Ximing Li, Lili He, and Haihong Yu. MAX-MIN ant system on GPU with CUDA. In *Fourth International Conference on Innovative Computing,*

[21] Weihang Zhu and James Curry. Parallel ant colony for nonlinear function optimization with graphics hardware acceleration. In *Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics*, SMC'09, pages 1803–1808. IEEE Press, 2009.

[22] Jie Fu, Lin Lei, and Guohua Zhou. A parallel ant colony optimization algorithm with gpu-acceleration based on all-in-roulette selection. In *Advanced Computational Intelligence*

[23] Jose A. Mocholi, Javier Jaen, Alejandro Catala, and Elena Navarro. An emotionally biased ant colony algorithm for pathfinding in games. *Expert Systems with Applications*,

[24] Nicholas A. Sinnott-Armstrong, Casey S. Greene, and Jason H. Moore. Fast genome-wide epistasis analysis using ant colony optimization for multifactor dimensionality reduction analysis on graphics processing units. In *Proceedings of the 12th annual conference on Genetic and evolutionary computation*, GECCO 2010, pages 215–216,

*(IWACI), 2010 Third International Workshop on*, pages 260–264, 2010.

*Expert Systems with Applications*, 36(6):9608 – 9617, 2009.

*Commun. ACM*, 53:58–66, November 2010.

[18] NVIDIA Corporation. *OpenCL Best Practices Guide*, 2010.

*IEEE Congress on*, pages 492 –500, 2007.

*Information and Control*, pages 801–804, 2009.

heterogeneous systems.

*OpenCL*, 12 2010.

37:4921–4927, 2010.

New York, NY, USA, 2010. ACM.


**Section 2**

**Applications**

**Section 2**

## **Applications**

**Chapter 4**

**An Ant Colony Optimization Algorithm for Area Traffic**

The optimization of traffic signal control is at the heart of urban traffic control. Traffic signal control which encloses delay, queuing, pollution, fuel consumption is a multi-objective opti‐ mization. For a signal-controlled road network, using the optimization techniques in deter‐ mining signal timings has been discussed greatly for decades. Due to complexity of the Area Traffic Control (ATC) problem, new methods and approaches are needed to improve effi‐ ciency of signal control in a signalized road network. In urban networks, traffic signals are used to control vehicle movements so as to reduce congestion, improve safety, and enable specific strategies such as minimizing delays, improving environmental pollution, etc. [1]. Signal systems that control road junctions are operated according to the type of junction. Al‐ though the optimization of signal timings for an isolated junction is relatively easy, the opti‐ mization of signal timings in coordinated road networks requires further research due to the "*offset*" term. Early methods such as that of [2] only considered an isolated signalized junc‐ tion. Later, fixed time strategies were developed that optimizing a group of signalized junc‐ tions using historical flow data [3]. For the ATC, TRANSYT-7F is one of the most useful network study software tools for optimizing signal timing and also the most widely used program of its type. It consists of two main parts: A traffic flow model and a signal timing optimizer. Traffic model utilizes a platoon dispersion algorithm that simulates the normal dispersion of platoons as they travel downstream. It simulates traffic in a network of signal‐ ized intersections to produce a cyclic flow profile of arrivals at each intersection that is used to compute a Performance Index (*PI*) for a given signal timing and staging plan. The *PI* in TRANSYT-7F may be defined in a number of ways. One of the TRANSYT-7F's *PI* is the Dis‐ utility Index (*DI*). The *DI* is a measure of disadvantageous operation; that is stops, delay, fuel consumption, etc. Optimization in TRANSYT-7F consists of a series of trial simulation

> © 2013 Haldenbilen et al.; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

> © 2013 Haldenbilen et al.; licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Soner Haldenbilen, Ozgur Baskan and Cenk Ozan

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51695

**1. Introduction**

**Control**

## **An Ant Colony Optimization Algorithm for Area Traffic Control**

Soner Haldenbilen, Ozgur Baskan and Cenk Ozan

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51695

## **1. Introduction**

The optimization of traffic signal control is at the heart of urban traffic control. Traffic signal control which encloses delay, queuing, pollution, fuel consumption is a multi-objective opti‐ mization. For a signal-controlled road network, using the optimization techniques in deter‐ mining signal timings has been discussed greatly for decades. Due to complexity of the Area Traffic Control (ATC) problem, new methods and approaches are needed to improve effi‐ ciency of signal control in a signalized road network. In urban networks, traffic signals are used to control vehicle movements so as to reduce congestion, improve safety, and enable specific strategies such as minimizing delays, improving environmental pollution, etc. [1]. Signal systems that control road junctions are operated according to the type of junction. Al‐ though the optimization of signal timings for an isolated junction is relatively easy, the opti‐ mization of signal timings in coordinated road networks requires further research due to the "*offset*" term. Early methods such as that of [2] only considered an isolated signalized junc‐ tion. Later, fixed time strategies were developed that optimizing a group of signalized junc‐ tions using historical flow data [3]. For the ATC, TRANSYT-7F is one of the most useful network study software tools for optimizing signal timing and also the most widely used program of its type. It consists of two main parts: A traffic flow model and a signal timing optimizer. Traffic model utilizes a platoon dispersion algorithm that simulates the normal dispersion of platoons as they travel downstream. It simulates traffic in a network of signal‐ ized intersections to produce a cyclic flow profile of arrivals at each intersection that is used to compute a Performance Index (*PI*) for a given signal timing and staging plan. The *PI* in TRANSYT-7F may be defined in a number of ways. One of the TRANSYT-7F's *PI* is the Dis‐ utility Index (*DI*). The *DI* is a measure of disadvantageous operation; that is stops, delay, fuel consumption, etc. Optimization in TRANSYT-7F consists of a series of trial simulation

© 2013 Haldenbilen et al.; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2013 Haldenbilen et al.; licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

runs, using the TRANSYT-7F simulation engine. Each simulation run is assigned a unique signal timing plan by the optimization processor. The optimizer applies the Hill-Climbing (HC) or Genetic Algorithm (GA) searching strategies. The trial simulation run resulting in the best performance is reported as optimal. Although the GA is mathematically better suit‐ ed for determining the absolute or global optimal solution, relative to HC optimization, it generally requires longer program running times, relative to HC optimization [4].

lem under different traffic demands. A hybrid optimization algorithm for simultaneously solving delay-minimizing and capacity-maximizing ATC was presented by [14]. Numerical computations and comparisons were conducted on a variety of road networks. Numerical tests showed that the effectiveness and robustness of this hybrid heuristic algorithm. Similar‐ ly, Chiou (2007) presented a computation algorithm based on the projected Quasi-Newton method to effectively solve the ATC problem. The proposed method combining the locally optimal search and global search heuristic achieved substantially better performance than did

An Ant Colony Optimization Algorithm for Area Traffic Control

http://dx.doi.org/10.5772/51695

89

traditional approaches in solving the ATC problem with expansions of link capacity.

is about the conclusions.

**2. Ant Colony Optimization**

Dan and Xiaohong (2008) developed a real-coded improved GA with microscopic traffic simulation model to find optimal signal plans for ATC problem, which takes the coordina‐ tion of signals timing for all signal-controlled junction into account. The results showed that the method based on GA could minimize delay time and improve capacity of network. Li (2011) presented an arterial signal optimization model that consider queue blockage among intersection lane groups under oversaturated conditions. The proposed model captures traf‐ fic dynamics with the cell transmission concept, which takes into account complex flow in‐ teractions among different lane groups. Through comparisons with signal-timing plans from TRANSYT-7F, the model was successful for signal-timing optimization particularly under congested conditions. The optimization of signal timings on coordinated signalized road network, which includes a set of non-linear mathematical formulations, is very difficult. Therefore, new methods and approaches are needed to improve efficiency of signal control in a road network due to complexity of the ATC problem. Although there are many studies in literature with different heuristic methods to optimize traffic signal timings, there is no application of ACO to this area. Thus, this study proposes Ant Colony Optimization TRANSYT-7F (ACOTRANS) model in which ACO and TRANSYT-7F are combined for solv‐ ing the ATC problem. The remaining content of this chapter is organized as follows. ACO algorithm and its solution process are given in Section 2, and definition of the ACOTRANS model is provided in Section 3. Numerical application is presented in Section 4. Last section

Ant algorithms were inspired by the observation of real ant colonies. Ants are social insects that live in colonies and whose behaviour is directed more to the survival of the colony as a whole than to that of a single individual component of the colony. Social insects have cap‐ tured the attention of many scientists because of the high structuration level their colonies can achieve, especially when compared to the relative simplicity of the colony's individuals. An important and interesting behaviour of ant colonies is their foraging behaviour, and, in particular, how ants can find shortest paths between food sources and their nest [18]. Ants are capable of finding the shortest path from food source to their nest or vice versa by smell‐ ing pheromones which are chemical substances they leave on the ground while walking. Each ant probabilistically prefers to follow a direction rich in pheromone. This behaviour of real ants can be used to explain how they can find a shortest path [19]. The main idea is that

This chapter proposes Ant Colony Optimization (ACO) based algorithm called ACORSES proposed by [5] for finding optimum signal parameters in coordinated signalized networks for given fixed set of link flows. The ACO is the one of the most recent techniques for ap‐ proximate optimization methods. The main idea is that it is indirect local communication among the individuals of a population of artificial ants. The core of ant's behaviour is the communication between the ants by means of chemical pheromone trails, which enables them to find shortest paths between their nest and food sources. This behaviour of real ant colonies is exploited to solve optimization problems. The proposed algorithm is based on each ant searches only around the best solution of the previous iteration with reduced search space. It is proposed for improving ACO's solution performance to reach global opti‐ mum fairly quickly. In this study, for solving the ATC problem, Ant Colony Optimization TRANSYT (ACOTRANS) model is developed. TRANSYT-7F traffic model is used to esti‐ mate total network *PI*.

Wong (1995) proposed group-based optimization of signal timings for area traffic control. In addition, the optimization of signal timings for ATC using group-based control variables was proposed by [7]. However, it was reported that obtaining the derivations of the *PI* for each of the control variable was mathematically difficult. Heydecker (1996) decomposed the optimization of traffic signal timings into two levels; first, optimizing the signal timings at the individual junction level using the group-based approach, and second, combining the re‐ sults from individual junction level with network level decision variables such as offset and common cycle time. Wong et al. (2000) developed a time-dependent TRANSYT traffic model for the evaluation of *PI*. It was found that the time-dependent model produces a reasonable estimate of *PI* for under saturated to moderately oversaturated conditions. Wong et al. (2002) developed a time-dependent TRANSYT traffic model which is a weighted combina‐ tion of the estimated delay and number of stops. A remarkable improvement over the aver‐ age flow scenario was obtained and when compared with the signal plans from independent analyses, a good improvement was found. Girianna and Benekohal (2002) pre‐ sented two different GA techniques which are applied on signal coordination for oversatu‐ rated networks. Signal coordination was formulated as a dynamic optimization problem and is solved using GA for the entire duration of congestion.

Similarly, Ceylan (2006) developed a GA with TRANSYT-HC optimization routine, and pro‐ posed a method for decreasing the search space to solve the ATC problem. Proposed ap‐ proach is better than signal timing optimization regarding optimal values of timings and *PI* when it is compared with TRANSYT. Chen and Xu (2006) investigated the application of Particle Swarm Optimization (PSO) algorithm to solve signal timing optimization problem. Their results showed that PSO can be applied to the traffic signal timing optimization prob‐ lem under different traffic demands. A hybrid optimization algorithm for simultaneously solving delay-minimizing and capacity-maximizing ATC was presented by [14]. Numerical computations and comparisons were conducted on a variety of road networks. Numerical tests showed that the effectiveness and robustness of this hybrid heuristic algorithm. Similar‐ ly, Chiou (2007) presented a computation algorithm based on the projected Quasi-Newton method to effectively solve the ATC problem. The proposed method combining the locally optimal search and global search heuristic achieved substantially better performance than did traditional approaches in solving the ATC problem with expansions of link capacity.

Dan and Xiaohong (2008) developed a real-coded improved GA with microscopic traffic simulation model to find optimal signal plans for ATC problem, which takes the coordina‐ tion of signals timing for all signal-controlled junction into account. The results showed that the method based on GA could minimize delay time and improve capacity of network. Li (2011) presented an arterial signal optimization model that consider queue blockage among intersection lane groups under oversaturated conditions. The proposed model captures traf‐ fic dynamics with the cell transmission concept, which takes into account complex flow in‐ teractions among different lane groups. Through comparisons with signal-timing plans from TRANSYT-7F, the model was successful for signal-timing optimization particularly under congested conditions. The optimization of signal timings on coordinated signalized road network, which includes a set of non-linear mathematical formulations, is very difficult. Therefore, new methods and approaches are needed to improve efficiency of signal control in a road network due to complexity of the ATC problem. Although there are many studies in literature with different heuristic methods to optimize traffic signal timings, there is no application of ACO to this area. Thus, this study proposes Ant Colony Optimization TRANSYT-7F (ACOTRANS) model in which ACO and TRANSYT-7F are combined for solv‐ ing the ATC problem. The remaining content of this chapter is organized as follows. ACO algorithm and its solution process are given in Section 2, and definition of the ACOTRANS model is provided in Section 3. Numerical application is presented in Section 4. Last section is about the conclusions.

## **2. Ant Colony Optimization**

runs, using the TRANSYT-7F simulation engine. Each simulation run is assigned a unique signal timing plan by the optimization processor. The optimizer applies the Hill-Climbing (HC) or Genetic Algorithm (GA) searching strategies. The trial simulation run resulting in the best performance is reported as optimal. Although the GA is mathematically better suit‐ ed for determining the absolute or global optimal solution, relative to HC optimization, it

This chapter proposes Ant Colony Optimization (ACO) based algorithm called ACORSES proposed by [5] for finding optimum signal parameters in coordinated signalized networks for given fixed set of link flows. The ACO is the one of the most recent techniques for ap‐ proximate optimization methods. The main idea is that it is indirect local communication among the individuals of a population of artificial ants. The core of ant's behaviour is the communication between the ants by means of chemical pheromone trails, which enables them to find shortest paths between their nest and food sources. This behaviour of real ant colonies is exploited to solve optimization problems. The proposed algorithm is based on each ant searches only around the best solution of the previous iteration with reduced search space. It is proposed for improving ACO's solution performance to reach global opti‐ mum fairly quickly. In this study, for solving the ATC problem, Ant Colony Optimization TRANSYT (ACOTRANS) model is developed. TRANSYT-7F traffic model is used to esti‐

Wong (1995) proposed group-based optimization of signal timings for area traffic control. In addition, the optimization of signal timings for ATC using group-based control variables was proposed by [7]. However, it was reported that obtaining the derivations of the *PI* for each of the control variable was mathematically difficult. Heydecker (1996) decomposed the optimization of traffic signal timings into two levels; first, optimizing the signal timings at the individual junction level using the group-based approach, and second, combining the re‐ sults from individual junction level with network level decision variables such as offset and common cycle time. Wong et al. (2000) developed a time-dependent TRANSYT traffic model for the evaluation of *PI*. It was found that the time-dependent model produces a reasonable estimate of *PI* for under saturated to moderately oversaturated conditions. Wong et al. (2002) developed a time-dependent TRANSYT traffic model which is a weighted combina‐ tion of the estimated delay and number of stops. A remarkable improvement over the aver‐ age flow scenario was obtained and when compared with the signal plans from independent analyses, a good improvement was found. Girianna and Benekohal (2002) pre‐ sented two different GA techniques which are applied on signal coordination for oversatu‐ rated networks. Signal coordination was formulated as a dynamic optimization problem

Similarly, Ceylan (2006) developed a GA with TRANSYT-HC optimization routine, and pro‐ posed a method for decreasing the search space to solve the ATC problem. Proposed ap‐ proach is better than signal timing optimization regarding optimal values of timings and *PI* when it is compared with TRANSYT. Chen and Xu (2006) investigated the application of Particle Swarm Optimization (PSO) algorithm to solve signal timing optimization problem. Their results showed that PSO can be applied to the traffic signal timing optimization prob‐

and is solved using GA for the entire duration of congestion.

generally requires longer program running times, relative to HC optimization [4].

mate total network *PI*.

88 Ant Colony Optimization - Techniques and Applications

Ant algorithms were inspired by the observation of real ant colonies. Ants are social insects that live in colonies and whose behaviour is directed more to the survival of the colony as a whole than to that of a single individual component of the colony. Social insects have cap‐ tured the attention of many scientists because of the high structuration level their colonies can achieve, especially when compared to the relative simplicity of the colony's individuals. An important and interesting behaviour of ant colonies is their foraging behaviour, and, in particular, how ants can find shortest paths between food sources and their nest [18]. Ants are capable of finding the shortest path from food source to their nest or vice versa by smell‐ ing pheromones which are chemical substances they leave on the ground while walking. Each ant probabilistically prefers to follow a direction rich in pheromone. This behaviour of real ants can be used to explain how they can find a shortest path [19]. The main idea is that it is indirect local communication among the individuals of a population of artificial ants. The core of ant's behavior is the communication between the ants by means of chemical pheromone trails, which enables them to find shortest paths between their nest and food sources. This behaviour of real ant colonies is exploited to solve optimization problems [20]. The general ACO algorithm is illustrated in Fig. 1. The first step consists mainly on the initi‐ alization of the pheromone trail. At beginning, each ant builds a complete solution to the problem according to a probabilistic state transition rules. They depend mainly on the state of the pheromone.


### **Figure 1.** A generic ant algorithm.

Once all ants generate a solution, then global pheromone updating rule is applied in two phases; an evaporation phase, where a fraction of the pheromone evaporates, and a reinforce‐ ment phase, where each ant deposits an amount of pheromone which is proportional to the fitness. This process is repeated until stopping criteria is met. The ACORSES proposed by [5] is consisted of three main phases; Initialization, pheromone update and solution phase. All of these phases build a complete search to the global optimum as can be seen in Fig. 2.

**Figure 2.** Steps of ACORSES [5].

where *xt*

point *xt k*

expression (2).

*xt k* *k* (*new*)

cal optimum. Ant vector *xt*

Let number of *m* ants being associated with *m* random initial vectors(*x <sup>k</sup>*

( ) () ( 1, 2,....., ) *k new k old t t x x t I*

is the solution vector of the *k* th ant at cycle *t*, *xt*

*k* (*new*)

= ±

from the previous step at cycle *t,* and *α*is a vector generated randomly to determine the length of jump. *α*controls the global optimum search direction not being trapped at bad lo‐

same ant obtained from previous step. Furthermore, in expression (1), (+) sign is used when

is on the right of the best solution on the same axis. The direction of search is defined by

is on the left of the best solution on the *x* coordinate axis. (-) sign is used when point

a

<sup>=</sup> (1)

obtained at *t* th cycle in (1) is determined using the value of

*k* (*old* )

An Ant Colony Optimization Algorithm for Area Traffic Control

http://dx.doi.org/10.5772/51695

91

The solution vector of each ant is updated using following expression:

, *k* =1, 2, 3, ......*m*).

is the solution obtained

As shown in Figure 2, pheromone update phase is located after the initialization phase, means that quantity of pheromone intensifies at each iteration within the reduced search space. Thus, global optimum is searched within the reduced search space using best values obtained from new ant colony in the previous iteration. Main advantageous of the ACORS‐ ES is that Feasible Search Space (FSS) is reduced with β and it uses the information taken from previous iteration.

At the beginning of the first cycle, all ants search randomly to the best solution of a given problem within the FSS, and old ant colony is created at initialization phase. After that*,* quantity of pheromone is updated. In the solution phase, new ant colony is created based on the best solution from the old ant colony using Equation (1) and (2). Then, the best solutions of two colonies are compared. At the end of the first cycle, FSS is reduced by β and best solution obtained from the previous iteration is kept. Global or near global optimum solution is then searched in the reduced search space during the solution prog‐ ress. The ACORSES reaches to the global or near global optimum as ants find their routes in the limited space [5].

**Figure 2.** Steps of ACORSES [5].

it is indirect local communication among the individuals of a population of artificial ants. The core of ant's behavior is the communication between the ants by means of chemical pheromone trails, which enables them to find shortest paths between their nest and food sources. This behaviour of real ant colonies is exploited to solve optimization problems [20]. The general ACO algorithm is illustrated in Fig. 1. The first step consists mainly on the initi‐ alization of the pheromone trail. At beginning, each ant builds a complete solution to the problem according to a probabilistic state transition rules. They depend mainly on the state

Once all ants generate a solution, then global pheromone updating rule is applied in two phases; an evaporation phase, where a fraction of the pheromone evaporates, and a reinforce‐ ment phase, where each ant deposits an amount of pheromone which is proportional to the fitness. This process is repeated until stopping criteria is met. The ACORSES proposed by [5] is consisted of three main phases; Initialization, pheromone update and solution phase. All of

As shown in Figure 2, pheromone update phase is located after the initialization phase, means that quantity of pheromone intensifies at each iteration within the reduced search space. Thus, global optimum is searched within the reduced search space using best values obtained from new ant colony in the previous iteration. Main advantageous of the ACORS‐ ES is that Feasible Search Space (FSS) is reduced with β and it uses the information taken

At the beginning of the first cycle, all ants search randomly to the best solution of a given problem within the FSS, and old ant colony is created at initialization phase. After that*,* quantity of pheromone is updated. In the solution phase, new ant colony is created based on the best solution from the old ant colony using Equation (1) and (2). Then, the best solutions of two colonies are compared. At the end of the first cycle, FSS is reduced by β and best solution obtained from the previous iteration is kept. Global or near global optimum solution is then searched in the reduced search space during the solution prog‐ ress. The ACORSES reaches to the global or near global optimum as ants find their routes

these phases build a complete search to the global optimum as can be seen in Fig. 2.

of the pheromone.

90 Ant Colony Optimization - Techniques and Applications

**Figure 1.** A generic ant algorithm.

from previous iteration.

in the limited space [5].

Let number of *m* ants being associated with *m* random initial vectors(*x <sup>k</sup>* , *k* =1, 2, 3, ......*m*). The solution vector of each ant is updated using following expression:

$$\begin{aligned} \mathfrak{x}\_{\iota}^{k(new)} &= \mathfrak{x}\_{\iota}^{k(old)} \pm \alpha \\ \mathfrak{x}(t = 1, 2, ..., I) \end{aligned} \tag{1}$$

where *xt k* (*new*) is the solution vector of the *k* th ant at cycle *t*, *xt k* (*old* ) is the solution obtained from the previous step at cycle *t,* and *α*is a vector generated randomly to determine the length of jump. *α*controls the global optimum search direction not being trapped at bad lo‐ cal optimum. Ant vector *xt k* (*new*) obtained at *t* th cycle in (1) is determined using the value of same ant obtained from previous step. Furthermore, in expression (1), (+) sign is used when point *xt k* is on the left of the best solution on the *x* coordinate axis. (-) sign is used when point *xt k* is on the right of the best solution on the same axis. The direction of search is defined by expression (2).

$$\overline{\mathbf{x}}\_{\rm t}^{\rm best} = \mathbf{x}\_{\rm t}^{\rm best} + (\mathbf{x}\_{\rm t}^{\rm best} \ast \mathbf{0}.01) \tag{2}$$

As shown in Fig.3, five ants being associated five random initial vectors. At the beginning of the first cycle (Fig. 3a), old ant colony is randomly created within the feasible search space for any given problem. After pheromone update phase, new ant colony is created at the last phase of the first cycle according to old ant colony using Equation (1) and (2). After that, the best values of the two colonies are compared. According to the best value obtained so far by comparing the old and new colonies and β, the FSS is reduced at the beginning of the sec‐ ond cycle and once again old ant colony is created, as can be seen in Fig. 3b. The new ant colony is created at the last phase of the second cycle according to randomly generated *α* value using Equation (1). Any of the newly created solution vectors may be outside the re‐ duced search space that is created at the beginning of the second cycle. Therefore, created

The ACOTRANS consists of two main parts namely ACO based algorithm and TRANS‐ YT-7F traffic model. ACO algorithm optimizes traffic signal timings under fixed set of link flows. TRANSYT-7F traffic model is used to compute *PI*, which is called objective function, for a given signal timing and staging plan in network. The network Disutility Index (*DI*), one of the TRANSYT-7F's *PI*, is used as objective function. The *DI* is a measure of disadvan‐ tageous operation; that is stops, delay, fuel consumption, etc. The standard TRANSYT-7F's *DI* is linear combination of delay and stops. The objective function and corresponding con‐

> , *a a d a s a fixed a L PI Min DI w d K w S*

> > *c*min ≤*c* ≤*c*max cycle time constraints 0≤*θ* ≤*c* values of offset constraints *φ*min ≤*φ* ≤*c* green time constraints

is stop penalty factor to express the importance of stops relative to delay, *S <sup>a</sup>* is stop on link *a*

flows, ψ is signal setting parameters, *c* is common cycle time (sec), *θ* is offset time (sec), *φ*is green time (sec), Ω0 is feasible region for signal timings, *I* is intergreen time (sec), and *z* is

The green timings can be distributed to all signal stages in a road network according to Eq.

is link-specific weighting factor for stops *S* on link *a*, q is fixed set of link

y

( ) ( )

= = × +× × é ù åë û **ψ q** (5)

 y

}

An Ant Colony Optimization Algorithm for Area Traffic Control

http://dx.doi.org/10.5772/51695

93

is link-specific weighting factor for delay *d*, *K*

new ant colony prevents being trapped in bad local optimum [5].

= Î

{

where *d <sup>a</sup>* is delay on link *a* (*L* set of links), *wda*

(6) in order to provide the cycle time constraint [21].

∑ *i*=1 *z*

(*φ* + *I*)

*<sup>i</sup>* =*c*

number of stages at each signalized intersection in a given road network.

**3. ACOTRANS for area traffic control**

straints are given in Eq. (5).

Subject to *ψ*(*c*, *θ*,*φ*)∈*Ω*0;

per second, *wsa*

If *<sup>f</sup>* (*x*¯*<sup>t</sup> best*)≤ *f* (*xt best*), (+) sign is used in (1). Otherwise, (-) sign is used. (±)sign defines the search direction to reach to the global optimum. *α* value is used to define the length of jump, and it will be gradually decreased in order not to pass over global optimum, as shown in Fig. 2. At the end of each cycle, a new ant colony is developed as the number of ants gener‐ ated in old colony. Quantity of pheromone (*τt*) is reduced to simulate the evaporation proc‐ ess of real ant colonies using (3) in the pheromone update phase. After reducing of the number of pheromone, it is updated using (4). Quantity of pheromone only intensifies around the best objective function value. This process is repeated until the given number of cycle, *I*, is completed. Initial pheromone intensity is set to the value of 100.

$$
\tau\_{\tau} = 0.1^{\ast} \tau\_{\tau^{-1}} \tag{3}
$$

$$
\pi\_{\iota} = \pi\_{\iota-1} + 0.01^{\bullet} f\left(\mathbf{x}\_{\iota-1}^{\mathsf{heat}}\right) \tag{4}
$$

ACO uses real numbers instead of coding them as in GA to optimise any given objective function. This is one of the main advantage of ACO that it provides to optimise the signal timings with less mathematically lengthy. Moreover, ACORSES algorithm has ability to reach to the global optimum quickly without being trapped in bad local optimum because it uses the reduced search space and the values of optimum signal timings are then searched in the reduced search space during the algorithm progress. The ACORSES reaches to the global optimum or near global optimum as ants find their routes in the limited space. For better understanding, consider a problem of five ants represents the formulation of the problem.

**Figure 3.** Main idea of the ACORSES [5].

As shown in Fig.3, five ants being associated five random initial vectors. At the beginning of the first cycle (Fig. 3a), old ant colony is randomly created within the feasible search space for any given problem. After pheromone update phase, new ant colony is created at the last phase of the first cycle according to old ant colony using Equation (1) and (2). After that, the best values of the two colonies are compared. According to the best value obtained so far by comparing the old and new colonies and β, the FSS is reduced at the beginning of the sec‐ ond cycle and once again old ant colony is created, as can be seen in Fig. 3b. The new ant colony is created at the last phase of the second cycle according to randomly generated *α* value using Equation (1). Any of the newly created solution vectors may be outside the re‐ duced search space that is created at the beginning of the second cycle. Therefore, created new ant colony prevents being trapped in bad local optimum [5].

## **3. ACOTRANS for area traffic control**

( \*0.01) *best best best*

cycle, *I*, is completed. Initial pheromone intensity is set to the value of 100.

t

t t <sup>1</sup> 0.1\* *t t*

 t

1 1 0.01\* ( ) *best t t t*

ACO uses real numbers instead of coding them as in GA to optimise any given objective function. This is one of the main advantage of ACO that it provides to optimise the signal timings with less mathematically lengthy. Moreover, ACORSES algorithm has ability to reach to the global optimum quickly without being trapped in bad local optimum because it uses the reduced search space and the values of optimum signal timings are then searched in the reduced search space during the algorithm progress. The ACORSES reaches to the global optimum or near global optimum as ants find their routes in the limited space. For better understanding, consider a problem of five ants represents the formulation of the problem.

search direction to reach to the global optimum. *α* value is used to define the length of jump, and it will be gradually decreased in order not to pass over global optimum, as shown in Fig. 2. At the end of each cycle, a new ant colony is developed as the number of ants gener‐ ated in old colony. Quantity of pheromone (*τt*) is reduced to simulate the evaporation proc‐ ess of real ant colonies using (3) in the pheromone update phase. After reducing of the number of pheromone, it is updated using (4). Quantity of pheromone only intensifies around the best objective function value. This process is repeated until the given number of

If *<sup>f</sup>* (*x*¯*<sup>t</sup>*

*best*)≤ *f* (*xt*

92 Ant Colony Optimization - Techniques and Applications

**Figure 3.** Main idea of the ACORSES [5].

*tt t xx x* = + (2)

= - (3)

*f x* = + - - (4)

*best*), (+) sign is used in (1). Otherwise, (-) sign is used. (±)sign defines the

The ACOTRANS consists of two main parts namely ACO based algorithm and TRANS‐ YT-7F traffic model. ACO algorithm optimizes traffic signal timings under fixed set of link flows. TRANSYT-7F traffic model is used to compute *PI*, which is called objective function, for a given signal timing and staging plan in network. The network Disutility Index (*DI*), one of the TRANSYT-7F's *PI*, is used as objective function. The *DI* is a measure of disadvan‐ tageous operation; that is stops, delay, fuel consumption, etc. The standard TRANSYT-7F's *DI* is linear combination of delay and stops. The objective function and corresponding con‐ straints are given in Eq. (5).

$$PI = \underset{\mathbf{v}, \mathbf{q} \sim \text{fixed}}{Min\\_DI} = \sum\_{a \in L} \left[ \mathbf{w}\_{d\_a} \cdot d\_a \left( \boldsymbol{\nu} \right) + \boldsymbol{K} \cdot \mathbf{w}\_{s\_a} \cdot S\_a \left( \boldsymbol{\nu} \right) \right] \tag{5}$$

Subject to *ψ*(*c*, *θ*,*φ*)∈*Ω*0; { *c*min ≤*c* ≤*c*max cycle time constraints 0≤*θ* ≤*c* values of offset constraints *φ*min ≤*φ* ≤*c* green time constraints ∑ *i*=1 *z* (*φ* + *I*) *<sup>i</sup>* =*c* }

where *d <sup>a</sup>* is delay on link *a* (*L* set of links), *wda* is link-specific weighting factor for delay *d*, *K* is stop penalty factor to express the importance of stops relative to delay, *S <sup>a</sup>* is stop on link *a* per second, *wsa* is link-specific weighting factor for stops *S* on link *a*, q is fixed set of link flows, ψ is signal setting parameters, *c* is common cycle time (sec), *θ* is offset time (sec), *φ*is green time (sec), Ω0 is feasible region for signal timings, *I* is intergreen time (sec), and *z* is number of stages at each signalized intersection in a given road network.

The green timings can be distributed to all signal stages in a road network according to Eq. (6) in order to provide the cycle time constraint [21].

$$\begin{aligned} \boldsymbol{\rho}\_{l} &= \boldsymbol{\rho}\_{\text{min},l} + \frac{\boldsymbol{p}\_{l}}{\sum\_{k=1}^{z} \boldsymbol{p}\_{l}} (\boldsymbol{c} - \sum\_{k=1}^{z} \boldsymbol{I}\_{k} - \sum\_{k=1}^{z} \boldsymbol{\rho}\_{\text{min},k}) \\ \boldsymbol{i} &= 1, 2, \dots, z \end{aligned} \tag{6}$$

*Step 2:* Generate the random initial signal timings, *ψ*(*c*, *θ*, *φ*)within the constraints of deci‐

An Ant Colony Optimization Algorithm for Area Traffic Control

http://dx.doi.org/10.5772/51695

95

*Step 3:* Distribute to the initial green timings to the stages according to distribution rule as mentioned above. At this step, randomly generated green timings at Step 2 are distributed to the stages according to generated cycle time at the same step, minimum green and inter‐

*Step 6:* Get the network *PI*. At this step, the *PI* is determined using TRANSYT-7F traffic model.

The ACOTRANS is tested on two example networks taken from literature. First, it is applied to two junction road network. The network contains one origin destination pair, eight links and six signal setting variables. The network and its representation of signal stages can be seen in Fig. (5a) and (5b). The fixed set of link flows, taken from [22] is given in Table 1.

*Step 4:* Get the network data and fixed set of link flows for TRANSYT-7F traffic model.

*Step 7:* If *t* =*t*max then terminate the algorithm; otherwise, *t* =*t* + 1and go to Step 2.

**Figure 5.** a) Two junction network ; b) Representation of signal stages of two-junction network.

The flowchart of the ACOTRANS can be seen in Fig. (4).

sion variables.

green time.

*Step 5:* Run TRANSYT-7F.

**4. Numerical Application**

where *φ<sup>i</sup>* is the green time (sec) for stage *i*, *φ*min,*<sup>i</sup>* is minimum green time (sec) for stage *i*, *p <sup>i</sup>* is generated randomly green timings (sec) for stage *i*, *z* is the number of stages and *I* is inter‐ green time (sec) between signal stages and *c* is the common cycle time of the network (sec).

In the ACOTRANS, optimization steps can be given in the following way:

*Step 0*: Initialization. Define the user specified parameters; the number of decision varia‐ bles (*n*) (this number is sum of the number of green times as stage numbers at each intersec‐ tion, the number of offset times as intersection numbers and common cycle time), the constraints for each decision variable, the size of ant colony (*m*), search space value (*β*) for each decision variable.

*Step 1:* Set*t* =1.

*Step 2:* Generate the random initial signal timings, *ψ*(*c*, *θ*, *φ*)within the constraints of deci‐ sion variables.

*Step 3:* Distribute to the initial green timings to the stages according to distribution rule as mentioned above. At this step, randomly generated green timings at Step 2 are distributed to the stages according to generated cycle time at the same step, minimum green and inter‐ green time.

*Step 4:* Get the network data and fixed set of link flows for TRANSYT-7F traffic model.

*Step 5:* Run TRANSYT-7F.

min, min,

generated randomly green timings (sec) for stage *i*, *z* is the number of stages and *I* is inter‐ green time (sec) between signal stages and *c* is the common cycle time of the network (sec).

*<sup>p</sup> c I*

1

=

In the ACOTRANS, optimization steps can be given in the following way:

*Step 0*: Initialization. Define the user specified parameters; the number of decision varia‐ bles (*n*) (this number is sum of the number of green times as stage numbers at each intersec‐ tion, the number of offset times as intersection numbers and common cycle time), the constraints for each decision variable, the size of ant colony (*m*), search space value (*β*) for

*i i i z k k*

=+ --

*i k*

*p*

1,2, .

*i z*

= ¼

j j

94 Ant Colony Optimization - Techniques and Applications

where *φ<sup>i</sup>* is the green time (sec) for stage *i*, *φ*min,*<sup>i</sup>*

**Figure 4.** The flowchart of the ACOTRANS.

each decision variable.

*Step 1:* Set*t* =1.

1 1

= =

*k k*

( )

j

å (6)

is minimum green time (sec) for stage *i*, *p <sup>i</sup>*

is

*z z*

å å

*Step 6:* Get the network *PI*. At this step, the *PI* is determined using TRANSYT-7F traffic model.

*Step 7:* If *t* =*t*max then terminate the algorithm; otherwise, *t* =*t* + 1and go to Step 2.

The flowchart of the ACOTRANS can be seen in Fig. (4).

## **4. Numerical Application**

The ACOTRANS is tested on two example networks taken from literature. First, it is applied to two junction road network. The network contains one origin destination pair, eight links and six signal setting variables. The network and its representation of signal stages can be seen in Fig. (5a) and (5b). The fixed set of link flows, taken from [22] is given in Table 1.

**Figure 5.** a) Two junction network ; b) Representation of signal stages of two-junction network.


In 75th cycle, ACOTRANS is reached to *PI* value of 8.16. The common network cycle time obtained from the ACOTRANS is 76 sec. In addition, two junction road network is opti‐ mized using TRANSYT-7F which included GA and HC optimization tools. In GA parame‐ ters, population size and maximum number of cycle are chosen 20 and 300, respectively. In HC optimization tool in TRANSYT-7F, the default optimization parameters used by pro‐ gram are effective and system is simulated for every integer cycle length between minimum and maximum cycle length. Therefore, HC optimization parameters are not being manipu‐ lated. For two junction road network, the ACOTRANS model and TRANSYT-7F optimizers'

results are given in Table 2.

TRANSYT-7F with

TRANSYT-7F with

sized network.

controlled junctions.

HC

GA

**Performance Index**

8.18 78

8.17 79

**Table 2.** The best *PI* and signal timings for two junction road network

ACOTRANS 8.16 76

**Cycle Time** *c* **(s)**

**Junction number** *i*

**Stage 1 θ***i*

While the best *PI* is 8.18 in TRANSYT-7F with HC, the best *PI* is 8.17 in TRANSYT-7F with GA. The common network cycle time is 79 sec and 78 sec in TRANSYT-7F with GA and HC. As can be seen in Table 2, the *PI* obtained from the ACOTRANS model is slightly better than the values obtained from the TRANSYT-7F with GA and HC. These results indicate that the ACOTRANS produces comparable results to the in TRANSYT-7F with GA and HC. Hence, the proposed ACOTRANS model provides an alternative to the HC and GA optimization algorithm in TRANSYT-7F that could produce better results in terms of the *PI* for this small

In order to test the ACOTRANS model's effectiveness and robustness, it is also ap‐ plied to medium sized road network. The network is illustrated based upon the one used by [23]. Basic layouts of the network and stage configurations are given in Fig. (7) and (8). This network includes 23 links and 21 signal setting variables at six signal-

**Duration of stages (s) Offsets**

An Ant Colony Optimization Algorithm for Area Traffic Control

http://dx.doi.org/10.5772/51695

97

**Stage 2 φ***i***,1**

1 55 21 0 2 66 10 36

1 55 23 0 2 68 10 0

1 58 21 0 2 69 10 6

**(s)** *I***1−<sup>2</sup> =** *I***2−<sup>1</sup> =5**

**Table 1.** Fixed set of link flows on two junction network.

The constraints on signal timings are set as follows:

36≤*c* ≤90cycle time constraint

0≤*θ* ≤*c* offsets

7≤*φ* ≤*c*green split

*I*1−<sup>2</sup> = *I*2−<sup>1</sup> =5seconds intergreen time

The ACOTRANS model was coded by the MATLAB software. It is performed with the fol‐ lowing user-specified parameters: colony size is 20, and maximum number of cycle (*t max*) is 75. The convergence of the model can be seen in Fig. (6).

**Figure 6.** The convergence of the ACOTRANS for small sized network.

In 75th cycle, ACOTRANS is reached to *PI* value of 8.16. The common network cycle time obtained from the ACOTRANS is 76 sec. In addition, two junction road network is opti‐ mized using TRANSYT-7F which included GA and HC optimization tools. In GA parame‐ ters, population size and maximum number of cycle are chosen 20 and 300, respectively. In HC optimization tool in TRANSYT-7F, the default optimization parameters used by pro‐ gram are effective and system is simulated for every integer cycle length between minimum and maximum cycle length. Therefore, HC optimization parameters are not being manipu‐ lated. For two junction road network, the ACOTRANS model and TRANSYT-7F optimizers' results are given in Table 2.


**Table 2.** The best *PI* and signal timings for two junction road network

**Link number**

**Table 1.** Fixed set of link flows on two junction network.

96 Ant Colony Optimization - Techniques and Applications

36≤*c* ≤90cycle time constraint

*I*1−<sup>2</sup> = *I*2−<sup>1</sup> =5seconds intergreen time

0≤*θ* ≤*c* offsets

7≤*φ* ≤*c*green split

The constraints on signal timings are set as follows:

75. The convergence of the model can be seen in Fig. (6).

**Figure 6.** The convergence of the ACOTRANS for small sized network.

**Link flow (veh/h)** **Saturation flow (veh/h)**

1 615 1800 20 2 45 1800 20 3 225 1800 20 4 615 1800 20 5 225 1800 20 6 45 1800 20

The ACOTRANS model was coded by the MATLAB software. It is performed with the fol‐ lowing user-specified parameters: colony size is 20, and maximum number of cycle (*t max*) is

**Free-flow travel time (sec)**

> While the best *PI* is 8.18 in TRANSYT-7F with HC, the best *PI* is 8.17 in TRANSYT-7F with GA. The common network cycle time is 79 sec and 78 sec in TRANSYT-7F with GA and HC. As can be seen in Table 2, the *PI* obtained from the ACOTRANS model is slightly better than the values obtained from the TRANSYT-7F with GA and HC. These results indicate that the ACOTRANS produces comparable results to the in TRANSYT-7F with GA and HC. Hence, the proposed ACOTRANS model provides an alternative to the HC and GA optimization algorithm in TRANSYT-7F that could produce better results in terms of the *PI* for this small sized network.

> In order to test the ACOTRANS model's effectiveness and robustness, it is also ap‐ plied to medium sized road network. The network is illustrated based upon the one used by [23]. Basic layouts of the network and stage configurations are given in Fig. (7) and (8). This network includes 23 links and 21 signal setting variables at six signalcontrolled junctions.

**Figure 8.** Stage configurations for medium sized network

**Link number**

The fixed set of link flows, taken from [22], is given in Table 3.

**Link flow (veh/h)** **Saturation flow (veh/h)**

 716 2000 1 463 1600 1 716 3200 10 569 3200 15 636 1800 20 173 1850 20 462 1800 10

**Free-flow travel time (sec)**

An Ant Colony Optimization Algorithm for Area Traffic Control

http://dx.doi.org/10.5772/51695

99

**Figure 7.** Layout for medium sized network

**Figure 8.** Stage configurations for medium sized network

**Figure 7.** Layout for medium sized network

98 Ant Colony Optimization - Techniques and Applications

The fixed set of link flows, taken from [22], is given in Table 3.



ent search directions. Finally, the minimum number of *PI* reached to the value of about 362

An Ant Colony Optimization Algorithm for Area Traffic Control

http://dx.doi.org/10.5772/51695

101

This numerical test shows that the ACORSES is able to prevent being trapped in bad local optimum for solving ATC problem. In order to overcome non-convexity, the ACORSES starts with a large base of solutions, each of which provided that the solution converges to the optimum and it also uses the reduced search space technique. In ACORSES, new ant col‐ ony is created according to randomly generated α value. For this reason, any of the newly created solution vectors may be outside the reduced search space. Therefore, created new ant colony prevents being trapped in bad local optimum. The ACORSES is able to achieve global optimum or near global optimum to optimise signal timings because it uses concur‐ rently the reduce search technique and the orientation of all ants to the global optimum.

The common network cycle time obtained from the ACOTRANS is 106 sec. Moreover, medium sized network is optimized using TRANSYT-7F, which are GA and HC optimiza‐ tion tools. For studied network, the ACOTRANS and TRANSYT-7F optimizers' results are given in Table 4. The best *PI* is found as 410.0 in TRANSYT-7F with GA while its value is obtained as 420.5 in TRANSYT-7F with HC. The common network cycle time is 114 sec and 120 sec in TRANSYT-7F with HC and GA, respectively. The ACOTRANS improves network's *PI* 11.7% and 13.9 % when it is compared with TRANSYT-7F with GA and HC. It also decreases common cycle time 11.5% and 7% when it is compared with the cycle times produced TRANSYT-7F with GA and HC. These results showed that the ACO‐ TRANS model illustrates good performance for optimizing traffic signal timings in coordi‐ nated networks with fixed set of link flows. Hence, the ACOTRANS provides an alternative to the HC and GA optimization tools in TRANSYT-7F that could produce better results in

**Figure 9.** The convergence of the ACOTRANS for medium sized network

after 150 cycles.

terms of *PI*.

**Table 3.** Fixed set of link flows on medium sized network

The constraints on signal timings are set as follows:

36≤*c* ≤140cycle time constraint

0≤*θ* ≤*c* offsets

7≤*φ* ≤*c*green split

*I*1−<sup>2</sup> = *I*2−<sup>1</sup> =5seconds intergreen time

In Fig. (9), the convergence of the ACOTRANS for medium sized network can be seen. The best signal timings obtained from the previous cycle are stored in order not to being bad lo‐ cal optimum. By means of the generated new ant colony, global optimum is searched around the best signal setting parameters using reduced search space during the algorithm process. As shown Fig. 9, the ACORSES starts the solution process according to random generated signal timings and it was found that the value of *PI* is about 551. The ACORSES keeps the best solution and then it uses the best solution to the optimum in the reduced search space. Optimum solution is then searched in the reduced search space during the al‐ gorithm progress. The significant improvement on the objective function takes place in the first few cycle because the ACORSES starts with randomly generated ants in a large colony size. After that, small improvements to the objective function takes place since the phero‐ mone updating rule and new created ant colony provide new solution vectors on the differ‐ ent search directions. Finally, the minimum number of *PI* reached to the value of about 362 after 150 cycles.

This numerical test shows that the ACORSES is able to prevent being trapped in bad local optimum for solving ATC problem. In order to overcome non-convexity, the ACORSES starts with a large base of solutions, each of which provided that the solution converges to the optimum and it also uses the reduced search space technique. In ACORSES, new ant col‐ ony is created according to randomly generated α value. For this reason, any of the newly created solution vectors may be outside the reduced search space. Therefore, created new ant colony prevents being trapped in bad local optimum. The ACORSES is able to achieve global optimum or near global optimum to optimise signal timings because it uses concur‐ rently the reduce search technique and the orientation of all ants to the global optimum.

**Figure 9.** The convergence of the ACOTRANS for medium sized network

**Link number**

100 Ant Colony Optimization - Techniques and Applications

**Table 3.** Fixed set of link flows on medium sized network

36≤*c* ≤140cycle time constraint

*I*1−<sup>2</sup> = *I*2−<sup>1</sup> =5seconds intergreen time

0≤*θ* ≤*c* offsets

7≤*φ* ≤*c*green split

The constraints on signal timings are set as follows:

**Link flow (veh/h)** **Saturation flow (veh/h)**

In Fig. (9), the convergence of the ACOTRANS for medium sized network can be seen. The best signal timings obtained from the previous cycle are stored in order not to being bad lo‐ cal optimum. By means of the generated new ant colony, global optimum is searched around the best signal setting parameters using reduced search space during the algorithm process. As shown Fig. 9, the ACORSES starts the solution process according to random generated signal timings and it was found that the value of *PI* is about 551. The ACORSES keeps the best solution and then it uses the best solution to the optimum in the reduced search space. Optimum solution is then searched in the reduced search space during the al‐ gorithm progress. The significant improvement on the objective function takes place in the first few cycle because the ACORSES starts with randomly generated ants in a large colony size. After that, small improvements to the objective function takes place since the phero‐ mone updating rule and new created ant colony provide new solution vectors on the differ‐

**Free-flow travel time (sec)**

> The common network cycle time obtained from the ACOTRANS is 106 sec. Moreover, medium sized network is optimized using TRANSYT-7F, which are GA and HC optimiza‐ tion tools. For studied network, the ACOTRANS and TRANSYT-7F optimizers' results are given in Table 4. The best *PI* is found as 410.0 in TRANSYT-7F with GA while its value is obtained as 420.5 in TRANSYT-7F with HC. The common network cycle time is 114 sec and 120 sec in TRANSYT-7F with HC and GA, respectively. The ACOTRANS improves network's *PI* 11.7% and 13.9 % when it is compared with TRANSYT-7F with GA and HC. It also decreases common cycle time 11.5% and 7% when it is compared with the cycle times produced TRANSYT-7F with GA and HC. These results showed that the ACO‐ TRANS model illustrates good performance for optimizing traffic signal timings in coordi‐ nated networks with fixed set of link flows. Hence, the ACOTRANS provides an alternative to the HC and GA optimization tools in TRANSYT-7F that could produce better results in terms of *PI*.


contains six junctions. Results also showed that the ACOTRANS improves network's *PI* by 11.7 % and 13.9 % according to TRANSYT-7F with GA and HC. The ACOTRANS provides an alternative to the HC and GA optimization tools in TRANSYT-7F that could produce bet‐ ter results in terms of the *PI*. As a result, the ACOTRANS may be used to optimize traffic signal timings at coordinated signalized network. In future works, the ACOTRANS will be applied to a real-sized network in order to demonstrate the applicability and the effective‐

An Ant Colony Optimization Algorithm for Area Traffic Control

http://dx.doi.org/10.5772/51695

103

Pamukkale University, Engineering Faculty, Department of Civil Engineering, Transporta‐

[1] Teklu, F., Sumalee, A., & Watling, D. (2007). A genetic algorithm approach for opti‐ mizing traffic control signals considering routing. *Computer-Aided Civil and Infrastruc‐*

[2] Webster, F. V. (1958). Traffic Signal Settings Road Research Technical Paper. *HMSO*

[3] Robertson, DI. (1969). TRANSYT' method for area traffic control. *Traffic Engineering*

[4] TRANSYT-7F Release 11.3 Users Guide,. (2008). McTrans Center, University of Flori‐

[5] Baskan, O., Haldenbilen, S., Ceylan, H., & Ceylan, H. (2009). A new solution algo‐ rithm for improving performance of ant colony optimization. *Applied Mathematics and*

[6] Wong, SC. (1995). Derivatives of the performance index for the traffic model from

[7] Wong, SC. (1996). Group-based optimisation of signal timings using the TRANSYT

[8] Heydecker, BG. (1996). A decomposed approach for signal optimization in road net‐

TRANSYT. *Transportation Research Part B*, 29(5), 303-327.

traffic model. *Transportation Research Part B*, 30(3), 217-244.

works. *Transportation Research Part B*, 30(2), 99-114.

, Ozgur Baskan and Cenk Ozan

\*Address all correspondence to: shaldenbilen@pau.edu.tr

ness of the proposed model.

**Author details**

Soner Haldenbilen\*

tion Division, Turkey

*ture Engineering*, 22, 31-43.

*and Control*, 10, 276-81.

da, Gaineville, Florida.

*Computation*, 211(1), 75-84.

*London* [39].

**References**

**Table 4.** The results for medium sized network

## **5. Conclusions**

This study deals with the area traffic control problem using the ACOTRANS. For this pur‐ pose, ACO based algorithm called ACORSES was used. The ACORSES algorithm for solv‐ ing ATC problem differs from approaches in that new ant colony is generated at each cycle with the assistance of the best solution of the previous information. Moreover, the best solu‐ tion that is obtained from the previous evaluation is saved to prevent being trapped in bad local optimum. The ACOTRANS is introduced to optimize traffic signal timings at coordi‐ nated signalized network. TRANSYT-7F is used to compute *PI* for a given set of signal tim‐ ing and staging plan in network. The ACOTRANS is tested on two road networks in order to show its robustness and effectiveness. For first test network which contains two junctions, results showed that the ACOTRANS produces slightly better results than TRANSYT-7F with GA and HC. Proposed algorithm was also applied to medium sized network which contains six junctions. Results also showed that the ACOTRANS improves network's *PI* by 11.7 % and 13.9 % according to TRANSYT-7F with GA and HC. The ACOTRANS provides an alternative to the HC and GA optimization tools in TRANSYT-7F that could produce bet‐ ter results in terms of the *PI*. As a result, the ACOTRANS may be used to optimize traffic signal timings at coordinated signalized network. In future works, the ACOTRANS will be applied to a real-sized network in order to demonstrate the applicability and the effective‐ ness of the proposed model.

## **Author details**

*PI*

ACOTRANS 361.9 106

102 Ant Colony Optimization - Techniques and Applications

TRANSYT-7F with HC 420.5 114

TRANSYT-7F with GA 410.0 120

**Table 4.** The results for medium sized network

**5. Conclusions**

**Cycle Time** *c* **(s)**

**Junction number** *i*

This study deals with the area traffic control problem using the ACOTRANS. For this pur‐ pose, ACO based algorithm called ACORSES was used. The ACORSES algorithm for solv‐ ing ATC problem differs from approaches in that new ant colony is generated at each cycle with the assistance of the best solution of the previous information. Moreover, the best solu‐ tion that is obtained from the previous evaluation is saved to prevent being trapped in bad local optimum. The ACOTRANS is introduced to optimize traffic signal timings at coordi‐ nated signalized network. TRANSYT-7F is used to compute *PI* for a given set of signal tim‐ ing and staging plan in network. The ACOTRANS is tested on two road networks in order to show its robustness and effectiveness. For first test network which contains two junctions, results showed that the ACOTRANS produces slightly better results than TRANSYT-7F with GA and HC. Proposed algorithm was also applied to medium sized network which

**Stage 1** *I***1−<sup>2</sup> =** *I***2−<sup>1</sup> =5**

**Duration of stages (s) Offsets**

**Stage 3 φ***i***,1**

**Stage 2 θ***i*

1 46 60 - 0 2 64 42 - 96 3 62 44 - 10 4 38 34 34 36 5 15 33 58 38 6 34 72 - 74

1 44 70 - 0 2 56 58 - 98 3 69 45 - 98 4 43 36 35 98 5 15 36 63 98 6 39 75 - 98

1 60 60 - 0 2 74 46 - 89 3 71 49 - 37 4 44 38 38 106 5 15 38 67 75 6 60 60 - 55

**(s) 7≤φ ≤***c*

> Soner Haldenbilen\* , Ozgur Baskan and Cenk Ozan

\*Address all correspondence to: shaldenbilen@pau.edu.tr

Pamukkale University, Engineering Faculty, Department of Civil Engineering, Transporta‐ tion Division, Turkey

## **References**


[9] Wong, S. C., Wong, W. T., Xu, J., & Tong, C. O. (2000). A Time-dependent TRANSYT Traffic Model for Area Traffic Control. *Proceedings of the Second International Confer‐ ence on Transportation and Traffic Studies. ICTTS*, 578-585.

[22] Ceylan, H. (2002). A genetic algorithm approach to the equilibrium network design

An Ant Colony Optimization Algorithm for Area Traffic Control

http://dx.doi.org/10.5772/51695

105

[23] Allsop, R. E., & Charlesworth, J. A. (1977). Traffic in a signal-controlled road net‐

work: an example of different signal timings including different routings. *Traffic En‐*

problem. Ph.D. Thesis, University of Newcastle upon Tyne, UK.

*gineering Control*, 18(5), 262-264.


[22] Ceylan, H. (2002). A genetic algorithm approach to the equilibrium network design problem. Ph.D. Thesis, University of Newcastle upon Tyne, UK.

[9] Wong, S. C., Wong, W. T., Xu, J., & Tong, C. O. (2000). A Time-dependent TRANSYT Traffic Model for Area Traffic Control. *Proceedings of the Second International Confer‐*

[10] Wong, S. C., Wong, W. T., Leung, C. M., & Tong, C. O. (2002). Group-based optimi‐ zation of a time-dependent TRANSYT traffic model for area traffic control. *Transpor‐*

[11] Girianna, M., & Benekohal, R. F. (2002). Application of Genetic Algorithms to Gener‐ ate Optimum Signal Coordination for Congested Networks. *Proceedings of the Seventh International Conference on Applications of Advanced Technologies in Transportation*,

[12] Ceylan, H. (2006). Developing Combined Genetic Algorithm-Hill-Climbing Optimi‐ zation Method for Area Traffic Control. *Journal of Transportation Engineering*, 132(8),

[13] Chen, J., & Xu, L. (2006). Road-Junction Traffic Signal Timing Optimization by an adaptive Particle Swarm Algorithm. *9th International Conference On Control, Automa‐*

[14] Chiou, S-W. (2007). A hybrid optimization algorithm for area traffic control problem.

[15] Chiou, S. W. (2007). An efficient computation algorithm for area traffic control prob‐ lem with link capacity expansions. *Applied Mathematics and Computation*, 188,

[16] Dan, C., & Xiaohong, G. (2008). Study on Intelligent Control of Traffic Signal of Ur‐ ban Area and Microscopic Simulation. *Proceedings of the Eighth International Confer‐ ence of Chinese Logistics and Transportation Professionals*, Logistics: The Emerging

[17] Li, Z. (2011). Modeling Arterial Signal Optimization with Enhanced Cell Transmis‐

[18] Dorigo, M., Di Caro, G., & Gambardella, L. M. (1999). Ant Algorithms for Discrete

[19] Eshghi, K., & Kazemi, M. (1999). Ant colony algorithm for the shortest loop design

[20] Baskan, O., & Haldenbilen, S. (2011). Ant Colony Optimization Approach for Opti‐ mizing Traffic Signal Timings. *Ant Colony Optimization- Methods and Applications*, In‐

[21] Ceylan, H., & Bell, M. G. H. (2004). Traffic signal timing optimisation based on genet‐ ic algorithm approach, including drivers' routing. *Transportation Research Part B*,

Frontiers of Transportation and Development in China, 4597-4604.

problem. *Computers & Industrial Engineering*, 50, 358-366.

sion Formulations. *Journal of Transportation Engineering*, 137(7), 445-454.

*ence on Transportation and Traffic Studies. ICTTS*, 578-585.

*tation Research Part B*, 36, 291-312.

104 Ant Colony Optimization - Techniques and Applications

*tion, Robotics And Vision*, 1- 5, 1103-1109.

Optimization. *Artificial Life, MIT press*.

*Journal of the Operational Research Society*, 58, 816-823.

762-769.

663-671.

1094-1102.

Tech, 205-220.

38(4), 329-342.

[23] Allsop, R. E., & Charlesworth, J. A. (1977). Traffic in a signal-controlled road net‐ work: an example of different signal timings including different routings. *Traffic En‐ gineering Control*, 18(5), 262-264.

**Chapter 5**

**ANGEL: A Simplified Hybrid Metaheuristic for**

The weight minimization of the shallow truss structures is a challenging but sometimes frustrating engineering optimization problem. Theoretically, the optimal design searching process can be formulated as an implicit nonlinear mixed integer optimization problem with a huge number of variables. The flexibility of the shallow truss structures might cause differ‐ ent types of structural instability. According to the nonlinear behavior of the resulted light‐ weight truss structures, a special treatment is required in order to tackle the "hidden" global stability problems during the optimization process. Therefore, we have to replace the tradi‐ tional "design variables → response variables" like approach with a more time-consuming "design variables → response functions" like approach, where the response functions de‐ scribe the structural response history of the loading process up to the maximal load intensity

In this study, a higher order path-following method [1] is embedded into a hybrid heuristic op‐ timization method in order to tackle the structural stability constraints within the truss optimi‐ zation. The proposed path-following method is based on the perturbation technique of the

The nonlinear function of the total potential energy for conservative systems can be ex‐ pressed in terms of nodal displacements and the load parameter. The equilibrium equations are given from the principle of stationary value of total potential energy. The stability inves‐ tigation is based on the eigenvalue computation of the Hessian matrix. In each step of the path-following process, we get information about the displacement, stresses, local, and glob‐

> © 2013 Csébfalvi; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

© 2013 Csébfalvi; licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

distribution, and reproduction in any medium, provided the original work is properly cited.

stability theory and a non-linear modification of the classical linear homotopy method.

**Structural Optimization**

Additional information is available at the end of the chapter

Anikó Csébfalvi

**1. Introduction**

http://dx.doi.org/10.5772/52188

without constraint violation.

al stability of the structure.
