**A Splicing/Decomposable Binary Encoding and Its Novel Operators for Genetic and Evolutionary Algorithms**

Yong Liang *Macau University of Science and Technology China* 

#### **1. Introduction**

18 Will-be-set-by-IN-TECH

82 Bio-Inspired Computational Algorithms and Their Applications

Tsutsui, S., Yamamura, M. & Higuchi, T. (1999). Multi-parent recombination with simplex

Wei, L., Chen, Z. & Li, J. (2011). Evolution strategies based adaptive *Lp* LS-SVM, *Information*

Wright, A. H. (1991). Genetic algorithms for real parameter optimization, *Proceedings of the*

Yoon, Y., Kim, Y.-H., Moraglio, A. & Moon, B.-R. (2012). A theoretical and empirical study on

Zhang, M., Luo, W. & Wang, X. (2008). Differential evolution with dynamic stochastic selection for constrained optimization, *Information Sciences* 178(15): 3043–3074.

*Workshop on Foundations of Genetic Algorithms*, pp. 205–218.

*Computation Conference*, pp. 657–664.

*Sciences* 181(14): 3000–3016.

*Sciences* 183(1): 48–65.

crossover in real coded genetic algorithms, *Proceedings of the Genetic and Evolutionary*

unbiased boundary-extended crossover for real-valued representation, *Information*

Most of the real-world problems could be encoded by different representations, but genetic and evolutionary algorithms (GEAs) may not be able to successfully solve the problems based on their phenotypic representations, unless we use some problem-specific genetic operators. Therefore, a proper genetic representation is necessary when using GEAs on the real-world problems (Goldberg, 1989; Liepins, 1990; Whitley, 2000; Liang, 2011).

A large number of theoretical and empirical investigations on genetic representations were made over the last decades. Earlier work (Goldberg, 1989c; Liepins & Vose, 1990) has shown that the behavior and performance of GEAs is strongly influenced by the representation used. As a result many genotypic representations were made for proper GEAs searching. Among of them, the binary, integer, real-valued, messy and tree structure representations are the most important and widely used by many GEAs.

To investigate the performance of the genetic representations, originally, the schema theorem proposed by Holland (1975) to model the performance of GEAs to process similarities between binary bitstrings. Using the definition of the building blocks (BBs) as being highly fit solutions to sub-problems, which are decomposed by the overall problem, the building block hypothesis (Goldberg, 1989c) states that GEAs mainly work due to their ability to propagate short, low order and highly fit BBs. During the last decade, (Thierens, 1995; Miller, 1996; Harik, 1997; Sendhoff, 1997; Rothlauf, 2002) developed three important elements towards a general theory of genetic representations. They identified that redundancy, the scaling of Building Blocks (BBs) and the distance distortion are major factors that influence the performance of GEAs with different genetic representations.

A genetic representation is denoted to be redundant if the number of genotypes is higher than the number of phenotypes. Investigating redundant representation reveals that give more copies to high quality solutions in the initial population result in a higher performance of GEAs, whereas encodings where high quality solutions are underrepresented make a problem more difficult to solve. Uniform redundancy, however, has no influence on the performance of GEAs.

The order of scaling of a representation describes the different contribution of the BBs to the individual's fitness. It is well known that if the BBs are uniformly scaled, GEAs solve all BBs

(S/D) binary representation and its genotypic distance. Section 4 proposes the new genetic algorithm based on the S/D binary representation, the splicing/Decompocable genetic algorithm (SDGA). Section 5 discusses the performance of the SDGA and compares the S/D binary representation with other existing binary encodings from the empirical studies. The

<sup>85</sup> A Splicing/Decomposable Binary Encoding

Binary encodings are the most commonly used and nature-inspired representations for GEAs, especially for genetic algorithms (GAs) (Goldberg, 1989). When encoding real-valued problems by binary representations, different types of binary representations assign the real-value in different ways to the binary strings. The most common binary representations are the binary, gray and unary encodings. According to three aspects of representation theory (redundancy, scaled building block and distance distortion), Rothlauf (Rothlauf, 2002) studied the performance differences of GAs by different binary representations for real encoding.

In the unary encoding, a string of length *l* = *s* − 1 is necessary to represent *s* different

the corresponding genotypic string. Thus, 2*s*−<sup>1</sup> different genotypes only encode *s* different phenotypes. Analysis on the unary encoding by the representation theory reveals that encoding is redundant, and does not represent phenotypes uniformly. Therefore, the performance of GAs with the unary encoding depends on the structure of the optimal solution. Unary GAs fail to solve integer one-max, deceptive trap and BinInt (Rothlauf, 2002) problems, unless larger population sizes are used, because the optimal solutions are strongly underrepresented for these three types of problems. Thus, the unary GAs perform much worse than GAs using the non-redundant binary or gray encoding (Julstrom, 1999; Rothlauf,

The binary encoding uses exponentially scaled bits to represent phenotypes. Each phenotypic value *xp* ∈ Φ*<sup>p</sup>* = {*x*1, *x*2, ..., *xs*} is represented by a binary string *xg* of length *l* = *log*2(*s*). Therefore, the genotype-phenotype mapping of the binary encoding is one-to-one mapping

However, for non-uniformly binary strings and competing Building Blocks (BBs) for high dimensional phenotype space, there are a lot of noise from the competing BBs lead to a reduction on the performance of GAs. The performance of GAs using the binary encoding is not only affected by the non-uniformly scaling of BBs, but also by problems associated with the *Hamming cliff* (Schaffer, 1989b). The binary encoding has the effect that genotypes of some phenotypical neighbors are completely different. For example, when we choose the phenotypes *xp* = 7 and *yp* = 8, both individuals have a distance of one, but the resulting genotypes *xg* = 0111 and *yg* = 1000 have the largest possible genotypic distance �*x* − *y*�*<sup>g</sup>* = 4. As a result, the locality of the binary representation is partially low. In the distance distortion theory, an encoding preserves the difficulty of a problem if it has perfect locality and if it does not modify the distance between individuals. The analysis reveals that the binary encoding

**2.2 The binary encoding, scaled building blocks and hamming cliff**

*th* phenotypic value is encoded by the number of ones *<sup>i</sup>* <sup>−</sup> 1 in

chapter conclusion are drawn in Section 6.

and Its Novel Operators for Genetic and Evolutionary Algorithms

**2.1 The unary encoding and redundancy**

and encodes phenotypes redundancy-free.

phenotypic values. The *i*

2002).

**2. Background**

implicitly in parallel. In contrast, for non-uniformly scaled BBs, domino convergence occurs and the BBs are solved sequentially starting with the most salient BB (Thierens, 1995). As a result, the convergence time increases and the performance is decreasing due to the noise from the competing BBs.

The distance distortion of a representation measures how much the distance between individuals are changed when mapping the phenotypes to the genotypes, and the locality of the representation means that whether similar genotypes correspond to similar phenotypes. The theoretical analysis shows that representation where the distance distortion and locality are equal to zero, that means the distances between the individuals are preserved, do not modify the difficulty of the problems they are used for, and guarantee to solve problems of bounded complexity reliably and predictably.

The importance of choosing proper representations for the performance of GAs is already recognized, but developing a general theory of representations is a formidable challenge. Up to now, there is no well set-up theory regarding the influence of representations on the performance of GAs. To help users with different tasks to search good representations, over the last few years, some researchers have made recommendations based on the existing theories. For example, Goldberg (Goldberg, 1989) proposed two basic design principles for encodings:


The principle of minimal alphabets advises us to use bit string representation. Combining with the principle of meaningful building blocks (BBs), we construct uniform salient BBs, which include equal scaled and splicing/decomposable alleles.

The purpose of this chapter is to introduce our novel genetic representation — a splicing/decomposable (S/D) binary encoding, which was proposed based on some theoretical guidance and existing recommendations for designing efficient genetic representations. The S/D binary representation can be spliced and decomposed to describe potential solutions of the problem with different precisions by different number of uniform-salient BBs. According to the characteristics of the S/D binary representation, GEAs can be applied from the high scaled to the low scaled BBs sequentially to avoid the noise from the competing BBs and improve GEAs' performance. Our theoretical and empirical investigations reveal that the S/D binary representation is more proper than other existing binary encodings for GEAs searching. Moreover, a new genotypic distance *dg* on the S/D binary space Φ*g* is proposed, which is equivalent to the Euclidean distance *dp* on the real-valued space Φ*p* during GEAs convergence. Based on the new genotypic distance *dg*, GEAs can reliably and predictably solve problems of bounded complexity and the methods depended on the phenotypic distance *dp* for solving different kinds of optimization problems can be directly used on the S/D binary space Φ*g*.

This chapter is organized as follows. Section 2 describes three most commonly used binary representations — binary, gray and unary encodings, and their theoretical analysis of the effect on the performance of GEAs. Section 3 introduces our proposed splicing/decomposable (S/D) binary representation and its genotypic distance. Section 4 proposes the new genetic algorithm based on the S/D binary representation, the splicing/Decompocable genetic algorithm (SDGA). Section 5 discusses the performance of the SDGA and compares the S/D binary representation with other existing binary encodings from the empirical studies. The chapter conclusion are drawn in Section 6.

### **2. Background**

2 S/D Binary Encoding and Its Operators

implicitly in parallel. In contrast, for non-uniformly scaled BBs, domino convergence occurs and the BBs are solved sequentially starting with the most salient BB (Thierens, 1995). As a result, the convergence time increases and the performance is decreasing due to the noise

The distance distortion of a representation measures how much the distance between individuals are changed when mapping the phenotypes to the genotypes, and the locality of the representation means that whether similar genotypes correspond to similar phenotypes. The theoretical analysis shows that representation where the distance distortion and locality are equal to zero, that means the distances between the individuals are preserved, do not modify the difficulty of the problems they are used for, and guarantee to solve problems of

The importance of choosing proper representations for the performance of GAs is already recognized, but developing a general theory of representations is a formidable challenge. Up to now, there is no well set-up theory regarding the influence of representations on the performance of GAs. To help users with different tasks to search good representations, over the last few years, some researchers have made recommendations based on the existing theories. For example, Goldberg (Goldberg, 1989) proposed two basic design principles for

• Principle of minimal alphabets: The alphabet of the encoding should be as small as possible

• Principle of meaningful building blocks: The schemata should be short, of low order, and

The principle of minimal alphabets advises us to use bit string representation. Combining with the principle of meaningful building blocks (BBs), we construct uniform salient BBs,

The purpose of this chapter is to introduce our novel genetic representation — a splicing/decomposable (S/D) binary encoding, which was proposed based on some theoretical guidance and existing recommendations for designing efficient genetic representations. The S/D binary representation can be spliced and decomposed to describe potential solutions of the problem with different precisions by different number of uniform-salient BBs. According to the characteristics of the S/D binary representation, GEAs can be applied from the high scaled to the low scaled BBs sequentially to avoid the noise from the competing BBs and improve GEAs' performance. Our theoretical and empirical investigations reveal that the S/D binary representation is more proper than other existing binary encodings for GEAs searching. Moreover, a new genotypic distance *dg* on the S/D binary space Φ*g* is proposed, which is equivalent to the Euclidean distance *dp* on the real-valued space Φ*p* during GEAs convergence. Based on the new genotypic distance *dg*, GEAs can reliably and predictably solve problems of bounded complexity and the methods depended on the phenotypic distance *dp* for solving different kinds of optimization problems

This chapter is organized as follows. Section 2 describes three most commonly used binary representations — binary, gray and unary encodings, and their theoretical analysis of the effect on the performance of GEAs. Section 3 introduces our proposed splicing/decomposable

from the competing BBs.

encodings:

bounded complexity reliably and predictably.

while still allowing a natural representation of solutions.

relatively unrelated to schemata over other fixed positions.

which include equal scaled and splicing/decomposable alleles.

can be directly used on the S/D binary space Φ*g*.

Binary encodings are the most commonly used and nature-inspired representations for GEAs, especially for genetic algorithms (GAs) (Goldberg, 1989). When encoding real-valued problems by binary representations, different types of binary representations assign the real-value in different ways to the binary strings. The most common binary representations are the binary, gray and unary encodings. According to three aspects of representation theory (redundancy, scaled building block and distance distortion), Rothlauf (Rothlauf, 2002) studied the performance differences of GAs by different binary representations for real encoding.

#### **2.1 The unary encoding and redundancy**

In the unary encoding, a string of length *l* = *s* − 1 is necessary to represent *s* different phenotypic values. The *i th* phenotypic value is encoded by the number of ones *<sup>i</sup>* <sup>−</sup> 1 in the corresponding genotypic string. Thus, 2*s*−<sup>1</sup> different genotypes only encode *s* different phenotypes. Analysis on the unary encoding by the representation theory reveals that encoding is redundant, and does not represent phenotypes uniformly. Therefore, the performance of GAs with the unary encoding depends on the structure of the optimal solution. Unary GAs fail to solve integer one-max, deceptive trap and BinInt (Rothlauf, 2002) problems, unless larger population sizes are used, because the optimal solutions are strongly underrepresented for these three types of problems. Thus, the unary GAs perform much worse than GAs using the non-redundant binary or gray encoding (Julstrom, 1999; Rothlauf, 2002).

#### **2.2 The binary encoding, scaled building blocks and hamming cliff**

The binary encoding uses exponentially scaled bits to represent phenotypes. Each phenotypic value *xp* ∈ Φ*<sup>p</sup>* = {*x*1, *x*2, ..., *xs*} is represented by a binary string *xg* of length *l* = *log*2(*s*). Therefore, the genotype-phenotype mapping of the binary encoding is one-to-one mapping and encodes phenotypes redundancy-free.

However, for non-uniformly binary strings and competing Building Blocks (BBs) for high dimensional phenotype space, there are a lot of noise from the competing BBs lead to a reduction on the performance of GAs. The performance of GAs using the binary encoding is not only affected by the non-uniformly scaling of BBs, but also by problems associated with the *Hamming cliff* (Schaffer, 1989b). The binary encoding has the effect that genotypes of some phenotypical neighbors are completely different. For example, when we choose the phenotypes *xp* = 7 and *yp* = 8, both individuals have a distance of one, but the resulting genotypes *xg* = 0111 and *yg* = 1000 have the largest possible genotypic distance �*x* − *y*�*<sup>g</sup>* = 4. As a result, the locality of the binary representation is partially low. In the distance distortion theory, an encoding preserves the difficulty of a problem if it has perfect locality and if it does not modify the distance between individuals. The analysis reveals that the binary encoding

Fig. 1. A graphical illustration of the splicing/decomposable representation scheme, where (b) is the refined bisection of the gray cell (10) in (a) (with mesh size *O*(1/2) ), (c) is the refined bisection of the dark cell (1001) in (b) (with mesh size *O*(1/22) ), and so forth.

<sup>87</sup> A Splicing/Decomposable Binary Encoding

and Its Novel Operators for Genetic and Evolutionary Algorithms

1, 2, ··· , *n*. Any real-value variable *x* = (*x*1, *x*2, ..., *xn*) ∈ Φ*<sup>p</sup>* can be represented by a splicing/decomposable (S/D) binary string *b* = (*b*1, *b*2, .., *bl*), the genotype-phenotype

> *l*/*n* ∑ *j*=0

*l*/*n* ∑ *j*=0

2

*<sup>p</sup>* then can be identified with (00),(01),(10) and (11). As the phenotype *x* lies in the

*l*/*n* ∑ *j*=0

*xi* − *α<sup>i</sup> hi*(*l*) <sup>&</sup>lt;

That is, the significance of each bit of the encoding can be clearly and uniquely interpreted (hence, each BB of the encoded S/D binary string has a specific meaning). As shown in Figure 1, take Φ*<sup>p</sup>* = [0, 1] × [0, 1] and the S/D binary string *b* = 100101 as an example (in

to identify the S/D binary string *b* and see what each bit value of *b* means. In Figure 1-(a),

<sup>2</sup> ). According to the *left*-0 and *right*-1 correspondence rule in each coordinate direction, these

<sup>2</sup>(*l*/*n*−*j*) <sup>×</sup> *bj*×*n*<sup>+</sup>1,

<sup>2</sup>(*l*/*n*−*j*) <sup>×</sup> *bj*×*n*+*<sup>i</sup>* <sup>+</sup> 1.

<sup>2</sup>(*l*/*n*−*j*) <sup>×</sup> *bj*×(*n*+1)),

<sup>2</sup>(�/*n*) , *<sup>i</sup>* =

<sup>8</sup> ). Let us look how

*<sup>p</sup>* (i.e., the subregions with uniform size

Given a length of a binary string *<sup>l</sup>*, the genotypic precision is *hi*(*l*) = (*βi*−*αi*)

*x* = (*x*1, *x*2, ··· , *xn*) = *fg*(*b*)=(

<sup>2</sup>(*l*/*n*−*j*) <sup>×</sup> *bj*×*n*+2, ··· ,

this case, *l* = 6, *n* = 2, and the genotypic precisions *h*1(*l*) = *h*2(*l*) = <sup>1</sup>

<sup>2</sup>(*l*/*n*−*j*) <sup>×</sup> *bj*×*n*+*<sup>i</sup>* <sup>≤</sup>

the phenotypic domain <sup>Φ</sup>*<sup>p</sup>* is bisected into four <sup>Φ</sup><sup>1</sup>

mapping *fg* is defined as

where

1

four <sup>Φ</sup><sup>1</sup> 2 *l*/*n* ∑ *j*=0

*l*/*n* ∑ *j*=0

changes the distance between the individuals and therefore changes the complexity of the optimization problem. Thus, easy problems can become difficult, and vice versa. The binary GAs are not able to reliably solve problems when mapping the phenotypes to the genotypes.

#### **2.3 The gray encoding and modification of problem difficulty**

The non-redundant gray encoding (Schaffer, 1989a) was designed to overcome the problems with the *Hamming cliff* of the binary encoding (Schaffer, 1989b). In the gray encoding, every neighbor of a phenotype is also a neighbor of the corresponding genotype. Therefore, the difficulty of a problem remains unchanged when using mutation-based search operators that only perform small step in the search space. As a result, easy problems and problems of bounded difficulty are easier to solve when using the mutation-based search with the gray coding than that with the binary encoding. Although the gray encoding has high locality, it still changes the distance correspondence between the individuals with bit difference of more than one. When focused on crossover-based search methods, the analysis of the average fitness of the schemata reveals that the gray encoding preserves building block complexity less than the binary encoding. Thus, a decrease in performance of gray-encoded GAs is unavoidable for some kind of problems (Whitley, 2000).

#### **3. A novel splicing/decomposable binary genetic representation**

The descriptions in above section show that the existing binary genetic representations are not proper for GAs searching and cannot guarantee that using GAs to solve problems of bounded complexity reliably and predictably. According to the theoretical analysis and recommendations for the design of an efficient representation, there are some important points that a genetic representation should try to respect. Common representations for GAs often encode the phenotypes by using a sequence of alleles. The alleles can separated (decomposed) into building blocks (BBs) which do not interact with each other and which determine one specific phenotypic property of the solution. The purpose of the genetic operators is to decompose the whole sequence of alleles by detecting which BBs influence each other. GAs perform well because they can identify best alleles of each BB and combine them to form high-quality over-all solution of the problem.

Based on above investigation results and recommendations, we have proposed a new genetic representation, which is proper for GAs searching. In this section, first we introduce a novel splicing/decomposable (S/D) binary encoding, then we define the new genotypic distance for the S/D encoding, finally we give the theoretical analysis for the S/D encoding based on the three elements of genetic representation theory (redundancy, scaled BBs and distance distortion).

#### **3.1 A splicing/decomposable binary encoding**

In (Leung, 2002; Xu, 2003a), we have proposed a novel S/D binary encoding for real-value encoding. Assuming the phenotypic domain Φ*p* of the *n* dimensional problem can be specified by

$$\Phi\_p = [\mathfrak{a}\_1 \mathfrak{f}\_1] \times [\mathfrak{a}\_2 \mathfrak{f}\_2] \times \cdots \times [\mathfrak{a}\_{n\prime} \mathfrak{f}\_n] . $$

Fig. 1. A graphical illustration of the splicing/decomposable representation scheme, where (b) is the refined bisection of the gray cell (10) in (a) (with mesh size *O*(1/2) ), (c) is the refined bisection of the dark cell (1001) in (b) (with mesh size *O*(1/22) ), and so forth.

Given a length of a binary string *<sup>l</sup>*, the genotypic precision is *hi*(*l*) = (*βi*−*αi*) <sup>2</sup>(�/*n*) , *<sup>i</sup>* = 1, 2, ··· , *n*. Any real-value variable *x* = (*x*1, *x*2, ..., *xn*) ∈ Φ*<sup>p</sup>* can be represented by a splicing/decomposable (S/D) binary string *b* = (*b*1, *b*2, .., *bl*), the genotype-phenotype mapping *fg* is defined as

$$\mathbf{x} = (\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) = f\_{\mathcal{S}}(b) = (\sum\_{j=0}^{l/n} 2^{(l/n-j)} \times b\_{j \times n+1})$$

$$\sum\_{j=0}^{l/n} 2^{(l/n-j)} \times b\_{j \times n+2}, \dots, \sum\_{j=0}^{l/n} 2^{(l/n-j)} \times b\_{j \times (n+1)})\_{\mathbf{y}}$$

where

4 S/D Binary Encoding and Its Operators

changes the distance between the individuals and therefore changes the complexity of the optimization problem. Thus, easy problems can become difficult, and vice versa. The binary GAs are not able to reliably solve problems when mapping the phenotypes to the genotypes.

The non-redundant gray encoding (Schaffer, 1989a) was designed to overcome the problems with the *Hamming cliff* of the binary encoding (Schaffer, 1989b). In the gray encoding, every neighbor of a phenotype is also a neighbor of the corresponding genotype. Therefore, the difficulty of a problem remains unchanged when using mutation-based search operators that only perform small step in the search space. As a result, easy problems and problems of bounded difficulty are easier to solve when using the mutation-based search with the gray coding than that with the binary encoding. Although the gray encoding has high locality, it still changes the distance correspondence between the individuals with bit difference of more than one. When focused on crossover-based search methods, the analysis of the average fitness of the schemata reveals that the gray encoding preserves building block complexity less than the binary encoding. Thus, a decrease in performance of gray-encoded GAs is

The descriptions in above section show that the existing binary genetic representations are not proper for GAs searching and cannot guarantee that using GAs to solve problems of bounded complexity reliably and predictably. According to the theoretical analysis and recommendations for the design of an efficient representation, there are some important points that a genetic representation should try to respect. Common representations for GAs often encode the phenotypes by using a sequence of alleles. The alleles can separated (decomposed) into building blocks (BBs) which do not interact with each other and which determine one specific phenotypic property of the solution. The purpose of the genetic operators is to decompose the whole sequence of alleles by detecting which BBs influence each other. GAs perform well because they can identify best alleles of each BB and combine

Based on above investigation results and recommendations, we have proposed a new genetic representation, which is proper for GAs searching. In this section, first we introduce a novel splicing/decomposable (S/D) binary encoding, then we define the new genotypic distance for the S/D encoding, finally we give the theoretical analysis for the S/D encoding based on the three elements of genetic representation theory (redundancy, scaled BBs and distance

In (Leung, 2002; Xu, 2003a), we have proposed a novel S/D binary encoding for real-value encoding. Assuming the phenotypic domain Φ*p* of the *n* dimensional problem can be

Φ*<sup>p</sup>* = [*α*1, *β*1] × [*α*2, *β*2] ×···× [*αn*, *βn*].

**2.3 The gray encoding and modification of problem difficulty**

unavoidable for some kind of problems (Whitley, 2000).

them to form high-quality over-all solution of the problem.

**3.1 A splicing/decomposable binary encoding**

distortion).

specified by

**3. A novel splicing/decomposable binary genetic representation**

$$\sum\_{j=0}^{l/n} \mathbf{2}^{(l/n-j)} \times \mathfrak{b}\_{j \times n+i} \le \frac{\mathfrak{x}\_i - \mathfrak{a}\_i}{h\_i(l)} < \sum\_{j=0}^{l/n} \mathbf{2}^{(l/n-j)} \times \mathfrak{b}\_{j \times n+i} + 1.$$

That is, the significance of each bit of the encoding can be clearly and uniquely interpreted (hence, each BB of the encoded S/D binary string has a specific meaning). As shown in Figure 1, take Φ*<sup>p</sup>* = [0, 1] × [0, 1] and the S/D binary string *b* = 100101 as an example (in this case, *l* = 6, *n* = 2, and the genotypic precisions *h*1(*l*) = *h*2(*l*) = <sup>1</sup> <sup>8</sup> ). Let us look how to identify the S/D binary string *b* and see what each bit value of *b* means. In Figure 1-(a), the phenotypic domain <sup>Φ</sup>*<sup>p</sup>* is bisected into four <sup>Φ</sup><sup>1</sup> 2 *<sup>p</sup>* (i.e., the subregions with uniform size 1 <sup>2</sup> ). According to the *left*-0 and *right*-1 correspondence rule in each coordinate direction, these four <sup>Φ</sup><sup>1</sup> 2 *<sup>p</sup>* then can be identified with (00),(01),(10) and (11). As the phenotype *x* lies in the

1010 (0.75)

 *n* ∑ *i*=1 ( *l*/*n*−1 ∑ *j*=0

genotypic distances phenotypic distances

<sup>89</sup> A Splicing/Decomposable Binary Encoding

For any two S/D binary strings *a*, *b* ∈ Φ*g*, we can define the Euclidean distance of their

as the phenotypic distance between the S/D binary strings *a* and *b*. The phenotypic distance �·�*<sup>p</sup>* and the genotypic distance �·�*<sup>g</sup>* are equivalents in the S/D binary space Φ*<sup>g</sup>* when we

**Theorem 1:** The phenotypic distance �·�*<sup>p</sup>* and the genotypic distance �·�*<sup>g</sup>* are equivalents

�·�*<sup>p</sup>* ≤�·�*<sup>g</sup>* <sup>≤</sup> <sup>√</sup>*<sup>n</sup>* ×�·�*<sup>p</sup>*

is satisfied in the the S/D binary space Φ*g*, where *n* is the dimensions of the real-encoding

*aj*×*n*+*<sup>i</sup>* − *bj*×*n*+*<sup>i</sup>* <sup>2</sup>*j*+<sup>1</sup> |

*i*=1(∑*l*/*n*−<sup>1</sup> *j*=0

+ ∑1≤*i*1,*i*2≤*<sup>n</sup>*

×| <sup>∑</sup>*l*/*n*−<sup>1</sup> *j*=0

*aj*×*n*+*<sup>i</sup>* − *bj*×*n*+*<sup>i</sup>* <sup>2</sup>*j*+<sup>1</sup> <sup>|</sup>)<sup>2</sup>

> <sup>−</sup>*bj*×*n*+*<sup>i</sup>* <sup>2</sup>*j*+<sup>1</sup> )<sup>2</sup>

> > *j*=0

2 <sup>2</sup>*j*+<sup>1</sup> |) *aj*×*n*+*<sup>i</sup>*

<sup>−</sup>*bj*×*n*+*<sup>i</sup>* 1 <sup>2</sup>*j*+<sup>1</sup> |

*aj*×*n*+*<sup>i</sup>*

*aj*×*n*+*i*−*bj*×*n*+*<sup>i</sup>*

*<sup>i</sup>*1�=*i*<sup>2</sup> (<sup>2</sup> × | <sup>∑</sup>*l*/*n*−<sup>1</sup>

consider the convergence process of GAs. We state this as the following theorem.

*aj*×*n*+*<sup>i</sup>* <sup>2</sup>*j*+<sup>1</sup> − *l*/*n*−1 ∑ *j*=0

*bj*×*n*+*<sup>i</sup>* <sup>2</sup>*j*+<sup>1</sup> )2,

Fig. 2. The genotypic and phenotypic distances between ∗ ∗ ∗∗ and 0000 in the S/D binary

1010 (0.75)

1011 (0.79)

1110 (0.9)

1111 (1.1)

1000 (0.5)

1001 (0.56)

1100 (0.71)

1101 (0.9)

0010 (0.25)

0011 (0.35)

0110 (0.56)

0111 (0.79)

0000 (0.0)

0001 (0.25)

0100 (0.5)

0101 (0.75)

1011 (1.0)

1110 (1.25)

1111 (1.5)

1000 (0.5)

�*a* − *b*�*<sup>p</sup>* =

in the S/D binary space Φ*g* because the inequation:

�*a* − *b*�*<sup>g</sup>* =

*n* ∑ *i*=1 | *l*/*n*−1 ∑ *j*=0

= ( *n* ∑ *i*=1 | *l*/*n*−1 ∑ *j*=0

= ∑*n*

1001 (0.75)

1100 (1.0)

1101 (1.25)

and Its Novel Operators for Genetic and Evolutionary Algorithms

0010 (0.25)

0011 (0.5)

0110 (0.75)

0111 (1.0)

0000 (0.0)

representation.

correspond phenotypes:

phenotypic space Φ*p*. *Proof* : For ∀*a*, *b* ∈ Φ*g*:

0001 (0.25)

0100 (0.5)

0101 (0.75)

subregion (10) (the gray square), its first building block (BB) should be *BB*<sup>1</sup> = 10. This leads to the first two bits of the S/D binary string *b*. Likewise, in Figure 1-(b), Φ*p* is partitioned into 22×<sup>2</sup> <sup>Φ</sup><sup>1</sup> 4 *<sup>p</sup>* , which are obtained through further bisecting each <sup>Φ</sup><sup>1</sup> 2 *<sup>p</sup>* along each direction. Particularly this further divides <sup>Φ</sup><sup>1</sup> 2 *<sup>p</sup>* = (*BB*1) into four <sup>Φ</sup><sup>1</sup> 4 *<sup>p</sup>* that can be respectively labelled by (*BB*1, 00),(*BB*1, 01),(*BB*1, 10) and (*BB*1, 11). The phenotype *x* is in (*BB*1, 01)-subregion (the dark square), so its second BB should be *BB*<sup>2</sup> = 01 and the first four positions of its corresponding S/D binary string *b* is 1001.

In the same way, <sup>Φ</sup>*<sup>p</sup>* is partitioned into 22×<sup>3</sup> <sup>Φ</sup><sup>1</sup> 8 *<sup>p</sup>* as shown in Figure 1-(c), with <sup>Φ</sup><sup>1</sup> 4 *<sup>p</sup>* = (*BB*1, *BB*2) particularly partitioned into four <sup>Φ</sup><sup>1</sup> 8 *<sup>p</sup>* labelled by (*BB*1, *BB*2, 00), (*BB*1, *BB*2, 01), (*BB*1, *BB*2, 10) and (*BB*1, *BB*2, 11). The phenotype *x* is found to be (*BB*1, *BB*2, 01), that is, identical with S/D binary string *b*. This shows that for any three region partitions, *b* = (*b*1, *b*2, *b*3, *b*4, *b*5, *b*6), each bit value *bi* can be interpreted geometrically as follows: *b*<sup>1</sup> = 0 (*b*<sup>2</sup> = 0) means the phenotype *x* is in the left half along the *x*-coordinate direction (the *y*-coordinate direction) in Φ*<sup>p</sup>* partition with <sup>1</sup> <sup>2</sup> -precision, and *b*<sup>1</sup> = 1 (*b*<sup>2</sup> = 1) means *x* is in the right half. Therefore, the first *BB*<sup>1</sup> = (*b*1, *b*2) determine the <sup>1</sup> <sup>2</sup> -precision location of *x*. If *<sup>b</sup>*<sup>3</sup> <sup>=</sup> <sup>0</sup> (*b*<sup>4</sup> <sup>=</sup> <sup>0</sup>), it then further indicates that when <sup>Φ</sup><sup>1</sup> 2 *<sup>p</sup>* is refined into <sup>Φ</sup><sup>1</sup> 4 *<sup>p</sup>* , the *x* lies in the left half of <sup>Φ</sup><sup>1</sup> 2 *<sup>p</sup>* in the *x*-direction (*y*-direction), and it lies in the right half if *b*<sup>3</sup> = 1 (*b*<sup>4</sup> = 1). Thus a more accurate geometric location (i.e., the <sup>1</sup> <sup>4</sup> -precision location) and a more refined *BB*<sup>2</sup> of *x* is obtained. Similarly we can explain *b*<sup>5</sup> and *b*<sup>6</sup> and identify *BB*3, which determine the 1 <sup>8</sup> -precision location of *x*. This interpretation holds for any high-resolution *l* bits S/D binary encoding.

#### **3.2 A new genotypic distance on the splicing/decomposable binary representation**

For measuring the similarity of the binary strings, the Hamming distance (Hamming, 1980) is widely used on the binary space. Hamming distance describes how many bits are different in two binary strings, but cannot consider the scaled property in non-uniformly binary representations. Thus, the distance distortion between the genotypic and the phenotypic spaces make phenotypically easy problem more difficult. Therefore, to make sure that GAs are able to reliably solve easy problems and problems of bounded complexity, the use of equivalent distances is recommended. For this purpose, we have defined a new genotypic distance on the S/D binary space to measure the similarity of the S/D binary strings.

**Definition 1:** Suppose any binary strings *a* and *b* belong to the S/D binary space Φ*g*, the genotypic distance �*a* − *b*�*<sup>g</sup>* is defined as

$$||a - b||\_{\mathcal{S}} = \sum\_{i=1}^{n} | \sum\_{j=0}^{l/n - 1} \frac{a\_{j \times n + i} - b\_{j \times n + i}}{2^{j+1}} |\_{\mathcal{I}}$$

where *l* and *n* denote the length of the S/D binary strings and the dimensions of the real-encoding phenotypic space Φ*p* respectively.

6 S/D Binary Encoding and Its Operators

subregion (10) (the gray square), its first building block (BB) should be *BB*<sup>1</sup> = 10. This leads to the first two bits of the S/D binary string *b*. Likewise, in Figure 1-(b), Φ*p* is partitioned

*<sup>p</sup>* = (*BB*1) into four <sup>Φ</sup><sup>1</sup>

by (*BB*1, 00),(*BB*1, 01),(*BB*1, 10) and (*BB*1, 11). The phenotype *x* is in (*BB*1, 01)-subregion (the dark square), so its second BB should be *BB*<sup>2</sup> = 01 and the first four positions of its

8

8

(*BB*1, *BB*2, 10) and (*BB*1, *BB*2, 11). The phenotype *x* is found to be (*BB*1, *BB*2, 01), that is, identical with S/D binary string *b*. This shows that for any three region partitions, *b* = (*b*1, *b*2, *b*3, *b*4, *b*5, *b*6), each bit value *bi* can be interpreted geometrically as follows: *b*<sup>1</sup> = 0 (*b*<sup>2</sup> = 0) means the phenotype *x* is in the left half along the *x*-coordinate direction (the

of *x* is obtained. Similarly we can explain *b*<sup>5</sup> and *b*<sup>6</sup> and identify *BB*3, which determine the

<sup>8</sup> -precision location of *x*. This interpretation holds for any high-resolution *l* bits S/D binary

For measuring the similarity of the binary strings, the Hamming distance (Hamming, 1980) is widely used on the binary space. Hamming distance describes how many bits are different in two binary strings, but cannot consider the scaled property in non-uniformly binary representations. Thus, the distance distortion between the genotypic and the phenotypic spaces make phenotypically easy problem more difficult. Therefore, to make sure that GAs are able to reliably solve easy problems and problems of bounded complexity, the use of equivalent distances is recommended. For this purpose, we have defined a new genotypic

**Definition 1:** Suppose any binary strings *a* and *b* belong to the S/D binary space Φ*g*, the

where *l* and *n* denote the length of the S/D binary strings and the dimensions of the

*aj*×*n*+*<sup>i</sup>* − *bj*×*n*+*<sup>i</sup>* <sup>2</sup>*j*+<sup>1</sup> <sup>|</sup>,

**3.2 A new genotypic distance on the splicing/decomposable binary representation**

distance on the S/D binary space to measure the similarity of the S/D binary strings.

*n* ∑ *i*=1 | *l*/*n*−1 ∑ *j*=0

�*a* − *b*�*<sup>g</sup>* =

4

2

*<sup>p</sup>* in the *x*-direction (*y*-direction), and it lies in the right half if *b*<sup>3</sup> = 1 (*b*<sup>4</sup> = 1).

2

*<sup>p</sup>* as shown in Figure 1-(c), with <sup>Φ</sup><sup>1</sup>

*<sup>p</sup>* labelled by (*BB*1, *BB*2, 00), (*BB*1, *BB*2, 01),

<sup>2</sup> -precision, and *b*<sup>1</sup> = 1 (*b*<sup>2</sup> = 1) means *x* is

<sup>4</sup> -precision location) and a more refined *BB*<sup>2</sup>

*<sup>p</sup>* is refined into <sup>Φ</sup><sup>1</sup>

*<sup>p</sup>* along each direction.

<sup>2</sup> -precision location of *x*.

*<sup>p</sup>* , the *x* lies in the

4

4 *<sup>p</sup>* =

*<sup>p</sup>* that can be respectively labelled

*<sup>p</sup>* , which are obtained through further bisecting each <sup>Φ</sup><sup>1</sup>

2

in the right half. Therefore, the first *BB*<sup>1</sup> = (*b*1, *b*2) determine the <sup>1</sup>

If *<sup>b</sup>*<sup>3</sup> <sup>=</sup> <sup>0</sup> (*b*<sup>4</sup> <sup>=</sup> <sup>0</sup>), it then further indicates that when <sup>Φ</sup><sup>1</sup>

into 22×<sup>2</sup> <sup>Φ</sup><sup>1</sup>

left half of <sup>Φ</sup><sup>1</sup>

encoding.

1

2

4

Particularly this further divides <sup>Φ</sup><sup>1</sup>

corresponding S/D binary string *b* is 1001.

In the same way, <sup>Φ</sup>*<sup>p</sup>* is partitioned into 22×<sup>3</sup> <sup>Φ</sup><sup>1</sup>

(*BB*1, *BB*2) particularly partitioned into four <sup>Φ</sup><sup>1</sup>

*y*-coordinate direction) in Φ*<sup>p</sup>* partition with <sup>1</sup>

Thus a more accurate geometric location (i.e., the <sup>1</sup>

genotypic distance �*a* − *b*�*<sup>g</sup>* is defined as

real-encoding phenotypic space Φ*p* respectively.



genotypic distances phenotypic distances

Fig. 2. The genotypic and phenotypic distances between ∗ ∗ ∗∗ and 0000 in the S/D binary representation.

For any two S/D binary strings *a*, *b* ∈ Φ*g*, we can define the Euclidean distance of their correspond phenotypes:

$$||a-b||\_p = \sqrt{\sum\_{i=1}^{n} (\sum\_{j=0}^{l/n-1} \frac{a\_{j \times n+i}}{2^{j+1}} - \sum\_{j=0}^{l/n-1} \frac{b\_{j \times n+i}}{2^{j+1}})^2},$$

as the phenotypic distance between the S/D binary strings *a* and *b*. The phenotypic distance �·�*<sup>p</sup>* and the genotypic distance �·�*<sup>g</sup>* are equivalents in the S/D binary space Φ*<sup>g</sup>* when we consider the convergence process of GAs. We state this as the following theorem.

**Theorem 1:** The phenotypic distance �·�*<sup>p</sup>* and the genotypic distance �·�*<sup>g</sup>* are equivalents in the S/D binary space Φ*g* because the inequation:

$$\|\|\cdot\|\|\_{p} \le \|\cdot\|\_{\mathcal{S}} \le \sqrt{n} \times \|\|\cdot\|\|\_{p}$$

is satisfied in the the S/D binary space Φ*g*, where *n* is the dimensions of the real-encoding phenotypic space Φ*p*.

*Proof* : For ∀*a*, *b* ∈ Φ*g*:

$$\begin{split} \| \| a - b \| \|\_{\mathcal{S}} &= \sum\_{i=1}^{n} | \sum\_{j=0}^{l/n - 1} \frac{a\_{j \times n + i} - b\_{j \times n + i}}{2^{j+1}} | \\ &= \sqrt{(\sum\_{i=1}^{n} | \sum\_{j=0}^{l/n - 1} \frac{a\_{j \times n + i} - b\_{j \times n + i}}{2^{j+1}} |)^{2}} \\ &= \sqrt{\frac{\sum\_{i=1}^{n} (\sum\_{j=0}^{l/n - 1} \frac{a\_{j \times n + i} - b\_{j \times n + i}}{2^{j+1}})^{2}}{+ \sum\_{i\_{1} \neq i\_{2}}^{1 \leq i\_{1} \leq n} (2 \times | \sum\_{j=0}^{l/n - 1} \frac{a\_{j \times n + i} - b\_{j \times n + i}}{2^{j+1}} |)}} \\ &\qquad \qquad \qquad \qquad \qquad \sqrt{\frac{\sum\_{j=0}^{l/n - 1} \frac{a\_{j \times n + i - b\_{j \times n + i\_{2}}}}{2^{j+1}} |}{2^{j+1}} |} \end{split}$$

already converged USBBs no yet converged USBBs

In each BB*<sup>i</sup>* of the S/D binary string, which consists of the bits (*bi*×*n*<sup>+</sup>1, *bi*×*n*<sup>+</sup>2, ··· , *<sup>b</sup>*(*i*+1)×*n*), *i* = 0, ··· , *l*/*n* − 1, these bits are uniformly scaled and independent each other. We refer such delicate feature of BB*<sup>i</sup>* to as the uniform-salient BB (USBB). Furthermore, the splicing different number of USBBs can describe the potential solutions of the problem with different precisions. So, the intra-BB difficulty (within building block) and inter-BB difficulty (between building blocks) (Goldberg, 2002) of USBB are low. The theoretical analysis reveals that GAs searching on USBB can explore the high-quality bits faster than GAs on non-uniformly scaled

The S/D binary encoding is redundancy-free representation because using the S/D binary strings to represent the real values is one-to-one genotype-phenotype mapping. The whole S/D binary string is constructed by a non-uniformly scaled sequence of USBBs. The domino convergence of GAs occurs and USBBs are solved sequentially from high to low scaled.

The BB-significance-variable and uniform-salient BB properties of the S/D binary representation embody many important information useful to the GAs searching. We will explore this information to design new GA based on the S/D binary representation in the

The existing exponentially scaled representations including binary and gray encodings consist of non-uniformly scaled BBs. For non-uniformly and competing BBs in the high dimensional phenotype space, there are a lot of noise from the competing BBs lead to a reduction on the performance of GAs. Moreover, by increasing the string length, more and more lower salient BBs are randomly fixed due to the noise from the competing BBs, causing GAs performance to decline. Using large population size can reduce the influence of the noise from the competing BBs. However, in real-world problem, long binary string is necessary to encode a large search space with high precision, and hence we cannot use too large population size to solve the noise problem. Thus, GAs will be premature and cannon converge to the optimum of the

To avoid the noise from the competing BBs of GAs, we have proposed a new splicing/decomposable GA (SDGA) based on the delicate properties of the S/D binary representation. The whole S/D binary string can be decomposed into a non-uniformly scaled sequence of USBBs. Thus, in the searching process of GAs on S/D binary encoding, the

111011011111 0100 010101000110

convergence window USBB

<sup>91</sup> A Splicing/Decomposable Binary Encoding

S/D binary string

and Its Novel Operators for Genetic and Evolutionary Algorithms

Fig. 3. Domino genotypic at the S/D encodings.

**4. A new S/D binary Genetic Algorithm (SDGA)**

BB.

subsequent sections.

problem.

because

$$\begin{aligned} 0 &\leq \begin{aligned} &\sum\_{i\_1 \neq i\_2}^{1 \leq i\_1, i\_2 \leq n} (2 \times |\sum\_{j=0}^{l/n-1} \frac{a\_{j \times n+i} - b\_{j \times n+i\_1}}{2^{j+1}}|) \\ &\times |\sum\_{j=0}^{l/n-1} \frac{a\_{j \times n+i} - b\_{j \times n+i\_2}}{2^{j+1}}|) \end{aligned} \end{aligned}$$

$$\leq (n-1) \sum\_{i=1}^n (\sum\_{j=0}^{l/n-1} \frac{a\_{j \times n+i} - b\_{j \times n+i}}{2^{j+1}})^2,$$

then

$$||a - b||\_p \le ||a - b||\_{\mathcal{S}} \le \sqrt{n} \times ||a - b||\_p.$$

Figure 2 shows the comparison of the genotypic distance �·�*<sup>g</sup>* and phenotypic distance �·�*<sup>p</sup>* between S/D binary strings and 0000 in 2 dimensional phenotypic space, where the length of the S/D binary string *l* = 4. For any two S/D binary strings *a* and *b*, if �*a* − 0�*<sup>p</sup>* > �*b* − 0�*p*, then �*a* − 0�*<sup>g</sup>* > �*b* − 0�*<sup>g</sup>* is also satisfied. This means that �·�*<sup>p</sup>* and �·�*<sup>g</sup>* are equivalent for considering the points' sequence converge to 0. The searching process of GAs can be recognized to explore the points' sequence, which sequentially converge to optimum of the problem. So we can use the new genotypic distance to measure the similarity and convergence of the individuals on the S/D binary place.

The other advantage of the new genotypic distance �·�*<sup>g</sup>* is that its computational complexity is *O*(*l*) and much lower than the computational complexity *O*(*l* <sup>2</sup>) of the phenotypic distance �·�*p*. So using the new genotypic distance �·�*<sup>g</sup>* can guarantee GA to reliably and predictably solve problems of bounded complexity and improve their performance when consider the similarity of the individuals.

#### **3.3 Theoretical analysis of the splicing/decomposable binary encoding**

The above interpretation reveals an important fact that in the new genetic representation the significance of the BB contribution to fitness of a whole S/D binary string varies as its position goes from front to back, and, in particular, the more in front the BB position lies, the more significantly it contributes to the fitness of the whole S/D binary string. We refer such delicate feature of the new representation to as the *BB-significance-variable property*. Actually, it is seen from the above interpretation that the first *n* bits of an encoding are responsible for the location of the *n* dimensional phenotype *x* in a global way (particularly, with *O*( <sup>1</sup> <sup>2</sup> )-precision); the next group of *n* bits is responsible for the location of phenotype *x* in a less global (might be called 'local') way, with *O*( <sup>1</sup> <sup>4</sup> )-precision, and so forth; the last group of *n*-bits then locates phenotype *x* in an extremely local (might be called 'microcosmic') way (particularly, with *O*( <sup>1</sup> <sup>2</sup>�/*<sup>n</sup>* )-precision). Thus, we have seen that as the encoding length *l* increases, the representation

$$(b\_1, b\_2, \dots, b\_n, b\_{n+1}, b\_{n+2}, \dots, b\_{2n}, \dots)$$

$$\begin{split} & \quad b\_{(\ell-n)\prime} b\_{(\ell-n+1)\prime} \cdots \; \; \; \nu\_\ell \mathbf{b}\_\ell \\ &= \left( B B\_{1\prime} B B\_{2\prime} \cdots \; \; \; \; \; B B\_{\ell/\hbar} \right) \end{split}$$

can provide a successive refinement (from global, to local, and to microcosmic), and more and more accurate representation of the problem variables.

8 S/D Binary Encoding and Its Operators

*j*=0

<sup>−</sup>*bj*×*n*+*<sup>i</sup>* 2 <sup>2</sup>*j*+<sup>1</sup> |)

�*<sup>a</sup>* <sup>−</sup> *<sup>b</sup>*�*<sup>p</sup>* ≤ �*<sup>a</sup>* <sup>−</sup> *<sup>b</sup>*�*<sup>g</sup>* <sup>≤</sup> <sup>√</sup>*<sup>n</sup>* × �*<sup>a</sup>* <sup>−</sup> *<sup>b</sup>*�*p*. Figure 2 shows the comparison of the genotypic distance �·�*<sup>g</sup>* and phenotypic distance �·�*<sup>p</sup>* between S/D binary strings and 0000 in 2 dimensional phenotypic space, where the length of the S/D binary string *l* = 4. For any two S/D binary strings *a* and *b*, if �*a* − 0�*<sup>p</sup>* > �*b* − 0�*p*, then �*a* − 0�*<sup>g</sup>* > �*b* − 0�*<sup>g</sup>* is also satisfied. This means that �·�*<sup>p</sup>* and �·�*<sup>g</sup>* are equivalent for considering the points' sequence converge to 0. The searching process of GAs can be recognized to explore the points' sequence, which sequentially converge to optimum of the problem. So we can use the new genotypic distance to measure the similarity and convergence

The other advantage of the new genotypic distance �·�*<sup>g</sup>* is that its computational complexity

�·�*p*. So using the new genotypic distance �·�*<sup>g</sup>* can guarantee GA to reliably and predictably solve problems of bounded complexity and improve their performance when consider the

The above interpretation reveals an important fact that in the new genetic representation the significance of the BB contribution to fitness of a whole S/D binary string varies as its position goes from front to back, and, in particular, the more in front the BB position lies, the more significantly it contributes to the fitness of the whole S/D binary string. We refer such delicate feature of the new representation to as the *BB-significance-variable property*. Actually, it is seen from the above interpretation that the first *n* bits of an encoding are responsible for the location of the *n* dimensional phenotype *x* in a global way (particularly,

group of *n*-bits then locates phenotype *x* in an extremely local (might be called 'microcosmic')

(*b*1, *b*2, ··· , *bn*, *bn*+1, *bn*+2, ··· , *b*2*n*, ··· ,

can provide a successive refinement (from global, to local, and to microcosmic), and more and

*<sup>b</sup>*(�−*n*), *<sup>b</sup>*(�−*n*+1), ··· , *bl*)

= (*BB*1, *BB*2, ··· , *BBl*/*n*)

<sup>2</sup> )-precision); the next group of *n* bits is responsible for the location of phenotype

<sup>2</sup>�/*<sup>n</sup>* )-precision). Thus, we have seen that as the encoding length *l*

*aj*×*n*+*<sup>i</sup>* − *bj*×*n*+*<sup>i</sup>* <sup>2</sup>*j*+<sup>1</sup> )2,

*aj*×*n*+*i*−*bj*×*n*+*<sup>i</sup>*

1 <sup>2</sup>*j*+<sup>1</sup> |

<sup>2</sup>) of the phenotypic distance

<sup>4</sup> )-precision, and so forth; the last

*<sup>i</sup>*1�=*i*<sup>2</sup> (<sup>2</sup> × | <sup>∑</sup>*l*/*n*−<sup>1</sup>

*aj*×*n*+*<sup>i</sup>*

<sup>0</sup> <sup>≤</sup> <sup>∑</sup>1≤*i*1,*i*2≤*<sup>n</sup>*

≤ (*n* − 1)

is *O*(*l*) and much lower than the computational complexity *O*(*l*

*x* in a less global (might be called 'local') way, with *O*( <sup>1</sup>

more accurate representation of the problem variables.

**3.3 Theoretical analysis of the splicing/decomposable binary encoding**

of the individuals on the S/D binary place.

similarity of the individuals.

way (particularly, with *O*( <sup>1</sup>

increases, the representation

with *O*( <sup>1</sup>

×| <sup>∑</sup>*l*/*n*−<sup>1</sup> *j*=0

> *n* ∑ *i*=1 ( *l*/*n*−1 ∑ *j*=0

because

then

Fig. 3. Domino genotypic at the S/D encodings.

In each BB*<sup>i</sup>* of the S/D binary string, which consists of the bits (*bi*×*n*<sup>+</sup>1, *bi*×*n*<sup>+</sup>2, ··· , *<sup>b</sup>*(*i*+1)×*n*), *i* = 0, ··· , *l*/*n* − 1, these bits are uniformly scaled and independent each other. We refer such delicate feature of BB*<sup>i</sup>* to as the uniform-salient BB (USBB). Furthermore, the splicing different number of USBBs can describe the potential solutions of the problem with different precisions. So, the intra-BB difficulty (within building block) and inter-BB difficulty (between building blocks) (Goldberg, 2002) of USBB are low. The theoretical analysis reveals that GAs searching on USBB can explore the high-quality bits faster than GAs on non-uniformly scaled BB.

The S/D binary encoding is redundancy-free representation because using the S/D binary strings to represent the real values is one-to-one genotype-phenotype mapping. The whole S/D binary string is constructed by a non-uniformly scaled sequence of USBBs. The domino convergence of GAs occurs and USBBs are solved sequentially from high to low scaled.

The BB-significance-variable and uniform-salient BB properties of the S/D binary representation embody many important information useful to the GAs searching. We will explore this information to design new GA based on the S/D binary representation in the subsequent sections.

### **4. A new S/D binary Genetic Algorithm (SDGA)**

The existing exponentially scaled representations including binary and gray encodings consist of non-uniformly scaled BBs. For non-uniformly and competing BBs in the high dimensional phenotype space, there are a lot of noise from the competing BBs lead to a reduction on the performance of GAs. Moreover, by increasing the string length, more and more lower salient BBs are randomly fixed due to the noise from the competing BBs, causing GAs performance to decline. Using large population size can reduce the influence of the noise from the competing BBs. However, in real-world problem, long binary string is necessary to encode a large search space with high precision, and hence we cannot use too large population size to solve the noise problem. Thus, GAs will be premature and cannon converge to the optimum of the problem.

To avoid the noise from the competing BBs of GAs, we have proposed a new splicing/decomposable GA (SDGA) based on the delicate properties of the S/D binary representation. The whole S/D binary string can be decomposed into a non-uniformly scaled sequence of USBBs. Thus, in the searching process of GAs on S/D binary encoding, the

**Input:** *N*—population size, *m*—number of USBBs, *g*—number of generations to run; **Termination condition:** Population fully converged;

<sup>93</sup> A Splicing/Decomposable Binary Encoding

**while** (not termination condition) **do**

randomly select two individuals *x*<sup>1</sup>

**if** (USBB*<sup>m</sup>* fully converged) *m* ←− *m* + 1;

In our experimentation, we use integer-specific variations of the one-max and the fully-deceptive trap problems for a comparison of different genetic representations defined

*f*1(*x*1, *x*2, ··· , *xn*) =

∑*n*

where *x* ∈ Φ*<sup>p</sup>* and *n* is the dimension of the problems. In our implementation, we set *n* = 30. For the binary representation, the integer one-max problem is equal to the BinInt problem [Rudnick, 1992]. These two problems have an exponential salience or fitness structure for binary strings. The integer one-max problem is a fully easy problem, whereas the integer

In the first set of experiments we applied a standard GA (SGA) using binary, gray, unary, S/D encodings and SDGA on the integer one-max and deceptive trap problems to compare their performance. We performed 50 runs and each run was stopped after the population was fully converged. That means that all individuals in the population are the same. For fairness of

*<sup>t</sup>* , *<sup>x</sup>*<sup>2</sup>

*n* ∑ *i*=1 *xi*,

*<sup>i</sup>*=<sup>1</sup> *xi* : if each *i*, *xi* = *xi*,*max*

*<sup>i</sup>*=<sup>1</sup> *xi* − 1 : else.

*<sup>i</sup>*=<sup>1</sup> *xi*,*max* <sup>−</sup> <sup>∑</sup>*<sup>n</sup>*

*<sup>t</sup>* and *<sup>x</sup>*<sup>2</sup>

*<sup>t</sup>* into *Pg*+1;

*<sup>t</sup>* from *Pg*;

**for** *t* ←− 1 **to** *N*/2;

**end for**

**end while**

*<sup>f</sup>*2(*x*1, *<sup>x</sup>*2, ··· , *xn*) = <sup>∑</sup>*<sup>n</sup>*

**5.2 Comparison of the performance of GAs with different representations**

deceptive trap should be fully difficult to solve for GAs.

**end**

**5.1 Two integer benchmark optimization problems**

Fig. 5. Pseudocode for SDGA algorithm.

The integer one-max problem is defined as

and the integer deceptive trap is

on binary strings.

Evaluate *Pg*+1;

crossover and selection *x*<sup>1</sup>

mutation operation *Pg*+1;

**begin** *g* ←− 0; *m* ←− 1; Initialize *Pg*; Evaluate *Pg*;

and Its Novel Operators for Genetic and Evolutionary Algorithms

Fig. 4. The genetic crossover and selection in SDGA.

domino convergence occurs and the length of the convergence window is equal to *n*, the length of USBB. As shown in Figure 3 for 4 dimensional case, the high scaled USBBs are already fully converged while the low scaled USBBs did not start to converge yet, and length of the convergence window is 4.

In the SDGA, genetic operators apply from the high scaled to the low scaled USBBs sequentially. The process of the crossover and selection in SDGA is shown in Figure 4. For two individuals *x*<sup>1</sup> and *x*<sup>2</sup> randomly selected from current population, The crossover point randomly set in the convergence window USBB and the crossover operator two children *c*1, *c*2. The parents *x*1, *x*<sup>2</sup> and their children *c*1, *c*<sup>2</sup> can be divided into two pairs {*x*1, *c*1} and {*x*2, *c*2}. In each pair {*xi*, *ci*}(*i* = 1, 2), the parent and child have the same low scaled USBBs. The select operator will conserve the better one of each pair into next generation according to the fitness calculated by the whole S/D binary string for high accuracy. Thus, the bits contributed to high fitness in the convergence window USBB will be preserved, and the diversity at the low scaled USBBs' side will be maintain. The mutation will operate on the convergence window and not yet converged USBBs according to the mutation probability to increase the diversity in the population. These low salient USBBs will converge due to GAs searching to avoid the noise from the competing BBs. The implementation pseudocode for SDGA algorithm is shown in Figure 5.

Since identifying high-quality bits in the convergence window USBB of GAs is faster than that GAs on the non-uniform BB, while no noise from the competing BBs occurs. Thus, population can efficiently converge to the high-quality BB in the position of the convergence window USBB, which are a component of overrepresented optimum of the problem. According to theoretical results of Thierens (Thierens, 1995), the overall convergence time complexity of the new GA with the S/D binary representation is approximately of order *O*(*l*/ <sup>√</sup>*n*), where *<sup>l</sup>* is the length of the S/D binary string and *n* is the dimensions of the problem. This is much faster than working on the binary strings as a whole where GAs have a approximate convergence time of order *O*(*l*). The gain is especially significant for high dimension problems.

#### **5. Empirical verification**

In this section we present an empirical verification of the performance differences between the different genetic representations and operators we described in the previous sections.

10 S/D Binary Encoding and Its Operators

Parents Children

domino convergence occurs and the length of the convergence window is equal to *n*, the length of USBB. As shown in Figure 3 for 4 dimensional case, the high scaled USBBs are already fully converged while the low scaled USBBs did not start to converge yet, and length

In the SDGA, genetic operators apply from the high scaled to the low scaled USBBs sequentially. The process of the crossover and selection in SDGA is shown in Figure 4. For two individuals *x*<sup>1</sup> and *x*<sup>2</sup> randomly selected from current population, The crossover point randomly set in the convergence window USBB and the crossover operator two children *c*1, *c*2. The parents *x*1, *x*<sup>2</sup> and their children *c*1, *c*<sup>2</sup> can be divided into two pairs {*x*1, *c*1} and {*x*2, *c*2}. In each pair {*xi*, *ci*}(*i* = 1, 2), the parent and child have the same low scaled USBBs. The select operator will conserve the better one of each pair into next generation according to the fitness calculated by the whole S/D binary string for high accuracy. Thus, the bits contributed to high fitness in the convergence window USBB will be preserved, and the diversity at the low scaled USBBs' side will be maintain. The mutation will operate on the convergence window and not yet converged USBBs according to the mutation probability to increase the diversity in the population. These low salient USBBs will converge due to GAs searching to avoid the noise from the competing BBs. The implementation pseudocode for SDGA algorithm is shown in

Since identifying high-quality bits in the convergence window USBB of GAs is faster than that GAs on the non-uniform BB, while no noise from the competing BBs occurs. Thus, population can efficiently converge to the high-quality BB in the position of the convergence window USBB, which are a component of overrepresented optimum of the problem. According to theoretical results of Thierens (Thierens, 1995), the overall convergence time complexity of the

the length of the S/D binary string and *n* is the dimensions of the problem. This is much faster than working on the binary strings as a whole where GAs have a approximate convergence

In this section we present an empirical verification of the performance differences between the

new GA with the S/D binary representation is approximately of order *O*(*l*/

time of order *O*(*l*). The gain is especially significant for high dimension problems.

different genetic representations and operators we described in the previous sections.

1111 10 11 1011

Fig. 4. The genetic crossover and selection in SDGA.

1111 01 10 0101

Pair 1

Pair 2

of the convergence window is 4.

Figure 5.

**5. Empirical verification**

1111 01 11 0101

<sup>√</sup>*n*), where *<sup>l</sup>* is

1111 10 10 0101

```
Input: N—population size, m—number of USBBs,
      g—number of generations to run;
Termination condition: Population fully converged;
begin
  g ←− 0;
  m ←− 1;
  Initialize Pg;
  Evaluate Pg;
  while (not termination condition) do
   for t ←− 1 to N/2;
   randomly select two individuals x1
                                     t and x2
                                            t from Pg;
   crossover and selection x1
                            t , x2
                               t into Pg+1;
   end for
   mutation operation Pg+1;
   Evaluate Pg+1;
   if (USBBm fully converged) m ←− m + 1;
  end while
end
```
Fig. 5. Pseudocode for SDGA algorithm.

#### **5.1 Two integer benchmark optimization problems**

In our experimentation, we use integer-specific variations of the one-max and the fully-deceptive trap problems for a comparison of different genetic representations defined on binary strings.

The integer one-max problem is defined as

$$f\_1(\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) = \sum\_{i=1}^n x\_{i\nu}$$

and the integer deceptive trap is

$$f\_2(\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) = \begin{cases} \sum\_{i=1}^n \mathbf{x}\_i \text{ : if each } i, \mathbf{x}\_i = \mathbf{x}\_{i, \max} \\ \sum\_{i=1}^n \mathbf{x}\_{i, \max} - \sum\_{i=1}^n \mathbf{x}\_i - 1 \text{ : else.} \end{cases}$$

where *x* ∈ Φ*<sup>p</sup>* and *n* is the dimension of the problems. In our implementation, we set *n* = 30. For the binary representation, the integer one-max problem is equal to the BinInt problem [Rudnick, 1992]. These two problems have an exponential salience or fitness structure for binary strings. The integer one-max problem is a fully easy problem, whereas the integer deceptive trap should be fully difficult to solve for GAs.

#### **5.2 Comparison of the performance of GAs with different representations**

In the first set of experiments we applied a standard GA (SGA) using binary, gray, unary, S/D encodings and SDGA on the integer one-max and deceptive trap problems to compare their performance. We performed 50 runs and each run was stopped after the population was fully converged. That means that all individuals in the population are the same. For fairness of

(b)

20 60 100 140 180 220 260 300

<sup>20</sup> <sup>60</sup> <sup>100</sup> <sup>140</sup> <sup>180</sup> <sup>220</sup> <sup>260</sup> <sup>300</sup> <sup>0</sup>

(b)

population size

(b)

20 60 100 140 180 220 260 300

population size

200

0

500

1000

generation

SGA with different scaled binary representations including binary, gray and S/D encodings complies the noise from the competing BBs. For small population sizes, the noise from the competing BBs strongly occurs and many bits in the binary strings are randomly fixed, so SGA fully converged faster but the best fitness is too bad. That means SGA is premature using small population sizes. For larger population sizes, SGA can explore better solutions, but its run duration is significantly increasing due to the noise from the competing BBs. Furthermore, for these high dimensional problems, the population size increases to 300 still not enough to avoid the noise from the competing BBs, so SGA cannot converge to the optima of the

Due to the problems of the unary encoding with redundancy, which result in an underrepresentation of the optimal solution, SGA using unary encoding perform increasingly badly with increasing problem orders. Therefore, for one-max and deceptive trap problems of order more than three the performance of SGA using unary encoding performance is significantly worse than when using binary, gray and S/D encodings. SGA with gray encoding performs worse than the binary encoding for the one-max problems, and better

As expected, SGA using S/D encoding performs better than that using binary and gray encodings for the one-max and the deceptive trap problems. Because in S/D encoding, more salient bits are continuous to construct short and high fit BBs, which are easily identified by SGA. This reveals that the S/D encoding is proper for GAs searching. However, lower salient bits in S/D binary string are randomly fixed by the noise from the competing BBs, the

1500

400

generation

<sup>95</sup> A Splicing/Decomposable Binary Encoding

<sup>20</sup> <sup>60</sup> <sup>100</sup> <sup>140</sup> <sup>180</sup> <sup>220</sup> <sup>260</sup> <sup>300</sup> <sup>50</sup>

 SDGA S/D Coding Binary Gray Unary

(a)

and Its Novel Operators for Genetic and Evolutionary Algorithms

population size

(a)

20 60 100 140 180 220 260 300

 SDGA S/D Coding Binary Gray Unary

population size

60

fitness

Fig. 8. Deceptive trap problem of order 2.

Fig. 9. Deceptive trap problem of order 3.

problems, which are overrepresented by BBs.

for the deceptive trap problems.

70

fitness

80

90

600

800

population size

Fig. 6. Integer one-max problem of order 3.

generation

Fig. 7. Integer one-max problem of order 5.

comparison, we implemented SGA with different binary encodings and SDGA with the same parameter setting and the same initial population. For SGA, we used one-point crossover operator (crossover probability=1) and tournament selection operator without replacement of size two. We used no mutation as we wanted to focus on the influence of genetic representations on selectorecombinative GAs.

For the one-max problem, we used 30 dimensional problem for order 2 (in each dimension, the number of different phenotypes *s* = 22 = 4), 3 (*s* = 23 = 8), 4 (*s* = 24 = 16) and 5 (*s* = 25 = 32). Because in our implementation, the global optima of deceptive trap problems with low orders cannon be explored by all GAs we used. The deceptive trap problems with high orders are more difficult than those with low orders and are not solvable by GAs. Here, we only present results for the 30 dimensional deceptive trap problems of order 2 (*s* = 22 = 4) and 3 (*s* = 23 = 8). Using binary, gray and S/D encoding results for the order 2 problems in a string length *l* = 60, for order 3 in *l* = 90, for order 4 in *l* = 120, and for order 5 in *l* = 150. When using unary encoding we need 30 × 3 = 90 bits for order 2, 30 × 7 = 210 bits for order 3, 30 × 15 = 450 bits for order 4 and 30 × 31 = 930 bits for order 5 problems.

Figures 6-7 present the results for the integer one-max problem of orders 3 and 5 respectively, and Figures 8-9 show the results for integer deceptive trap problems of orders 2 and 3 respectively. The plots show for SGA with different representations and SDGA the best fitness at the end of the run (left) and the run duration — fully converged generation (right) with respect to the population size *N*.

Fig. 8. Deceptive trap problem of order 2.

12 S/D Binary Encoding and Its Operators

> SDGA S/D Coding Binary Gray Unary

generation

comparison, we implemented SGA with different binary encodings and SDGA with the same parameter setting and the same initial population. For SGA, we used one-point crossover operator (crossover probability=1) and tournament selection operator without replacement of size two. We used no mutation as we wanted to focus on the influence of genetic

For the one-max problem, we used 30 dimensional problem for order 2 (in each dimension, the number of different phenotypes *s* = 22 = 4), 3 (*s* = 23 = 8), 4 (*s* = 24 = 16) and 5 (*s* = 25 = 32). Because in our implementation, the global optima of deceptive trap problems with low orders cannon be explored by all GAs we used. The deceptive trap problems with high orders are more difficult than those with low orders and are not solvable by GAs. Here, we only present results for the 30 dimensional deceptive trap problems of order 2 (*s* = 22 = 4) and 3 (*s* = 23 = 8). Using binary, gray and S/D encoding results for the order 2 problems in a string length *l* = 60, for order 3 in *l* = 90, for order 4 in *l* = 120, and for order 5 in *l* = 150. When using unary encoding we need 30 × 3 = 90 bits for order 2, 30 × 7 = 210 bits for order

Figures 6-7 present the results for the integer one-max problem of orders 3 and 5 respectively, and Figures 8-9 show the results for integer deceptive trap problems of orders 2 and 3 respectively. The plots show for SGA with different representations and SDGA the best fitness at the end of the run (left) and the run duration — fully converged generation (right) with

3, 30 × 15 = 450 bits for order 4 and 30 × 31 = 930 bits for order 5 problems.

generation

20 60 100 140 180 220 260 300

(a)

 SDGA S/D Coding Binary Gray Unary

(a)

population size

<sup>20</sup> <sup>60</sup> <sup>100</sup> <sup>140</sup> <sup>180</sup> <sup>220</sup> <sup>260</sup> <sup>300</sup> <sup>400</sup>

population size

Fig. 6. Integer one-max problem of order 3.

Fig. 7. Integer one-max problem of order 5.

representations on selectorecombinative GAs.

respect to the population size *N*.

fitness

fitness

20 60 100 140 180 220 260 300

(b)

population size

(b)

<sup>20</sup> <sup>60</sup> <sup>100</sup> <sup>140</sup> <sup>180</sup> <sup>220</sup> <sup>260</sup> <sup>300</sup> <sup>0</sup>

population size

population size

(b)

Fig. 9. Deceptive trap problem of order 3.

SGA with different scaled binary representations including binary, gray and S/D encodings complies the noise from the competing BBs. For small population sizes, the noise from the competing BBs strongly occurs and many bits in the binary strings are randomly fixed, so SGA fully converged faster but the best fitness is too bad. That means SGA is premature using small population sizes. For larger population sizes, SGA can explore better solutions, but its run duration is significantly increasing due to the noise from the competing BBs. Furthermore, for these high dimensional problems, the population size increases to 300 still not enough to avoid the noise from the competing BBs, so SGA cannot converge to the optima of the problems, which are overrepresented by BBs.

Due to the problems of the unary encoding with redundancy, which result in an underrepresentation of the optimal solution, SGA using unary encoding perform increasingly badly with increasing problem orders. Therefore, for one-max and deceptive trap problems of order more than three the performance of SGA using unary encoding performance is significantly worse than when using binary, gray and S/D encodings. SGA with gray encoding performs worse than the binary encoding for the one-max problems, and better for the deceptive trap problems.

As expected, SGA using S/D encoding performs better than that using binary and gray encodings for the one-max and the deceptive trap problems. Because in S/D encoding, more salient bits are continuous to construct short and high fit BBs, which are easily identified by SGA. This reveals that the S/D encoding is proper for GAs searching. However, lower salient bits in S/D binary string are randomly fixed by the noise from the competing BBs, the

<sup>0</sup> <sup>10</sup> <sup>20</sup> <sup>30</sup> <sup>40</sup> <sup>50</sup> <sup>60</sup> <sup>70</sup> <sup>80</sup> <sup>90</sup> <sup>0</sup>

<sup>97</sup> A Splicing/Decomposable Binary Encoding

population size = 20

population size = 60

population size =200

population size =300

bit position

one-max problems with orders 4 and 5 are longer than those of SGA because SGA is strongly

As in Table 1 described, for one-max and deceptive trap problems, all GAs converge to sidewise of the optima, which are overrepresented by BBs. But SGA with different binary representation cannot explore the optima of the problems. The ability of SDGA to explore optima, which are overrepresented by BBs, is significantly better than SGA. To explore the global optimum of the deceptive trap problems, we need use other niche methods to divide the whole population into some sub-populations. In each subpopulation, the global optimum is overrepresented by BBs, thus SDGA can efficiently explore this global optimum of the

To validate the predictions about avoiding the noise from the competing BBs, We have implemented our SDGA to solve 30 dimensional integer one-max problem of order 3. We have counted the number of generations it takes before each of bits fully converges. Results are averaged over 50 independent runs. Figure 10 shows the bits convergence for a string of length *l* = 90, and the population sizes are 20, 100, 200, 300 respectively. The experimental results are summered in Table 2. The run duration of each USBB*i*, (*i* = 1, 2, 3) is an average of

As shown in Figure 10 and Table 2, the whole S/D binary string includes three USBBs. In each USBB, the bits converge uniformly at almost same generations. For a non-uniform scaled sequence of USBBs, the domino converge occurs sequentially from high scaled to low scaled

Fig. 10. Convergence process of SDGA without the noise from the competing BBs.

premature for the long binary string and small population sizes.

the fully converged generations of the bits, which belong to the USBB*i*.

100

deceptive trap problems.

**5.3 Avoid the noise from the competing BBs**

200

300

generation

400

500

600

700

and Its Novel Operators for Genetic and Evolutionary Algorithms

performance of SGA with S/D encoding cannot significantly better than those with binary and gray encodings.

As shown Figure 6-9, the performance of SDGA is significantly better than SGA with different encodings. Using small population size, the explored solutions when SDGA fully converged are much better than those of SGA because each bit is identified by the searching process of SDGA, and not randomly fixed by the noise from the competing BBs. According to the same reason, the run duration of SDGA is longer than that of SGA. That means there no premature and drift occur. For larger population sizes, the performance of SDGA is much better than that of SGA due to the high-quality solutions and short run duration, because GAs search on USBBs of S/D binary encoding faster than the non-uniformly scaled BBs and domino converge, which occurs only on the non-uniformly sequence of USBBs, is too weak.


Table 1. Comparison of results of SGA with different binary representations and SDGA for the one-max and deceptive problems.

Table 1 summarizes the experimental results for the one-max and the deceptive trip problems. The best fitness ( run duration) of each problem is calculated as the average of the fitness (generations) GAs fully converged with different population sizes.

The average fitness of SDGA is much better than that of other SGA. The standard deviations of best fitness and run duration of SDGA for different problems are significantly smaller than other SGA. That reveals the population size is important parameter for SGA searching, but does not the significant parameter for SDGA searching. The run durations of SDGA for 14 S/D Binary Encoding and Its Operators

performance of SGA with S/D encoding cannot significantly better than those with binary

As shown Figure 6-9, the performance of SDGA is significantly better than SGA with different encodings. Using small population size, the explored solutions when SDGA fully converged are much better than those of SGA because each bit is identified by the searching process of SDGA, and not randomly fixed by the noise from the competing BBs. According to the same reason, the run duration of SDGA is longer than that of SGA. That means there no premature and drift occur. For larger population sizes, the performance of SDGA is much better than that of SGA due to the high-quality solutions and short run duration, because GAs search on USBBs of S/D binary encoding faster than the non-uniformly scaled BBs and domino converge, which occurs only on the non-uniformly sequence of USBBs, is too weak.

> one-max (order 2) one-max (order 3) one-max (order 4) *Pm* best fit. run dur. best fit. run dur. best fit. run dur. (St. Dev.) (St. Dev.) (St. Dev.) (St. Dev.) (St. Dev.) (St. Dev.)

(1.24) (43.6) (2.9) (77.4) (6.8) (107.2) S/D 81.1 446.1 180.9 597 375.9 694.9 coding (9.8) (187.4) (21.16) (287) (54.3) (377.2) Binary 80.1 473.7 177.7 651 370.5 748.8

(10.3) (192.7) (21.9) (316.8) (42.2) (398) Gray 78.3 496.9 173.1 691.2 365.2 803.6

(9.6) (196.3) (20.5) (328.5) (42.2) (434.8)

(10.6) (218.5) (21.3) (416.7) (26.6) (558.4) one-max (order 5) decep. (order 2) decep. (order 3)

(9.8) (118.2) (0.78) (48) (2.8) (75.6)

(91) (481.6) (9.1) (192) (21.1) (334.8)

(87.9) (502) (9.4) (183) (21.8) (309.5)

(72.4) (726.9) (10.5) (221) (20.6) (451.9)

SDGA 89.6 383.1 209.2 577.3 448.1 768.7

Unary 76.1 536.8 150.5 844.2 281.5 1006

SDGA 926.6 952.9 88.74 380 208.1 573.1

S/D 777.1 761.8 80.02 428 182.9 602.9 coding (101) (422.4) (9.7) (173) (21.6) (285.4) Binary 752.6 838.6 77.16 482 172.8 690.1

Gray 719.8 909.5 78.76 453 177.9 647

Unary 560.8 1216 74.18 549 150.7 882.7

Table 1. Comparison of results of SGA with different binary representations and SDGA for

Table 1 summarizes the experimental results for the one-max and the deceptive trip problems. The best fitness ( run duration) of each problem is calculated as the average of the fitness

The average fitness of SDGA is much better than that of other SGA. The standard deviations of best fitness and run duration of SDGA for different problems are significantly smaller than other SGA. That reveals the population size is important parameter for SGA searching, but does not the significant parameter for SDGA searching. The run durations of SDGA for

(generations) GAs fully converged with different population sizes.

the one-max and deceptive problems.

and gray encodings.

Fig. 10. Convergence process of SDGA without the noise from the competing BBs.

one-max problems with orders 4 and 5 are longer than those of SGA because SGA is strongly premature for the long binary string and small population sizes.

As in Table 1 described, for one-max and deceptive trap problems, all GAs converge to sidewise of the optima, which are overrepresented by BBs. But SGA with different binary representation cannot explore the optima of the problems. The ability of SDGA to explore optima, which are overrepresented by BBs, is significantly better than SGA. To explore the global optimum of the deceptive trap problems, we need use other niche methods to divide the whole population into some sub-populations. In each subpopulation, the global optimum is overrepresented by BBs, thus SDGA can efficiently explore this global optimum of the deceptive trap problems.

#### **5.3 Avoid the noise from the competing BBs**

To validate the predictions about avoiding the noise from the competing BBs, We have implemented our SDGA to solve 30 dimensional integer one-max problem of order 3. We have counted the number of generations it takes before each of bits fully converges. Results are averaged over 50 independent runs. Figure 10 shows the bits convergence for a string of length *l* = 90, and the population sizes are 20, 100, 200, 300 respectively. The experimental results are summered in Table 2. The run duration of each USBB*i*, (*i* = 1, 2, 3) is an average of the fully converged generations of the bits, which belong to the USBB*i*.

As shown in Figure 10 and Table 2, the whole S/D binary string includes three USBBs. In each USBB, the bits converge uniformly at almost same generations. For a non-uniform scaled sequence of USBBs, the domino converge occurs sequentially from high scaled to low scaled

*N* = 20 *N* = 40 *N* = 60 *Pm* best fit. run dur. best fit. run dur. best fit. run dur. (St. Dev.) (St. Dev.) (St. Dev.) (St. Dev.) (St. Dev.) (St. Dev.) 0 198.6 393 208.9 470 210 488 (5.7) (72) (1.2) (55) (0) (54) 0.001 201.7 411 209.4 472 210 517 (100) (49) (1.2) (43) (0) (54) 0.005 202.7 422 208.9 492 210 535 (2.9) (55) (1.3) (82) (0) (89) 0.01 203.8 415 209.1 504 210 545 (2.2) (59) (1.2) (76) (0) (80) 0.05 209.3 534 209.9 739 210 1202 (1) (158) (0.3) (202) (0) (317) 0.1 209.8 688 210 5629 210 66514 (0.6) (133) (0) (1857) (0) (21328)

<sup>99</sup> A Splicing/Decomposable Binary Encoding

and Its Novel Operators for Genetic and Evolutionary Algorithms

0.2 209.8 <sup>10981</sup> − − − − (0.4) (7668) (−) (−) (−) (−)

Table 3. Comparison of results of SDGA with different mutation probabilities for one-max

Mutation Probability = 0.1 Mutation Probability = 0.05 Mutation Probability = 0.01 Mutation Probability = 0.005 Mutation Probability = 0.001 Mutation Probability = 0

20 40 60 80 100 120 140 160 180 200 220 240 260 280 300

population size

Fig. 11. SDGA with the mutation operator by different mutation probabilities for one-max

problem of order 3. ("-": cannot fully converged during 106 generations)

400

problem of order 3.

600 800 1000

2000

generation

4000

6000

10000


Table 2. Comparison of the run durations of USBBs fully converged with different population sizes.(Standard Deviation)

USBBs. Thus, no less salient bit converges with more salient bit at same generations and no noise from the competing BBs occurs.

On the other hand, we know the noise from the competing BBs strongly occurs when GAs using a small population size. In our implementations, when the population size of SDGA is small to 20, the convergence process of bits is as same as SDGA using large population size. The low scaled USBBs converge during long generations by SDGA and no noise from the competing BBs occurs.

It is clear form Figure 10 and Table 2 that the predictions and the experimental results coincide very well.

#### **5.4 SDGA with the mutation operator**

In this subsection we have consider the action of the mutation operator for SDGA searching. We have implemented our SDGA with different mutation probabilities to solve 30 dimensional integer one-max problem of order 3. Results are averaged over 50 independent runs. Figure 11 presents the experimental results where mutation probabilities are 0.001, 0.005, 0.01, 0.05 and 0.1 respectively. The plots show for SDGA the run duration fully converged generations with respect to the population size *N*.

As shown in Figure 11, when the mutation probabilities are smaller than 0.01, SDGA can fully converge with small and large population sizes and the run durations do not increase too long. When the mutation probabilities increase larger than 0.01, SDGA with large population sizes are difficult to fully converge, and only when using small population sizes, SDGA can fully converge, but the run durations increase significantly.

Table 3 summaries the experimental results with population sizes 20, 40 and 60. For small population sizes (20 and 40), the mutation operators can improve the performance of SDGA, because it can find some high-quality bits, which are not included in current population. For large population sizes (≥ 60), all high-quality bits are included in the initial population, so mutation operator cannot improve the best fitness when SDGA fully converged. Furthermore, when the mutation probability is large than 0.01, SDGA cannot fully converge in a reasonable time (here we set the upper bound of the run duration equal to 106 generations).

#### **5.5 Genotypic distance on the S/D binary representation**

To validate the predictions about the methods depended on the distance of real-valued space, can be directly used on the S/D binary space based on our new defined genotypic distance, we have combined SGA with the S/D binary encoding and the dynamic niche sharing methods 16 S/D Binary Encoding and Its Operators

population run duration run duration run duration size of USBB1 of USBB2 of USBB3 47.3(8.2) 193.7(12.7) 365.6(13.8) 116.6(6.8) 263.2(7.8) 470.8(12.1) 167.4(7.7) 366.5(6.7) 559.6(13.9) 220.3(7.0) 430.8(6.6) 633.6(7.8)

USBBs. Thus, no less salient bit converges with more salient bit at same generations and no

On the other hand, we know the noise from the competing BBs strongly occurs when GAs using a small population size. In our implementations, when the population size of SDGA is small to 20, the convergence process of bits is as same as SDGA using large population size. The low scaled USBBs converge during long generations by SDGA and no noise from the

It is clear form Figure 10 and Table 2 that the predictions and the experimental results coincide

In this subsection we have consider the action of the mutation operator for SDGA searching. We have implemented our SDGA with different mutation probabilities to solve 30 dimensional integer one-max problem of order 3. Results are averaged over 50 independent runs. Figure 11 presents the experimental results where mutation probabilities are 0.001, 0.005, 0.01, 0.05 and 0.1 respectively. The plots show for SDGA the run duration —

As shown in Figure 11, when the mutation probabilities are smaller than 0.01, SDGA can fully converge with small and large population sizes and the run durations do not increase too long. When the mutation probabilities increase larger than 0.01, SDGA with large population sizes are difficult to fully converge, and only when using small population sizes, SDGA can

Table 3 summaries the experimental results with population sizes 20, 40 and 60. For small population sizes (20 and 40), the mutation operators can improve the performance of SDGA, because it can find some high-quality bits, which are not included in current population. For large population sizes (≥ 60), all high-quality bits are included in the initial population, so mutation operator cannot improve the best fitness when SDGA fully converged. Furthermore, when the mutation probability is large than 0.01, SDGA cannot fully converge in a reasonable

To validate the predictions about the methods depended on the distance of real-valued space, can be directly used on the S/D binary space based on our new defined genotypic distance, we have combined SGA with the S/D binary encoding and the dynamic niche sharing methods

time (here we set the upper bound of the run duration equal to 106 generations).

fully converged generations with respect to the population size *N*.

fully converge, but the run durations increase significantly.

**5.5 Genotypic distance on the S/D binary representation**

Table 2. Comparison of the run durations of USBBs fully converged with different

population sizes.(Standard Deviation)

noise from the competing BBs occurs.

**5.4 SDGA with the mutation operator**

competing BBs occurs.

very well.


Table 3. Comparison of results of SDGA with different mutation probabilities for one-max problem of order 3. ("-": cannot fully converged during 106 generations)

Fig. 11. SDGA with the mutation operator by different mutation probabilities for one-max problem of order 3.

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

(b)

Hamming distance

0

0.2

0.4

0.6

fitness

<sup>101</sup> A Splicing/Decomposable Binary Encoding

Fig. 13. Comparison of results of the dynamic niche sharing methods with S/D genotypic distance and Hamming distance for *f*4(*x*). (key: "o" — the optima in the final population)

genotypic distance can explore all optima in *f*3(*x*) − *f*6(*x*) at each run. Contrary, for the niche methods with Hamming distance, the final population converged to a single optimum of the multimodal problem and cannot find multiply optima. That means the niche method cannon work due to the distance distortion between genotypic space (S/D binary space) and

The experimental investigations reveal that the methods depended on the Euclidean distance on the real-valued space can be directly used on the S/D binary space with our new defined

> Distance S/D genotypic distance Hamming distance threshold Optima No. Success rate Optima No. Success rate *f*<sup>3</sup> 2.0 2 100% 1 0% *f*<sup>4</sup> 0.16 5 100% 1 0% *f*<sup>5</sup> 0.16 5 100% 1 0% *f*<sup>6</sup> 0.8 6 100% 1 0%

This paper has given for the first time a uniform-salient building block (USBB) in the S/D binary representation, which include uniformly scaled bits. This assumes that the phenotypic space Φ*p* is uniformly scaled in each dimension. If the assumption is not be satisfied, we need to normalize the phenotypic space Φ*p* first, then encoding the normalized phenotypic space

*<sup>p</sup>* into the S/D binary space Φ*<sup>g</sup>* to guarantee that the bits in each USBB have same scaled. SDGA applies on the S/D binary representation and converges from high scaled to low scaled USBBs sequentially. However, when the convergence window USBB cannon converge to single high-quality BB, there maybe are some high-quality BBs existing to describe different optima of the problem. At this time, we need to use some other methods (e.g. the niche methods) to divide the whole population into several sub-populations and each sub-population focus on each optimum. Thus, each optimum will be overrepresented by

Table 5. Comparison of results of the dynamic niche sharing methods with the S/D

0.8

1

(a)

and Its Novel Operators for Genetic and Evolutionary Algorithms

S/D coding genotypic distance

phenotypic space (real-valued space) when using Hamming distance.

0

genotypic distance.

**6. Discussion**

Φ�

genotypic distance and Hamming distance.

0.2

0.4

0.6

fitness

0.8

1

Fig. 12. Comparison of results of the dynamic niche sharing methods with S/D genotypic distance and Hamming distance for *f*3(*x*). (key: "o" — the optima in the final population)

[Miller] for multimodal function optimization to solve 4 benchmark multimodal optimization problems as listed in Table 4. To assess the effectiveness of the new genotypic distance on the S/D binary space, its performance is compared with the combination of SGA with S/D binary representation and the dynamic niche sharing methods based on Hamming distance. In applying SGA, we set the initial population size *N* = 100, the maximal generations *gmx* = 1000, the length of S/D binary string for each dimension *l*/*n* = 32, the crossover probability *pc* = 0.8 and the mutation probability *pm* = 0.005.

Two-peak trap function (2 peaks):

$$f\_3(\mathbf{x}) = \begin{cases} \frac{200}{2}(2-\mathbf{x}) \text{, for } 0 \le \mathbf{x} < \mathbf{2};\\\\ \frac{190}{18}(\mathbf{x}-\mathbf{2}) \text{, for } \mathbf{2} \le \mathbf{x} \le \mathbf{20};\end{cases}$$

Deb's function (5 peaks):

*<sup>f</sup>*4(*x*) = sin6(5*πx*), *<sup>x</sup>* <sup>∈</sup> [0, 1];

Deb's decreasing function (5 peaks):

$$f\_5(\mathfrak{x}) = 2^{-2((\mathfrak{x} - 0.1)/0.9)^2} \sin^6(5\pi\mathfrak{x}), \mathfrak{x} \in [0, 1];$$

Roots function (6 peaks):

$$f\_6(\mathbf{x}) = \frac{1}{1 + |\mathbf{x}^6 - 1|}, \text{ where } \mathbf{x} \in \mathsf{C}, \mathbf{x} = \mathbf{x}\_1 + i\mathbf{x}\_2 \in [-2, 2];$$

Table 4. The test suite of multimodal functions used in our experiments.

Figures 12 - 15 show the comparison results of the dynamic niche sharing methods with the S/D genotypic distance and Hamming distance for *f*3(*x*) − *f*6(*x*), respectively. Table 5 lists the solution quality comparison results in terms of the numbers of multiple optima maintained. We have run each algorithm 10 times. The dynamic niche sharing methods with the S/D

Fig. 13. Comparison of results of the dynamic niche sharing methods with S/D genotypic distance and Hamming distance for *f*4(*x*). (key: "o" — the optima in the final population)

genotypic distance can explore all optima in *f*3(*x*) − *f*6(*x*) at each run. Contrary, for the niche methods with Hamming distance, the final population converged to a single optimum of the multimodal problem and cannot find multiply optima. That means the niche method cannon work due to the distance distortion between genotypic space (S/D binary space) and phenotypic space (real-valued space) when using Hamming distance.

The experimental investigations reveal that the methods depended on the Euclidean distance on the real-valued space can be directly used on the S/D binary space with our new defined genotypic distance.


Table 5. Comparison of results of the dynamic niche sharing methods with the S/D genotypic distance and Hamming distance.

#### **6. Discussion**

18 S/D Binary Encoding and Its Operators

0

50

100

fitness

Fig. 12. Comparison of results of the dynamic niche sharing methods with S/D genotypic distance and Hamming distance for *f*3(*x*). (key: "o" — the optima in the final population)

<sup>2</sup> (2 − *x*), for 0 ≤ *x* < 2;

<sup>18</sup> (*x* − 2), for 2 ≤ *x* ≤ 20;

sin6(5*πx*), *<sup>x</sup>* <sup>∈</sup> [0, 1];

Figures 12 - 15 show the comparison results of the dynamic niche sharing methods with the S/D genotypic distance and Hamming distance for *f*3(*x*) − *f*6(*x*), respectively. Table 5 lists the solution quality comparison results in terms of the numbers of multiple optima maintained. We have run each algorithm 10 times. The dynamic niche sharing methods with the S/D

, where *x* ∈ *C*, *x* = *x*<sup>1</sup> + *ix*<sup>2</sup> ∈ [−2, 2];

[Miller] for multimodal function optimization to solve 4 benchmark multimodal optimization problems as listed in Table 4. To assess the effectiveness of the new genotypic distance on the S/D binary space, its performance is compared with the combination of SGA with S/D binary representation and the dynamic niche sharing methods based on Hamming distance. In applying SGA, we set the initial population size *N* = 100, the maximal generations *gmx* = 1000, the length of S/D binary string for each dimension *l*/*n* = 32, the crossover probability

150

200

0 5 10 15 20

(b)

Hamming distance

0 5 10 15 20

(a)

S/D coding genotypic distance

*pc* = 0.8 and the mutation probability *pm* = 0.005.

⎧ ⎨ ⎩ 200

190

*<sup>f</sup>*4(*x*) = sin6(5*πx*), *<sup>x</sup>* <sup>∈</sup> [0, 1];

*<sup>f</sup>*5(*x*) = <sup>2</sup>−2((*x*−0.1)/0.9)<sup>2</sup>

<sup>1</sup> + |*x*<sup>6</sup> − <sup>1</sup>|

Table 4. The test suite of multimodal functions used in our experiments.

*<sup>f</sup>*6(*x*) = <sup>1</sup>

*f*3(*x*) =

Two-peak trap function (2 peaks):

Deb's decreasing function (5 peaks):

Deb's function (5 peaks):

Roots function (6 peaks):

0

50

100

fitness

150

200

This paper has given for the first time a uniform-salient building block (USBB) in the S/D binary representation, which include uniformly scaled bits. This assumes that the phenotypic space Φ*p* is uniformly scaled in each dimension. If the assumption is not be satisfied, we need to normalize the phenotypic space Φ*p* first, then encoding the normalized phenotypic space Φ� *<sup>p</sup>* into the S/D binary space Φ*<sup>g</sup>* to guarantee that the bits in each USBB have same scaled.

SDGA applies on the S/D binary representation and converges from high scaled to low scaled USBBs sequentially. However, when the convergence window USBB cannon converge to single high-quality BB, there maybe are some high-quality BBs existing to describe different optima of the problem. At this time, we need to use some other methods (e.g. the niche methods) to divide the whole population into several sub-populations and each sub-population focus on each optimum. Thus, each optimum will be overrepresented by

and predictably solve problems of bounded complexity and the methods depended on the Euclidean distance for solving different kinds of optimization problems can be directly used

<sup>103</sup> A Splicing/Decomposable Binary Encoding

This research was supported by Macau Science and Technology Develop Funds (Grant No. 021/2008/A) and (Grant No. 017/2010/A2) of Macau Special Administrative Region of the

Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning.

Hamming, R. (1980). Coding and information theory. Prentice-Hall. Han, K. H. & Kim, J. H.

La Jolla, CA. Harik, G. R., Cantu-Paz, E., Goldberg, D. E. & Miller, B. L. (1997). The gambler's

Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor, MI: University

Leung, K. S., Sun, J. Y. & Xu, Z. B. (2002). Efficiency speed-up strategies for evolutionary

Leung, K. S. & Liang, Y. (2003). Adaptive elitist-population based genetic algorithm

Liepins, G. E. & Vose, M. D. (1990). Representational issues in genetic optimization. Journal of

Lobo, F. G., Goldberg. D. E. & Pelikan, M. (2000). Time complexity of genetic algorithms

Mahfoud, S. W. (1996). Niching methods for genetic algorithms. Doctoral Thesis, University

Miller, B. L. & Goldberg, D. E. (1996). Optimal sampling for genetic algorithms (IlliGAL Report No. 96005). Urbana, IL: University of Illinois at Urbana-Champaign. Rothlauf, F. (2002). Representations for genetic and evolutionary algorithms. Heidelberg;

Experimental and Theoretical Artificial Intelligence, 2, pp.101-115.

Computation Conference 2003: Volume 1, pp. 1160-1171, Chicago, USA. Liang, Y. & Leung, K. S. (2006). Evolution Strategies with Exclusion-based Selection Operators

1999: Volume 1. San Francisco, CA: Morgan Kaufmann Publishers.

(2000). Genetic quantum algorithm and its application to combinatorial optimization problem, Proceeding of Congress on Evolutionary Computation 2000: Volume 1, pp.

ruin problem, genetic algorithms and the size of populations. In Back, T. (Ed.), Proceedings of the Forth International Conference on Evolutionary Computation, pp.

of Michigan Press. Julstrom, B. A. (1999). Redundant genetic encodings may not be harmful. Proceedings of the Genetic and Evolutionary Computation Conference

computation: an adaptive implementation. Engineering Computations, 19 (3), pp.

for multimodal function optimization. Proceeding of Genetic and Evolutionary

and a Fourier Series Auxiliary Function, Applied Mathematics and Computation,

on exponentially scaled problems. Proceedings of the Genetic and Evolutionary Computation Conference 2000: Volume 1. San Francisco, CA: Morgan Kaufmann

New York: Physica-Verl., 2002 Schaffer, J. D. (Ed.) (1989a). Proceedings of the

on the S/D binary space.

People's Republic of China.

1354-1360,

272-304.

Publishers.

Volume 174, pp. 1080-1109.

of Illinois at Urbana-Champaign.

7-12, New York.

Reading, MA: Addison-Wesley.

and Its Novel Operators for Genetic and Evolutionary Algorithms

**8. Acknowledgment**

**9. References**

Fig. 14. Comparison of results of the dynamic niche sharing methods with S/D genotypic distance and Hamming distance for *f*5(*x*). (key: "o" — the optima in the final population)

Fig. 15. Comparison of results of the dynamic niche sharing methods with S/D genotypic distance and Hamming distance for *f*6(*x*). (key: "o" — the optima in the final population)

BBs in its sub-population and SDGA can efficiently explore all the optima using these sub-populations.

#### **7. Conclusions**

In this paper, we introduce a new genetic representation — a splicing/decomposable (S/D) binary encoding, which was proposed based on some theoretical guidance and existing recommendations for designing efficient genetic representations. The S/D binary representation can be spliced and decomposed to describe potential solutions of the problem with different precisions by different number of uniform-salient building blocks (USBBs). According to the characteristics of the S/D binary representation, genetic and evolutionary algorithms (GEAs) can be applied from the high scaled to the low scaled BBs sequentially to avoid the noise from the competing BBs and improve GEAs' performance. Our theoretical and empirical investigations reveal that the S/D binary representation is more proper than other existing binary encodings for GEAs searching. Moreover, we define a new genotypic distance on the S/D binary space, which is equivalent to the Euclidean distance on the real-valued space during GEAs convergence. Based on the new genotypic distance, GEAs can reliably and predictably solve problems of bounded complexity and the methods depended on the Euclidean distance for solving different kinds of optimization problems can be directly used on the S/D binary space.

#### **8. Acknowledgment**

This research was supported by Macau Science and Technology Develop Funds (Grant No. 021/2008/A) and (Grant No. 017/2010/A2) of Macau Special Administrative Region of the People's Republic of China.

#### **9. References**

20 S/D Binary Encoding and Its Operators

0

0.2

0.4

0.6

fitness

Fig. 14. Comparison of results of the dynamic niche sharing methods with S/D genotypic distance and Hamming distance for *f*5(*x*). (key: "o" — the optima in the final population)

2

Fig. 15. Comparison of results of the dynamic niche sharing methods with S/D genotypic distance and Hamming distance for *f*6(*x*). (key: "o" — the optima in the final population)

BBs in its sub-population and SDGA can efficiently explore all the optima using these

In this paper, we introduce a new genetic representation — a splicing/decomposable (S/D) binary encoding, which was proposed based on some theoretical guidance and existing recommendations for designing efficient genetic representations. The S/D binary representation can be spliced and decomposed to describe potential solutions of the problem with different precisions by different number of uniform-salient building blocks (USBBs). According to the characteristics of the S/D binary representation, genetic and evolutionary algorithms (GEAs) can be applied from the high scaled to the low scaled BBs sequentially to avoid the noise from the competing BBs and improve GEAs' performance. Our theoretical and empirical investigations reveal that the S/D binary representation is more proper than other existing binary encodings for GEAs searching. Moreover, we define a new genotypic distance on the S/D binary space, which is equivalent to the Euclidean distance on the real-valued space during GEAs convergence. Based on the new genotypic distance, GEAs can reliably

0.8

1

0 0.2 0.4 0.6 0.8 1

(b)

Hamming distance

(b)

−2

Hamming distance

−2

0

2 0

0.5

fitness

1

0

2

0 0.2 0.4 0.6 0.8 1

(a)

S/D coding genotypic distance

(a)

−2

S/D coding genotypic distance

−2

0

2 0

sub-populations.

**7. Conclusions**

0.5

fitness

1

0

0

0.2

0.4

0.6

fitness

0.8

1


**6** 

Popa Rustem

*Romania* 

*"Dunarea de Jos" University of Galati* 

**Genetic Algorithms: An Overview** 

**with Applications in Evolvable Hardware** 

The genetic algorithm (GA) is an optimization and search technique based on the principles of genetics and natural selection. A GA allows a population composed of many individuals to evolve under specified selection rules to a state that maximizes the "fitness" (i.e., minimizes the cost function). The fundamental principle of natural selection as the main evolutionary principle has been formulated by Charles Darwin, without any knowledge about genetic mechanism. After many years of research, he assumed that parents qualities mix together in the offspring organism. Favorable variations are preserved, while the unfavorable are rejected. There are more individuals born than can survive, so there is a continuous struggle for life. Individuals with an advantage have a greater chance for survive i.e., the "survival of the fittest". This theory arose serious objections to its time, even after the discovering of the Mendel's laws, and only in 1920s "was it proved that Mendel's genetics and Darwin's theory of natural selection are in no way conflicting and that their

The dynamical principles underlying Darwin's concept of evolution have been used to provide the basis for a new class of algorithms that are able to solve some difficult problems in computation. These "computational equivalents of natural selection, called evolutionary algorithms, act by successively improving a set or generation of candidate solutions to a given problem, using as a criterion how fit or adept they are at solving the problem." (Forbes, 2005). Evolutionary algorithms (EAs) are highly parallel, which makes solving these difficult problems more tractable, although usually the computation effort is

In this chapter we focus on some applications of the GAs in Digital Electronic Design, using the concept of extrinsic Evolvable Hardware (EHW). But first of all, we present the genesis of the main research directions in Evolutionary Computation, the structure of a Simple Genetic Algorithm (SGA), and a classification of GAs, taking into account the state of the art

In the 1950s and the 1960s several computer scientists independently studied evolutionary systems with the idea that evolution could be used as an optimization tool for engineering

happy marriage yields modern evolutionary theory" (Michalewicz, 1996).

**1. Introduction** 

huge.

in this field of research.

**2. A brief history of evolutionary computation** 

Third International Conference on Genetic Algorithms. San Francisco, CA: Morgan Kaufmann Publishers

