**2.2. Differential evolution**

2 Will-be-set-by-IN-TECH

program is represented by a float vector that is translated to a linear sequence of imperative

The chapter is organized in the following way. The first section introduces the Differential Evolution and CMA Evolution Strategy schemes, focusing on the similarities and main differences. We then present our continuous schemes, LDEP and CMA-LEP, respectively based on DE and CMA-ES. We show that these schemes are easily implementable as plug-ins for DE and CMA-ES. In Section 4, we compare the performance of these two schemes, and

In this section we present DE and CMA-ES, that form the main components of the

To our knowledge O'Neill and Brabazon were the firsts to use DE to evolve programs within the well known framework of Grammatical Evolution (GE) [13]. In GE, a population of variable length binary strings is decoded using a Backus Naur Form (BNF) formal grammar definition into a syntactically correct program. The genotype-to-phenotype mapping process allows to use almost any BNF grammars and so to evolve programs in many different languages. GE has been applied to various problems ranging from symbolic regression problems or robot control [14] to physical-based animal animations [15] including neural network evolution, or financial applications [16]... In [13], Grammatical Differential Evolution is defined by retaining the GE grammar decoding process for generating phenotypes, with genotypes being evolved with DE. A diverse selection of benchmarks from the GP literature were tackled with four different flavors of GE. Even if the experimental results indicated that the grammatical differential evolution approach was outperformed by standard GP on three

More recently, Veenhuis also introduced a successful application of DE for automatic programming in [17], mapping a continuous genotype to trees, so called Tree based Differential Evolution (TreeDE). TreeDE improved somewhat on the performance of grammatical differential evolution, but it requires an additional low-level parameter, the tree depth of solutions, that has to be set beforehand. Moreover evolved programs do not include

Another recent proposal for program evolution based on DE is called Geometric Differential Evolution, and was issued in [18]. These authors introduced a formal generalization of DE to keep the same geometric interpretation of the search dynamic across diverse representations, either for continuous or combinatorial spaces. This scheme is interesting, although it has some limitations: it is not possible to model the search space of Koza style subtree crossover for example. Anyway, experiments on four standard benchmarks against Langdon's homologous

Our proposal differs from these previous works by being based on Banzhaf's Linear GP representation of solutions. This allows us to implement real-valued constant management

instructions, *a la* LGP.

random constants.

crossover GP were promising.

also traditional GP, over a range of benchmarks.

**2. Continuous evolutionary schemes**

evolutionary algorithms used in our experiments.

**2.1. Previous works on evolving programs with DE**

of the four problems, the results were somewhat encouraging.

This section only introduces the main Differential Evolution (DE) concepts. The interested reader might refer to [11] for a full presentation. DE is a population-based search algorithm that draws inspiration from the field of evolutionary computation, even if it is not usually viewed as a typical evolutionary algorithm.

DE is a real-valued, vector based, heuristic for minimizing possibly non-differentiable and non linear continuous space functions. As most evolutionary schemes, DE can be viewed as a stochastic directed search method. But instead of randomly mating two individuals (like crossover in Genetic Algorithms), or generating random offspring from an evolved probability distribution (like PBIL [19] or CMA-ES [20]), DE takes the difference vector of two randomly chosen population vectors to perturb an existing vector. This perturbation is made for every individual (vector) inside the population. A newly perturbated vector is kept in the population only if it has a better fitness than its previous version.

## *2.2.1. Principles*

DE is a search method working on a set or population *X* = (*X*1, *X*2,..., *XN*) of *N* solutions that are *d*−dimensional float vectors, trying to optimize a fitness (or objective) function *<sup>f</sup>*(*Xi*)*i*∈[1,*N*] : **<sup>R</sup>***<sup>d</sup>* <sup>→</sup> **<sup>R</sup>**.

DE can be roughly decomposed into an initialization phase and three very simple steps that are iterated on:


At the beginning of the algorithm, the initial population is randomly initialized and evaluated using the fitness function *f* . Then new potential individuals are created: a new trial solution is created for every vector *Xj*, in two steps called mutation and crossover. A selection process is triggered to determine whether or not the trial solution replaces the vector *Xj* in the population.

#### *2.2.2. Mutation*

Let *t* indicate the number of the current iteration (or generation), for each vector *Xj*(*t*) of the population, a variant vector *Vj*(*t* + 1)=(*vj*1, *vj*2,..., *vjd*) is generated according to Eq. 1:

#### 4 Will-be-set-by-IN-TECH 30 Genetic Programming – New Approaches and Successful Applications Continuous Schemes for Program Evolution <sup>5</sup>

$$V\_j(t+1) = X\_{r\_1}(t) + F \times (X\_{r\_2}(t) - X\_{r\_3}(t)) \tag{1}$$

*2.2.5. Iteration and stop criterion*

These three steps (mutation, crossover, selection) are looped over until a stop criterion is triggered: typically a maximum number of evaluations/iterations is allowed, or a given value of fitness is reached. Overall DE is quite simple, only needing three parameters: the

Continuous Schemes for Program Evolution 31

Among continuous optimization methods, DE was often compared (in e.g. [22, 23]) to the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), initially proposed in [12]. The CMA Evolution Strategy is an evolutionary algorithm for difficult non-linear non-convex optimization problems in continuous domains. It is typically applied to optimization problems of search space dimensions between three and one hundred. CMA-ES was designed to exhibit several invariances: (a) invariance against order preserving (i.e. strictly monotonic) transformations of the objective function value; (b) invariance against angle preserving transformations of the search space (e.g rotation, reflection); (c) scale invariance. Invariances are highly desirable as they usually imply a good behavior of the search strategy

In this section we only introduce the main CMA-ES concepts, and refer the interested reader to the original paper for a full presentation of this heuristic. An abundant literature has brought several refinements to this algorithm (e.g. [24] and [25]), and has shown its strong interest as

The basic CMA-ES idea is sampling search points using a normal distribution that is centered on an updated model of the ideal solution. This ideal solution can be seen as a weighted mean of a best subset of current search points. The distribution is also shaped by the covariance matrix of the best solutions sampled in the current iteration. This fundamental scheme was

• extracting more information from the history of the optimization run; this is done through the so-called accumulation path whose idea is akin to the momentum of artificial neural

• allocating an increasing computational effort via an increasing population size in a classic

3. the probability distribution is updated according to a best subset of the evaluated points

population size *N*, the crossover rate *CR*, and the scaling factor *F*.

**2.3. Covariance matrix adaptation evolution strategy**

on ill-conditioned and on non-separable problems.

a continuous optimization method.

refined mainly on two points:

algorithm restart scheme.

The main steps can be summed-up as:

2. the sample points are evaluated

1. sample points are drawn according to the current distribution

4. iterate to step 1, until the stop criterion is reached

*2.3.1. Principles*

networks;

where:


Many variants were proposed for equation 1, including the use of more than 3 individuals. According to [17, 21], the mutation method that is the more robust over a set of experiments is the method DE/best/2/bin, defined by Eq. 2:

$$V\_j(t+1) = \mathbf{X\_{best}}(t) + F \times \left(\mathbf{X\_{r\_1}(t)} + \mathbf{X\_{r\_2}(t)} - \mathbf{X\_{r\_3}(t)} - \mathbf{X\_{r\_4}(t)}\right) \tag{2}$$

where *X*best(*t*) is the best individual in the population at the current generation. This method DE/best/2/bin is used throughout the chapter.

#### *2.2.3. Crossover*

As explained in [11], the crossover step ensures to increase or at least to maintain the diversity. Each trial vector is partly crossed with the variant vector. The crossover scheme ensures that at least one vector component will be crossovered.

The trial vector *Uj*(*t* + 1)=(*uj*1, *uj*2,..., *ujd*) is generated using Eq. 3:

$$u\_{ji}(t+1) = \begin{cases} v\_{ji}(t+1) & \text{if} \quad (rand \le \text{CR}) \quad \text{or } j = mbr(i) \\ x\_{ji}(t) & \text{if} \quad (rand > \text{CR}) \quad \text{and } j \ne mbr(i) \end{cases} \tag{3}$$

where:


#### *2.2.4. Selection*

The selection step decides whether the trial solution *Ui*(*t* + 1) replaces the vector *Xi*(*t*) or not. The trial solution is compared to the target vector *Xi*(*t*) using a greedy criterion. Here we assume a minimization framework: if *f*(*Ui*(*t* + 1)) < *f*(*Xi*(*t*)), then *Xi*(*t* + 1) = *Ui*(*t* + 1) otherwise the old value is kept: *Xi*(*t* + 1) = *Xi*(*t*) .

### *2.2.5. Iteration and stop criterion*

4 Will-be-set-by-IN-TECH

• *r*1, *r*<sup>2</sup> and *r*<sup>3</sup> are three mutually *different* randomly selected indices in the population that

• the scaling factor *F* is a real constant which controls the amplification of differential evolution and avoids the stagnation in the search process — typical values for F are in

Many variants were proposed for equation 1, including the use of more than 3 individuals. According to [17, 21], the mutation method that is the more robust over a set of experiments

where *X*best(*t*) is the best individual in the population at the current generation. This method

As explained in [11], the crossover step ensures to increase or at least to maintain the diversity. Each trial vector is partly crossed with the variant vector. The crossover scheme ensures that

• *vji*(*t* + 1) is the jth component of the current variant vector *Vj*(*t* + 1) (see above Eq. 1 and 2);

• *rnbr*(*i*) is a randomly chosen index drawn in the range [1, *d*] independently for each vector *Xi*(*t*) which ensures that *Uj*(*t* + 1) gets at least one component from the variant vector

The selection step decides whether the trial solution *Ui*(*t* + 1) replaces the vector *Xi*(*t*) or not. The trial solution is compared to the target vector *Xi*(*t*) using a greedy criterion. Here we assume a minimization framework: if *f*(*Ui*(*t* + 1)) < *f*(*Xi*(*t*)), then *Xi*(*t* + 1) = *Ui*(*t* + 1)

• *CR* is the crossover rate in the range [0, 1] which has to be determined by the user;

*Vj*(*t* + 1) = *X*best(*t*) + *F* × (*Xr*<sup>1</sup> (*t*) + *Xr*<sup>2</sup> (*t*) − *Xr*<sup>3</sup> (*t*) − *Xr*<sup>4</sup> (*t*)) (2)

*vji*(*<sup>t</sup>* <sup>+</sup> <sup>1</sup>) if (*rand* <sup>≤</sup> *CR*) or *<sup>j</sup>* <sup>=</sup> *rnbr*(*i*)

*xji*(*t*) if (*rand* <sup>&</sup>gt; *CR*) and *<sup>j</sup>* � *rnbr*(*i*) (3)

• The expression (*Xr*<sup>2</sup> (*t*) − *Xr*<sup>3</sup> (*t*)) is referred to as the difference vector.

where:

the range [0, 2].

*2.2.3. Crossover*

where:

*Vj*(*t* + 1).

*2.2.4. Selection*

are also different from the current index *j*.

is the method DE/best/2/bin, defined by Eq. 2:

DE/best/2/bin is used throughout the chapter.

at least one vector component will be crossovered.

*uji*(*t* + 1) =

• *xji*(*t*) is the jth component of vector *Xi*(*t*);

otherwise the old value is kept: *Xi*(*t* + 1) = *Xi*(*t*) .

The trial vector *Uj*(*t* + 1)=(*uj*1, *uj*2,..., *ujd*) is generated using Eq. 3:

• *rand* is a random float drawn uniformly in the range [0, 1[;

*Vj*(*t* + 1) = *Xr*<sup>1</sup> (*t*) + *F* × (*Xr*<sup>2</sup> (*t*) − *Xr*<sup>3</sup> (*t*)) (1)

These three steps (mutation, crossover, selection) are looped over until a stop criterion is triggered: typically a maximum number of evaluations/iterations is allowed, or a given value of fitness is reached. Overall DE is quite simple, only needing three parameters: the population size *N*, the crossover rate *CR*, and the scaling factor *F*.
