**2.3. Covariance matrix adaptation evolution strategy**

Among continuous optimization methods, DE was often compared (in e.g. [22, 23]) to the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), initially proposed in [12]. The CMA Evolution Strategy is an evolutionary algorithm for difficult non-linear non-convex optimization problems in continuous domains. It is typically applied to optimization problems of search space dimensions between three and one hundred. CMA-ES was designed to exhibit several invariances: (a) invariance against order preserving (i.e. strictly monotonic) transformations of the objective function value; (b) invariance against angle preserving transformations of the search space (e.g rotation, reflection); (c) scale invariance. Invariances are highly desirable as they usually imply a good behavior of the search strategy on ill-conditioned and on non-separable problems.

In this section we only introduce the main CMA-ES concepts, and refer the interested reader to the original paper for a full presentation of this heuristic. An abundant literature has brought several refinements to this algorithm (e.g. [24] and [25]), and has shown its strong interest as a continuous optimization method.

## *2.3.1. Principles*

The basic CMA-ES idea is sampling search points using a normal distribution that is centered on an updated model of the ideal solution. This ideal solution can be seen as a weighted mean of a best subset of current search points. The distribution is also shaped by the covariance matrix of the best solutions sampled in the current iteration. This fundamental scheme was refined mainly on two points:


The main steps can be summed-up as:


#### 6 Will-be-set-by-IN-TECH 32 Genetic Programming – New Approaches and Successful Applications Continuous Schemes for Program Evolution <sup>7</sup>

#### *2.3.2. Sampling step*

More formally, the basic equation for sampling the search points (step 1) is:

$$\mathbf{x}\_k^{(g+1)} \leftarrow \boldsymbol{m}^{(g)} + \sigma^{(g)} \mathbf{N}(\mathbf{0}, \mathbf{C}^{(g)}) \tag{4}$$

to a Principal Component Analysis of steps, sequentially in time and space. The goal of the

Continuous Schemes for Program Evolution 33

In addition to the covariance matrix adaptation rule, a step-size control is introduced, that adapts the overall scale of the distribution based on information obtained by the evolution path. If the evolution path is long and single steps are pointing more or less to the same direction, the step-size should be increased. On the other hand, if the evolution path is short and single steps cancel each other out, then we probably oscillate around an optimum, thus

For the sake of simplicity, the details of the update of the covariance matrix *C* and step-size

The Differential Evolution method and the CMA Evolution Strategy are often compared, since they are both population-based continuous optimization heuristics. Unlike DE, CMA-ES is based on strong theoretical aspects that allow it to exhibit several invariances that make it a robust local search strategy, see [12]. Indeed it was shown to achieve superior performance versus state-of-the art global search strategies (e.g. see [26]). On the other hand and in comparison with most search algorithms, DE is very simple and straightforward both to implement and to understand. This simplicity is a key factor in its popularity especially for

Despite or maybe thanks to its simplicity, DE also exhibits very good performance when compared to state-of-the art search methods. Furthermore the number of control parameters in DE remains surprisingly small for an evolutionary scheme (*Cr*, *F* and *N*) and a large amount of work has been proposed to select the best equation for the construction of the variant vector. As explained in [27], the space complexity of DE is low when compared to the most competitive optimizers like CMA-ES. Although CMA-ES remains very competitive over problems up to 100 variables, it is difficult to extend it to higher dimensional problems due

Evolving programs which are typically a mix of discrete and continuous features (e.g. regression problems) is an interesting challenge for these heuristics, since they were not

We propose to use Differential Evolution and CMA Evolution Strategy to evolve float vectors, which will be mapped to sequences of imperative instructions in order to form linear programs, similar to the LGP scheme from [6]. For the sake of simplicity, these schemes are

• LDEP, for Linear Differential Evolutionary Programming, when DE is used as the

mainly to the cost of computing and updating the covariance matrix.

**3. Linear programs with continuous representation**

adaptation mechanism is to increase the probability of successful consecutive steps.

the step-size should be decreased.

practitioners from other fields.

designed for this kind of task.

respectively denoted:

evolutionary engine;

control are beyond the scope of this chapter.

**2.4. Main differences between DE and CMA-ES**

where:


#### *2.3.3. Evaluation and selection step*

Once the sample solutions are evaluated, we can select the current best *μ* solutions, where *μ* is the traditional parameter of Evolution Strategies. Then the new mean *m*(*g*+1) , the new covariance matrix *C*(*g*+1) and the new step size control *σ*(*g*+1) can be computed in order to prepare the next iteration, as explained in the following section.

#### *2.3.4. Update step*

The probability distribution for sampling the next generation follows a normal distribution. The new mean *m*(*g*+1) of the search distribution is a weighted average of the *μ* selected best points from the sample *x* (*g*+1) <sup>1</sup> ,..., *x* (*g*+1) *<sup>N</sup>* , as shown in Eq. 5:

$$m^{(g+1)} = \sum\_{i=1}^{\mu} w\_i x\_{i:N}^{(g+1)} \tag{5}$$

where:


Thus the calculation of the mean can also be interpreted as a recombination step (typically by setting the weights *wi* = 1/*μ*). Notice that the best *μ* points are taken from the new current generation, so there is no elitism.

Adapting the covariance matrix of the distribution is a complex step, that consists of three sub-procedures: the rank-*μ*-update, the rank-one-update and accumulation. They are similar to a Principal Component Analysis of steps, sequentially in time and space. The goal of the adaptation mechanism is to increase the probability of successful consecutive steps.

In addition to the covariance matrix adaptation rule, a step-size control is introduced, that adapts the overall scale of the distribution based on information obtained by the evolution path. If the evolution path is long and single steps are pointing more or less to the same direction, the step-size should be increased. On the other hand, if the evolution path is short and single steps cancel each other out, then we probably oscillate around an optimum, thus the step-size should be decreased.

For the sake of simplicity, the details of the update of the covariance matrix *C* and step-size control are beyond the scope of this chapter.
