**2.4. Main differences between DE and CMA-ES**

6 Will-be-set-by-IN-TECH

• *N*(0, *C*(*g*)) is a multivariate normal distribution with zero mean and covariance matrix *C*(*g*)

Once the sample solutions are evaluated, we can select the current best *μ* solutions, where

covariance matrix *C*(*g*+1) and the new step size control *σ*(*g*+1) can be computed in order to

The probability distribution for sampling the next generation follows a normal distribution. The new mean *m*(*g*+1) of the search distribution is a weighted average of the *μ* selected best

*<sup>N</sup>* , as shown in Eq. 5:

(*g*+1)

*<sup>N</sup>* from Eq. 4.

*<sup>i</sup>*=<sup>1</sup> *wi* = 1

*μ* ∑ *i*=1 *wix* (*g*+1)

Thus the calculation of the mean can also be interpreted as a recombination step (typically by setting the weights *wi* = 1/*μ*). Notice that the best *μ* points are taken from the new current

Adapting the covariance matrix of the distribution is a complex step, that consists of three sub-procedures: the rank-*μ*-update, the rank-one-update and accumulation. They are similar

*μ* is the traditional parameter of Evolution Strategies. Then the new mean *m*(*g*+1)

(*g*+1)

• *μ* ≤ *N*, *μ* best points are selected in the parent population of size *N*.

*m*(*g*+1) =

(*g*+1) <sup>1</sup> ,..., *x* *N*(0, *C*(*g*)

) (4)

*<sup>i</sup>*:*<sup>N</sup>* (5)

, the new

More formally, the basic equation for sampling the search points (step 1) is:

*<sup>k</sup>* <sup>←</sup> *<sup>m</sup>*(*g*) <sup>+</sup> *<sup>σ</sup>*(*g*)

*x* (*g*+1)

*<sup>k</sup>* is the k-th offspring drawn at generation g + 1

• *m*(*g*) is the mean value of the search distribution at generation *g* • *σ*(*g*) is the "overall" standard deviation (or step-size) at generation *g*

prepare the next iteration, as explained in the following section.

(*g*+1) <sup>1</sup> ,..., *x*

*<sup>i</sup>*:*<sup>N</sup>* , *i*-th best individual out of *x*

generation, so there is no elitism.

• *<sup>w</sup>*<sup>1</sup> <sup>≥</sup> ... <sup>≥</sup> *<sup>w</sup><sup>μ</sup>* are the weight coefficients with <sup>∑</sup>*<sup>μ</sup>*

• *k* ∈ 1, ..., *N* is an index over the population size

*2.3.2. Sampling step*

• *g* is the generation number

where:

• *x*

(*g*+1)

at generation *g*

*2.3.4. Update step*

where:

• *x*

(*g*+1)

points from the sample *x*

*2.3.3. Evaluation and selection step*

The Differential Evolution method and the CMA Evolution Strategy are often compared, since they are both population-based continuous optimization heuristics. Unlike DE, CMA-ES is based on strong theoretical aspects that allow it to exhibit several invariances that make it a robust local search strategy, see [12]. Indeed it was shown to achieve superior performance versus state-of-the art global search strategies (e.g. see [26]). On the other hand and in comparison with most search algorithms, DE is very simple and straightforward both to implement and to understand. This simplicity is a key factor in its popularity especially for practitioners from other fields.

Despite or maybe thanks to its simplicity, DE also exhibits very good performance when compared to state-of-the art search methods. Furthermore the number of control parameters in DE remains surprisingly small for an evolutionary scheme (*Cr*, *F* and *N*) and a large amount of work has been proposed to select the best equation for the construction of the variant vector.

As explained in [27], the space complexity of DE is low when compared to the most competitive optimizers like CMA-ES. Although CMA-ES remains very competitive over problems up to 100 variables, it is difficult to extend it to higher dimensional problems due mainly to the cost of computing and updating the covariance matrix.

Evolving programs which are typically a mix of discrete and continuous features (e.g. regression problems) is an interesting challenge for these heuristics, since they were not designed for this kind of task.
