**Surrogate-Based Optimization**

Zhong-Hua Han and Ke-Shi Zhang

*School of Aeronautics, Northwestern Polytechnical University, Xi'an, P.R. China* 

### **1. Introduction**

342 Real-World Applications of Genetic Algorithms

Ramaswami, R. & Sivarajan, K.N.(1996) Design of logical topologies for wavelength-routed

Zang, H. et al. (2000). A review of routing and wavelength assignment approaches for

840-851

pp. 47-60.

optical networks. *IEEE Journal on Selected Areas in Communications*, vol. 14, no. 5, pp.

wavelength-routed optical WDM networks, *Optical Networks Magazine*, vol. 1, no. 1,

Surrogate-based optimization (Queipo et al. 2005, Simpson et al. 2008) represents a class of optimization methodologies that make use of surrogate modeling techniques to quickly find the local or global optima. It provides us a novel optimization framework in which the conventional optimization algorithms, e.g. gradient-based or evolutionary algorithms are used for sub-optimization(s). Surrogate modeling techniques are of particular interest for engineering design when high-fidelity, thus expensive analysis codes (e.g. Computation Fluid Dynamics (CFD) or Computational Structural Dynamics (CSD)) are used. They can be used to greatly improve the design efficiency and be very helpful in finding global optima, filtering numerical noise, realizing parallel design optimization and integrating simulation codes of different disciplines into a process chain. Here the term "surrogate model" has the same meaning as "response surface model", "metamodel", "approximation model", "emulator" etc. This chapter aims to give an overview of existing surrogate modeling techniques and issues about how to use them for optimization.

### **2. Overview of surrogate modeling techniques**

For optimization problems, surrogate models can be regarded as approximation models for the cost function (s) and state function (s), which are built from sampled data obtained by randomly probing the design space (called sampling via Design of Experiment (DoE)). Once the surrogate models are built, an optimization algorithm such as Genetic Algorithms (GA) can be used to search the new design (based on the surrogate models) that is most likely to be the optimum. Since the prediction with a surrogate model is generally much more efficient than that with a numerical analysis code, the computational cost associated with the search based on the surrogate models is generally negligible.

Surrogate modeling is referred to as a technique that makes use of the sampled data (observed by running the computer code) to build surrogate models, which are sufficient to predict the output of an expensive computer code at untried points in the design space. Thus, how to choose sample points, how to build surrogate models, and how to evaluate the accuracy of surrogate models are key issues for surrogate modeling.

### **2.1 Design of experiments**

To build a surrogate model, DoE methods are usually used to determine the locations of sample points in the design space. DoE is a procedure with the general goal of maximizing

Surrogate-Based Optimization 345

With the above descriptions and assumptions, our objective here is to build a surrogate model for predicting the output of the computer code for any untried site **x** (that is, to estimate *y*( ) **x** ) based on the sampled date sets ( **S** , <sup>S</sup> **y** ), in an attempt to achieve the desired

Here we use "RSM" to denote a polynomial approximation model in which the sampled data is fitted by a least-square regression technique. In RSM-based optimization applications, the "quadratic" polynomial model usually provides the best compromise between the modeling accuracy and computational expense, when compared with the linear or higher order polynomial models. An advantage of RSM is that it can smooth out the various scales of numerical noise in the data while captures the global trend of the variation, which makes it very robust and thus well suited for optimization problems in engineering

> () () ˆ , *<sup>m</sup> y y* **x xx** = +∈ ε

observation is supposed to be independent and identically distributed. The quadratic RSM

11 1

totally *pm m* =+ + ( 1)( 2) / 2 unknown coefficients in Eq.(4), building a quadratic RSM with *m* variables requires at least *p* sample points. Let *<sup>p</sup>* **β**∈ be the column vector contains

T 1T ( ) *<sup>S</sup>*

<sup>2</sup> <sup>2</sup> (1) (1) (1) (1) (1) (1) (1) (1)

*x x xx x x x x*

 = ∈

*m mm m*

−

−

<sup>2</sup> <sup>2</sup> ( ) ( ) ()() () () ( ) ( )

After the unknown coefficients in **β** are determined, the approximated response *y*ˆ at any

*n n nn nn n n m mm m*

*x x xx x x x x*

U .

1 1 2 1 1

1 1 2 1 2

*i i i ji*

= = =≥

 β

*i i ii i ij i j*

*x x xx*

, (3)

σ

is the random error which is

ε

*n p*

×

(6)

at each

. The error *<sup>i</sup>*

ε

 β

*ij* are unknown coefficients to be determined. Since there are

<sup>−</sup> **β** = **UU Uy** , (5)

( ) ( )

( ) ( )

**<sup>x</sup>** =+ + + , (4)

The pair ( **S** , <sup>S</sup> **y** ) denotes the sampled data sets in the vector space.

accuracy with the least possible number of sample points.

The true quadratic RSM can be written in the following form:

where *y*ˆ ( ) **x** is the quadratic polynomial approximation and

*y*

β

assumed to be normally distributed with mean zero and variance of <sup>2</sup>

0

these *p* unknown coefficients. The least square estimator of **β** is

ββ

( ) <sup>2</sup>

<sup>ˆ</sup> *m m mm*

**2.2.1 Quadratic response surface method** 

predictor *y*ˆ ( ) **x** can be defined as:

1

1

untried **x** can be efficiently predicted by Eq. (4).

design.

where

where

β0 , β*i* , β*ii* and

the amount of information gained form a limited number of sample points (Giunta et al., 2001). Currently, there are different DoE methods which can be classified into two categories: "classic" DoE methods and "modern" DoE methods. The classic DoE methods, such as full-factorial design, central composite design (CCD), Box-Behnken and D-Optimal Design (DOD), were developed for the arrangement of laboratory experiments, with the consideration of reducing the effect of random error. In contrast, the modern DoE methods such as Latin Hypercube Sampling (LHS), Orthogonal Array Design (OAD) and Uniform Design (UD) (Fang et al., 2000) were developed for deterministic computer experiments without the random error as arises in laboratory experiments. An overview of the classic and modern DoE methods was presented by Giunta et al. (2001). A more detailed description of existing DoE methods is beyond the scope of this chapter.

The schematics of 40 sample points selected by LHS and UD for a two-dimensional problem are sketched in Figure 1.

Fig. 1. Schematics of 40 sample points selected by Design of Experiments for a twodimensional problem (left: Latin Hypercube Sampling; right: Uniform Design).

#### **2.2 Surrogate models**

There are a number of surrogate models available in the literatures. Here we limit our discussion to three popular techniques such as RSM (polynomial Response Surface Model), Kriging, RBFs (Radial Basis Functions).

For an *m*-dimensional problem, suppose we are concerned with the prediction of the output of a high-fidelity, thus expensive computer code, which is correspondent to an unknown function <sup>m</sup> *y* : → . By running the computer code, *<sup>y</sup>* is observed at *n* sites (determined by DoE)

$$\mathbf{S} = [\mathbf{x}^{(1)}, \dots, \mathbf{x}^{(n)}]^T \in \mathbb{R}^{n \times m}, \mathbf{x} = \{\mathbf{x}\_1, \dots, \mathbf{x}\_m\} \in \mathbb{R}^m \tag{1}$$

with the corresponding responses

$$\mathbf{y}\_S = [y^{(1)}, \dots, y^{(n)}]^\mathrm{T} = [y(\mathbf{x}^{(1)}), \dots, y(\mathbf{x}^{(n)})]^\mathrm{T} \in \mathbb{R}^n \tag{2}$$

the amount of information gained form a limited number of sample points (Giunta et al., 2001). Currently, there are different DoE methods which can be classified into two categories: "classic" DoE methods and "modern" DoE methods. The classic DoE methods, such as full-factorial design, central composite design (CCD), Box-Behnken and D-Optimal Design (DOD), were developed for the arrangement of laboratory experiments, with the consideration of reducing the effect of random error. In contrast, the modern DoE methods such as Latin Hypercube Sampling (LHS), Orthogonal Array Design (OAD) and Uniform Design (UD) (Fang et al., 2000) were developed for deterministic computer experiments without the random error as arises in laboratory experiments. An overview of the classic and modern DoE methods was presented by Giunta et al. (2001). A more detailed

The schematics of 40 sample points selected by LHS and UD for a two-dimensional problem

**V2**

Fig. 1. Schematics of 40 sample points selected by Design of Experiments for a twodimensional problem (left: Latin Hypercube Sampling; right: Uniform Design).

(1) ( ) T

There are a number of surrogate models available in the literatures. Here we limit our discussion to three popular techniques such as RSM (polynomial Response Surface Model),

For an *m*-dimensional problem, suppose we are concerned with the prediction of the output of a high-fidelity, thus expensive computer code, which is correspondent to an unknown

<sup>1</sup> [ ,..., ] , { ,.., } *<sup>n</sup> n m <sup>m</sup> x xm*

(1) ( ) T T (1) ( )

By running the computer code, *<sup>y</sup>* is observed at *n* sites (determined by

<sup>×</sup> **Sx x x** = ∈= ∈ (1)

<sup>S</sup> [ ,..., ] [ ( ),..., ( )] *n n <sup>n</sup>* **<sup>y</sup>** == ∈ *yy y y* **x x** (2)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

**V1**

<sup>0</sup> 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 <sup>1</sup> <sup>0</sup>

description of existing DoE methods is beyond the scope of this chapter.

**V1**

<sup>0</sup> 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 <sup>1</sup> <sup>0</sup>

are sketched in Figure 1.

**V2**

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

**2.2 Surrogate models** 

function <sup>m</sup> *y* : → .

DoE)

Kriging, RBFs (Radial Basis Functions).

with the corresponding responses

The pair ( **S** , <sup>S</sup> **y** ) denotes the sampled data sets in the vector space.

With the above descriptions and assumptions, our objective here is to build a surrogate model for predicting the output of the computer code for any untried site **x** (that is, to estimate *y*( ) **x** ) based on the sampled date sets ( **S** , <sup>S</sup> **y** ), in an attempt to achieve the desired accuracy with the least possible number of sample points.

#### **2.2.1 Quadratic response surface method**

Here we use "RSM" to denote a polynomial approximation model in which the sampled data is fitted by a least-square regression technique. In RSM-based optimization applications, the "quadratic" polynomial model usually provides the best compromise between the modeling accuracy and computational expense, when compared with the linear or higher order polynomial models. An advantage of RSM is that it can smooth out the various scales of numerical noise in the data while captures the global trend of the variation, which makes it very robust and thus well suited for optimization problems in engineering design.

The true quadratic RSM can be written in the following form:

$$
\hat{y}\left(\mathbf{x}\right) = \hat{y}\left(\mathbf{x}\right) + \varepsilon\_r \mathbf{x} \in \mathbb{R}^m \tag{3}
$$

where *y*ˆ ( ) **x** is the quadratic polynomial approximation and ε is the random error which is assumed to be normally distributed with mean zero and variance of <sup>2</sup> σ . The error *<sup>i</sup>* ε at each observation is supposed to be independent and identically distributed. The quadratic RSM predictor *y*ˆ ( ) **x** can be defined as:

$$\hat{\boldsymbol{\beta}}\left(\mathbf{x}\right) = \boldsymbol{\beta}\_0 + \sum\_{i=1}^{m} \boldsymbol{\beta}\_i \mathbf{x}\_i + \sum\_{i=1}^{m} \boldsymbol{\beta}\_{ii} \mathbf{x}\_i^2 + \sum\_{i=1}^{m} \sum\_{j>i}^{m} \boldsymbol{\beta}\_{ij} \mathbf{x}\_i \mathbf{x}\_j \tag{4}$$

where β0 , β*i* , β*ii* and β*ij* are unknown coefficients to be determined. Since there are totally *pm m* =+ + ( 1)( 2) / 2 unknown coefficients in Eq.(4), building a quadratic RSM with *m* variables requires at least *p* sample points. Let *<sup>p</sup>* **β**∈ be the column vector contains these *p* unknown coefficients. The least square estimator of **β** is

$$\mathfrak{B} = (\mathbf{U}^{\mathrm{T}} \mathbf{U})^{-1} \mathbf{U}^{\mathrm{T}} \mathbf{y}\_{\mathrm{S}} \,. \tag{5}$$

where

$$\mathbf{U} = \begin{bmatrix} \mathbf{1} & \mathbf{x}\_1^{(1)} & \cdots & \mathbf{x}\_m^{(1)} & \mathbf{x}\_1^{(1)}\mathbf{x}\_2^{(1)} & \cdots & \mathbf{x}\_{m-1}^{(1)}\mathbf{x}\_m^{(1)} & \left(\mathbf{x}\_1^{(1)}\right)^2 & \cdots & \left(\mathbf{x}\_m^{(1)}\right)^2\\ \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\ \mathbf{1} & \mathbf{x}\_1^{(n)} & \cdots & \mathbf{x}\_m^{(n)} & \mathbf{x}\_1^{(n)}\mathbf{x}\_2^{(n)} & \cdots & \mathbf{x}\_{m-1}^{(n)}\mathbf{x}\_m^{(n)} & \left(\mathbf{x}\_2^{(n)}\right)^2 & \cdots & \left(\mathbf{x}\_m^{(n)}\right)^2\\ \end{bmatrix} \in \mathbb{R}^{n \times p}. \tag{6}$$

After the unknown coefficients in **β** are determined, the approximated response *y*ˆ at any untried **x** can be efficiently predicted by Eq. (4).

Surrogate-Based Optimization 347

From the derivation by Sacks et al. (1989) the Kriging predictor *y*ˆ ( ) *x* for any untried **x** can

T 1 0 0 ˆ() () ( ) *<sup>S</sup> y*

> β<sup>0</sup> is

T 1 1T 1

*S*

and *<sup>n</sup>* **1**∈ is a vector filled with ones, and **R r**, are the correlation matrix and the

(1) (1) (1) (2) (1) ( ) (1) (2) (1) (2) (2) (2) ( ) (2)

<sup>=</sup> ∈= ∈

(,)(,) (,) ( ,) (,)(,) (,) ( ,) ,

**xx xx xx x x**

*n*

( ) (1) ( ) (2) () () ( )

*n n n n n*

(,)(,) (,) ( ,)

**xx xx xx x x**

where ( ) ( ) (, ) *<sup>i</sup> <sup>j</sup> R* **x x** denotes the correlation between any two observed points ( )*<sup>i</sup>* **x** and ( )*<sup>j</sup>* **x** ; ( ) ( ,) *<sup>i</sup> R* **x x** denotes the correlation between the *i*-th observed point ( )*<sup>i</sup>* **x** and the untried point

A unique feature of Kriging model is that it provides an uncertainty estimation (or MSE) for

2 2 T1 T1 2 T1 *s*ˆ ( ) [1.0 ( 1) / ]

Assuming that the sampled data are distributed according to a Gaussian process, the responses at sampling sites are considered to be correlated random functions with the

2 S0 S0

**y 1R y 1 <sup>θ</sup> <sup>p</sup>**

<sup>−</sup> − − = −

−

 β

<sup>−</sup> **x r xR y 1** =+ − , (10)

−− − = **1R 1 1R y** , (11)

*<sup>n</sup> n n <sup>n</sup>*

, (12)

. (14)

1 2 **θ** = [ , ,..., ] θθ

 θ*m*

×

−− − **x rR r rR 1 1R 1** =−+ − . (13)

β

T 1

σ

 β

**<sup>θ</sup> p y 1R y 1** (15)

 β

**θ R θ** ., (16)

β

β

<sup>0</sup> ( )

*RR R R RR R R*

**xx xx xx x x R r**

*RR R R*

the prediction, which is very useful for sample-points refinement. It is of the form

<sup>0</sup> <sup>2</sup> <sup>2</sup> 1 1 ( )( ) ( , ,,) exp <sup>2</sup> 2( )*<sup>n</sup> <sup>L</sup>*

<sup>0</sup> and the process variance

**θ p 1R 1 1R y**

*n*

T 1 1T 1 0 S 2 T 1

−− −

0 S0 S0

are obtained analytically, yet depends on the remaining hyper-parameters <sup>T</sup>

β

=− −

1 2 **p** = [ , ,..., ] *pp pm* . Substituting it into the associated Eq. (14) and taking the logarithm,

σ

<sup>1</sup> ( ,,) ( ) ( )

**R**

σ

π σ

(, ) ( )

=

β

β

σ β

. MLE( , ) ln ( ) ln ( ) <sup>2</sup> **θ p** =− − *n*

corresponding likelihood function given by

β σ

The optimal estimation of

and <sup>T</sup>

we are left with maximizing

where the generalized least square estimation of

correlation vector, respectively. and **R r** are defined as

be written as

 

**x** .

#### **2.2.2 Kriging model**

Different from RSM, Kriging (Krige, 1951) is an interpolating method which features the observed data at all sample points. Kriging provides a statistic prediction of an unknown function by minimizing its Mean Squared Error (MSE). It can be equivalent to any order of polynomials and is thus well suited for a highly-nonlinear function with multi extremes. For the derivation of Kriging (Sacks et al., 1989), the output of a deterministic computer experiment is treated as a realization of a random function (or stochastic process), which is defined as the sum of a global trend function <sup>T</sup> **f x**( )**β** and a Gaussian random function *Z*( ) **x** as following

$$y(\mathbf{x}) = \mathbf{f}^{\mathrm{T}}(\mathbf{x})\mathbf{f} + Z(\mathbf{x}), \mathbf{x} \in \mathbb{R}^{m},\tag{7}$$

where <sup>T</sup> 0 1 ( ) [ ( ),.., ( )] *<sup>p</sup> <sup>p</sup> f f* = ∈ <sup>−</sup> **fx x x** is defined with a set of the regression basis functions and <sup>T</sup> 0 1 [ ,.., ] *<sup>p</sup>* **<sup>β</sup>** = ∈ β β *<sup>p</sup>*<sup>−</sup> denotes the vector of the corresponding coefficients. In general, <sup>T</sup> **f x**( )**β** is taken as either a constant or low-order polynomials. Practice suggests that the constant trend function is sufficient for most of the problems. Thus, <sup>T</sup> **f x**( )**β** is taken as a constant β<sup>0</sup> in the text hereafter. In Eq.(7), *Z*( )⋅ denotes a stationary random process with zero mean, variance <sup>2</sup> σand nonzero covariance of

$$\text{Cov}[Z(\mathbf{x}), Z(\mathbf{x'})] = \sigma^2 R(\mathbf{x}, \mathbf{x'}) \,. \tag{8}$$

Here *R*(, ) **x x**′ is the correlation function which is only dependent on the Euclidean distance between any two sites **x** and **x**′ in the design space. In this study, a Gaussian exponential correlation function is adopted, and it is of the form

$$R(\mathbf{x}, \mathbf{x}') = \exp\{-\sum\_{k=1}^{m} \theta\_k \, | \, \mathbf{x}\_k - \mathbf{x}'\_k \, | \, ^{p\_k}\}, 1 < p\_k \le 2 \,\tag{9}$$

where <sup>T</sup> 1 2 **θ** = [ , ,..., ] θθ θ *<sup>m</sup>* and <sup>T</sup> 1 2 **p** = [ , ,..., ] *pp pm* denote the vectors of the unknown model parameters (hyper parameters) to be tuned. The schematics of a Gaussian exponential correlation function for one-dimensional problem is sketched in Figure 2.

Fig. 2. Schematics of Gaussian exponential correlation function for different hyper parameters (left: varying θ with a fixed *p* ; right: varying *p* with a fixed θ)

Different from RSM, Kriging (Krige, 1951) is an interpolating method which features the observed data at all sample points. Kriging provides a statistic prediction of an unknown function by minimizing its Mean Squared Error (MSE). It can be equivalent to any order of polynomials and is thus well suited for a highly-nonlinear function with multi extremes. For the derivation of Kriging (Sacks et al., 1989), the output of a deterministic computer experiment is treated as a realization of a random function (or stochastic process), which is defined as the sum

<sup>T</sup> ( ) ( ) ( ), *<sup>m</sup> y Z* **x fx** =+∈ **β x x** , (7)

<sup>T</sup> **f x**( )**β** is taken as either a constant or low-order polynomials. Practice suggests that the constant trend function is sufficient for most of the problems. Thus, <sup>T</sup> **f x**( )**β** is taken as a

> <sup>2</sup> *Cov Z Z R* [ ( ), ( )] ( , ) **x x xx** ′ ′ = σ

Here *R*(, ) **x x**′ is the correlation function which is only dependent on the Euclidean distance between any two sites **x** and **x**′ in the design space. In this study, a Gaussian exponential

> ( , ) exp[ | | ] ,1 2 *<sup>k</sup> <sup>m</sup> <sup>p</sup>*

parameters (hyper parameters) to be tuned. The schematics of a Gaussian exponential

**Correlation function**

0

with a fixed *p* ; right: varying *p* with a fixed

0.2

0.4

0.6

0.8

1

1.2

*kk k k*

**x x**′ ′ = − − <≤ , (9)

**<sup>x</sup><sup>i</sup>**

**- x**

**p = 2.0**

θ) **p = 1.0 p = 0.5**


θ **= 1.0**

1 2 **p** = [ , ,..., ] *pp pm* denote the vectors of the unknown model

*xx p*

1

=

θ

*k*

*<sup>m</sup>* and <sup>T</sup>

correlation function for one-dimensional problem is sketched in Figure 2.

θ **= 0.1**

Fig. 2. Schematics of Gaussian exponential correlation function for different hyper

*<sup>p</sup> f f* = ∈ <sup>−</sup> **fx x x** is defined with a set of the regression basis functions

*<sup>p</sup>*<sup>−</sup> denotes the vector of the corresponding coefficients. In general,

<sup>0</sup> in the text hereafter. In Eq.(7), *Z*( )⋅ denotes a stationary random process with

. (8)

of a global trend function <sup>T</sup> **f x**( )**β** and a Gaussian random function *Z*( ) **x** as following

and nonzero covariance of

**2.2.2 Kriging model** 

where <sup>T</sup>

0 1 [ ,.., ] *<sup>p</sup>* **<sup>β</sup>** = ∈

and <sup>T</sup>

β β

β

zero mean, variance <sup>2</sup>

where <sup>T</sup> 1 2 **θ** = [ , ,..., ] θθ

> **Correlation function**

> > 0

parameters (left: varying

0.2

0.4

0.6

0.8

1

1.2

constant

0 1 ( ) [ ( ),.., ( )] *<sup>p</sup>*

σ

correlation function is adopted, and it is of the form

*R*

**p = 2.0**

**xi - x**

θ


θ **= 10.0**

θ **= 1.0**

 θ From the derivation by Sacks et al. (1989) the Kriging predictor *y*ˆ ( ) *x* for any untried **x** can be written as

$$
\hat{y}(\mathbf{x}) = \beta\_0 + \mathbf{r}^\mathrm{T}(\mathbf{x})\mathbf{R}^{-1}(\mathbf{y}\_S - \beta\_0 \mathbf{1})\,\mathrm{.}\tag{10}
$$

where the generalized least square estimation of β<sup>0</sup> is

$$\boldsymbol{\beta}\_{0} = (\mathbf{1}^{\mathrm{T}} \mathbf{R}^{-1} \mathbf{1})^{-1} \mathbf{1}^{\mathrm{T}} \mathbf{R}^{-1} \mathbf{y}\_{S'} \tag{11}$$

and *<sup>n</sup>* **1**∈ is a vector filled with ones, and **R r**, are the correlation matrix and the correlation vector, respectively. and **R r** are defined as

$$\mathbf{R} = \begin{bmatrix} R(\mathbf{x}^{(1)}, \mathbf{x}^{(1)}) & R(\mathbf{x}^{(1)}, \mathbf{x}^{(2)}) & \cdots & R(\mathbf{x}^{(1)}, \mathbf{x}^{(n)}) \\ R(\mathbf{x}^{(2)}, \mathbf{x}^{(1)}) & R(\mathbf{x}^{(2)}, \mathbf{x}^{(2)}) & \cdots & R(\mathbf{x}^{(2)}, \mathbf{x}^{(n)}) \\ \vdots & \vdots & \ddots & \vdots \\ R(\mathbf{x}^{(v)}, \mathbf{x}^{(1)}) & R(\mathbf{x}^{(v)}, \mathbf{x}^{(2)}) & \cdots & R(\mathbf{x}^{(n)}, \mathbf{x}^{(n)}) \end{bmatrix} \in \mathbb{R}^{n \times n}, \mathbf{r} = \begin{bmatrix} R(\mathbf{x}^{(1)}, \mathbf{x}) \\ R(\mathbf{x}^{(2)}, \mathbf{x}) \\ \vdots \\ R(\mathbf{x}^{(n)}, \mathbf{x}) \end{bmatrix} \in \mathbb{R}^{n} \tag{12}$$

where ( ) ( ) (, ) *<sup>i</sup> <sup>j</sup> R* **x x** denotes the correlation between any two observed points ( )*<sup>i</sup>* **x** and ( )*<sup>j</sup>* **x** ; ( ) ( ,) *<sup>i</sup> R* **x x** denotes the correlation between the *i*-th observed point ( )*<sup>i</sup>* **x** and the untried point **x** .

A unique feature of Kriging model is that it provides an uncertainty estimation (or MSE) for the prediction, which is very useful for sample-points refinement. It is of the form

$$\hat{s}^2(\mathbf{x}) = \sigma^2 \left[ 1.0 - \mathbf{r}^T \mathbf{R}^{-1} \mathbf{r} + (\mathbf{r}^T \mathbf{R}^{-1} \mathbf{1} - \mathbf{1})^2 / \mathbf{1}^T \mathbf{R}^{-1} \mathbf{1} \right]. \tag{13}$$

Assuming that the sampled data are distributed according to a Gaussian process, the responses at sampling sites are considered to be correlated random functions with the corresponding likelihood function given by

$$L(\boldsymbol{\beta}\_{0}, \sigma^{2}, \boldsymbol{\theta}, \mathbf{p}) = \frac{1}{\sqrt{2\pi (\sigma^{2})^{v} |\mathbf{R}|}} \exp\left(-\frac{1}{2} \frac{(\mathbf{y}\_{\text{S}} - \boldsymbol{\beta}\_{0} \mathbf{1})^{\mathrm{T}} \mathbf{R}^{-1} (\mathbf{y}\_{\text{S}} - \boldsymbol{\beta}\_{0} \mathbf{1})}{\sigma^{2}}\right). \tag{14}$$

The optimal estimation of β<sup>0</sup> and the process variance

$$\begin{aligned} \beta\_0(\boldsymbol{\theta}, \mathbf{p}) &= (\mathbf{1}^\mathrm{T} \mathbf{R}^{-1} \mathbf{1})^{-1} \mathbf{1}^\mathrm{T} \mathbf{R}^{-1} \mathbf{y}\_S \\ \sigma^2(\beta\_0, \boldsymbol{\theta}, \mathbf{p}) &= \frac{1}{n} (\mathbf{y}\_S - \beta\_0 \mathbf{1})^\mathrm{T} \mathbf{R}^{-1} (\mathbf{y}\_S - \beta\_0 \mathbf{1}) \end{aligned} \tag{15}$$

are obtained analytically, yet depends on the remaining hyper-parameters <sup>T</sup> 1 2 **θ** = [ , ,..., ] θθ θ *m* and <sup>T</sup> 1 2 **p** = [ , ,..., ] *pp pm* . Substituting it into the associated Eq. (14) and taking the logarithm, we are left with maximizing

$$\text{MLE}(\boldsymbol{\theta}, \mathbf{p}) = -n \ln \sigma^2(\boldsymbol{\theta}) - \ln \left| \mathbf{R}(\boldsymbol{\theta}) \right| \,, \tag{16}$$

Surrogate-Based Optimization 349

To build a RBFs model, one needs to prescribe the type of basis functions that only depends on the Euclidean distance *r* = − **x x**′ between any two sites **x** and **x**′ . Compared to the correlation function used for a Kriging model, more choices are available for a RBFs model,

φ

φ

φ

φ

φ

All the basis functions listed in Table 1 can be classified into two categories: decaying functions (such as GAUSS and HIMQ) and growing functions (POW, TPS and HMQ). The decaying functions can yield positive definite matrix **Ψ** , which allows for the use of Cholesky decomposition for its inversion; the growing functions generally result in a nonpositive definite matrix **Ψ** and thus LU decomposition is usually used alternatively. The schematics of the basis functions for one-dimensional problem is sketched in Figure 3.

**IHMQ**

**Basis functions**

0

2

4

6

8

**GAUSS**

Fig. 3. Schematics of basis functions for Radial Basis Functions (left: decaying functions;

In addition to what we mentioned above, there are also a few surrogate models available in the literatures, such as Artificial Neutral Network (ANN) (Elanayar et al. 1994, Park et al. 1991), Multiple Adaptive Regression Splines (MARS) and Support Vector Regression (SVR)

σ

 β= ≤≤ (e.g.

σ=1 )

> β=1.8 )

**xi - x**


**HMQ POW**

**TPS**

<sup>−</sup> = (e.g. <sup>2</sup>

β

( ) ln( ) *rr r* =

() 1 *r r* = +

( ) 1/ 1 *r r* = +

which are partially listed in Table 1.

Basis functions Formulations Gaussian (GAUSS) 2 2 / 2 ( ) *<sup>r</sup> r e*

Thin Plate Spline (TPS) <sup>2</sup>

Table 1. Basis functions for RBFs surrogate model

**xi - x**


**Basis functions**

0

right: growing functions).

**2.2.4 The big (ger) picture** 

0.2

0.4

0.6

0.8

1

Hardy's Multiquadric (HMQ) <sup>2</sup>

Hardy's Inverse Multiquadric (HIMQ) <sup>2</sup>

Power function (POW) ( ) ,1 3 *r r*

which can be solved by a numerical optimization algorithm such as GA. The hyperparamters tuning strategies are discussed by Toal et al. (2008). Note that the above Kriging formulation can be extended by including gradient information obtained by Adjoint method (Han et al. 2009, Laurenceau et al. 2008) or lower-fidelity data by lower-fidelity analysis code (Han et al. 2010, Forrester et al. 2007).

#### **2.2.3 Radial basis functions**

In additional to Kriging, RBFs model (Hardy, 1971) is known as an alternative interpolation method for surrogate modeling. For the RBFs approach by Powell (1987), the approximation of the unknown function *y*( ) **x** at an untried **x** is formally defined as the linear combination of the radial basis functions and a global trend function as

1 ˆ() () () *n i i y P* ω ϕ = **x xx** = + , (17)

where ω*<sup>i</sup>* are the *i*-th unknown weight coefficient, ( ) () ( ) *<sup>i</sup>* ϕ ϕ **x xx** = − are the basis functions that depend on the Euclidean distance between the observed point ( )*<sup>i</sup>* **x** and the untried point **x** (similar to the correlation function of kriging model); *P*( ) **x** is the global trend function which is taken as a constant β0 here. To ensure the function values at observed points are reproduced by the RBFs predictor, the flowing constraints should be satisfied:

$$
\hat{y}(\mathbf{x}^{(i)}) = y^{(i)}\,\mathrm{i} = \mathbf{1}\,,\ldots\prime\prime\,\mathrm{n}\tag{18}
$$

Then the additional constraints for *P*( ) **x** should be imposed as

$$\sum\_{i=0}^{n} \alpha \mathfrak{q} = 0 \quad . \tag{19}$$

Solving the linear equations formed by Eq. (18) and Eq. (19) for ω*i* and β<sup>0</sup> , and substituting into Eq.(17) yields the RBFs predictor as

$$
\hat{y}(\mathbf{x}) = \beta\_0 + \boldsymbol{\upupmu}^{\mathrm{T}}(\mathbf{x})\boldsymbol{\upupmu}^{\mathrm{T}-1}(\mathbf{y}\_S - \beta\_0 \mathbf{1})\,. \tag{20}
$$

Where T 1 1T 1 0 S β() ( ) −− − **θ** = **1 Ψ 1 1 Ψ y** and **Ψ** , **φ**( ) **x** are defined as

$$\mathbf{AP} \coloneqq [\varphi(\left\|\mathbf{x}^{(i)} - \mathbf{x}^{(j)}\right\|)]\_{\boldsymbol{\upbeta}} \in \mathbb{R}^{n \times n}, \; \mathbf{q}(\mathbf{x}) \coloneqq [\varphi(\left\|\mathbf{x}^{(i)} - \mathbf{x}\right\|)]\_{i} \in \mathbb{R}^{n} \,. \tag{21}$$

When the above RBFs predictor is compared with the Kriging predictor (see Eq. (10)), one can observe that they are essentially similar, only with the basis-function matrix **Ψ** (also called Gram matrix) and the basis function vector **φ**( ) **x** being different from the correlation matrix **R** and the correlation vector **r x**( ) of the Kriging predictor, respectively. In addition, RBFs differs from Kriging at the following two aspects: 1) RBFs doesn't provide the uncertainty estimation of the prediction; 2) The model parameters can't be tuned by MLE like Kriging. Generally, Kriging can be regarded as a particular form of RBFs.

which can be solved by a numerical optimization algorithm such as GA. The hyperparamters tuning strategies are discussed by Toal et al. (2008). Note that the above Kriging formulation can be extended by including gradient information obtained by Adjoint method (Han et al. 2009, Laurenceau et al. 2008) or lower-fidelity data by lower-fidelity

In additional to Kriging, RBFs model (Hardy, 1971) is known as an alternative interpolation method for surrogate modeling. For the RBFs approach by Powell (1987), the approximation of the unknown function *y*( ) **x** at an untried **x** is formally defined as the linear combination

> 1 ˆ() () () *n i i y P* ω ϕ

points are reproduced by the RBFs predictor, the flowing constraints should be satisfied:

0

=

β

like Kriging. Generally, Kriging can be regarded as a particular form of RBFs.

() ( ) −− − **θ** = **1 Ψ 1 1 Ψ y** and **Ψ** , **φ**( ) **x** are defined as

*n i i* ω

0

T 1 <sup>0</sup> S 0 *y*ˆ() () ( )

( ) ( ) ( ) : [ ( )] , ( ) : [ ( )] *i i <sup>j</sup> n n <sup>n</sup>*

When the above RBFs predictor is compared with the Kriging predictor (see Eq. (10)), one can observe that they are essentially similar, only with the basis-function matrix **Ψ** (also called Gram matrix) and the basis function vector **φ**( ) **x** being different from the correlation matrix **R** and the correlation vector **r x**( ) of the Kriging predictor, respectively. In addition, RBFs differs from Kriging at the following two aspects: 1) RBFs doesn't provide the uncertainty estimation of the prediction; 2) The model parameters can't be tuned by MLE

 β

 ϕ *ij i* <sup>×</sup> **Ψ** = − ∈ = −∈ **x x φ x xx** . (21)

that depend on the Euclidean distance between the observed point ( )*<sup>i</sup>* **x** and the untried point **x** (similar to the correlation function of kriging model); *P*( ) **x** is the global trend

ϕ

 ϕ

**x xx** = + , (17)

0 here. To ensure the function values at observed

<sup>=</sup> . (19)

<sup>0</sup> , and substituting

ω*i* and β

<sup>−</sup> **x** =+ − **φ x Ψ y 1** . (20)

() () ˆ( ) , 1,.., *i i <sup>y</sup>* **<sup>x</sup>** = = *yi n* . (18)

**x xx** = − are the basis functions

=

*<sup>i</sup>* are the *i*-th unknown weight coefficient, ( ) () ( ) *<sup>i</sup>*

β

analysis code (Han et al. 2010, Forrester et al. 2007).

of the radial basis functions and a global trend function as

Then the additional constraints for *P*( ) **x** should be imposed as

Solving the linear equations formed by Eq. (18) and Eq. (19) for

**2.2.3 Radial basis functions** 

function which is taken as a constant

into Eq.(17) yields the RBFs predictor as

0 S

ϕ

Where T 1 1T 1

β

where

ω

To build a RBFs model, one needs to prescribe the type of basis functions that only depends on the Euclidean distance *r* = − **x x**′ between any two sites **x** and **x**′ . Compared to the correlation function used for a Kriging model, more choices are available for a RBFs model, which are partially listed in Table 1.


Table 1. Basis functions for RBFs surrogate model

All the basis functions listed in Table 1 can be classified into two categories: decaying functions (such as GAUSS and HIMQ) and growing functions (POW, TPS and HMQ). The decaying functions can yield positive definite matrix **Ψ** , which allows for the use of Cholesky decomposition for its inversion; the growing functions generally result in a nonpositive definite matrix **Ψ** and thus LU decomposition is usually used alternatively. The schematics of the basis functions for one-dimensional problem is sketched in Figure 3.

Fig. 3. Schematics of basis functions for Radial Basis Functions (left: decaying functions; right: growing functions).

#### **2.2.4 The big (ger) picture**

In addition to what we mentioned above, there are also a few surrogate models available in the literatures, such as Artificial Neutral Network (ANN) (Elanayar et al. 1994, Park et al. 1991), Multiple Adaptive Regression Splines (MARS) and Support Vector Regression (SVR)

Surrogate-Based Optimization 351

DoE

Design space

Run analysis code

Sampled database

Build surrogate models

Evaluate surrogate models

Surrogate models

The basic idea of using surrogate models in optimization can be quite simple. First, the surrogate models for the object function(s) and state function(s) with sufficient accuracy are built (see Figure 2); second, the optimum is found by an optimizer, with the object function(s) or state function(s) evaluated by surrogate models, rather than by the expensive analysis code. Since prediction with the surrogate models is much more efficient than that by the expensive analysis code, the optimization efficiency can be greatly improved. The comparison of the conventional optimization and surrogate-based optimization is sketched in Figure 5. In addition to improve optimization efficiency, surrogate models also serve as an interface between the analysis code and the optimizer, which makes the establishment of an optimization procedure much easier. One of the examples for such a surrogate-based

Although the framework of the surrogate-based optimization sketched in Figure 5. (b) is very intuitive and simple, questions may arise: are the surrogate models accurate enough? has the true optimum been reached? In fact, the optimum gained by the surrogate models is only an approximation to the true optimum (see Figure 5. (a)). One has to refine the surrogate models by adding new sample points, which is to be observed by running the analysis code. The procedure of augmenting new sample point(s) to the current sampled data sets is the so-called "sample-point refinement". The rules of determining the new sample sites towards the true optimum are called "infill criteria", which will be discussed in section 3.2. The flowchart of a surrogate-based optimization with additional process of

Fig. 4. Frameworks of building surrogate models

**3.1 Framework of surrogate-based optimization** 

optimization framework can be found in paper by Han et al. (2010).

**3.1.1 A simple framework** 

**3.1.2 A bi-level framework** 

(Smola & Schoelkopf 2004). Although these methods are coming from different research communities, the idea is similar when using them for function prediction in surrogate modeling. They are not described in detail here due to the limited space. The readers are referred to read the paper by Wang & Shan (2007) and the books written by Keane et al. (2005) and by Forrester et al. (2008) for more description about surrogate modeling techniques.

#### **2.3 Evaluation of approximation models**

An important issue for the surrogate modeling is how to estimate the error of the approximation models. Only when the surrogate model with sufficient accuracy is built can the reliable optimum be obtained. Here we use two variables ( *e* andσ *<sup>e</sup>* ) to evaluate the error of the approximation models at test points, which are also chosen by DoE method. The average relative error is

$$\overline{e} = \frac{1}{n\_t} \sum\_{i=1}^{n\_t} e^{(i)}, \quad e^{(i)} = \left\| \frac{\hat{y}\_t^{(i)} - y\_t^{(i)}}{y\_t^{(i)}} \right\|, \tag{22}$$

where *nt* is number of the test points; ( )*<sup>i</sup> <sup>t</sup> <sup>y</sup>* and ( ) <sup>ˆ</sup> *<sup>i</sup> <sup>t</sup> y* are the true value and predicted value corresponding to the *i*-th test point, respectively. The root mean squared error is defined by

$$
\sigma\_c = \sqrt{\sum\_{i=1}^{n\_i} (e^{(i)})^2 \Big/ n\_i} \,\, . \tag{23}
$$

#### **2.4 Framework of building surrogate models**

A Generic framework of building a surrogate model is sketched in Figure 4. Note that the initial surrogate model can be evaluated by Eqs. (22) and (23) and a branch for resampling is denoted by black dashed line in Figure 4.

### **3. Use of surrogate models for optimization**

Suppose we are concerned with solving a general optimization problem as

$$\begin{array}{ll}\text{Objective} & \text{minimize } \mathbf{y}(\mathbf{x})\\ \text{s.t.} & \mathbf{g}\_i(\mathbf{x}) \le 0, \; i = 1, \ldots, n\_c \\ & \mathbf{x}\_l \le \mathbf{x} \le \mathbf{x}\_u \end{array} \tag{24}$$

where *nc* is the number of state functions which is in line with the number of inequality constraints (assuming that all the equality constraints have been transformed into inequality constraints.); *<sup>l</sup>* **x** and *<sup>u</sup>* **x** are the lower and upper bound of design variables, respectively; the object function y( ) **x** and state functions g ( ) *<sup>i</sup>* **x** are evaluated by an expensive analysis code. Traditionally, the optimization problem is solved by either a gradient-based algorithm or a gradient-free algorithm such as GA. It may become prohibitive due to the large computational cost associated running the expensive analysis code. Alternatively, here we are concerned with using surrogate modeling techniques to solve the optimization problem, in an attempt to dramatically improve the efficiency.

(Smola & Schoelkopf 2004). Although these methods are coming from different research communities, the idea is similar when using them for function prediction in surrogate modeling. They are not described in detail here due to the limited space. The readers are referred to read the paper by Wang & Shan (2007) and the books written by Keane et al. (2005) and by Forrester et al. (2008) for more description about surrogate modeling techniques.

An important issue for the surrogate modeling is how to estimate the error of the approximation models. Only when the surrogate model with sufficient accuracy is built can

error of the approximation models at test points, which are also chosen by DoE method. The

<sup>1</sup> <sup>ˆ</sup> , *nt i i i i t t <sup>i</sup> <sup>t</sup> <sup>i</sup> <sup>t</sup> <sup>y</sup> <sup>y</sup> e ee n* <sup>=</sup> *y*

*<sup>t</sup> <sup>y</sup>* and ( ) <sup>ˆ</sup> *<sup>i</sup>*

corresponding to the *i*-th test point, respectively. The root mean squared error is defined by

1 ( )

A Generic framework of building a surrogate model is sketched in Figure 4. Note that the initial surrogate model can be evaluated by Eqs. (22) and (23) and a branch for resampling is

> Objective minimize y( ) s.t. g ( ) 0, 1,...

**x x xx**

*l u*

≤ ≤

where *nc* is the number of state functions which is in line with the number of inequality constraints (assuming that all the equality constraints have been transformed into inequality constraints.); *<sup>l</sup>* **x** and *<sup>u</sup>* **x** are the lower and upper bound of design variables, respectively; the object function y( ) **x** and state functions g ( ) *<sup>i</sup>* **x** are evaluated by an expensive analysis code. Traditionally, the optimization problem is solved by either a gradient-based algorithm or a gradient-free algorithm such as GA. It may become prohibitive due to the large computational cost associated running the expensive analysis code. Alternatively, here we are concerned with using surrogate modeling techniques to solve the optimization problem,

*i c*

≤ =*i n*

**x**

 *e n* =

*t n*

() () () () ( ) <sup>1</sup>

() 2

*i e t i*

<sup>−</sup> = = , (22)

<sup>=</sup> . (23)

σ

*<sup>t</sup> y* are the true value and predicted value

, (24)

*<sup>e</sup>* ) to evaluate the

the reliable optimum be obtained. Here we use two variables ( *e* and

σ

Suppose we are concerned with solving a general optimization problem as

**2.3 Evaluation of approximation models** 

where *nt* is number of the test points; ( )*<sup>i</sup>*

**2.4 Framework of building surrogate models** 

**3. Use of surrogate models for optimization** 

in an attempt to dramatically improve the efficiency.

denoted by black dashed line in Figure 4.

average relative error is

Fig. 4. Frameworks of building surrogate models

### **3.1 Framework of surrogate-based optimization**

### **3.1.1 A simple framework**

The basic idea of using surrogate models in optimization can be quite simple. First, the surrogate models for the object function(s) and state function(s) with sufficient accuracy are built (see Figure 2); second, the optimum is found by an optimizer, with the object function(s) or state function(s) evaluated by surrogate models, rather than by the expensive analysis code. Since prediction with the surrogate models is much more efficient than that by the expensive analysis code, the optimization efficiency can be greatly improved. The comparison of the conventional optimization and surrogate-based optimization is sketched in Figure 5. In addition to improve optimization efficiency, surrogate models also serve as an interface between the analysis code and the optimizer, which makes the establishment of an optimization procedure much easier. One of the examples for such a surrogate-based optimization framework can be found in paper by Han et al. (2010).

### **3.1.2 A bi-level framework**

Although the framework of the surrogate-based optimization sketched in Figure 5. (b) is very intuitive and simple, questions may arise: are the surrogate models accurate enough? has the true optimum been reached? In fact, the optimum gained by the surrogate models is only an approximation to the true optimum (see Figure 5. (a)). One has to refine the surrogate models by adding new sample points, which is to be observed by running the analysis code. The procedure of augmenting new sample point(s) to the current sampled data sets is the so-called "sample-point refinement". The rules of determining the new sample sites towards the true optimum are called "infill criteria", which will be discussed in section 3.2. The flowchart of a surrogate-based optimization with additional process of

Surrogate-Based Optimization 353

The infill criterion is related to the determination of the new sample sites by solving suboptimization problem(s). Three infill criteria are discussed here: Searching Surrogate Model(s) (SSM), Expected Improvement (EI) and Lower-Bounding Confidence (LCB).

Provided that the initial surrogate models have been built, an optimizer such as GA can be used to find the optimum, which in turn can be employed to refine the surrogate models.

The mathematical model of the sub-optimization for determining the new sample site is of

Objective minimize yˆ( ) s.t. gˆ ( ) 0, 1,

**x x xx**

≤ ≤

where yˆ( ) **x** and gˆ ( ) *<sup>i</sup>* **x** are surrogate models of y( ) **x** and g ( ) *<sup>i</sup>* **x** , respectively. With the optimal design variables opt **x**ˆ gained by the surrogate models in hand, one needs to run the expensive analysis code to compute the corresponding true function value and compare it with what predicted by the surrogate models. If the error between them is blow a threshold, the optimization process can be terminated; if not, the new sample point is augmented to the sampled data sets and the surrogate models are rebuilt; the process is repeated until the

This criterion applies for all the surrogate models and is very efficient for local exploitation

Surrogate model such as Kriging provides not only an function prediction but also an estimation of the mean squared error (MSE). In fact, the prediction by a Kriging model, *y*ˆ( ) **x** , at any point can be regarded as a Gaussian random variable with the mean given by

2.2.2). Viewed in this way, a probability can be computed that the function value at any untried **x** would fall below the minimum among the sample points observed so far. Then Expected Improvement (EI) function (Jones et al 1998, Jeong et al. 2005) can be calculated to account for the improvement of the object function we expect to achieve at any untried **x** .

min min

*yy yy y y s s*

ˆ ˆ ( ) ( ) ( ( )) ˆˆ ˆ +( ) if > 0

φ

**x x x** (26)

**x x**

( ) are the cumulative distribution function and probability density

0 if = 0 ˆ

the Kriging predictor, and the variance given by the mean squared error, ( ) <sup>2</sup>

[ ( )] ˆ ˆ ( ) ( )

− − − Φ <sup>=</sup>

function of a standard normal distribution, respectively. (1) (2) ( )

**x x**

*E I s s*

*i c l u*

≤ =*i n*

**x**

, (25)

*s* **x** (see section

*s* 

min ( , ,..., ) *<sup>n</sup> y Min y y y* <sup>=</sup>

**3.2 Infill criteria** 

the form

**3.2.1 Searching surrogate model(s)** 

optimum solution is approached.

**3.2.2 Expected improvement** 

The definition of EI is of the form

where Φ( ) and

min

φ

of the promising region in the design space.

sample-point refinement is sketched in Figure 6. It can be regarded as a bi-level optimization framework, with the process of building and refining the surrogate models (which needs to run the expensive analysis code) acting as the main optimization and the process of using surrogate models to determine the new sample sites acting as the sub-optimization(s).

Fig. 5. Comparison of frameworks for conventional optimization and surrogate-based optimization with a simple framework

Fig. 6. Flowchart of the surrogate-based optimization with a bi-level framework (main optimization: building and refining the surrogate models which needs to run the expensive analysis code; sub-optimization(s): using surrogate models to determine new sample sites)

#### **3.2 Infill criteria**

352 Real-World Applications of Genetic Algorithms

sample-point refinement is sketched in Figure 6. It can be regarded as a bi-level optimization framework, with the process of building and refining the surrogate models (which needs to run the expensive analysis code) acting as the main optimization and the process of using surrogate models to determine the new sample sites acting as the sub-optimization(s).

New design(s)

(a) Conventional optimization

New design(s)

(b) Surrogate-based optimization

Baseline Analysis code Optimizer Converge?

Baseline Surrogate models Optimizer Converge?

Fig. 5. Comparison of frameworks for conventional optimization and surrogate-based

design space

DoE

simulate sample

sampled data set

distributed computing Optimum

Approxi. Optimum

suboptimizations no

surrogate models

**Optimizer e.g. GA**

**Converge ?**

yes

yes

Fig. 6. Flowchart of the surrogate-based optimization with a bi-level framework (main optimization: building and refining the surrogate models which needs to run the expensive analysis code; sub-optimization(s): using surrogate models to determine new sample sites)

no

**Converge ?**

Optimum design

new samples

optimization with a simple framework

Main optimization The infill criterion is related to the determination of the new sample sites by solving suboptimization problem(s). Three infill criteria are discussed here: Searching Surrogate Model(s) (SSM), Expected Improvement (EI) and Lower-Bounding Confidence (LCB).

#### **3.2.1 Searching surrogate model(s)**

Provided that the initial surrogate models have been built, an optimizer such as GA can be used to find the optimum, which in turn can be employed to refine the surrogate models.

The mathematical model of the sub-optimization for determining the new sample site is of the form

$$\begin{aligned} \text{Objective} & \quad \text{minimize } \hat{\mathbf{y}}(\mathbf{x})\\ \text{s.t.} & \quad \hat{\mathbf{g}}\_i(\mathbf{x}) \le 0, \, i = 1, n\_c \quad , \\ & \mathbf{x}\_l \le \mathbf{x} \le \mathbf{x}\_u \end{aligned} \tag{25}$$

where yˆ( ) **x** and gˆ ( ) *<sup>i</sup>* **x** are surrogate models of y( ) **x** and g ( ) *<sup>i</sup>* **x** , respectively. With the optimal design variables opt **x**ˆ gained by the surrogate models in hand, one needs to run the expensive analysis code to compute the corresponding true function value and compare it with what predicted by the surrogate models. If the error between them is blow a threshold, the optimization process can be terminated; if not, the new sample point is augmented to the sampled data sets and the surrogate models are rebuilt; the process is repeated until the optimum solution is approached.

This criterion applies for all the surrogate models and is very efficient for local exploitation of the promising region in the design space.

#### **3.2.2 Expected improvement**

Surrogate model such as Kriging provides not only an function prediction but also an estimation of the mean squared error (MSE). In fact, the prediction by a Kriging model, *y*ˆ( ) **x** , at any point can be regarded as a Gaussian random variable with the mean given by

the Kriging predictor, and the variance given by the mean squared error, ( ) <sup>2</sup> *s* **x** (see section

2.2.2). Viewed in this way, a probability can be computed that the function value at any untried **x** would fall below the minimum among the sample points observed so far. Then Expected Improvement (EI) function (Jones et al 1998, Jeong et al. 2005) can be calculated to account for the improvement of the object function we expect to achieve at any untried **x** . The definition of EI is of the form

$$E[I(\mathbf{x})] = \begin{cases} (y\_{\min} - \hat{y}(\mathbf{x})) \Phi\left(\frac{y\_{\min} - \hat{y}(\mathbf{x})}{\hat{s}(\mathbf{x})}\right) + \hat{s}(\mathbf{x}) \phi\left(\frac{y\_{\min} - \hat{y}(\mathbf{x})}{\hat{s}(\mathbf{x})}\right) & \text{if } \hat{s} \ge 0\\ 0 & \text{if } \hat{s} = 0 \end{cases} \tag{26}$$

where Φ( ) and φ( ) are the cumulative distribution function and probability density function of a standard normal distribution, respectively. (1) (2) ( ) min ( , ,..., ) *<sup>n</sup> y Min y y y* <sup>=</sup>

Surrogate-Based Optimization 355

Using an in-house Navier-Stokes flow solver, the objective of the problem is to minimize the

 : Minimize . : (1) 0.99 : (2) 0.99 : (3) *<sup>m</sup>*

*Objective C*

*m*

where *Area0 ,Cl0 , Cm0* are the area, lift coefficient, and moment coefficient of the baseline airfoil, respectively. The first constraint is in consideration of the structural design of the wing to guarantee the volume of the wing; the second one is to enforce a constant lift of the wing in order to balance the weight of the aircraft at cruise condition; the third one is to control the pitching moment of the airfoil to avoid large drag penalty of the horizontal tail

The initial number of samples for Kriging is set to 20, selected by the Latin Hypercube Sampling (LHS). The airfoil is parameterized by 10 Hicks-Henne bump functions (Hicks & Henne, 1978); and the maximum amplitude of each bump is max *A c*/ 0.544% = . Both of the SSM and EI infill strategies are adopted in the surrogate refinement. Table 2 presents the optimization results of the two optimization method. The optimized and initial airfoils and the corresponding pressure coefficient distributions are compared in Figure 7. Note that the aerodynamic coefficients of the initial airfoil RAE2822 are set to 100. Obviously, the Krigingbased optimization method gives better result, and with higher efficiency, and is more likely

**y/c**


Fig. 7. Aerodynamic shape optimization of a transonic airfoil (RAE 2822) via Kriging and quadratic Response Surface Model (left: pressure distribution; right: airfoil shape); by using Kriging model with Expected Improvement infill criteria, the drag is reduced by 33.6% with

0

0.05

**initial RSM Kriging** *C C*

≤

*st Area*

*Area C*

0

*l l*

≥

0

**x/c**

0 0.2 0.4 0.6 0.8 1

**initial RSM Kriging**

*d*

× ×

*C*

α

0

<sup>≥</sup> , (31)

=2.7 , <sup>6</sup> Re 6.5 10 = × , subject to

**4. Examples for applications to aircraft design** 

drag of an RAE2822 airfoil at the flow condition of *Ma* <sup>=</sup> 0.73 , <sup>o</sup>

**4.1 Airfoil design** 

paid for balancing the aircraft.

to find the global optimum.

**Cp**



0

0.5

1

**x/c**

0 0.2 0.4 0.6 0.8

only 56 calling of Navier-Stokes flow solver.

**Ma = 0.73, Re=6.5e6,** α**=2.7<sup>o</sup>**

3 constraints:

denotes the minimum of the observed data so far. The greater the EI, the more improvement we expect to achieve. The point with maximum EI is located by a global optimizer such as GA then observed by running the analysis code. For this infill criterion, the constraints can be accounted by introducing the probability that the constraints are satisfied. The corresponding sub-optimization problem can be modeled as

$$\begin{aligned} \text{Objective} \quad & \text{maximize } E[I(\mathbf{x})] \cdot \prod\_{i=1}^{n\_c} P[G\_i(\mathbf{x}) \le 0] \\ \text{s.t.} \quad & \mathbf{x}\_l \le \mathbf{x} \le \mathbf{x}\_u \end{aligned} \tag{27}$$

where [ ( ) 0] *P Gi* **x** ≤ denotes the probability that *i*-th constraint may be satisfied and ( ) *Gi* **x** is a random function corresponding to *i*-th state function ( ) *<sup>i</sup> g* **x** . [ ( ) 0] 1 *P Gi* **x** ≤ → when the constraint is satisfied and [ ( ) 0] 0 *P Gi* **x** ≤ → when the constraint is violated. [ ( ) 0] *P Gi* **x** ≤ can be calculated by

$$P[G\_i(\mathbf{x}) \le 0] = \frac{1}{\hat{s}\_i(\mathbf{x})\sqrt{2\pi}} \int\_{-\infty}^0 e^{-[G\_i(\mathbf{x}) - \hat{g}\_i(\mathbf{x})]^2 / 2\hat{s}\_i^2(\mathbf{x})} dG\_i(\mathbf{x}) = \Phi\left(\frac{-\hat{g}\_i(\mathbf{x})}{\hat{s}\_i(\mathbf{x})}\right) \tag{28}$$

where ˆ ( ) x *is* denotes the estimated standard error corresponding to the surrogate model ˆ ( ) x *<sup>i</sup> g* .

The optimum site opt **x**ˆ obtained by solving Eq. (27) is observed by running analysis code and the new sample point is added to the sampled date sets; the surrogate models are rebuilt and the whole process is repeated until the global optimum is approached.

#### **3.2.3 Lower-bounding confidence (LCB)**

The LCB function is defined as the weighted sum of predicted function value *y*ˆ( ) **x** and the standard error of the prediction *s*ˆ( ) **x** . For an optimization problem of finding the minimum of the unknown function *y*( ) **x** , a simple expression for LCB function is of the form

$$\mathbf{LCB} = \hat{y}(\mathbf{x}) - A\hat{s}(\mathbf{x}) \,. \tag{29}$$

where *A* is a constant which balances the influence of the predicted function and the corresponding uncertainty. Best practice suggests *A* = 1 works well for a number of realistic problems. The corresponding sub-optimization problem can be modeled as

$$\begin{array}{ll}\text{Objective} & \text{minimize} & \hat{y}(\mathbf{x}) - A\hat{s}(\mathbf{x})\\ \text{s.t.} & \hat{g}\_{i}(\mathbf{x}) \le 0, \, i = 1, n\_{c} \\ & \mathbf{x}\_{l} \le \mathbf{x} \le \mathbf{x}\_{u} \end{array} \tag{30}$$

The above optimization problem can be solved via a global optimizer such as GA. Since the point with smallest value of LCB indicates the possible minimum of the unknown function, the optimum site opt **x**ˆ is then observed and added to sampled data sets to refine the surrogate models. This procedure is performed iteratively until the global optimum is reached.

### **4. Examples for applications to aircraft design**

#### **4.1 Airfoil design**

354 Real-World Applications of Genetic Algorithms

denotes the minimum of the observed data so far. The greater the EI, the more improvement we expect to achieve. The point with maximum EI is located by a global optimizer such as GA then observed by running the analysis code. For this infill criterion, the constraints can be accounted by introducing the probability that the constraints are satisfied. The

Objective maximize [ ( )] [ ( ) 0]

where [ ( ) 0] *P Gi* **x** ≤ denotes the probability that *i*-th constraint may be satisfied and ( ) *Gi* **x** is a random function corresponding to *i*-th state function ( ) *<sup>i</sup> g* **x** . [ ( ) 0] 1 *P Gi* **x** ≤ → when the constraint is satisfied and [ ( ) 0] 0 *P Gi* **x** ≤ → when the constraint is violated. [ ( ) 0] *P Gi* **x** ≤ can

<sup>0</sup> 2 2 [ ( ) ( )] /2 ( ) ˆ ˆ <sup>1</sup> <sup>ˆ</sup> ( ) [ ( ) 0] ( ) <sup>ˆ</sup> ()2 <sup>ˆ</sup> ( )

where ˆ ( ) x *is* denotes the estimated standard error corresponding to the surrogate model

The optimum site opt **x**ˆ obtained by solving Eq. (27) is observed by running analysis code and the new sample point is added to the sampled date sets; the surrogate models are

The LCB function is defined as the weighted sum of predicted function value *y*ˆ( ) **x** and the standard error of the prediction *s*ˆ( ) **x** . For an optimization problem of finding the minimum

where *A* is a constant which balances the influence of the predicted function and the corresponding uncertainty. Best practice suggests *A* = 1 works well for a number of realistic

Objective minimize ( ) ( ) ˆ ˆ

≤ ≤

*i c l u*

≤ =

 *g i n*

The above optimization problem can be solved via a global optimizer such as GA. Since the point with smallest value of LCB indicates the possible minimum of the unknown function, the optimum site opt **x**ˆ is then observed and added to sampled data sets to refine the surrogate models. This procedure is performed iteratively until the global optimum is

*y As*

−

**x x**

*<sup>g</sup> P G <sup>e</sup> dG*

− −

*i i*

<sup>−</sup> ≤ = = Φ

*s*

**x x**

*l u*

*i i*

**x x**

rebuilt and the whole process is repeated until the global optimum is approached.

of the unknown function *y*( ) **x** , a simple expression for LCB function is of the form

problems. The corresponding sub-optimization problem can be modeled as

s.t. ( ) 0, 1, ˆ

**x x xx**

−∞

π

≤ ≤

**x xx**

1

**x x** ∏

*i*

*EI PG* =

*i*

, (27)

(28)

⋅ ≤

*Gg s ii i i*

LCB= ( ) ( ) *y*ˆ ˆ **x x** − *As ,* (29)

, (30)

**xx x <sup>x</sup>**

*nc*

corresponding sub-optimization problem can be modeled as

s.t.

*s*

**3.2.3 Lower-bounding confidence (LCB)** 

be calculated by

ˆ ( ) x *<sup>i</sup> g*.

reached.

Using an in-house Navier-Stokes flow solver, the objective of the problem is to minimize the drag of an RAE2822 airfoil at the flow condition of *Ma* <sup>=</sup> 0.73 , <sup>o</sup> α=2.7 , <sup>6</sup> Re 6.5 10 = × , subject to 3 constraints:

$$\begin{array}{llll}\text{Objective} & : & \text{Minimize} & \mathbb{C}\_{d} \\ \text{s.t.} & : & \text{(1) } Area \geq 0.99 \times Area\_{0} \\ & : & \text{(2) } \mathbb{C}\_{l} \geq 0.99 \times \mathbb{C}\_{l0} \\ & : & \text{(3) } |\mathbb{C}\_{m}| \leq |\mathbb{C}\_{m0}| \end{array} \tag{31}$$

where *Area0 ,Cl0 , Cm0* are the area, lift coefficient, and moment coefficient of the baseline airfoil, respectively. The first constraint is in consideration of the structural design of the wing to guarantee the volume of the wing; the second one is to enforce a constant lift of the wing in order to balance the weight of the aircraft at cruise condition; the third one is to control the pitching moment of the airfoil to avoid large drag penalty of the horizontal tail paid for balancing the aircraft.

The initial number of samples for Kriging is set to 20, selected by the Latin Hypercube Sampling (LHS). The airfoil is parameterized by 10 Hicks-Henne bump functions (Hicks & Henne, 1978); and the maximum amplitude of each bump is max *A c*/ 0.544% = . Both of the SSM and EI infill strategies are adopted in the surrogate refinement. Table 2 presents the optimization results of the two optimization method. The optimized and initial airfoils and the corresponding pressure coefficient distributions are compared in Figure 7. Note that the aerodynamic coefficients of the initial airfoil RAE2822 are set to 100. Obviously, the Krigingbased optimization method gives better result, and with higher efficiency, and is more likely to find the global optimum.

Fig. 7. Aerodynamic shape optimization of a transonic airfoil (RAE 2822) via Kriging and quadratic Response Surface Model (left: pressure distribution; right: airfoil shape); by using Kriging model with Expected Improvement infill criteria, the drag is reduced by 33.6% with only 56 calling of Navier-Stokes flow solver.

Surrogate-Based Optimization 357

(Wwing). Then the average relative errors and the root mean squared errors are calculated to evaluate the approximation models, as listed in Table 4. In this case Kriging and RSM have

Design variable Unit Lower limit Upper limit Span m 26 34 Taper ratio 0.2 0.4 Linear twist angle degree -3 -1 Sweep angle on leading edge degree 25 35 Thickness of front-spar web mm 2 6 Thickness of back-spar web mm 2 6 Thickness of lower skin mm 3 7 Thickness of Upper skin mm 3 7 Table 3. Definition of design variables for preliminary design of a high-subsonic transport-

*<sup>e</sup>* Parameter Surrogate

Kriging 0.0362 0.0213 Kriging 0.0515 0.0522

Kriging 0.0123 0.0097 Kriging 0.0227 0.0241

Kriging 0.0071 0.0051 Kriging 0.0142 0.0107

L RSM 0.0360 0.0213 σmax RSM 0.0563 0.0535

L/D RSM 0.0122 0.0099 δmax RSM 0.0227 0.0241

Swing RSM 0.0071 0.0051 Wwing RSM 0.0140 0.0104

Then the multi-objective optimization for the supercritical wing is performed based on RSM due to its higher computational efficiency. Weighted sum method is used to transform the multi-objective optimization into a single-objective optimization. Sequential quadratic programming method is employed to solve the optimization. One of the candidate wings with better performance, are selected as the initial point for optimization. The optimal design is observed by running the analysis codes and the results are listed in Table 5. Where <sup>0</sup> **X** and <sup>0</sup> **Y** is the initial wing scheme and its response, respectively; **\* X** and **\* Y** is the optimal wing scheme and its actual response, respectively; **Y**ˆ is the response at **\* X** calculated by the approximation models. For the optimal wing scheme, the largest relative error of approximation models is no more than 3 percent. It again proves the high accuracy

Figure 8 shows the contour of the equivalent stress of the optimal wing. It shows that the stress is larger in the location of intersection of inner wing and outer wing due to the inflexion. Figure 9 shows pressure distribution of the optimal wing. It shows that the wing

model

*e*

σ*e*

comparative high accuracy.

aircraft wing

Parameter Surrogate

model

Table 4. Evaluation of modeling accuracy

of the approximation models.

*e*

σ


Table 2. Drag reduction of an RAE 2822 airfoil via Kriging and RSM-based optimizations

### **4.2 Wing design**

Here we are concerned with the preliminary design for a high-subsonic transport-aircraft wing of a wing/body combination, considering aerodynamic, structure and static aeroelastic effect. The calculation of the external flow is carried out by numerical solutions of the full potential equation in conservative form (Kovalev & Karas, 1991). The FEM-based commercial software ANSYS is used for analyzing the structural performance of the wing with double-beams sandwich structure. Weak coupling method is adopted for static aeroelastic analysis.

The optimization objectives are to maximize the aircraft lift-to-drag ratio and minimize the weight of wing for a fixed maximum take-off weight of 54 tons and cruise Mach number of 0.76 at 10,000 meters high. The wing is composed of inner and outer wing. The reference area of wing is 105 square meter. The mathematical model for optimization is of the form

$$\begin{aligned} \max & \text{L/D} \\ \min & \text{W}\_{\text{wing}} \\ \text{s.t.} & \quad L \ge 54 & \quad / 10^3 \text{kg} \\ & \quad 100 \le S\_{\text{wing}} \le 110 & \quad / \text{m}^2 \\ & \quad \sigma\_{\text{max}} \le \sigma\_b & \quad / 10^9 \text{pa} \\ & \quad \delta\_{\text{max}} \le 1 & \quad / \text{m} \end{aligned} \tag{32}$$

Eight supercritical airfoils are configured along the span. The optimization is subject to 4 constraints. The first constraint is to enforce a constant lift of the wing in order to balance the weight of the aircraft at cruise condition; the second one is to guarantee a near constant wing loading; the third and fourth constraints are to make sure that the strength and rigidity requirements are satisfied. The definition for the limits of design variables is listed in Table 3. The first four design variables define the aerodynamic configuration of the wing. The four remain are for structure design. The detail can be found in paper by Zhang et al. (2008).

The uniform design table, U100(108), is used to creates one hundred candidate wings for building surrogate models. The other forty-five candidate wings are created by the uniform design table, U45(108), for evaluating the approximation model. For each wing, the static aeroelastic analysis is performed to obtain the responses of lift (L), lift-to-drag ratio (L/D), wing area (Swing), maximum stress (σmax), maximum deformation (δmax) and wing weight

99.3 (-0.7%)

96.5 (-3.5%)

Table 2. Drag reduction of an RAE 2822 airfoil via Kriging and RSM-based optimizations

Here we are concerned with the preliminary design for a high-subsonic transport-aircraft wing of a wing/body combination, considering aerodynamic, structure and static aeroelastic effect. The calculation of the external flow is carried out by numerical solutions of the full potential equation in conservative form (Kovalev & Karas, 1991). The FEM-based commercial software ANSYS is used for analyzing the structural performance of the wing with double-beams sandwich structure. Weak coupling method is adopted for static

The optimization objectives are to maximize the aircraft lift-to-drag ratio and minimize the weight of wing for a fixed maximum take-off weight of 54 tons and cruise Mach number of 0.76 at 10,000 meters high. The wing is composed of inner and outer wing. The reference area of wing is 105 square meter. The mathematical model for optimization is of the form

wing

*L D W L*

≥

max min

> max max

σ

δ

wing

≤ ≤

*S*

≤ ≤

*b*

Eight supercritical airfoils are configured along the span. The optimization is subject to 4 constraints. The first constraint is to enforce a constant lift of the wing in order to balance the weight of the aircraft at cruise condition; the second one is to guarantee a near constant wing loading; the third and fourth constraints are to make sure that the strength and rigidity requirements are satisfied. The definition for the limits of design variables is listed in Table 3. The first four design variables define the aerodynamic configuration of the wing. The four remain are for structure design. The detail can be found in paper by Zhang et al. (2008).

The uniform design table, U100(108), is used to creates one hundred candidate wings for building surrogate models. The other forty-five candidate wings are created by the uniform design table, U45(108), for evaluating the approximation model. For each wing, the static aeroelastic analysis is performed to obtain the responses of lift (L), lift-to-drag ratio (L/D), wing area (Swing), maximum stress (σmax), maximum deformation (δmax) and wing weight

 σ

s.t. 54 /10 kg 100 110 /m

1 /m

99.9 (-0.1%)

100.4 (+0.4%)

> 3 2

> > 9

/10 pa

baseline 100 100 100 100 -

Area NS flow solver calls

102

57

(32)

*Cd Cl Cm*

101.1 (+1.1%)

101.1 (+1.1%)

RSM 73.0

Kriging 70.7

**4.2 Wing design** 

aeroelastic analysis.

(-27.0%)

(-29.3%)


(Wwing). Then the average relative errors and the root mean squared errors are calculated to evaluate the approximation models, as listed in Table 4. In this case Kriging and RSM have comparative high accuracy.

Table 3. Definition of design variables for preliminary design of a high-subsonic transportaircraft wing


Table 4. Evaluation of modeling accuracy

Then the multi-objective optimization for the supercritical wing is performed based on RSM due to its higher computational efficiency. Weighted sum method is used to transform the multi-objective optimization into a single-objective optimization. Sequential quadratic programming method is employed to solve the optimization. One of the candidate wings with better performance, are selected as the initial point for optimization. The optimal design is observed by running the analysis codes and the results are listed in Table 5. Where <sup>0</sup> **X** and <sup>0</sup> **Y** is the initial wing scheme and its response, respectively; **\* X** and **\* Y** is the optimal wing scheme and its actual response, respectively; **Y**ˆ is the response at **\* X** calculated by the approximation models. For the optimal wing scheme, the largest relative error of approximation models is no more than 3 percent. It again proves the high accuracy of the approximation models.

Figure 8 shows the contour of the equivalent stress of the optimal wing. It shows that the stress is larger in the location of intersection of inner wing and outer wing due to the inflexion. Figure 9 shows pressure distribution of the optimal wing. It shows that the wing

Surrogate-Based Optimization 359

**x/c**

0 0.25 0.5 0.75 1

**x/c**

0 0.25 0.5 0.75 1

**-4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5**

Fig. 10. Convergence history of Y-direction deform and torsion deform on wing tip

Wing-tip torsion angle /(o

)

**z/b = 0.70 x/c**

**Cp**


0 0.25 0.5 0.75 1

**12345678**

Iteration

**z/b = 0.40**

**Cp**

Fig. 9. Pressure distribution of the optimal wing (Ma=0.76, Re=0.257E+08, α=0o)


**Cp**


**x/c**

**12345678**

Iteration

**0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5**

Wing-tip Deform in Y-direction /m

**z/b = 0.95**

**x/c**

**z/b = 0.85**

0 0.25 0.5 0.75 1

0 0.25 0.5 0.75 1

**Cp**


**Cp**


0 0.25 0.5 0.75 1

**z/b = 0.25**

**Cp**


basically meets the design requirements of a supercritical wing. A little bit non-smoothness of the pressure distribution may be caused by non-uniform deformation of the skin. Figure 10 shows the convergence history of aeroelastic deformation, which shows that fast convergence of the aeroelastic deformation of the optimal wing.

The optimization , together with the aeroelastic analysis of all candidate wings, only takes about two days on a personal computer of Pentium(R) 4 CPU 2.8GHz. If more computers are used to concurrently calculate the performance of different candidate wings, the cost can be further greatly reduced.


Table 5. Optimization results when considering the aeroelastic effect

Fig. 8. Contour of equivalent stress of the optimal wing

basically meets the design requirements of a supercritical wing. A little bit non-smoothness of the pressure distribution may be caused by non-uniform deformation of the skin. Figure 10 shows the convergence history of aeroelastic deformation, which shows that fast

The optimization , together with the aeroelastic analysis of all candidate wings, only takes about two days on a personal computer of Pentium(R) 4 CPU 2.8GHz. If more computers are used to concurrently calculate the performance of different candidate wings, the cost can

*X* <sup>0</sup> 34.00 0.244 -1.667 29.444 3.333 3.778 6.556 3.889 *X* \* 31.69 0.200 -1.563 28.233 2.232 2.000 4.396 4.057

*<sup>L</sup>*/103kg *<sup>L</sup>*/*<sup>D</sup> <sup>S</sup>*wing/m2 *<sup>σ</sup>*max/109pa *<sup>δ</sup>*max /m *<sup>W</sup>*wing

<sup>0</sup> **Y** 51.50 27.81 111.94 0.311 1.191 3.850 *Y* \* 53.14 27.40 107.25 0.275 0.991 3.063

**Y**ˆ 54.00 27.45 106.27 0.267 1.000 3.003

Modeling error 1.61% 0.16% 0.91% 2.97% 0.87% 1.85%

Table 5. Optimization results when considering the aeroelastic effect

Fig. 8. Contour of equivalent stress of the optimal wing

*B*/m *λ θ*/(o) *Λ*/(o) *T*FS/mm *T*BS/mm *T*LS/mm *T*US/mm

/103kg

convergence of the aeroelastic deformation of the optimal wing.

be further greatly reduced.

Fig. 9. Pressure distribution of the optimal wing (Ma=0.76, Re=0.257E+08, α=0o)

Fig. 10. Convergence history of Y-direction deform and torsion deform on wing tip

Surrogate-Based Optimization 361

Han, Z. -H., Zhang, K. -S., Song, W. -P., and Qiao, Z. -D., "Optimization of Active Flow

Han, Z.-H., Görtz, S., Zimmermann, R. "On Improving Efficiency and Accuracy of Variable-

2009 European Air and Space Conference, Manchester, UK, Oct. 26-29, 2009. Han, Z.-H., Zimmermann, R., and Görtz, S., "A New Cokriging Method for Variable-Fidelity

Hardy, R. L., "Multiquadric Equations of Topography and Other Irregular Surface," *Journal* 

Hicks, R. M., and Henne, P. A., "Wing Design by Numerical Optimization," *Journal of* 

Jeong, S., Murayama, M., and Yamamoto, K., "Efficient Optimization Design Method Using Kriging Model," *Journal of Aircraft*, Vol. 42, No. 2, 2005, pp. 413- 420. Jones, D., Schonlau, M., Welch W., "Efficient Global Optimization of Expensive Black-Box Functions," *Journal of Global Optimization*, 1998, Vol. 13, pp. 455-492. Keane, A. J., Nair, P. B., "Computational Approaches for Aerospace Design: The Pursuit of

Kovalev, V.E., Karas, O. V. "Computation of a Transonic Airfoil Flow Considering Viscous

Krige, D. G., "A Statistical Approach to Some Basic Mine Valuations Problems on the

Laurenceau, J., and Sagaut, P. "Building Efficient Response Surfaces of Aerodynamic

Park. J., and Sandberg, I. W., "Universal Approximation Using Radial-Basis-Function

Powell, M. J. D. "Radial Basis Functions for Multivariable Interpolation: A Review, "

Queipo, N. V., Haftka, R. T., Shyy W., Goel, T., Vaidyanathan, R., and Tucher, P. K.,

Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P., "Design and Analysis of Computer

Simpson, T. W., Mauery, T. M., Korte, J. J., et al., "Kriging Models for Global Approximation

Simpson, T. W., Toropov, V., Balabanov, V., and Viana, F. A. C., "Design and Analysis of

How Far We Have Come – or Not," AIAA Paper 2008-5802, 2008.

Networks," *Neural Computation*, Vol. 3, No. 2, 1991, pp. 246-257.

Experiments," *Statistical Science*, Vol. 4, 1989, pp. 409-423.

Effects and Thin Separated Regions." La Recherche Aerospatiale (English Edition)

Witwatersrand," *Journal of the Chemical, Metallurgical and Mining Engineering Society* 

Functions with Kriging and Cokriging," *AIAA Journal*, Vol. 46, No. 2, 2008, pp. 498-

*Algorithms for Approximation*, edited by J. C. Mason and M. G. Cox, Oxford Univ.

"Surrogate-based Analysis and Optimization," *Progress in Aerospace Sciences*, Vol.

in Simulation-Based Multidisciplinary Design Optimization", *AIAA Journal*, Vol. 39,

Computer Experiments in Multidisciplinary Design Optimization: a Review of

*of Geophysical Research*, Vol. 76, March, 1971, pp. 1905-1915.

Excellence", John Wiley & Sons, Ltd, Chichester, 2005.

*Aircraft*, 2010, Vol. 47, No. 2, pp. 603-612.

Exposition, Orlando, Florida, Jan. 4-7, 2010.

*Aircraft*, Vol. 15, No. 7, 1978, pp. 407–412.

(ISSN 0379-380X), No. 1, 1991, pp. 1-15.

507.

41, 2005, pp. 1-28.

No. 12, 2001, pp. 2233-2241.

*of South Afric*a, Vol. 52, No. 6, 1951, pp. 119-139.

Press, New York, 1987, Chap. 3, pp. 141-167.

Control over an Airfoil Using a Surrogate-Management Framework," *Journal of* 

Fidelity Surrogate Modeling in Aero-data for Loads Context," Proceeding of CEAS

Surrogate Modeling of Aerodynamic Data," AIAA Paper 2010-1225, 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace

### **5. Conclusion**

An overview of the existing surrogate models and the techniques about how to use them for optimization is presented in this chapter. Among the surrogate models, the regression model such as the quadratic response surface model (RSM) is well suited for a local optimization problem with relatively simpler design space; interpolation models such as Kriging or RBFs can be used for highly-nonlinear, multi-modal functions, and thus well suited for a global problem with relatively more complicated design space. From an application point of view, the simple framework of surrogate-based optimization is a good choice for an engineer design, due to the fact that surrogate model can act as an interface between the expensive analysis code and the optimizer and one doesn't need to change the analysis code itself. The drawback of this framework is that the accuracy of optimum only depends on the approximation accuracy of surrogate model and we generally get an approximation to the true optimum. In contrast, the bi-level framework with different infill criteria provides an efficient way to quickly find true optimum without the need of building globally accurate surrogate models. Multiple infill criteria seem to be a better way to overcome the drawback of the single infill criterion.

Examples for airfoil and wing designs show that surrogate-based optimization is very promising for aerodynamic problem with number of design variables being less than about 10. For higher-dimensional problem, the computational cost increases very quickly, which can be prohibitive. Thus, use of surrogate model for high(er)-dimensional optimization problems would become an important issue of future work.

### **6. Acknowledgements**

This research was sponsored by the National Natural Science Foundation of China (NSFC) under grant No. 10902088 and the Aeronautical Science Foundation of China under grant No. 2011ZA53008

### **7. References**


An overview of the existing surrogate models and the techniques about how to use them for optimization is presented in this chapter. Among the surrogate models, the regression model such as the quadratic response surface model (RSM) is well suited for a local optimization problem with relatively simpler design space; interpolation models such as Kriging or RBFs can be used for highly-nonlinear, multi-modal functions, and thus well suited for a global problem with relatively more complicated design space. From an application point of view, the simple framework of surrogate-based optimization is a good choice for an engineer design, due to the fact that surrogate model can act as an interface between the expensive analysis code and the optimizer and one doesn't need to change the analysis code itself. The drawback of this framework is that the accuracy of optimum only depends on the approximation accuracy of surrogate model and we generally get an approximation to the true optimum. In contrast, the bi-level framework with different infill criteria provides an efficient way to quickly find true optimum without the need of building globally accurate surrogate models. Multiple infill criteria seem to be a better way to

Examples for airfoil and wing designs show that surrogate-based optimization is very promising for aerodynamic problem with number of design variables being less than about 10. For higher-dimensional problem, the computational cost increases very quickly, which can be prohibitive. Thus, use of surrogate model for high(er)-dimensional optimization

This research was sponsored by the National Natural Science Foundation of China (NSFC) under grant No. 10902088 and the Aeronautical Science Foundation of China under grant

Elanayar, S. V. T., and Shin, Y. C., "Radial basis function neural network for approximation

Fang, K. T., Lin, D., Winker, P., Zhang, Y., "Uniform design: Theory and application,"

Forrester, A. I. J., Sóbester, A., and Keane, A. J., "Multi-Fidelity Optimization via Surrogate

Forrester, A. I. J., Sóbester, A., and Keane, A., "Engineering Design via Surrogate Modeling:

Giunta, A. A., Wojtkiewicz Jr, S. F., and Eldred, M. S. , "Overview of Modern Design of

and estimation of nonlinear stochastic dynamic systems," IEE Transactions on

Modeling," Proceedings of the Royal Society A, Vol. 463, No. 2088, 2007, pp. 3251-

A Practical Guide," Progress in Astronautics and Aeronautics Series, 226, published

Experiments Methods for Computational Simulations," AIAA paper 2003-649,

**5. Conclusion** 

overcome the drawback of the single infill criterion.

**6. Acknowledgements** 

No. 2011ZA53008

**7. References** 

3269.

2001.

problems would become an important issue of future work.

Neural Networks, Vol. 5, No. 4, 1994, pp. 594-603.

*Technometrics*, Vol. 42, No. 3, 2000, pp. 237-248.

by John Wiley & Sons, 2008.


Smola, A. J. and Schoelkopf, B., "A tutorial on support vector regression," Statistics and

Toal, D. J. J., Bressloff, N. W., Kean, J., "Kriging Hyperparameter Tuning Strategies," *AIAA* 

Wang, G. G., Shan S., "Review of Metamodeling Techniques in Support of Engineering

Zhang, K. -S., Han, Z. -H., Li, W. -J., and Song, W. -P., "Coupled Aerodynamic/Structural

Design Optimization," *Journal of Mechanical Design*, Vol. 129, No. 4. 2007, pp. 370-

Optimization of a Subsonic Transport Wing Using a Surrogate Model," *Journal of* 

Computing, Vol. 14, 2004, pp. 199-222.

380.

*Journal*, Vol. 46, No .5, 2008, pp. 1240-1252.

*Aircraft*, 2008, Vol. 45, No. 6, pp. 2167-2171.

## *Edited by Olympia Roeva*

The book addresses some of the most recent issues, with the theoretical and methodological aspects, of evolutionary multi-objective optimization problems and the various design challenges using different hybrid intelligent approaches. Multiobjective optimization has been available for about two decades, and its application in real-world problems is continuously increasing. Furthermore, many applications function more effectively using a hybrid systems approach. The book presents hybrid techniques based on Artificial Neural Network, Fuzzy Sets, Automata Theory, other metaheuristic or classical algorithms, etc. The book examines various examples of algorithms in different real-world application domains as graph growing problem, speech synthesis, traveling salesman problem, scheduling problems, antenna design, genes design, modeling of chemical and biochemical processes etc.

Real-World Applications of Genetic Algorithms

Real-World Applications of

Genetic Algorithms

*Edited by Olympia Roeva*

Photo by ktsimage / iStock