2. Conjugate gradient method (shortly CG)

The conjugate gradient method is the method between the steepest descent method and the Newton method.

The conjugate gradient method in fact deflects the direction of the steepest descent method by adding to it a positive multiple of the direction used in the last step.

The restarting and the preconditioning are very important to improve the conjugate gradient method [47].

Some of well-known CG methods are [12, 19, 20, 23, 24, 31, 39, 40, 49]:

$$\begin{aligned} \boldsymbol{\beta}\_{k}^{\text{HS}} &= \frac{\boldsymbol{\mathcal{G}}\_{k+1}^{T} \boldsymbol{\mathcal{Y}}\_{k}}{d\_{k}^{T} \boldsymbol{\mathcal{Y}}\_{k}} \\ \boldsymbol{\beta}\_{k}^{\text{FR}} &= \frac{\|\boldsymbol{\mathcal{G}}\_{k+1}\|^{2}}{\|\boldsymbol{\mathcal{G}}\_{k}\|^{2}} \\ \boldsymbol{\beta}\_{k}^{\text{RPP}} &= \frac{\boldsymbol{\mathcal{G}}\_{k+1}^{T} \boldsymbol{\mathcal{Y}}\_{k}}{\|\boldsymbol{\mathcal{G}}\_{k}\|^{2}} \\ \boldsymbol{\beta}\_{k}^{\text{CD}} &= \frac{\|\boldsymbol{\mathcal{G}}\_{k+1}\|^{2}}{-d\_{k}^{T} \boldsymbol{\mathcal{G}}\_{k}} \\ \boldsymbol{\beta}\_{k}^{\text{LS}} &= \frac{\boldsymbol{\mathcal{G}}\_{k+1}^{T} \boldsymbol{\mathcal{Y}}\_{k}}{-d\_{k}^{T} \boldsymbol{\mathcal{G}}\_{k}} \\ \boldsymbol{\beta}\_{k}^{\text{DY}} &= \frac{\left\|\boldsymbol{\mathcal{G}}\_{k+1}\right\|^{2}}{d\_{k}^{T} \boldsymbol{\mathcal{Y}}\_{k}} \\ \boldsymbol{\beta}\_{k}^{\text{N}} &= \boldsymbol{\mathcal{Y}}\_{k} - 2d\_{k} \frac{\|\boldsymbol{\mathcal{Y}}\_{k}\|^{2}}{d\_{k}^{T} \boldsymbol{\mathcal{Y}}\_{k}} \end{aligned}$$
 
$$\boldsymbol{\beta}\_{k}^{\text{WTL}} = \frac{\boldsymbol{\mathcal{G}}\_{k}^{T} \left(d\_{k}^{d} - \frac{\|\boldsymbol{\mathcal{G}}\_{k-1}\|}{\|\boldsymbol{\mathcal{G}}\_{k-1}\|^{2}} \boldsymbol{\mathcal{G}}\_{k-1}\right)}{\left\|\boldsymbol{\mathcal{G}}\_{k-1}\right\|^{2}}$$

Consider positive definite quadratic function

$$f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^T \mathbf{G} \mathbf{x} + \mathbf{b}^T \mathbf{x} + c,\tag{2}$$

where <sup>G</sup> is an <sup>n</sup> � <sup>n</sup> symmetric positive definite matrix, <sup>b</sup><sup>∈</sup> <sup>R</sup><sup>n</sup>, and <sup>c</sup> is <sup>a</sup> real number.

Theorem 1.2.1. [47] (Property theorem of conjugate gradient method) For positive definite quadratic function (2), FR conjugate gradient method with the exact line search terminates after m ≤ n steps, and the following properties hold for all i, 0≤ i ≤ m:

$$\begin{aligned} d\_i^T \mathbf{G} d\_j &= \mathbf{0}, j = \mathbf{0}, \mathbf{1}, \dots, i - \mathbf{1};\\ \mathbf{g}\_i^T \mathbf{g}\_j &= \mathbf{0}, j = \mathbf{0}, \mathbf{1}, \dots, i - \mathbf{1};\\ d\_i^T \mathbf{g}\_i &= -\mathbf{g}\_i^T \mathbf{g}\_i;\\ \begin{bmatrix} \mathbf{g}\_0', \mathbf{g}\_1, \dots, \mathbf{g}\_i \end{bmatrix} & \leftarrow \begin{bmatrix} \mathbf{g}\_0', \mathbf{G} \mathbf{g}\_0, \dots, \mathbf{G}^i \mathbf{g}\_0 \end{bmatrix}; \begin{pmatrix} \\ \vdots \\ \mathbf{g}\_0', \mathbf{G} \mathbf{g}\_0, \dots, \mathbf{G}^i \mathbf{g}\_0 \end{bmatrix}; \end{aligned}$$

where m is the number of distinct eigenvalues of G.

Now, we give the algorithm of conjugate gradient method. Algorithm 1.2.1. (CG method).

Assumptions: <sup>ε</sup> , <sup>0</sup> and <sup>x</sup><sup>0</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>. Let <sup>k</sup> <sup>¼</sup> 0, <sup>t</sup><sup>0</sup> <sup>¼</sup> 0, <sup>d</sup>�<sup>1</sup> <sup>¼</sup> 0, <sup>d</sup><sup>0</sup> ¼ �g0, <sup>β</sup>�<sup>1</sup> <sup>¼</sup> 0, and β<sup>0</sup> ¼ 0.

Unconstrained Optimization Methods: Conjugate Gradient Methods and Trust-Region Methods DOI: http://dx.doi.org/10.5772/intechopen.84374

Step 1. If ∥ gk∥ ≤ ε, then STOP. Step 2. Calculate the step-size tk by a line search. Step 3. Calculate β<sup>k</sup> by any of the conjugate gradient method. Step 4. Calculate dk ¼ �gk þ βk�<sup>1</sup>dk�1. Step 5. Set xkþ<sup>1</sup> ¼ xk þ tkdk. Step 6. Set k ¼ k þ 1 and go to Step 1.

### 2.1 Convergence of conjugate gradient methods

Theorem 1.2.2. [47] (Global convergence of FR conjugate gradient method) Suppose that <sup>f</sup> : <sup>R</sup><sup>n</sup> ! <sup>R</sup> is continuously differentiable on <sup>a</sup> bounded level set

$$L = \{ \mathfrak{x} \in \mathbb{R}^n | f(\mathfrak{x}) \le f(\mathfrak{x}\_0) \},$$

and let FR method be implemented by the exact line search. Then, the produced sequence f g xk has at least one accumulation point, which is a stationary point, i.e.:

1. When f g is <sup>a</sup> finite sequence, then the final point <sup>x</sup><sup>∗</sup> xk is <sup>a</sup> stationary point of <sup>f</sup>.

2.When f g xk is an infinite sequence, then it has a limit point, and it is a stationary point.

In [35], a comparison of two methods, the steepest descent method and the conjugate gradient method which are used for solving systems of linear equations, is illustrated. The aim of the research is to analyze, which method is faster in solving these equations and how many iterations are needed by each method for solving.

The system of linear equations in the general form is considered:

$$A\mathfrak{x} = B,\tag{3}$$

where matrix A is symmetric and positive definite.

The conclusion is that the SD method is a faster method than the CG, because it solves equations in less amount of time.

By the other side, the authors find that the CG method is slower but more productive than the SD, because it converges after less iterations.

So, we can see that one method can be used when we want to find solution very fast and another can converge to maximum in less number of iterations.

Again, we consider the problem (1), where <sup>f</sup> : <sup>R</sup><sup>n</sup> ! <sup>R</sup> is <sup>a</sup> smooth function and its gradient is available.

A hybrid conjugate gradient method is a certain combination of different conjugate gradient methods; it is made to improve the behavior of these methods and to avoid the jamming phenomenon.

An excellent survey of hybrid conjugate gradient methods is given in [5].

Three-term conjugate gradient methods were studied in the past (e.g., see [8, 32, 34], etc.); but, from recent papers about CG methods, we can conclude that maybe the mainstream is made by three-term and even four-term conjugate gradient methods. An interesting paper about a five-term hybrid conjugate gradient method is [1]. Also, from recent papers we can conclude that different modifications of the existing CG methods are made, as well as different hybridizations of CG and BFGS methods.

Consider unconstrained optimization problem (1), where <sup>f</sup> : <sup>R</sup><sup>n</sup> ! <sup>R</sup> is <sup>a</sup> continuously differentiable function, bounded from below. Starting from an initial point x<sup>0</sup> ∈ R<sup>n</sup>, the three-term conjugate gradient method with line search generates a sequence f g xk , given by the next iterative scheme:

$$\mathbf{x}\_{k+1} = \mathbf{x}\_k + t\_k d\_{k\bullet} \tag{4}$$

where tk is a step-size which is obtained from the line search, and

$$d\_0 = -\mathbb{g}\_0,\\d\_{k+1} = -\mathbb{g}\_{k+1} + \delta\_k \mathfrak{s}\_k + \eta\_k \mathfrak{y}\_k.$$

In the last relation, δ<sup>k</sup> and η<sup>k</sup> are the conjugate gradient parameters, sk <sup>¼</sup> xkþ<sup>1</sup> � xk, gk <sup>¼</sup> <sup>∇</sup>fð Þ xk , and yk <sup>¼</sup> gkþ<sup>1</sup> � gk. We can see that the search direction dkþ<sup>1</sup> is computed as <sup>a</sup> linear combination of �gkþ1, sk, and yk.

In [6], the author suggests another way to get three-term conjugate gradient algorithms by minimization of the one-parameter quadratic model of the function f. The idea is to consider the quadratic approximation of the function f in the current point and to determine the search direction by minimization of this quadratic model. It is assumed that the symmetrical approximation of the Hessian matrix Bkþ<sup>1</sup> satisfies the general quasi-Newton equation which depends on a positive parameter:

$$B\_{k+1} \mathfrak{s}\_k = \alpha^{-1} \mathfrak{y}\_{k^\*} \alpha = \mathbf{0}.\tag{5}$$

In this paper the quadratic approximation of the function f is considered:

$$
\Phi\_{k+1}(d) = f\_{k+1} + \mathbf{g}\_{k+1}^T d + \frac{1}{2} d^T B\_{k+1} d.
$$

The direction dkþ<sup>1</sup> is computed as

$$d\_{k+1} = -\mathcal{g}\_{k+1} + \beta\_k \mathfrak{s}\_k.\tag{6}$$

where the scalar β<sup>k</sup> is determined as the solution of the following minimizing problem:

$$\min\_{\beta\_k \in \mathbb{R}} \Phi\_{k+1}(d\_{k+1}).\tag{7}$$

From (6) and (7), the author obtains

$$\beta\_k = \frac{\mathbf{g}\_{k+1}^T B\_{k+1} \mathbf{s}\_k - \mathbf{g}\_{k+1}^T \mathbf{s}\_k}{s\_k^T B\_{k+1} \mathbf{s}\_k}. \tag{8}$$

Using (5), from (7), the next expression for β<sup>k</sup> is obtained:

$$\beta\_k = \frac{\mathbf{g}\_{k+1}^T \mathbf{y}\_k - a \mathbf{g}\_{k+1}^T \mathbf{s}\_k}{\mathbf{y}\_k^T \mathbf{s}\_k}. \tag{9}$$

Using the idea of Perry [36], the author obtains

$$d\_{k+1} = -\mathbf{g}\_{k+1} + \frac{\mathbf{y}\_k^T \mathbf{g}\_{k+1} - \alpha \mathbf{s}\_k^T \mathbf{g}\_{k+1}}{\mathbf{y}\_k^T \mathbf{s}\_k} \mathbf{s}\_k - \frac{\mathbf{s}\_k^T \mathbf{g}\_{k+1}}{\mathbf{y}\_k^T \mathbf{s}\_k} \mathbf{y}\_k \dots$$

In fact, in this approach the author gets a family of three-term conjugate gradient algorithms depending of a positive parameter ω.

Next, in [52], the WYL conjugate gradient (CG) formula, with βWYL ≥ 0, is <sup>k</sup> further studied. A three-term WYL CG algorithm is presented, which has the sufficiently descent property without any conditions. The global convergence and the linear convergence are proven; moreover, the n-step quadratic convergence with a restart strategy is established if the initial step length is appropriately chosen.

Unconstrained Optimization Methods: Conjugate Gradient Methods and Trust-Region Methods DOI: http://dx.doi.org/10.5772/intechopen.84374

The first three-term Hestenes-Stiefel (HS) method (TTHS method) can be found in [55].

Baluch et al. [7] describe a modified three-term Hestenes-Stiefel (HS) method. Although the earliest conjugate gradient method HS achieves global convergence using an exact line search, this is not guaranteed in the case of an inexact line search. In addition, the HS method does not usually satisfy the descent property. The modified three-term conjugate gradient method from [7] possesses a sufficient descent property regardless of the type of line search and guarantees global convergence using the inexact Wolfe-Powell line search [50, 51]. The authors also prove the global convergence of this method. The search direction, which is considered in [7], has the next form:

$$d\_k = \begin{cases} -\mathbf{g}\_k, \text{if } k = \mathbf{0}, \\\ -\mathbf{g}\_k + \boldsymbol{\beta}\_k^{\text{BZ}A} d\_{k-1} - \boldsymbol{\theta}\_k^{\text{BZ}A} \boldsymbol{\gamma}\_{k-1}, \text{if } k \ge \mathbf{1}, \end{cases}$$

<sup>k</sup> <sup>ð</sup>gk�gk�<sup>1</sup><sup>Þ</sup> , <sup>θ</sup>BZA <sup>k</sup> dk�<sup>1</sup> where <sup>β</sup>BZA <sup>¼</sup> <sup>g</sup><sup>T</sup> <sup>¼</sup> <sup>g</sup><sup>T</sup> , <sup>μ</sup> . <sup>1</sup>: <sup>k</sup> <sup>d</sup><sup>T</sup> <sup>k</sup> dT <sup>k</sup>�<sup>1</sup>yk�1þμ∣gTdk�1<sup>∣</sup> <sup>k</sup>�<sup>1</sup>yk�1þμ∣gTdk�1<sup>∣</sup> <sup>k</sup> <sup>k</sup>

In [13], an accelerated three-term conjugate gradient method is proposed, in which the search direction satisfies the sufficient descent condition as well as extended Dai-Liao conjugacy condition:

$$d\_k^T \mathbf{y}\_{k-1} = -\text{tg}\, ^T\_k \mathbf{s}\_{k-1}, \ t \ge 0.$$

This method seems different from the existent methods. Next, Li-Fushikuma quasi-Newton equation is

$$
\nabla^2 f(\varkappa\_k) s\_{k-1} = z\_{k-1}, \tag{10}
$$

where

$$z\_{k-1} = \mathcal{y}\_{k-1} + \mathcal{C} \|\mathcal{g}\_{k-1}\|^r s\_{k-1} + \max\left\{-\frac{s\_{k-1}^T \mathcal{y}\_{k-1}}{\left\|s\_{k-1}\right\|^2}, \mathcal{O}\right\} s\_{k-1},$$

where C and r are two given positive constants. Based on (10), Zhou and Zhang [56] propose a modified version of DL method, called ZZ method in [13].

In [30], some new conjugate gradient methods are extended, and then some three-term conjugate gradient methods are constructed. Namely, the authors remind to [41, 42], with its conjugate gradient parameters, respectively:

$$\boldsymbol{\beta}\_{k}^{\text{RMIL}} = \frac{\mathbf{g}\_{k}^{T}\mathbf{y}\_{k-1}}{\left\|\mathbf{d}\_{k-1}\right\|^{2}},\tag{11}$$

$$\boldsymbol{\beta}\_{k}^{\text{MRMIL}} = \frac{\mathbf{g}\_{k}^{T} \left(\mathbf{g}\_{k} - \mathbf{g}\_{k-1} - d\_{k-1}\right)}{\left\|d\_{k-1}\right\|^{2}},\tag{12}$$

<sup>¼</sup> <sup>β</sup>RMIL wherefrom it is obvious that <sup>β</sup>MRMIL for the exact line search. Let us say <sup>k</sup> <sup>k</sup> that these methods, presented in [41, 42], are RMIL and MRMIL methods.

The three-term RMIL and MRMIL methods are introduced in [30].

The search direction dk can be expressed as

$$d\_0 = -\mathbb{g}\_{0^\bullet} \, d\_k = -\mathbb{g}\_k + \beta\_k d\_{k-1} + \theta\_k \mathbb{y}\_{k-1^\bullet}$$

where β<sup>k</sup> is given by (11) or (12), and

Applied Mathematics

$$\theta\_k = -\frac{\mathbf{g}\_k^T d\_{k-1}}{\|d\_{k-1}\|^2}.$$

An important property of the proposed methods is that the search direction always satisfies the sufficient descent condition without any line search, that is, the next relation always holds

$$\mathbf{g}\_k^T d\_k \le \|\mathbf{g}\_k\|^2.$$

Under the standard Wolfe line search and the classical assumptions, the global convergence properties of the proposed methods are proven.

Having in view the conjugate gradient parameter suggested in [49], in [45] the next two conjugate gradient parameters are presented:

$$\beta\_k^{\text{MHS}} = \frac{\|\mathbf{g}\_k\|^2 - \frac{\|\mathbf{g}\_k\|}{\|\mathbf{g}\_{k-1}\|} \mathbf{g}\_k^T \mathbf{g}\_{k-1}}{d\_{k-1}^T (\mathbf{g}\_k - \mathbf{g}\_{k-1})},\tag{13}$$

$$\boldsymbol{\beta}\_{k}^{MLS} = \frac{\|\mathbf{g}\_{k}\|^2 - \frac{\|\mathbf{g}\_{k}\|}{\|\mathbf{g}\_{k-1}\|} \mathbf{g}\_{k}^{T} \mathbf{g}\_{k-1}}{-\mathbf{d}\_{k-1}^{T} \mathbf{g}\_{k-1}}.\tag{14}$$

Motivated by [49], as well as by [45], in [1], a new hybrid nonlinear CG method is proposed; it combines the features of five different CG methods, with the aim of combining the positive features of different non-hybrid methods. The proposed method generates descent directions independently of the line search. Under some assumptions on the objective function, the global convergence is proven under the standard Wolfe line search. Conjugate gradient parameter, proposed in [1], is

$$\rho\_k^{hAO} = \frac{\|\|\mathbf{g}\_k\|\|^2 - \max\left\{ \mathbf{0}, \frac{\|\mathbf{g}\_k\|}{\|\mathbf{g}\_{k-1}\|} \mathbf{g}\_k^T \mathbf{g}\_{k-1} \right\}}{\max\left\{ \|\|\mathbf{g}\_{k-1}\|\|^2, d\_{k-1}^T (\mathbf{g}\_k - \mathbf{g}\_{k-1}), -d\_{k-1}^T \mathbf{g}\_{k-1} \right\}}. \tag{15}$$

Let's note that the proposed method is hybrid of FR, DY, WYL, MHS, and MLS. The behaviors of the methods BZA, TTRMIL, MRMIL, MHS, MLS, and hAO are illustrated by the next tables.

The test criterion is CPU time.

The tests are performed on the computer Workstation Intel Celeron CPU 1,9 GHz.

The experiments are made on the test functions from [3].

Each problem is tested for a number of variables n ¼ 1000 and n ¼ 5000.

The average CPU time values are given in the last rows of these tables (Tables 1–4).

In [2], based on the numerical efficiency of Hestenes-Stiefel (HS) method, a new modified HS algorithm is proposed for unconstrained optimization. The new direction independent of the line search satisfies the sufficient descent condition. Motivated by theoretical and numerical features of three-term conjugate gradient (CG) methods proposed by [33], similar to the approach in [10], the new direction is computed by minimizing the distance between the CG direction and the direction of the three-term CG methods proposed by [33]. Under some mild conditions, the global convergence of the new method for general functions is established when the standard Wolfe line search is used. In this paper the conjugate gradient parameter is given by

$$
\beta\_k = \beta\_k^{\text{HS}} \theta\_k. \tag{16}
$$


Unconstrained Optimization Methods: Conjugate Gradient Methods and Trust-Region Methods DOI: http://dx.doi.org/10.5772/intechopen.84374

#### Table 1.

n = 1000.


### Applied Mathematics


#### Table 2.

n = 1000.


Table 3. n = 5000.

where

$$\theta\_k = \mathbf{1} - \frac{\left(\mathbf{g}\_k^T d\_{k-1}\right)^2}{\left\|\mathbf{g}\_k\right\|^2 \left\|d\_{k-1}\right\|^2}.$$

But this new CG direction does not fulfill a descent condition, so further modification is made, namely, having in view [53], the authors [2] introduce

Unconstrained Optimization Methods: Conjugate Gradient Methods and Trust-Region Methods DOI: http://dx.doi.org/10.5772/intechopen.84374


Table 4.

n = 5000.

$$
\overline{\beta}\_k = \beta\_k - \lambda \underbrace{\|\mathbf{y}\_{k-1}\| \theta\_k^{\prime}}\_{d\_{k-1}^T \mathbf{y}\_k} \big)^2 \mathbf{g}\_k^T d\_{k-1},
$$

where <sup>1</sup> λ . is a parameter. Also, the global convergence is proven under stan- <sup>4</sup> dard conditions.

It is worth to mention the next papers about this theme, which can be interesting [4, 14–17, 25–27].

### 3. Trust region methods

We remind that the basic idea of Newton method is to approximate the objective function f xð Þ around xk by using a quadratic model:

$$q^{(k)}(\mathbf{s}) = f(\mathbf{x}\_k) + \mathbf{g}\_k^T \mathbf{s} + \frac{\mathbf{1}}{2} s\_k^T G\_k \mathbf{s},$$

where ð Þ<sup>k</sup> gk <sup>¼</sup> <sup>∇</sup>f xð Þ<sup>k</sup> , Gk <sup>¼</sup> <sup>∇</sup><sup>2</sup> f xð Þ<sup>k</sup> , and also use the minimizer sk of q ð Þs to set xkþ<sup>1</sup> ¼ xk þ sk.

Also, remind that Newton method can only guarantee the local convergence, i.e., when s is small enough and the method is convergent locally.

Further, Newton method cannot be used when Hessian is not positive definite.

There exists another class of methods, known as trust region methods. It does not use the line search to get the global convergence, as well as it avoids the difficulty which is the consequence of the nonpositive definite Hessian in the line search.

Furthermore, it produces greater reduction of the function f than line search approaches.

Here, we define the region around the current iterate:

$$\mathfrak{Q}\_k = \{ \mathfrak{x} : \|\mathfrak{x} - \mathfrak{x}\_k\| \le \Delta\_k \},$$

where Δ<sup>k</sup> is the radius of Ωk, inside which the model is trusted to be adequate to the objective function.

Our further intention is to choose a step which should be the approximate minimizer of the quadratic model in the trust region. In fact, xk þ sk should be the approximately best point on the sphere:

$$\{\mathfrak{x}\_{k} + \mathfrak{s} \|\|\mathfrak{s}\| \le \Delta\_{k}\},$$

with the center xk and the radius Δk.

In the case that this step is not acceptable, we reduce the size of the step, and then we find a new minimizer.

This method has the rapid local convergence rate, and that's the property of Newton method and quasi-Newton method, too, but the important characteristic of trust region method is also the global convergence.

Since the step is restricted by the trust region, this method is also called the restricted step method.

The model subproblem of the trust region method is

$$\text{min}q^{(k)}(\mathbf{s}) = f(\mathbf{x}\_k) + \mathbf{g}\_k^T \mathbf{s} + \frac{\mathbf{1}}{2} \mathbf{s}^T B\_k \mathbf{s},\tag{17}$$

$$\text{s.t.} \|s\| \le \Delta\_k,\tag{18}$$

where Δ<sup>k</sup> is the trust region radius and Bk is a symmetric approximation of the Hessian Gk.

ð Þ<sup>k</sup> In the case that we use the standard <sup>l</sup><sup>2</sup> norm <sup>∥</sup> � <sup>∥</sup>2, sk is the minimizer of <sup>q</sup> ð Þ<sup>s</sup> in the ball of radius Δk. Generally, different norms define the different shapes of the trust region.

Setting Bk ¼ Gk in (17)–(18), the method becomes a Newton-type trust region method.

The problem by itself is the choice of Δ<sup>k</sup> at each single iteration.

ð Þ<sup>k</sup> If the agreement between the model <sup>q</sup> ð Þ<sup>s</sup> and the objective function <sup>f</sup>ðxk <sup>þ</sup> <sup>s</sup><sup>Þ</sup> is satisfactory enough, the value Δ<sup>k</sup> should be chosen as large as it is possible. The expression Aredk ¼ f xð <sup>k</sup>Þ � fðxk þ skÞ is called the actual reduction, and the expres ð Þ<sup>k</sup> <sup>0</sup> ð Þ<sup>k</sup> sion Predk <sup>¼</sup> <sup>q</sup> <sup>ð</sup> Þ � <sup>q</sup> ð Þ sk is called the predicted reduction; here, we emphasize that

$$r\_k = \frac{Area\_k}{Pred\_k}$$

measures ð Þ<sup>k</sup> the agreement between the model function <sup>q</sup> ð Þ<sup>s</sup> and the objective function f xð <sup>k</sup> þ sÞ.

If rk is close to 0 or it is negative, the trust region is going to shrink; otherwise, we do not change the trust region.

The conclusion is that rk is important in making the choice of new iterate xkþ<sup>1</sup> as well as in updating the trust region radius Δk. Now, we give the trust region algorithm.

Algorithm 1.3.1. (Trust region method). ˜ ° Assumptions: x0, Δ, Δ<sup>0</sup> ∈ 0; Δ , ε ≥0, 0 , η<sup>1</sup> ≤ η<sup>2</sup> , 1, and 0 , γ<sup>1</sup> , 1 , γ2. Let k ¼ 0. Step 1. If ∥ gk∥ ≤ ε, then STOP. Step 2. Approximately solve the problem (17)–(18) for sk.

Unconstrained Optimization Methods: Conjugate Gradient Methods and Trust-Region Methods DOI: http://dx.doi.org/10.5772/intechopen.84374

Step 3. Compute fðxk þ skÞ and rk. Set

$$\mathfrak{x}\_{k+1} = \begin{cases} \mathfrak{x}\_k + s\_k, & \text{if } \begin{array}{l} \eta\_k \ge \eta\_1, \\ \end{array} \\ \mathfrak{x}\_k, & \text{otherwise}. \end{cases}$$

Step 4. If rk , η1, then Δkþ<sup>1</sup> ∈ð0; γ1ΔkÞ. If rk ∈ ½η1; η2Þ, then Δkþ<sup>1</sup> ∈ðγ1Δk; ΔkÞ. ˝ ° ˛˙ If rk <sup>≥</sup> <sup>η</sup><sup>2</sup> and <sup>∥</sup>sk<sup>∥</sup> <sup>¼</sup> <sup>Δ</sup>k, then <sup>Δ</sup>kþ<sup>1</sup> <sup>∈</sup> <sup>Δ</sup>k; min <sup>γ</sup>2Δk; <sup>Δ</sup> . ð Þ Step 5. Generate Bkþ1, update q <sup>k</sup> , set k <sup>¼</sup> <sup>k</sup> <sup>þ</sup> <sup>1</sup>, and go to Step 1.

In Algorithm 1.3.1, Δ is a bound for all Δk. Those iterations with the property rk ≥ η<sup>2</sup> (and so those for which Δkþ<sup>1</sup> ≥ Δk) are called very successful iterations; the iterations with the property rk ≥η<sup>1</sup> (and so those for which xkþ<sup>1</sup> ¼ xk þ sk) are called successful iterations; and the iterations with the property rk , η<sup>1</sup> (and so those for which xkþ<sup>1</sup> ¼ xk) are called unsuccessful iterations. Generally, the iterations from the two first cases are called successful iterations.

Some choices of parameters are η<sup>1</sup> ¼ 0, 01, η<sup>2</sup> ¼ 0, 75, γ<sup>1</sup> ¼ 0, 5, γ<sup>2</sup> ¼ 2, Δ<sup>0</sup> ¼ 1, and <sup>Δ</sup><sup>0</sup> <sup>¼</sup> <sup>1</sup> <sup>∥</sup> <sup>g</sup>0∥. The algorithm is insensitive to change of these parameters. <sup>10</sup>

Next, if rk , 0, 01, then Δ<sup>k</sup>þ<sup>1</sup> can be chosen from ð0:01; 0:5Þ∥sk∥ on the basis of a polynomial interpolation.

In the case of quadratic interpolation, we set

$$
\Delta\_{k+1} = \lambda \| \mathbf{s}\_k \|,
$$

where

$$\lambda = \frac{-\mathbf{g}\_k^T \mathbf{s}\_k}{2\left(f(\mathbf{x}\_k + \mathbf{s}\_k) - f(\mathbf{x}\_k) - \mathbf{g}\_k^T \mathbf{s}\_k\right)}.$$

#### 3.1 Convergence of trust region methods
