2.9 Broyden-Fletcher-Goldfarb-Shanno (BFGS) update

BFGS update is given by the formula

$$B\_{k+1}^{BFGS} = B\_k + \frac{\mathcal{Y}\_k \mathcal{Y}\_k^T}{\mathcal{Y}\_k^T \mathcal{s}\_k} - \frac{B\_k \mathbf{s}\_k \mathbf{s}\_k^T B\_k}{\mathbf{s}\_k^T B\_k \mathbf{s}\_k}. \tag{52}$$

The BFGS update is also said to be a complement to DFP update.

In [62], an adaptive scaled BFGS method for unconstrained optimization is presented. In this paper, the author emphasizes that the BFGS method is one of the most efficient quasi-Newton methods for solving small-size and medium-size unconstrained optimization problems. The third term in the standard BFGS update formula is scaled in order to reduce the large eigenvalues of the approximation to the Hessian of the minimizing function. In fact, in [62], the general scaling BFGS updating formula is considered:

$$B\_{k+1} = B\_k - \frac{B\_k s\_k s\_k^T B\_k}{s\_k^T B\_k s\_k} + \gamma\_k \frac{\mathcal{Y}\_k \mathcal{Y}\_k^T}{\mathcal{Y}\_k^T s\_k},\tag{53}$$

where γ<sup>k</sup> is a positive parameter. Obviously, using γ<sup>k</sup> ¼ 1 for all k ¼ 0, 1, …, we get the standard BFGS formula. By the way, there exist several procedures created to select the scaling parameter γk, for example, see [62–69]. The approach for determining the scaling parameters of the terms of the BFGS update in [62] is to minimize the Byrd and Nocedal measure function.

Namely, in [70], the next function was introduced:

$$
\varphi(A) = tr(A) - \ln\left(\det(A)\right),
\tag{54}
$$

which is defined on positive definite matrices.

This function is a measure of matrices involving all the eigenvalues of A, not only the smallest one and the largest one, as it is traditionally used in the analysis of the quasi-Newton method based on the condition number of matrices.

Observe that function φ works simultaneously with the trace and the determinant, thus simplifying the analysis of the quasi-Newton methods. Fletcher [71] proves that this function is strictly convex on the set of symmetric and positive definite matrices, and it is minimized by A ¼ I. Besides, this function becomes unbounded when A becomes singular or infinite, and therefore it works as a barrier function that keeps A positive definite. It is worth saying that the BFGS update tends to generate updates with large eigenvalues.

Further, in [62], a double-parameter scaling BFGS update is considered, in which the first two terms on the right-hand side of the BFGS update (52) are scaled with a positive parameter, while the third one is scaled with another positive parameter:

$$B\_{k+1} = \delta\_k \left[ B\_k - \frac{B\_k s\_k s\_k^T B\_k}{s\_k^T B\_k s\_k} \right] + \gamma\_k \frac{\mathcal{Y}\_k \mathcal{Y}\_k^T}{\mathcal{Y}\_k^T s\_k},\tag{55}$$

where δ<sup>k</sup> and γ<sup>k</sup> are the two positive parameters that have to be determined. In [62], the next proposition is proved.

Proposition 1.2.1. If the step size tk is determined by the standard Wolfe line search (12) and (13), Bk is positive definite and γ<sup>k</sup> . 0, and then Bkþ1, given by (55), is also positive definite.

From (55), it can be seen that φðBkþ1Þ depends on the scaling parameters δ<sup>k</sup> and γk. In [62], these scaling parameters are determined as solution of the minimizing problem:

$$\min\_{\delta\_k > 0, \gamma\_k > 0} \varphi(B\_{k+1}).\tag{56}$$

Further, the next values of the scaling parameters δ<sup>k</sup> and γ<sup>k</sup> are reached:

$$\delta\_k = \frac{n - 1}{tr(B\_k) - \frac{\|B\_k s\_k\|^2}{s\_k^l B\_k s\_k}} \tag{57}$$

$$\gamma\_k = \frac{\mathcal{Y}\_k^T s\_k}{\left\| \mathcal{Y}\_k \right\|^2}. \tag{58}$$

Consider the relation

$$
\varkappa\_{k+1} = \varkappa\_k + t\_k d\_k,\tag{59}
$$

where dk is the BFGS search direction obtained as solution of the linear algebraic system

$$B\_k d\_k = -\mathfrak{g}\_k.$$

where the matrix Bk is the BFGS approximation to the Hessian <sup>∇</sup><sup>2</sup>f xð Þ<sup>k</sup> , being updated by the classical formula (52).

The next theorems are also given in [62].

Theorem 1.2.11. If the step size in (59) is determined by the Wolfe search conditions (12)–(13), then the scaling parameters given by (57) and (58) are the unique global solutions of the problem (56).

Theorem 1.2.12. Let δ<sup>k</sup> be computed by (57). Then, for any k ¼ 0, 1, …, δ<sup>k</sup> is positive and close to 1.

Next, in [72], using chain rule, a modified secant equation is given, to get a more accurate approximation of the second curvature of the objective function. Then, based on this modified secant equation, a new BFGS method is presented. The proposed method makes use of both gradient and function values, and it utilizes information from two most recent steps, while the usual secant relation uses only the latest step information. Under appropriate conditions, it is shown that the proposed method is globally convergent without convexity assumption on the objective function.

Some interesting applications of Newton, modified Newton, inexact Newton, and quasi-Newton methods can be found, for example, in [73–83], etc.

A very interesting paper is [84].

An interesting application of BFGS method can be found in [85].
