**2. Problem description**

**1. Introduction**

62 Nonlinear Systems - Design, Analysis, Estimation and Control

[16–18].

definitely increased.

Optimal control approach provides the solution in solving dynamic real‐world practical problems. Particularly, the linear problems, which are disturbed by the random noise sequence, have been well‐defined with application of the optimal state estimate in designing the optimal feedback control law. In such situation, the optimal state estimator and the optimal controller are designed separately to optimize and control the dynamical systems. This is called the separation principle [1–4]. By virtue of this principle, the research works on stochastic optimal control and applications are growing widely, see for examples, linear systems [5, 6], fleet composition problem [7], optimal parameter selection problems [8], Markov jump process [9], power management [10], multiagent systems [11], portfolio selection model [12], 2‐DOF vehicle model [13], sensorimotor system [14], and advertising model [15].

In fact, the exact solution of stochastic optimal control problems is impossible to be obtained, especially for the problems involving nonlinear system dynamics. To obtain an optimal solution of the discrete‐time nonlinear stochastic optimal control problem, the integrated optimal control and parameter estimation (IOCPE) algorithm has been proposed to solve this kind of the problem iteratively [16–18]. In this algorithm, the linear quadratic Gaussian (LQG) model is applied to a model‐based optimal control problem, where the state estimation procedure is done using the Kalman filtering theory. Based on this model, the adjusted parameters are added into the model so as system optimization and parameter estimation are integrated interactively. On this basis, the differences between the real plant and the model used are measured repeatedly in order to update the optimal solution of the model used. On the other hand, the output that is measured from the real plant is fed back into the model used for the state estimator design. When the convergence is achieved, the iterative solution approaches to the true optimal solution of the original optimal control problem despite model‐ reality differences. This optimal solution is the optimal filtering solution, which is obtained using the IOCPE algorithm. The efficiency of the IOCPE algorithm has been proven in Refs.

However, the output trajectory of the model, which is obtained from the IOCPE algorithm, is less accurate in estimating the exact output measurement of the original optimal control problem. In this chapter, our aim is to improve the IOCPE algorithm using the fixed‐interval smoothing approach, where the output residual shall be reduced within an appropriate tolerance to generate a better output trajectory. In our model, the state dynamics, which is disturbed by Gaussian noise sequences, is estimated by using the Kalman filtering theory, and then it is smoothed in a fixed‐interval estimation. With such state estimation procedure, we modify the estimation procedure so that a smoothed state estimate is predicted backward in time and is used in designing the feedback optimal control law. It is noticed that the output residual of this smoothed state estimate is smaller than the output residual that is obtained by using the Kalman filtering theory, see [17]. The procedure of the solution method discussed in this chapter is almost the same as that was presented in the study of Kek et al. [17], but the accuracy of the optimal solution with the modified fixed‐interval smoothing would be Consider a general class of the dynamical system given below:

$$\mathbf{x}(k+1) = f(\mathbf{x}(k), \boldsymbol{\mu}(k), k) + Ga(k) \tag{1a}$$

$$\mathbf{y}(k) = h(\mathbf{x}(k), k) + \eta(k) \tag{1b}$$

where () ∈ ℜ, = 0, 1, ..., 1, () ∈ ℜ, = 0, 1, ..., , and () ∈ ℜ, = 0, 1, ..., are the control sequence, the state sequence, and the output sequence, respectively. () ∈ ℜ, = 0, 1, ..., <sup>1</sup>, which is the process noise sequence, and () ∈ ℜ, = 0, 1, ..., , which is the measurement noise sequence, are stationary Gaussian white noise sequences with zero mean, and their covariance matrices are given by ∈ ℜ× and ∈ ℜ×, respec‐ tively. Here, both of these covariance matrices are positive definite matrices. In addition, :ℜ × ℜ × ℜ <sup>ℜ</sup> represents the real plant and ℎ:ℜ × ℜ <sup>ℜ</sup> is the real output measure‐ ment, which both are assumed to be continuously differentiable with respect to their respective arguments, whereas <sup>∈</sup> <sup>ℜ</sup>× is a process coefficient matrix.

The initial state is

$$\mathbf{x}(\mathbf{0}) = \mathbf{x}\_0$$

where 0 ∈ ℜ is a random vector with mean and covariance given, respectively, by

$$E[\mathbf{x}(0)] = \overline{\mathbf{x}}\_0 \text{ and } E[(\mathbf{x}\_0 - \overline{\mathbf{x}}\_0)(\mathbf{x}\_0 - \overline{\mathbf{x}}\_0)^\mathsf{T}] = M\_{0\cdots}$$

Here, 0 ∈ ℜ× is a positive definite matrix and [ ⋅ ] is the expectation operator. It is assumed that initial state, process noise, and measurement noise are statistically independent.

Therefore, our aim is to find an admissible control sequence () ∈ ℜ, = 0, 1, ..., <sup>1</sup> subject to the dynamical system given in Eq. (1) such that the scalar cost function

$$J\_0(\boldsymbol{\mu}) = E[\boldsymbol{\rho}(\mathbf{x}(N), N) + \sum\_{k=0}^{N-1} L(\mathbf{x}(k), \boldsymbol{\mu}(k), k)] \tag{2}$$

is minimized, where :ℜ × ℜ <sup>ℜ</sup> is the terminal cost and :ℜ × ℜ × ℜ <sup>ℜ</sup> is the cost under summation. It is assumed that these functions are continuously differentiable with respect to their respective arguments.

This problem is regarded as the discrete‐time nonlinear stochastic optimal control problem and is referred to as Problem (P).

Notice that, in general, the exact solution of Problem (P) is unable to be obtained and estimating the state of the real plant by applying the nonlinear filtering theory is computationally demanding. Due to these reasons, a smoothing model‐based optimal control problem, which is referred to as Problem (M), is proposed by

$$\begin{aligned} \min\_{u(k)} J\_m(u) &= \frac{1}{2} \hat{\mathbf{x}}\_s(N)^\top S(N) \hat{\mathbf{x}}\_s(N) + \boldsymbol{\gamma}(N) \\ &+ \sum\_{k=0}^{N-1} \left( \frac{1}{2} (\hat{\mathbf{x}}\_s(k)^\top Q \hat{\mathbf{x}}\_s(k) + u(k)^\top R u(k)) + \boldsymbol{\gamma}(k) \right) \end{aligned} \tag{3}$$

subject to

$$\begin{aligned} \hat{\mathfrak{x}}\_s(k) &= \hat{\mathfrak{x}}(k) + K\_s(k)(\hat{\mathfrak{x}}\_s(k+1) - \overline{\mathfrak{x}}(k+1)),\\ \hat{\mathfrak{y}}\_s(k) &= C\hat{\mathfrak{x}}\_s(k) \end{aligned}$$

with the following state estimation procedure

$$
\overline{\mathbf{x}}(k+\mathbf{l}) = A\hat{\mathbf{x}}(k) + Bu(k) + a\_{\parallel}(k) \tag{4a}
$$

$$
\hat{\mathbf{x}}(k) = \overline{\mathbf{x}}(k) + K\_f(k)(\mathbf{y}(k) - \overline{\mathbf{y}}(k)) \tag{4b}
$$

$$
\overline{\mathbf{y}}(k) = \mathbf{C}\overline{\mathbf{x}}(k) + a\_2(k) \tag{4c}
$$

where () ∈ ℜ, = 0, 1, ..., and () ∈ ℜ, = 0, 1, ..., are, respectively, the smoothed state sequence and the smoothed output sequence. The matrices involved are given as follow: *A* is an *n* × *n* state transition matrix, *B* is an *n* × *n* control coefficient matrix, is a *p* × *n* output coefficient matrix, *S*(*N*) and *Q* are *n* × *n* positive semidefinite matrices, and *R* is a *m* × *m* positive definite matrix. The extra parameters 1(), = 0, 1, ..., 1, 2(), = 0, 1, ..., , and (), = 0, 1, ..., are introduced as adjustable parameters.

The state estimation procedure, which is given in (4a), (4b), and (4c), is obviously from the Kalman filtering theory, where () ∈ ℜ, = 0, 1, ..., <sup>1</sup> and () ∈ ℜ, = 0, 1, ..., are, respectively, the filtered state sequence and the predicted state sequence, whereas () ∈ ℜ, = 0, 1, ..., is the expected output sequence. The filter and smoother gains, which are () ∈ ℜ× and () ∈ ℜ×, are, respectively, given by

$$K\_{\mathcal{J}}(k) = M\_{\mathcal{X}}(k)C^{\mathrm{T}}M\_{\mathcal{Y}}(k)^{-1} \tag{5a}$$

$$K\_s(k) = P(k)A^T M\_\chi (k+l)^{-1} \tag{5b}$$

whereas the state error covariance matrices are

Here, 0 ∈ ℜ× is a positive definite matrix and [ ⋅ ] is the expectation operator. It is assumed that initial state, process noise, and measurement noise are statistically independent.

Therefore, our aim is to find an admissible control sequence () ∈ ℜ, = 0, 1, ..., <sup>1</sup> subject

1

= +å (2)


0

=

is minimized, where :ℜ × ℜ <sup>ℜ</sup> is the terminal cost and :ℜ × ℜ × ℜ <sup>ℜ</sup> is the cost under summation. It is assumed that these functions are continuously differentiable with respect to

This problem is regarded as the discrete‐time nonlinear stochastic optimal control problem

Notice that, in general, the exact solution of Problem (P) is unable to be obtained and estimating the state of the real plant by applying the nonlinear filtering theory is computationally demanding. Due to these reasons, a smoothing model‐based optimal control problem, which

1 T T

ˆˆ ˆ ( ) ( ) ( )( ( 1) ( 1))

= + +- +

*x k xk K k x k xk*

<sup>1</sup> *x k Ax k Bu k k* ( 1) ( ) ( ) ( ) += + + ˆ

<sup>2</sup> *y k Cx k k* () () () = +

a

*s s*

( ( ( ) ( ) ( ) ( )) ( )) ˆ ˆ

g

*x k Qx k u k Ru k k*

a

ˆ( ) ( ) ( )( ( ) ( )) *<sup>f</sup> xk xk K k yk yk* =+ - (4b)

g

(4a)

(4c)

+ ++ å (3)

*k*

( ) [ ( ( ), ) ( ( ), ( ), )] *N*

*J u E xN N Lxk uk k*

to the dynamical system given in Eq. (1) such that the scalar cost function

j

1 T

min ( ) ( ) ( ) ( ) ( ) ˆ ˆ

*J u x N SNx N N*

= +

1


*ms s u k N*

*k*

=

2 0

ˆ ˆ () ()

*y k Cx k*

=

*s s*

*s ss*

0

64 Nonlinear Systems - Design, Analysis, Estimation and Control

their respective arguments.

subject to

and is referred to as Problem (P).

is referred to as Problem (M), is proposed by

2 ( )

with the following state estimation procedure

$$P(k) = M\_x(k) - M\_x(k)C^T M\_y(k)^{-1} C M\_x(k) \tag{6a}$$

$$M\_{\chi}(k+l) = AP(k)A^{\top} + GQ\_{\mathcal{O}}G^{\top} \tag{6b}$$

$$P\_s(k) = P(k) + K\_s(k)(P\_s(k+l) - M\_\chi(k+l))K\_s(k)^T \tag{6c}$$

and the output error covariance matrix is

$$M\_{\circ}(k) = CM\_{\chi}(k)C^{\top} + R\_{\eta} \tag{6d}$$

with the boundary conditions (0) = 0 and () = () The filtered state error cova‐ riance () ∈ ℜ×, the predicted state error covariance () ∈ ℜ×, the smoothed state error covariance () ∈ ℜ×, and the output error covariance () ∈ ℜ× are positive definite matrices.

Here, the cost function given in Eq. (3) is evaluated from the expectation of the quadratic forms [2], both for random and deterministic terms with trace matrix *tr*( ), which is simplified by

$$\mathbf{a}\mathbf{a}\mathbf{ \cdot} \quad E\left[\mathbf{x}(N)^{\mathrm{T}}\mathbf{S}(N)\mathbf{x}(N)\right] = \mathrm{tr}\left\{\mathbf{S}(N)M\_{\mathbf{x}}(N)\right\} + \overline{\mathbf{x}}(N)^{\mathrm{T}}\mathbf{S}(N)\overline{\mathbf{x}}(N)$$

$$\mathbf{b}\mathbf{b}.\quad E\left[\mathbf{x}(k)^{\mathrm{T}}Q\mathbf{x}(k)\right] = \mathrm{tr}\left(Q\mathbf{M}\_{\mathbf{x}}(k)\right) + \overline{\mathbf{x}}(k)^{\mathrm{T}}Q\overline{\mathbf{x}}(k)$$

$$\mathbf{c} \mathbf{c} \quad \left. E \left[ u(k)^{\mathrm{T}} R u(k) \right] \right| = u(k)^{\mathrm{T}} R u(k) \,\mathrm{s}$$

$$\mathbf{d}.\quad E[\boldsymbol{\chi}(k)] = \boldsymbol{\chi}(k), \\ E[a\_1(k)] = a\_1(k), \text{ and } E[a\_2(k)] = a\_2(k).$$

Follow from this simplification, the trace matrix terms that are depend on the state error covariance matrix are ignored in the model used since they are constant values. In such a way, the cost function of the linear model‐based optimal control model could be evaluated.

Notice that the separation principle [1–4] is applied to solving Problem (M), where the optimal feedback control law and the optimal state estimate are designed separately as discussed in [16–18]. Further from this, the accuracy of the optimal state estimate is increased by smoothing the state estimate in the fixed interval [2, 4]. Then, based on this smoothed state estimate, the smoothing optimal control law is designed. On the other hand, the output measured from the real plant is fed back into the model used, in turn, to improve the state estimation procedure and to update the solution of the model used. Moreover, only solving Problem (M) without adding the adjusted parameters into the model used would not approximate to the optimal solution of Problem (P). Hence, by taking the adjusted parameters into the model used and solving Problem (M) iteratively, the correct optimal solution of the original optimal control problem could be obtained, in spite of model‐reality differences.

## **3. Modified smoothing with model-reality differences**

Now, let us introduce an expanded optimal control problem with smoothing state estimate, which is referred to as Problem (E), given below:

$$\begin{split} \min\_{\boldsymbol{u}(k)} J\_{\varepsilon}(\boldsymbol{u}) &= \frac{1}{2} \hat{\boldsymbol{x}}\_{\boldsymbol{s}}(N)^{\mathrm{T}} S(N) \hat{\boldsymbol{x}}\_{\boldsymbol{s}}(N) + \boldsymbol{\gamma}(N) \\ &+ \sum\_{k=0}^{N-1} \left( \frac{1}{2} (\hat{\boldsymbol{x}}\_{\boldsymbol{s}}(k)^{\mathrm{T}} \boldsymbol{\mathcal{Q}} \hat{\boldsymbol{x}}\_{\boldsymbol{s}}(k) + \boldsymbol{u}(k)^{\mathrm{T}} \boldsymbol{R} \boldsymbol{u}(k) \right) + \boldsymbol{\gamma}(k) ) \\ &+ \frac{1}{2} \boldsymbol{\eta} \parallel \boldsymbol{\nu}(k) - \boldsymbol{u}(k) \parallel^{2} + \frac{1}{2} r\_{2} \parallel \boldsymbol{z}(k) - \hat{\boldsymbol{x}}\_{\boldsymbol{s}}(k) \parallel^{2} \end{split} \tag{7}$$

subject to

$$
\hat{\boldsymbol{\alpha}}\_s(k) = \hat{\boldsymbol{\alpha}}(k) + K\_s(k)(\hat{\boldsymbol{\alpha}}\_s(k+1) - \overline{\boldsymbol{\alpha}}(k+l)),
$$

ˆ ˆ () () *s s y k Cx k* =

Smoothing Solution for Discrete-Time Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences http://dx.doi.org/10.5772/64564 67

$$\frac{1}{2}z(N)^T S(N)z(N) + \mathcal{Y}(N) = \mathcal{O}(z(N), N)$$

**b.** ()T() = () + ()T()

66 Nonlinear Systems - Design, Analysis, Estimation and Control

**d.** () = (), 1() = 1(), and 2() = 2().

problem could be obtained, in spite of model‐reality differences.

**3. Modified smoothing with model-reality differences**

1 T

*es s u k N*

*k*

=

1


2 0

min ( ) ( ) ( ) ( ) ( ) ˆ ˆ

*J u x N SNx N N*

= +

which is referred to as Problem (E), given below:

2 ( )

subject to

Follow from this simplification, the trace matrix terms that are depend on the state error covariance matrix are ignored in the model used since they are constant values. In such a way, the cost function of the linear model‐based optimal control model could be evaluated.

Notice that the separation principle [1–4] is applied to solving Problem (M), where the optimal feedback control law and the optimal state estimate are designed separately as discussed in [16–18]. Further from this, the accuracy of the optimal state estimate is increased by smoothing the state estimate in the fixed interval [2, 4]. Then, based on this smoothed state estimate, the smoothing optimal control law is designed. On the other hand, the output measured from the real plant is fed back into the model used, in turn, to improve the state estimation procedure and to update the solution of the model used. Moreover, only solving Problem (M) without adding the adjusted parameters into the model used would not approximate to the optimal solution of Problem (P). Hence, by taking the adjusted parameters into the model used and solving Problem (M) iteratively, the correct optimal solution of the original optimal control

Now, let us introduce an expanded optimal control problem with smoothing state estimate,

( )

( ( ) ( ) ( ) ( ) ( )) ˆ ˆ

*x k Qx k u k Ru k k*

*s*

å (7)

g

1 1 2 2

*r vk uk r zk x k*

ˆˆ ˆ ( ) ( ) ( )( ( 1) ( 1)) *s ss x k xk K k x k xk* = + +- +

ˆ ˆ () () *s s y k Cx k* =


g

1 T T

+ -+ -

+ ++

1 2 2 2

*s s*

**c.** ()T() = ()T()

$$\frac{1}{2}(\boldsymbol{z}(k)^{\mathrm{T}}\boldsymbol{Q}\boldsymbol{z}(k) + \boldsymbol{\nu}(k)^{\mathrm{T}}\boldsymbol{R}\boldsymbol{\nu}(k)) + \boldsymbol{\gamma}(k) = L(\boldsymbol{z}(k), \boldsymbol{\nu}(k), k)$$

$$Az(k) + Bv(k) + \alpha\_1(k) = f(z(k), v(k), k)$$

<sup>2</sup> *Cz k k h z k k* ( ) ( ) ( ( ), ) + = a

*vk uk* () () =

$$z(k) = \hat{x}\_s(k)$$

where () ∈ ℜ, = 0, 1, ..., <sup>1</sup> and () ∈ ℜ, = 0, 1, ..., are introduced to separate the control and the smoothed state from the respective signals in the parameter estimation problem and ∥⋅∥ denotes the usual Euclidean norm. The terms <sup>1</sup> <sup>2</sup>1 ∥ () () ∥2 and 1 <sup>2</sup>2 ∥ () () ∥2 are introduced such that the convexity is improved and the convergence of the iterative algorithm is enhanced. The main purpose of designing the algorithm in this way is to ensure that satisfying of the constraints () = () and () = () is fulfilled at the end of the iterations. More specifically, applying the state estimate () and the control () for the computation in the parameter estimation and the matching schemes will increase the practical usage of the algorithm. Moreover, implementing the relevant smoothed state () and control () that will be reserved for optimizing the model‐based optimal control problem leads the iterative solution toward to the true optimal solution of the original optimal control problem.

**Figure 1** shows the block diagram of the approach proposed. The methodology of the approach proposed is further discussed in the following sections.

From the block diagram in **Figure 1**, the definition of the principle of model‐reality differences could be given.

Definition 3.1: Principle of model‐reality differences is a unified framework, which integrates system optimization and parameter estimation interactively to define an expanded optimal control problem, aims to give the correct optimal solution of the original optimal control problem by solving the model‐based optimal control problem iteratively.

**Figure 1.** Block diagram of the approach proposed.

#### **3.1. Optimality conditions**

Define the Hamiltonian function for Problem (E) as follows:

$$\begin{split} H\_{\varepsilon}(k) &= \frac{1}{2} (\hat{\mathbf{x}}\_{\mathcal{S}}(k)^{\mathsf{T}} \, \hat{Q} \hat{\mathbf{x}}\_{\mathcal{S}}(k) + u(k)^{\mathsf{T}} R u(k)) + \gamma(k) \\ &+ \frac{1}{2} \eta \left\| \left| \boldsymbol{\nu}(k) - \boldsymbol{\mu}(k) \right| \right\|^{2} + \frac{1}{2} \nu\_{2} \left\| \left| \boldsymbol{z}(k) - \hat{\mathbf{x}}\_{\mathcal{S}}(k) \right| \right\|^{2} \\ &- \lambda(k)^{\mathsf{T}} \boldsymbol{u}(k) - \beta(k)^{\mathsf{T}} \hat{\mathbf{x}}\_{\mathcal{S}}(k) \\ &+ q(k)^{\mathsf{T}} \left( C \hat{\mathbf{x}}\_{\mathcal{S}}(k) - \hat{\boldsymbol{\nu}}\_{\mathcal{S}}(k) \right) \\ &+ p(k+1)^{\mathsf{T}} (\hat{\mathbf{x}}\_{\mathcal{S}}(k) - \hat{\mathbf{x}}(k) - K\_{\mathcal{S}}(k)(\hat{\mathbf{x}}\_{\mathcal{S}}(k+1) - \overline{\boldsymbol{x}}(k+1))). \end{split} \tag{8}$$

Then, the augmented cost function becomes

Smoothing Solution for Discrete-Time Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences http://dx.doi.org/10.5772/64564 69

$$\begin{aligned} J\_e'(k) &= \frac{1}{2} \hat{\mathbf{x}}\_s(N)^\top S(N) \hat{\mathbf{x}}\_s(N) + \gamma(N) + \Gamma^\top (\hat{\mathbf{x}}\_s(N) - z(N)) \\ &+ \xi(N)(\rho(z(N), N) - \frac{1}{2} z(N)^\top S(N) z(N) - \gamma(N)) \\ &+ \sum\_{k=0}^{N-1} H\_e(k) + \lambda(k)^\top \nu(k) + \beta(k)^\top z(k) \\ &+ \xi(k)(L(z(k), \upsilon(k), k) - \frac{1}{2} (z(k)^\top Qz(k) + \upsilon(k)^\top R \upsilon(k)) - \gamma(k)) \\ &+ \mu(k)^\top \left( f(z(k), \upsilon(k), k) - Az(k) - B\upsilon(k) - a\_1(k) \right) \\ &+ \pi(k)^\top \left( h(z(k), k) - Cz(k) - a\_2(k) \right) \end{aligned} \tag{9}$$

where (), (), (), (), (), , (), and () are the proper multipliers to be judged the value later.

The following necessary conditions for optimality are resulted when applying the calculus of variation [2, 4, 17] to the augmented cost function given in Eq. (9):

(a) Stationary condition:

control problem, aims to give the correct optimal solution of the original optimal control

problem by solving the model‐based optimal control problem iteratively.

68 Nonlinear Systems - Design, Analysis, Estimation and Control

**Figure 1.** Block diagram of the approach proposed.

2

l

Then, the augmented cost function becomes

*e ss*

Define the Hamiltonian function for Problem (E) as follows:

T


T

*q k Cx k y k*

1 T T

1 2 2 2 T T

() () () () ˆ ( ) ( ( ) ( )) ˆ ˆ

 b

*k uk k x k*

*s s*

( ) ( ( ) ( ) ( ) ( )) ( ) ˆ ˆ

= ++ + -+ -

*H k x k Qx k u k Ru k k*

1 1 2 2

*r vk uk r zk x k*

*s*


*s ss*

*pk x k xk K k x k xk*

+ + - - +- +

( 1) ( ( ) ( ) ( )( ( 1) ( 1))). ˆˆ ˆ

*s*

(8)

g

**3.1. Optimality conditions**

$$R u(k) + B^T K\_s(k) p(k+l) - \lambda(k) - \eta(\mathbf{v}(k) - u(k)) = 0. \tag{10a}$$

(b) Smoothed costate equation:

$$p(k) = Q\hat{\mathbf{x}}\_3(k) + p(k+1) - \beta(k) - r\_2(\boldsymbol{\varepsilon}(k) - \hat{\mathbf{x}}\_3(k)).\tag{10b}$$

(c) Smoothed state equation:

$$
\hat{\mathbf{x}}\_{\mathcal{S}}(k) = \hat{\mathbf{x}}(k) + K\_{\mathcal{S}}(k)(\hat{\mathbf{x}}\_{\mathcal{S}}(k+1) - \overline{\mathbf{x}}(k+1))\tag{10c}
$$

with the boundary conditions () = () and () = .

(d) Adjustable parameter equations:

$$\varphi(\boldsymbol{z}(N),N) = \frac{1}{2}\boldsymbol{z}(N)^{\mathrm{T}}S(N)\boldsymbol{z}(N) + \boldsymbol{\gamma}(N) \tag{11a}$$

$$L(z(k), \mathbf{v}(k), k) = \frac{1}{2} (z(k)^T \underline{Q} z(k) + \mathbf{v}(k)^T R \mathbf{v}(k)) + \boldsymbol{\gamma}(k) \tag{11b}$$

$$\int f(z(k), \mathbf{v}(k), k) = Az(k) + B\mathbf{v}(k) + \alpha\_1(k) \tag{11c}$$

$$h(z(k),k) = \mathcal{C}z(k) + a\_{\mathcal{D}}(k). \tag{11d}$$

(e) Multiplier equations:

$$
\Gamma - \nabla\_{z(k)} \varphi + S(N)z(N) = 0 \tag{12a}
$$

$$
\lambda(k) + (\nabla\_{\nu(k)} L - R\nu(k)) + \left(\frac{\partial f}{\partial \nu(k)} - B\right)^{\mathrm{T}} \hat{p}(k+1) = 0\tag{12b}
$$

$$
\partial\_t \beta(k) + (\nabla\_{z(k)} L - Q\overline{z}(k)) + \left(\frac{\partial f}{\partial \overline{z}(k)} - A\right)^{\overline{\Gamma}} \widehat{p}(k+1) = 0\tag{12c}
$$

with () = 1, () = ( + 1) and () = ()=0.

(f) Separable variables:

$$
\nu(k) = \mu(k), \newline z(k) = \hat{\boldsymbol{x}}\_{\sf s}(k), \hat{\boldsymbol{p}}(k) = \boldsymbol{p}(k). \tag{13}
$$

In view of these necessary optimality conditions, the conditions (10a), (10b), and (10c) define the modified model‐based optimal control problem, the conditions (11a), (11b), (11c), and (11d) define the parameter estimation problem and the conditions (12a), (12b), and (12c) are used to compute the multipliers. They are further discussed as follows.

#### **3.2. Modified model-based optimal control problem**

The modified model‐based optimal control problem, which is referred to as Problem (MM), is given below:

$$\begin{split} \min\_{u(k)} J\_{mm}(u) &= \frac{1}{2} \hat{\mathbf{x}}\_{s}(N)^{\mathrm{T}} S(N) \hat{\mathbf{x}}\_{s}(N) + \boldsymbol{\gamma}(N) + \boldsymbol{\Gamma}^{\mathrm{T}} \hat{\mathbf{x}}\_{s}(N) \\ &+ \sum\_{k=0}^{N-1} \frac{1}{2} (\hat{\mathbf{x}}\_{s}(k)^{\mathrm{T}} Q \hat{\mathbf{x}}\_{s}(k) + u(k)^{\mathrm{T}} R u(k)) + \boldsymbol{\gamma}(k) \\ &+ \frac{1}{2} \boldsymbol{\eta}\_{1} \left\| \left| \boldsymbol{\nu}(k) - \boldsymbol{u}(k) \right| \right\|^{2} + \frac{1}{2} \boldsymbol{\eta}\_{2} \left\| \left| \boldsymbol{z}(k) - \hat{\mathbf{x}}\_{s}(k) \right| \right\|^{2} \\ &- \boldsymbol{\hat{\lambda}}(k)^{\mathrm{T}} u(k) - \boldsymbol{\beta}(k)^{\mathrm{T}} \hat{\mathbf{x}}\_{s}(k) \end{split} \tag{14}$$

subject to

Smoothing Solution for Discrete-Time Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences http://dx.doi.org/10.5772/64564 71

$$
\hat{\boldsymbol{\chi}}\_{\boldsymbol{s}}(k) = \hat{\boldsymbol{\chi}}(k) + K\_{\boldsymbol{s}}(k)(\hat{\boldsymbol{\chi}}\_{\boldsymbol{s}}(k+1) - \overline{\boldsymbol{\chi}}(k+1)),
$$

$$
\hat{\boldsymbol{\chi}}\_{\boldsymbol{s}}(k) = C\hat{\boldsymbol{\chi}}\_{\boldsymbol{s}}(k).
$$

From the outcome of Problem (E) and Problem (MM), the theorem of the smoothed optimal control law which is applied to solve Problem (MM) is described.

Theorem 3.1: Suppose the expanded optimal control law for Problem (E) exists. Then, this control law is the smoothed feedback control law for Problem (MM) given by

$$
\mu(k) = -K(k)\hat{\mathbf{x}}\_s(k) + \boldsymbol{u}\_{\mathcal{J}}(k) \tag{15}
$$

where

<sup>2</sup> *h z k k Cz k k* ( ( ), ) ( ) ( ). = +

( ) ( )( ) 0 *z k* G-Ñ + = j

( ) ( ) ( ( )) ˆ( 1) 0 ( ) *v k <sup>f</sup> k L Rv k B pk v k*

( ) ( ) ( ( )) ˆ( 1) 0 ( ) *z k <sup>f</sup> k L Qz k A pk z k*

In view of these necessary optimality conditions, the conditions (10a), (10b), and (10c) define the modified model‐based optimal control problem, the conditions (11a), (11b), (11c), and (11d) define the parameter estimation problem and the conditions (12a), (12b), and (12c) are used to

The modified model‐based optimal control problem, which is referred to as Problem (MM), is

1 T T

min ( ) ( ) ( ) ( ) ( ) ( ) ˆˆ ˆ

*J u x N SNx N N x N*

= + +G

*mm s <sup>s</sup> <sup>s</sup> u k*

1 T T

+ -+ -

+ ++

1 2 2 2 T T

*s s*

() () () () ˆ

 b

*k uk k x k*

1 1 2 2

*r vk uk r zk x k*

*s*


*s*

å (14)

g

( ( ) ( ) ( ) ( )) ( ) ˆ ˆ

g

*x k Qx k u k Ru k k*

æ ö ¶ +Ñ - + - + = ç ÷ ¶è ø

æ ö ¶ +Ñ - + - + = ç ÷ ¶è ø

(e) Multiplier equations:

(f) Separable variables:

given below:

subject to

l

70 Nonlinear Systems - Design, Analysis, Estimation and Control

b

with () = 1, () = ( + 1) and () = ()=0.

compute the multipliers. They are further discussed as follows.

1


*N*

*k*

=

l

2 0


**3.2. Modified model-based optimal control problem**

2 ( )

a

T

T

( ) ( ), ( ) ( ), ( ) ( ). ˆ ˆ *<sup>s</sup> vk uk zk x k pk pk* == = (13)

(11d)

(12b)

(12c)

*SNzN* (12a)

$$\begin{split} u\_{\boldsymbol{f}\boldsymbol{f}}(\boldsymbol{k}) &= -(\boldsymbol{R}\_{\boldsymbol{a}} + \boldsymbol{B}^{\top}\boldsymbol{K}\_{\boldsymbol{s}}(\boldsymbol{k})\boldsymbol{S}(\boldsymbol{k}+\boldsymbol{l})\boldsymbol{B})^{-1} \big(\boldsymbol{B}^{\top}\boldsymbol{K}\_{\boldsymbol{s}}(\boldsymbol{k})\boldsymbol{s}(\boldsymbol{k}+\boldsymbol{l}) - \boldsymbol{\lambda}\_{\boldsymbol{a}}(\boldsymbol{k}) \\ &+ \boldsymbol{B}^{\top}\boldsymbol{K}\_{\boldsymbol{s}}(\boldsymbol{k})\boldsymbol{S}(\boldsymbol{k}+\boldsymbol{l})(\boldsymbol{(}\boldsymbol{A}-\boldsymbol{K}\_{\boldsymbol{s}}(\boldsymbol{k})^{-1})\hat{\boldsymbol{x}}(\boldsymbol{k}) + \boldsymbol{a}\_{\boldsymbol{l}}(\boldsymbol{k})) \big) \end{split} \tag{16a}$$

$$K(k) = \left(R\_{\mathcal{Q}} + B^{\operatorname{T}} K\_{\mathcal{S}}(k) S(k+1) B\right)^{-1} B^{\operatorname{T}} K\_{\mathcal{S}}(k) S(k+1) K\_{\mathcal{S}}(k)^{-1} \tag{16b}$$

$$S(k) = \underline{Q}\_a + S(k+1)(K\_s(k)^{-1} - BK(k))\tag{16c}$$

$$\mathbf{s}(k) = \mathbf{S}(k+\mathbf{l})((A - K\_s(k)^{-1})\hat{\mathbf{x}}(k) + Bu\_{\underline{\mathcal{H}}}(k) + a\_{\mathbf{l}}(k)) + \mathbf{s}(k+\mathbf{l}) - \beta\_a(k) \tag{16d}$$

with the boundary conditions () given and ()=0, and

$$\begin{aligned} R\_a &= R + r\_1 I\_m \text{; } Q\_a = Q + r\_2 I\_n \text{;}\\ \lambda\_a(k) &= \lambda(k) + r\_1 \upsilon(k) \text{; } \beta\_a(k) = \beta(k) + r\_2 \mathbf{z}(k) \text{.} \end{aligned}$$

Proof: From the necessary optimality condition (10a), we have

$$R\_a \mu(k) = -B^T K\_s(k) p(k+1) + \mathcal{I}\_a(k). \tag{17}$$

Applying sweep method [2, 4],

$$p(k) = \mathcal{S}(k)\hat{\boldsymbol{x}}\_{\mathcal{S}}(k) + \mathbf{s}(k) \tag{18}$$

we substitute Eq. (18) for = + 1 into Eq. (17), which yields

$$R\_d\boldsymbol{u}(k) = -B^\mathrm{T}K\_\mathrm{s}(k)S(k+\mathrm{l})\mathbf{x}\_\mathrm{s}(k+\mathrm{l}) - B^\mathrm{T}K\_\mathrm{s}(k)\mathbf{s}(k+\mathrm{l}) + \boldsymbol{\lambda}\_\mathrm{u}(k). \tag{19}$$

Rewrite the smoothed state equation from Eq. (10c),

$$
\hat{\mathbf{x}}\_s(k+l) = \overline{\mathbf{x}}(k+l) + \left(K\_s(k)\right)^{-1} (\hat{\mathbf{x}}\_s(k) - \hat{\mathbf{x}}(k)).\tag{20}
$$

Then, substitute Eq. (20) into Eq. (19). After some algebraic manipulations, the smoothed control law (15) is obtained, where Eqs. (16a) and (16b) are satisfied.

From the smoothed costate equation (10b), we substitute Eq. (18) for = + 1 to give

$$p(k) = Q\_a \hat{\mathbf{x}}\_s(k) + \mathbf{S}(k+1)\hat{\mathbf{x}}\_s(k+1) + \mathbf{s}(k+1) - \beta\_a(k) \tag{21}$$

Consider Eq. (20) in Eq. (21), we obtain

$$p(k) = Q\_a \hat{\mathbf{x}}\_s(k) + S(k+1)(\overline{\mathbf{x}}(k+1) + (K\_g(k))^{-1}(\hat{\mathbf{x}}\_s(k) - \hat{\mathbf{x}}(k)) + \mathbf{s}(k+1) - \beta\_a(k). \tag{22}$$

By doing some algebraic manipulations, it is found that Eqs. (16c) and (16d) are satisfied after comparing to Eq. (18). This completes the proof.

From Eqs. (4a), (10c), and (15), the smoothed state equation becomes

$$\begin{split} \hat{\mathfrak{x}}\_{\mathcal{S}}(k) &= (I\_n - K\_{\mathcal{S}}(k)BK(k))^{-1} ((I\_n - K\_{\mathcal{S}}(k)A) \hat{\mathfrak{x}}(k) \\ &+ K\_{\mathcal{S}}(k) (\hat{\mathfrak{x}}\_{\mathcal{S}}(k+1) - Bu\_{\mathcal{H}}(k) - \alpha\_{\mathcal{I}}(k))) \end{split} \tag{23}$$

and the smoothed output is measured from

$$
\hat{\mathbf{y}}\_s(k) = C\hat{\mathbf{x}}\_s(k) \tag{24}
$$

with the boundary condition () = ().

#### **3.3. Parameter estimation**

() () () () ˆ*<sup>s</sup> pk Skx k sk* = + (18)

<sup>1</sup> <sup>ˆ</sup> ( 1) ( 1) ( ( )) ( ( ) ( )). ˆ ˆ *<sup>s</sup> s s x k xk K k x k xk* - += ++ - (20)

b

l

(19)

(21)

b

we substitute Eq. (18) for = + 1 into Eq. (17), which yields

control law (15) is obtained, where Eqs. (16a) and (16b) are satisfied.

From Eqs. (4a), (10c), and (15), the smoothed state equation becomes

Rewrite the smoothed state equation from Eq. (10c),

72 Nonlinear Systems - Design, Analysis, Estimation and Control

Consider Eq. (20) in Eq. (21), we obtain

comparing to Eq. (18). This completes the proof.

and the smoothed output is measured from

with the boundary condition

T T ( ) ( ) ( 1) ( 1) ( ) ( 1) ( ). *Ruk B K kSk x k B K ksk k a ss s a* =- + + - + +

Then, substitute Eq. (20) into Eq. (19). After some algebraic manipulations, the smoothed

From the smoothed costate equation (10b), we substitute Eq. (18) for = + 1 to give

( ) ( ) ( 1) ( 1) ( 1) ( ) ˆ ˆ *a s <sup>s</sup> <sup>a</sup> pk Qx k Sk x k sk k* = + + ++ +-

<sup>1</sup> ( ) ( ) ( 1)( ( 1) ( ( )) ( ( ) ( )) ( 1) ( ). <sup>ˆ</sup> ˆ ˆ *a s s s <sup>a</sup> pk Qx k Sk xk K k x k xk sk k*

By doing some algebraic manipulations, it is found that Eqs. (16c) and (16d) are satisfied after

1

( )( ( 1) ( ) ( ))) ˆ

ˆ ˆ ( ) ( ( ) ( )) (( ( ) ) ( )

*x k I K k BK k I K k A x k K k x k Bu k k*

*s ns n s s s ff*

() = ().



1

+ +- - (23)

ˆ ˆ () () *s s y k Cx k* = (24)

a

After solving Problem (MM), the defined separable variables given in Eq. (13) are used for the further computations. Particularly, in the parameter estimation problem, the differences between the real plant and the model used are taken into account in which the matching schemes are established. In view of this, the adjusted parameters, which are resulted from parameter estimation problem defined by Eq. (11), are calculated from

$$\alpha\_1(k) = f\left(z(k), \mathbf{v}(k), k\right) - Az(k) - B\mathbf{v}(k) \tag{25a}$$

$$a\_2(k) = h(z(k), k) - Cz(k) \tag{25b}$$

$$\varphi(N) = \varphi(z(N), N) - \frac{1}{2}z(N)^T S(N)z(N) \tag{25c}$$

$$\boldsymbol{\gamma}(k) = \boldsymbol{L}(\boldsymbol{z}(k), \boldsymbol{\nu}(k), \boldsymbol{k}) - \frac{1}{2}(\boldsymbol{z}(k)^{\mathsf{T}} \boldsymbol{Q} \boldsymbol{z}(k) + \boldsymbol{\nu}(k)^{\mathsf{T}} \boldsymbol{R} \boldsymbol{\nu}(k)) \tag{25d}$$

#### **3.4. Computation of multipliers**

The multipliers, which are related to the Jacobian matrix of the functions *f* and *L* with respect to () and (), are computed from

$$
\Gamma = \nabla\_{z(k)} \varphi - S(N)z(N) \tag{26a}
$$

$$\mathcal{A}(k) = -(\nabla\_{\nu(k)} L - R\nu(k)) - \left(\frac{\partial f}{\partial \nu(k)} - B\right)^{\mathrm{T}} \hat{p}(k+l) \tag{26b}$$

$$\mathcal{J}\beta(k) = -(\nabla\_{\boldsymbol{\varepsilon}(k)}L - \mathcal{Q}\boldsymbol{\varepsilon}(k)) - \left(\frac{\partial f}{\partial \boldsymbol{\varepsilon}(k)} - A\right)^{\mathrm{T}}\boldsymbol{\hat{p}}(k+\mathrm{l})\tag{26c}$$

#### **3.5. Iterative algorithm**

From the previous sections, the derivation of equations and the formulation of the resulting algorithm are clearly discussed. Following from these discussions, a summary on this iterative algorithm is delivered as follows:

Data , , (), , , , , , , 0, 0, , 1, 2, , , , , , ℎ, . Note that *A* and *B* may be chosen through the linearization of *f*, and *C* is obtained from the linearization of *h*.

Step 0: Compute a nominal solution. Assume 1() = 0, = 0, 1, ..., 1, 2() = 0, = 0, 1, ..., , and 1 = 2 = 0. Calculate () and () from Eqs. (5a) and (5b), (), (), () and () from Eqs. (6a), (6b), (6c), and (6d) for the state estimation, and solve Problem (M) defined by Eq. (3) to obtain ()0, = 0, 1, ..., 1, and ()0, ()0, ()0, = 0, 1, ..., . Then, with 1() = 0, = 0, 1, ..., 1, 2() = 0, = 0, 1, ..., , and 1, 2 from data, calculate () and (), respectively, from Eqs. (16b) and (16c). Set = 0, ()<sup>0</sup> = ()0, ()0 = ()0 and ()0 = ()<sup>0</sup> .

Step 1: Calculate the adjustable parameters 1() , = 0 , 1, ..., 1, 2() , = 0, 1,..., , () , = 0, 1, ..., , from Eq. (25). This is called the *parameter estimation* step.

Step 2: Compute the modifiers , () and () , = 0, 1, ..., 1, from Eq. (26). This requires the partial derivatives of , ℎ and *L* with respect to () and () .

Step 3: With the determined 1() , 2() , () , , () , () , () , and () , solve Problem (MM) defined by Eq. (14) using the result in Theorem 3.1. This is called the *system optimization* step.


Step 4: Update the optimal smoothing solution of Problem (P) and test the convergence of the algorithm. For regulating convergence, a mechanism, which is a simple relaxation method, shall be provided and given by:

$$z(k)^{l+1} = z(k)^l + k\_z(\hat{\mathbf{x}}\_s(k)^l - z(k)^l) \tag{27a}$$

$$\nu(k)^{l+1} = \nu(k)^l + k\_\nu(\mu(k)^l - \nu(k)^l) \tag{27b}$$

$$\left(\hat{p}(k)\right)^{l+1} = \hat{p}(k)^{l} + k\_{p}\left(p(k)^{l} - \hat{p}(k)^{l}\right) \tag{27c}$$

where , , , range in the interval of (0, 1], are scalar gains. If () + 1 = () , = 0, 1, ..., , and () + 1 = () , = 0, 1, ..., 1, within a given tolerance, stop; else repeat from Step 1 by setting = + 1.

Remarks:

Step 0: Compute a nominal solution. Assume 1() = 0, = 0, 1, ..., 1, 2() = 0, = 0, 1, ..., ,

from Eqs. (6a), (6b), (6c), and (6d) for the state estimation, and solve Problem (M) defined by

1() = 0, = 0, 1, ..., 1, 2() = 0, = 0, 1, ..., , and 1, 2 from data, calculate () and

, = 0, 1, ..., , from Eq. (25). This is called the *parameter estimation* step.

and ()

, () , , ()

, = 0, 1, ..., 1 by solving Eq. (16a), either backward or forward.

(MM) defined by Eq. (14) using the result in Theorem 3.1. This is called the *system optimization*

() from Eqs. (5a) and (5b), (), (),

and ()

<sup>1</sup> ( ) ( ) ( ( ) ( )) <sup>ˆ</sup> *i i ii z s zk zk k x k zk* <sup>+</sup> =+ - (27a)

<sup>1</sup> ( ) ( ) ( ( ) ( )) *i i ii <sup>v</sup> vk vk k uk vk* <sup>+</sup> =+ - (27b)

*<sup>p</sup> pk pk k pk pk* <sup>+</sup> =+ - (27c)

, ()

, = 0, 1, ..., by solving Eq. (16d) backward, and obtain

, = 0, 1, ..., 1 using Eq. (15).

, = 0, 1, ..., , using Eq. (23).

, = 0, 1, ..., , using Eq. (18).

, = 0, 1, ..., , using Eq. (24).

Step 4: Update the optimal smoothing solution of Problem (P) and test the convergence of the algorithm. For regulating convergence, a mechanism, which is a simple relaxation method,

<sup>1</sup> ˆˆ ˆ ( ) ( ) ( ( ) ( )) *i i ii*

.

, ()

()0,

() and

()0, ()0 = ()0 and

()0, ()0, = 0, 1, ..., . Then, with

, = 0 , 1, ..., 1, 2()

, = 0, 1, ..., 1, from Eq. (26). This requires

, and ()

()

, = 0, 1,...,

, solve Problem

and 1 = 2 = 0. Calculate () and

74 Nonlinear Systems - Design, Analysis, Estimation and Control

Step 2: Compute the modifiers

Step 3: With the determined 1()

**b.** Calculate the new control ()

**d.** Calculate the new costate ()

**e.** Calculate the new output

shall be provided and given by:

**c.** Calculate the new state

()0 = ()0 .

, ()

step.

**a.** Obtain ()

()

Eq. (3) to obtain ()0, = 0, 1, ..., 1, and

Step 1: Calculate the adjustable parameters 1()

the partial derivatives of , ℎ and *L* with respect to ()

(), respectively, from Eqs. (16b) and (16c). Set = 0, ()0 =

, ()

()

()

, 2()


$$||\mathbf{v}^{\ell+1} - \mathbf{v}^{\ell}||\_2 = \left(\frac{1}{N-1} \sum\_{k=0}^{N-1} ||\mathbf{v}(k)^{\ell+1} - \mathbf{v}(k)^{\ell}||\right)^{1/2} \tag{28a}$$

$$\|\|z^{\ell+1} - z^{\ell}\|\|\_{2} = \left(\frac{1}{N} \sum\_{k=0}^{N} \|\|z(k)^{\ell+1} - z(k)^{\ell}\|\right)^{1/2} \tag{28b}$$

**g.** The relaxation scalars (*kv, kz, kp*) are the step-sizes in regulating the convergence mechanism. These scalars could be normally chosen as a certain value in the range of (0, 1], but this choice may not provide the optimal number of iterations. Hence, it is important to note that the optimal choice of these scalars *kv, kz, kp* ∈ (0, 1] would be problem dependent. As a rule of this case, the algorithm (from Step 1 to Step 4) is required to run few times. Initially, for first run of the algorithm (from Step 1 to Step 4), these scalars are set at *kv* = *kz* = *kp* = 1, and then, with different values chosen from 0.1 to 0.9, the algorithm is run again. The value with the optimal number of iterations can be determined after that. Applying the parameters *r*1 and *r*2 is to enhance the convexity such that the convergence of the algorithm can be improved.
