**1. Introduction**

Optimal control approach provides the solution in solving dynamic real‐world practical problems. Particularly, the linear problems, which are disturbed by the random noise sequence, have been well‐defined with application of the optimal state estimate in designing the optimal feedback control law. In such situation, the optimal state estimator and the optimal controller are designed separately to optimize and control the dynamical systems. This is called the separation principle [1–4]. By virtue of this principle, the research works on stochastic optimal control and applications are growing widely, see for examples, linear systems [5, 6], fleet composition problem [7], optimal parameter selection problems [8], Markov jump process [9], power management [10], multiagent systems [11], portfolio selection model [12], 2‐DOF vehicle model [13], sensorimotor system [14], and advertising model [15].

In fact, the exact solution of stochastic optimal control problems is impossible to be obtained, especially for the problems involving nonlinear system dynamics. To obtain an optimal solution of the discrete‐time nonlinear stochastic optimal control problem, the integrated optimal control and parameter estimation (IOCPE) algorithm has been proposed to solve this kind of the problem iteratively [16–18]. In this algorithm, the linear quadratic Gaussian (LQG) model is applied to a model‐based optimal control problem, where the state estimation procedure is done using the Kalman filtering theory. Based on this model, the adjusted parameters are added into the model so as system optimization and parameter estimation are integrated interactively. On this basis, the differences between the real plant and the model used are measured repeatedly in order to update the optimal solution of the model used. On the other hand, the output that is measured from the real plant is fed back into the model used for the state estimator design. When the convergence is achieved, the iterative solution approaches to the true optimal solution of the original optimal control problem despite model‐ reality differences. This optimal solution is the optimal filtering solution, which is obtained using the IOCPE algorithm. The efficiency of the IOCPE algorithm has been proven in Refs. [16–18].

However, the output trajectory of the model, which is obtained from the IOCPE algorithm, is less accurate in estimating the exact output measurement of the original optimal control problem. In this chapter, our aim is to improve the IOCPE algorithm using the fixed‐interval smoothing approach, where the output residual shall be reduced within an appropriate tolerance to generate a better output trajectory. In our model, the state dynamics, which is disturbed by Gaussian noise sequences, is estimated by using the Kalman filtering theory, and then it is smoothed in a fixed‐interval estimation. With such state estimation procedure, we modify the estimation procedure so that a smoothed state estimate is predicted backward in time and is used in designing the feedback optimal control law. It is noticed that the output residual of this smoothed state estimate is smaller than the output residual that is obtained by using the Kalman filtering theory, see [17]. The procedure of the solution method discussed in this chapter is almost the same as that was presented in the study of Kek et al. [17], but the accuracy of the optimal solution with the modified fixed‐interval smoothing would be definitely increased.

The structure of the chapter is outlined as follows. In Section 2, the description of a general discrete‐time nonlinear stochastic optimal control problem and its simplified model‐based optimal control problem is made. In Section 3, an expanded optimal control model is intro‐ duced, where system optimization and parameter estimation are integrated mutually. The feedback control law, which is incorporated with the Kalman filtering theory and the fixed‐ interval smoothing, is designed. Then, the iterative algorithm based on principle of model‐ reality differences is derived so that discrete‐time nonlinear stochastic optimal control problem could be solved. In Section 4, a convergence result for the algorithm proposed is provided. In Section 5, an example of optimal control of a continuous stirred‐tank reactor problem is illustrated. Finally, some concluding remarks are made.
