**5. Optimal costs model**

where *u <sup>n</sup>* is the action taken by the decision maker at time *t* =0, 1, ⋯, *n*, take values in some assessable space *U* of allowable control. The decision rule is described by considering a class

*d dd d n T* = ( 01 1 ,,, , L - ) (50)

to denote the total expected reward obtained by

*<sup>π</sup>* for *t* < *T* such that

å (51)

å (52)

of randomized history-dependent strategies consisting of a sequence of functions

**•** having known *x* 0, the response official (the controller) selects a control *u*0∈*U* ;

**•** a state *x*1 is attained according to a known probability measure *P*(*x*<sup>1</sup> | *x*0, *u*0); and

*<sup>d</sup>* →*R* <sup>1</sup>

*<sup>d</sup>* , the decision rule follows *μt*


=

( ) <sup>0</sup> 0

=

introduce cost concepts into the control paradigm as discussed in the next section.

¥

q

*t V x E rt x x*

( ) ( ) ( ) 1 , *<sup>t</sup> T*

In particular, if the SMD processes (i) to (iii) are stationary, then, for a given rule *π* and an initial state *x*, the future rewards can be estimated. Let *V <sup>π</sup>*(*x*) be the value function; then, the

*t t h k kk k T k t h E r xu r x*

*<sup>π</sup>* :*Ht*

 p

The basic problem therefore is to find a policy *π* =(*d*0, *d*1) consisting of *d* 0 and *d* 1 that will

minimize the objective functional *J*(*x*0) =*∫ f x*1, *d*1(*x*1) *P*(*x*<sup>1</sup> | *x*0, *d*0(*x*0)), which is given as

using Eq. (50) at decision epochs *t*, *t* + 1, ⋯, *T* −1. With an assumption that the history at

é ù = + ê ú ë û


é ù = = ê ú ë û

However, the entire cast of players involved in oil spill control (the contingency planners, response officials, government agencies, pipeline operators, tanker owners, etc.) shares keen interest in being able to anticipate oil spill response costs for planning purposes according to Arapostathis et al. [9]. This means that the type of decision and/or action chosen at a given point in time is a function of the clean-up cost. In other words, the clean-up/response cost is a key indicator for the optimal control. Thus, to set a pace for rapid response, it is important to

 p

and also by considering the following sequence of events:

**•** knowing *x* 1, the response official selects a control *u*1∈*U* .

*d*

p

p

m

expected discounted return could be measured as

**•** an initial state *x* 0 is obtained;

144 Robust Control - Theoretical Models and Case Studies

*P*(*ut* | *xt*) =*π*(*ut* | *xt*). Hence, we set *μt*

*<sup>d</sup>* <sup>∈</sup>*Ht*

decision epoch *t* is *ht*

Considered the following synthesis: the system starts in state *x* 0 and the response team takes a permitted action*ut*(*x*0), resulting in an output (reward) *r <sup>t</sup>* . This decision determines the cost to incur. Now, defining a cost function that assigned a cost to each sequence of controls as

$$C\left(\mathbf{x}\_{0}, u\_{0:T-1}\right) = \sum\_{t=0}^{T-1} \beta\left(t, \mathbf{x}\_{t}, u\_{t}\right) + o\left(\mathbf{x}\_{T}\right) \tag{53}$$

where *β*(*t*, *x*, *u*) is the cost associated with taking action *u* at time *t* in state *x* and *ω*(*xT* ) is the cost related to actions taken up to time *T*; the optimal control problem is to find the sequence *u*0:*<sup>T</sup>* <sup>−</sup>1, that minimizes Eq. (53). Thus, we introduce the optimal cost functional:

$$C\left(t, \mathbf{x}\_t\right) = \min\_{u\_{tT}} \left( \sum\_{k=t}^{T-1} \beta\left(k, \mathbf{x}\_k, u\_k\right) + o\left(\mathbf{x}\_T\right) \right) \tag{54}$$

which solves the optimal problem from an intermediate time *t* until the fixed end time *T*, starting at an arbitrary state *x <sup>t</sup>* . Here, the minimum of Eq. (53) is denoted by *C*(0, *x*0). Hence, a procedure to compute *C*(*t*, *x*) from *C*(*t* + 1, *x*) for all *x* recursively using dynamic program‐ ming is given as follows:

Set

$$C(T, x) = a\nu(x)$$

So that

$$\begin{split} C\left(t,\mathbf{x}\_{l}\right) &= \min\_{\mathbf{u}\_{l:T-1}} \left\{ \sum\_{k=t}^{T-1} \mathcal{B}\left(k,\mathbf{x}\_{k},\boldsymbol{u}\_{k}\right) + o\left(\mathbf{x}\_{T}\right) \right\} \\ &= \min\_{\boldsymbol{u}\_{l}} \left\{ \mathcal{B}\left(t,\mathbf{x}\_{l},\boldsymbol{u}\_{l}\right) + \min\_{\boldsymbol{u}\_{l+T-1}} \left[ \sum\_{k=t+1}^{T-1} \mathcal{B}\left(k,\mathbf{x}\_{k},\boldsymbol{u}\_{k}\right) + o\left(\mathbf{x}\_{T}\right) \right] \right\} \\ &= \min\_{\boldsymbol{u}\_{l}} \left\{ \mathcal{B}\left(t,\mathbf{x}\_{l},\boldsymbol{u}\_{l}\right) + C\left(t+1,\mathbf{x}\_{l+1}\right) \right\} \\ &= \min\_{\boldsymbol{u}\_{l}} \left\{ \mathcal{B}\left(t,\mathbf{x}\_{l},\boldsymbol{u}\_{l}\right) + C\left(t+1,\mathbf{x}\_{l} + f\left(t,\mathbf{x}\_{l},\boldsymbol{u}\_{l}\right)\right) \right\} \end{split} \tag{55}$$

It could be seen that the reduction to a sequence of minimizations over *ut* from the minimization over the whole path *u*0:*<sup>T</sup>* <sup>−</sup>1 is due to the Markovian nature of the problem: the future depends on the past and the past depends on the future only through the present. Thus, it could be seen that, in the last line of Eq. (55), the minimization is done for each *x <sup>t</sup>* separately and also explicitly depends on time. The procedure for the dynamic programming is illustrated as follows:

**Step 1:** Initialization: *C*(*T* , *x*)=*ω*(*x*)

**Step 2:** Backwards: For *t* =*T* −1, ⋯, 0 and for all *x*, compute

$$u\_t^\star \left(\mathbf{x}\right) = \arg\min\_u \left\{ \beta \left(t, \mathbf{x}, u\right) + C \left(t + 1, \mathbf{x} + f \left(t, \mathbf{x}, u\right)\right) \right\}$$

$$C \left(t, \mathbf{x}\right) = \beta \left(t, u\_t^\star \left(\mathbf{x}\right)\right) + C \left(t + 1, \mathbf{x} + f \left(t, \mathbf{x}, u\_t^\star \left(\mathbf{x}\right)\right)\right)$$

**Step 3:** Forwards: For *t* =0, ⋯, *T* −1, compute

$$\mathbf{x}\_{t+1}^{\ast} = \mathbf{x}\_t^{\ast} + f\left(t, \mathbf{x}\_t^{\ast}, \boldsymbol{\mu}\_t^{\ast}\left(\mathbf{x}\_t^{\ast}\right)\right), \qquad \mathbf{x}\_0^{\ast} = \mathbf{x}\_0.$$

**Lemma 2:** Let *π* \* → *u*<sup>0</sup> \* , *u*<sup>1</sup> \* , ⋯, *uT* <sup>−</sup><sup>1</sup> \* be an optimal control policy for the control problem and assume that, when using *π* \* , a given state *xi* occurs at time *i*, (*i* ≥*t*) a.e. Suppose that the state is at stage *x <sup>i</sup>* at time *i*, and we wish to minimize the cost functional from time *i* to *T*:

$$E\left[\left.o(\mathbf{x}\_T) + \sum\_{t=l}^{T-1} \beta\left(\mathbf{x}\_t, u\_t\left(\mathbf{x}\_t\right)\right)\right] \tag{56}$$

Then, *ui* \* , *ui*+1 \* , <sup>⋯</sup>, *uT* <sup>−</sup><sup>1</sup> \* is the optimal path for this problem and *ut* \* is the optimal control. *Proof:* Define *C* \* (*t*, *x*)=*ω*(*x*) as the optimal cost-to-go:

$$\min\_{\mu} \left[ \text{g} \left( \mathbf{x}, \mu \right) + \nabla\_{t} \boldsymbol{C}^{\bullet} \left( t, \mathbf{x} \right) + \nabla\_{\mathbf{x}} \boldsymbol{C}^{\bullet} \left( t, \mathbf{x} \right)^{\prime} \boldsymbol{f} \left( \mathbf{x}, \mu \right) \right] = \mathbf{0} \tag{57}$$

where *C* \* (*<sup>T</sup>* , *<sup>x</sup>*)=*ω*(*x*). We can say that ∇*<sup>x</sup> <sup>C</sup>* \* (*<sup>T</sup>* , *<sup>x</sup>*)=∇*ω*(*x*). If we define ∇*<sup>x</sup> <sup>C</sup>* \* (*t*, *xt* \* ) =*λt*, then, by introducing the Hamiltonian, *H* (*x*, *u*, *λ*)= *g*(*x*, *u*) + *λ* ′ *f* (*x*, *u*), where *λ*˙ *<sup>t</sup>* = −∇*<sup>x</sup> H* (*xt* \* , *ut* \* , *λ<sup>t</sup>* ); it follows from the optimality principle that

$$\boldsymbol{u}\_{t}^{\bullet} = \underset{\boldsymbol{u}}{\text{arg min}} \left[ \mathbf{g} \left( \mathbf{x}\_{t}^{\bullet}, \boldsymbol{u} \right) + \boldsymbol{\lambda}\_{t}^{\prime} f \left( \mathbf{x}\_{t}^{\bullet}, \boldsymbol{u} \right) \right] \quad \forall t \in \left[ \mathbf{0}, T \right] \tag{58}$$

$$\textbf{Theorem 2:}\text{ Let } \underset{\boldsymbol{\mu}}{\text{min}} \Big[ \underset{\boldsymbol{\mu}}{\text{g}} \{\mathbf{x}, \ \boldsymbol{\mu}\} + \nabla\_{\boldsymbol{\mu}} V \{\mathbf{f}, \ \mathbf{x}\} + \nabla\_{\boldsymbol{\mu}} V \{\mathbf{f}, \ \mathbf{x}\}^{\prime} f \{\mathbf{x}, \ \boldsymbol{\mu}\} \Big] = \boldsymbol{0} \quad \forall \, \forall \, \mathbf{x} \tag{59}$$

with the condition that *V* (*T* , *x*)=*ω*(*x*) ∀ *x*

Suppose that *ut* \* attains the minimum in Eq. (59) for all *t* and *x*. Let (*xt* \*|*t* ∈ 0, *T* ) be the oil trajectory obtained from the known quantity of spill at the initial state denoted by *x* 0, when the control trajectory,*ut* \* →*V* (*t*, *xt* \* ), is used and *x*˙*<sup>t</sup>* = *f* (*xt* \* , *u* \* (*t*, *xt* \* )) ∀*t* ∈ 0, *T* . Then,

$$V\left(t, \mathbf{x}\right) = \boldsymbol{C}^{\star}\left(t, \mathbf{x}\right) \quad \forall t, \mathbf{x} \tag{60}$$

and {*ut* \*|*t* ∈ 0, *T* } is optimal control [7].

### **6. Conclusion**

It could be seen that the reduction to a sequence of minimizations over *ut*

that, in the last line of Eq. (55), the minimization is done for each *x <sup>t</sup>*

**Step 2:** Backwards: For *t* =*T* −1, ⋯, 0 and for all *x*, compute

*<sup>t</sup> <sup>u</sup>*

b

\*

**Step 3:** Forwards: For *t* =0, ⋯, *T* −1, compute

→ *u*<sup>0</sup> \* , *u*<sup>1</sup> \* , ⋯, *uT* <sup>−</sup><sup>1</sup>

assume that, when using *π* \*

\* , <sup>⋯</sup>, *uT* <sup>−</sup><sup>1</sup>

**Lemma 2:** Let *π* \*

is at stage *x <sup>i</sup>*

Then, *ui* \* , *ui*+1

where *C* \*

*Proof:* Define *C* \*

**Step 1:** Initialization: *C*(*T* , *x*)=*ω*(*x*)

146 Robust Control - Theoretical Models and Case Studies

over the whole path *u*0:*<sup>T</sup>* <sup>−</sup>1 is due to the Markovian nature of the problem: the future depends on the past and the past depends on the future only through the present. Thus, it could be seen

( ) { ( ) ( ( ))}

arg min , , 1, , ,

\* \*

*t t*

\* be an optimal control policy for the control problem and

å (56)

\*

*f* (*x*, *u*), where *λ*˙ *<sup>t</sup>* = −∇*<sup>x</sup> H* (*xt*

is the optimal control.

(*t*, *xt* \*

> \* , *ut* \* , *λ<sup>t</sup>* ); it

) =*λt*, then,

, a given state *xi* occurs at time *i*, (*i* ≥*t*) a.e. Suppose that the state

( ) ( ) ( )( ) \* \* min , , , ,0 *t x <sup>u</sup> g xu C t x C t x f xu* é ù +Ñ +Ñ ¢ <sup>=</sup> ê ú ë û (57)

(*<sup>T</sup>* , *<sup>x</sup>*)=∇*ω*(*x*). If we define ∇*<sup>x</sup> <sup>C</sup>* \*

( ) () ( ) ( ( ) ( ) )

, , 1, , ,

= + ++

b

*C t x tu x C t x f t xu x*

( ( )) \* \* \*\* \* \* <sup>1</sup> 0 0 ,, , *t t tt t x x f tx u x x x* <sup>+</sup> = + =

at time *i*, and we wish to minimize the cost functional from time *i* to *T*:

,

( ) ( ) ( ) 1

*T tt t t i E x xu x*

 b

é ù ê ú + ë û

*T*


=

\* is the optimal path for this problem and *ut*

w

(*t*, *x*)=*ω*(*x*) as the optimal cost-to-go:

(*<sup>T</sup>* , *<sup>x</sup>*)=*ω*(*x*). We can say that ∇*<sup>x</sup> <sup>C</sup>* \*

by introducing the Hamiltonian, *H* (*x*, *u*, *λ*)= *g*(*x*, *u*) + *λ* ′

follows from the optimality principle that

*u x t xu C t x f t xu*

= + ++

depends on time. The procedure for the dynamic programming is illustrated as follows:

from the minimization

separately and also explicitly

This chapter presents the mathematical abstractions of optimal control process where decisions must be made in several stages following an optimal control path to minimize the apparent toxicological effect of oil spill clean-up technique by determining the control measure that will cause a process to satisfy the physical constraints and at the same time optimizing some performance criteria for all future earnings from marine biota. Hence, in the future, if the optimal policy is followed, the recursive method for the sequential optimization will converge to optimal costs control and value function, which optimizes the probable future value at any node of the decision tree.
