**2.2. A dynamic programming approach**

The proposed dynamic programming approach to model the decision-making problem of selecting the best Binomial population is similar to an optimal stopping problem.

**a.** *Bi*, *<sup>j</sup>*

*Bi*, *<sup>j</sup>*

Similarly,

**b.** *Bi*, *<sup>j</sup>*

Since *Bi*, *<sup>j</sup>*

have *Si* ≡{*Bi*, *<sup>j</sup>*

max{*Bi*, *<sup>j</sup>*

{*Si*

} and *Bi*, *<sup>j</sup>*

Hence, we define event *Si*

(*i*) + *Bi*, *<sup>j</sup>*

(*i*)≥*di*, *<sup>j</sup>*

{ } { ( ) ( )}

a b

= ³=

, ,,

*ij j ij ij*

*BS Bj ds*

Pr

ab

( )


 ab

\*

(*i*), *Bi*, *<sup>j</sup>*

variation interval of *di*, *<sup>j</sup>*

{*CS* |*Si*

} and *Bi*, *<sup>j</sup>*

( *j*). Then, using equation (2) we have

, ,

, ,

{*Sj* }

*ij j ij*

Regarding the decision-making strategy, we have:

as

*ij i ij*

{*CS* |*Sj* }

These are the beliefs of making the correct decision if population *i* or *j* is selected as the optimal

a

a

These are the beliefs of selecting population *i* or *j* as the optimal population, respectively.

( ) ( ( ) ( )) ( ) ( ) \*

{ ( ) { ( ) ( )} ( ) ( )} \*

maximum value of the beliefs, without interruption of assumptions, we can change the

\* (*s*) from [0,1] to [0.5,1]. Now by considering *di*, *<sup>j</sup>*

( ) ( ) ( ) { ( ( )) } ( ( ))

ì ü ì ü <sup>ï</sup> <sup>ï</sup> ï ï <sup>í</sup> ³ =³ = ³ <sup>ý</sup> í ý <sup>ï</sup> <sup>+</sup> <sup>ï</sup> ï ï î þ î þ

*jk jk jk j k*

,1 ,1 , ,1 ,1 , ,

*B pB p p*

*ik ik ik jk jk jk i k*

,1 ,1 , \*\* \* ,

*B p <sup>p</sup> d s p hd s p hd s*

\* (*s*)}. By similar reasoning *Sj* <sup>≡</sup>{*Bi*, *<sup>j</sup>*

, Pr Pr Pr , ,

( ) ( ) ,1 ,1 ,

( ) ( ) ,1 ,1 ,

 b


,1 ,1 , ,1 ,1 , , ( ) , , *jk jk kj*

 ab

, ,, , , max , and *i j ij ij ij ij B i B iB j B i d s* = ³ (14)

, ,, , , max , , *i ij ij ij ij ij S B i B iB j B i d s* º = ³ (15)

( *j*)=1 and that the beliefs are not negative we conclude

( *j*)≥*di*, *<sup>j</sup>*

, , ,, ,

*i j jk ij ik i j*

\* (*s*)}. Hence

( *j*)} ≥0.5. Furthermore, since the decision making is performed based upon the

*jk jk kj ik ik ki*

 b


*B pB p*

,1 ,1 , ,1 ,1 , , ( ) , , *ik ik ki*

*ik ik ki jk jk kj*

 ab

Using Dynamic Programming Based on Bayesian Inference in Selection Problems

(*i*) and

131

http://dx.doi.org/10.5772/57423

(12)

(13)

\* (*s*)≥0.5 implicitly, we

(16)

population, respectively. To make the evaluation easier, we denote these beliefs by *Bi*, *<sup>j</sup>*

{ } ( )

{ } ( )

*B p B CS S B j B pB p*

ab

= = <sup>+</sup>

ab

= = <sup>+</sup>

*B p B CS S B i*

Let us assume that to find the best population there is a limited number of stages (s) available. Then, the general framework of the decision-making process in each stage is proposed as:


In step 3 of the above framework, let populations *i* and *j* be the two candidates of being the best populations (it means that the beliefs of populations *i* and *j* are the two biggest beliefs) and we have *s* decision-making stages. If the biggest belief is more than a threshold (minimum acceptable belief)*di*, *<sup>j</sup>* (*s*), (0≤*di*, *<sup>j</sup>* (*s*)≤1), we select the corresponding subspace of that belief as the solution. Otherwise, the decision-making process continues by taking more observations. We determine the value of *di*, *<sup>j</sup>* (*s*) such that the belief of making the correct decision is maxi‐ mized. To do this suppose that for each population a new observation, (*α <sup>j</sup>*,*<sup>k</sup>* , *β <sup>j</sup>*,*<sup>k</sup>* ), is available at a given stage *k*. At this stage, we define *V* (*s*, *di*, *<sup>j</sup>* (*s*)) to be the expected belief of making the correct decision in *s* stages when two populations *i* and *j* are the candidates for the optimal population. In other words, if we let *CS* denote the event of making the correct decision, we define *Vi*, *<sup>j</sup>* (*s*, *di*, *<sup>j</sup>* (*s*)) =*E Bi*, *<sup>j</sup>* {*CS*} , where *Bi*, *<sup>j</sup>* {*CS*} is the belief of making the correct decision. Furthermore, assume that the maximum of *Vi*, *<sup>j</sup>* (*s*, *di*, *<sup>j</sup>* (*s*)) occurs at *di*, *<sup>j</sup>* \*(*s*). Then, we will have

$$V\_{i,j}\left(\mathbf{s}\_{\prime},d\_{i,j}^{\prime}\left(\mathbf{s}\right)\right) = \max\_{d\_{i,j}\left(\mathbf{s}\right)} \left\{V\_{i,j}\left(\mathbf{s}\_{\prime},d\_{i,j}\left(\mathbf{s}\right)\right)\right\} = \text{Max}\left\{E\left[\mathcal{B}\_{i,j}\left\{\mathbf{CS}\right\}\right]\right\} \tag{10}$$

We denote this optimal point by *Vi*, *<sup>j</sup>* \*(*s*). In other words, *Vi*, *<sup>j</sup>* \*(*s*)=*Vi*, *<sup>j</sup>* (*s*, *di*, *<sup>j</sup>* \*(*s*)). Moreover, let us define *Si* and *Sj* to be the state of selecting population *i* and *j* as the candidates for the optimal population, respectively, and *N Si*, *<sup>j</sup>* as the state of choosing neither of these population. Then, by conditioning on the above states, we have

$$\begin{split} \left| \boldsymbol{V}\_{i,j} \right|^{\*} (\boldsymbol{s}) &= \text{Max} \left| \mathbb{E} \left[ \boldsymbol{B}\_{i,j} \{ \text{CS} \} \right] \right| = \\ &\times \text{Max} \left| \mathbb{E} \left[ \boldsymbol{B}\_{i,j} \left\{ \text{CS} \middle| \boldsymbol{S}\_{i} \right\} \boldsymbol{B}\_{i,j} \left\{ \boldsymbol{S}\_{i} \right\} + \boldsymbol{B}\_{i,j} \left\{ \text{CS} \middle| \boldsymbol{S}\_{j} \right\} \boldsymbol{B}\_{i,j} \left\{ \boldsymbol{S}\_{j} \right\} + \boldsymbol{B}\_{i,j} \left\{ \text{CS} \middle| \boldsymbol{N} \mathbf{S}\_{i,j} \right\} \boldsymbol{B}\_{i,j} \left\{ \text{NS}\_{i,j} \right\} \right] \right| \end{split} \tag{11}$$

In order to evaluate*Vi*, *<sup>j</sup>* \*(*s*), in what follows we will find the belief terms of equation (11).

$$\mathbf{a.} \quad \mathcal{B}\_{i,j} \{ \mathbf{CS} \mid \mathbf{S}\_i \} \text{ and } \mathcal{B}\_{i,j} \{ \mathbf{CS} \mid \mathbf{S}\_j \} $$

These are the beliefs of making the correct decision if population *i* or *j* is selected as the optimal population, respectively. To make the evaluation easier, we denote these beliefs by *Bi*, *<sup>j</sup>* (*i*) and *Bi*, *<sup>j</sup>* ( *j*). Then, using equation (2) we have

$$B\_{i,j}\left(\mathbb{CS}\middle|S\_i\right) = B\_{i,j}(i) = \frac{\mathbb{B}\left(\alpha\_{i,k-1}, \beta\_{i,k-1}\right) p\_{k,i}}{\mathbb{B}\left(\alpha\_{i,k-1}, \beta\_{i,k-1}\right) \overline{p\_{k,j}} + \mathbb{B}\left(\alpha\_{j,k-1}, \beta\_{j,k-1}\right) \overline{p\_{k,j}}} \tag{12}$$

Similarly,

**2.2. A dynamic programming approach**

130 Dynamic Programming and Bayesian Inference, Concepts and Applications

**3.** Select the two biggest beliefs.

acceptable belief)*di*, *<sup>j</sup>*

define *Vi*, *<sup>j</sup>*

us define *Si*

We determine the value of *di*, *<sup>j</sup>*

(*s*, *di*, *<sup>j</sup>*

We denote this optimal point by *Vi*, *<sup>j</sup>*

and *Sj*

\*

In order to evaluate*Vi*, *<sup>j</sup>*

The proposed dynamic programming approach to model the decision-making problem of

Let us assume that to find the best population there is a limited number of stages (s) available. Then, the general framework of the decision-making process in each stage is proposed as:

**2.** Calculate the posterior beliefs in terms of the prior beliefs using Bayesian approach.

that the corresponding subspace is the optimal one. Otherwise, go to step 1.

**4.** Based upon the values of the two biggest beliefs calculate the minimum acceptable belief. **5.** If the maximum belief is more than the minimum acceptable belief, then we can conclude

In step 3 of the above framework, let populations *i* and *j* be the two candidates of being the best populations (it means that the beliefs of populations *i* and *j* are the two biggest beliefs) and we have *s* decision-making stages. If the biggest belief is more than a threshold (minimum

the solution. Otherwise, the decision-making process continues by taking more observations.

mized. To do this suppose that for each population a new observation, (*α <sup>j</sup>*,*<sup>k</sup>* , *β <sup>j</sup>*,*<sup>k</sup>* ), is available

correct decision in *s* stages when two populations *i* and *j* are the candidates for the optimal population. In other words, if we let *CS* denote the event of making the correct decision, we

( ) { ( )} { { } }

\*(*s*). In other words, *Vi*, *<sup>j</sup>*

population, respectively, and *N Si*, *<sup>j</sup>* as the state of choosing neither of these population. Then,

{ { } { } { } { } { } { } }

*Max E B CS S B S B CS S B S B CS NS B NS*

, , , , , ,, ,

é ù + + ë û

*ij i ij i ij j ij j ij ij ij ij*

to be the state of selecting population *i* and *j* as the candidates for the optimal

\*(*s*), in what follows we will find the belief terms of equation (11).

, , , , , ( )

*ij ij ij ij i j d s V s d s Max V s d s Max E B CS* = = é ù

(*s*, *di*, *<sup>j</sup>*

{*CS*} , where *Bi*, *<sup>j</sup>*

,

, () , () *i j*

(*s*)≤1), we select the corresponding subspace of that belief as

(*s*) such that the belief of making the correct decision is maxi‐

(*s*)) occurs at *di*, *<sup>j</sup>*

\*(*s*)=*Vi*, *<sup>j</sup>*

(*s*)) to be the expected belief of making the

\*(*s*). Then, we will have

\*(*s*)). Moreover, let

(11)

ë û (10)

(*s*, *di*, *<sup>j</sup>*

{*CS*} is the belief of making the correct decision.

selecting the best Binomial population is similar to an optimal stopping problem.

**1.** Take an independent sample of size m from each population.

(*s*), (0≤*di*, *<sup>j</sup>*

at a given stage *k*. At this stage, we define *V* (*s*, *di*, *<sup>j</sup>*

(*s*)) =*E Bi*, *<sup>j</sup>*

by conditioning on the above states, we have

( ) { { } }

= = é ù ë û

, ,

*i j i j*

*V s Max E B CS*

Furthermore, assume that the maximum of *Vi*, *<sup>j</sup>*

\*

$$\mathcal{B}\_{i,j}\left\{\text{CS}\Big|\,\mathcal{S}\_{j}\right\} = \mathcal{B}\_{i,j}(j) = \frac{\mathcal{B}\left(\alpha\_{j,k-1}, \mathcal{B}\_{j,k-1}\right)\overline{p\_{k,j}}}{\mathcal{B}\left(\alpha\_{j,k-1}, \mathcal{B}\_{j,k-1}\right)\overline{p\_{k,j}} + \mathcal{B}\left(\alpha\_{i,k-1}, \mathcal{B}\_{i,k-1}\right)\overline{p\_{k,j}}} \tag{13}$$

**b.** *Bi*, *<sup>j</sup>* {*Si* } and *Bi*, *<sup>j</sup>* {*Sj* }

These are the beliefs of selecting population *i* or *j* as the optimal population, respectively. Regarding the decision-making strategy, we have:

$$B\_{i,j}(i) = \max\left(B\_{i,j}\left(i\right), B\_{i,j}\left(j\right)\right) \text{ and } B\_{i,j}\left(i\right) \succeq d\_{i,j}^\*\left(s\right) \tag{14}$$

Hence, we define event *Si* as

$$S\_i = \left\{ B\_{i,j} \left( i \right) = \max \left\{ B\_{i,j} \left( i \right), B\_{i,j} \left( j \right) \right\}, B\_{i,j} \left( i \right) \succeq d\_{i,j}^\* \left( s \right) \right\} \tag{15}$$

Since *Bi*, *<sup>j</sup>* (*i*) + *Bi*, *<sup>j</sup>* ( *j*)=1 and that the beliefs are not negative we conclude max{*Bi*, *<sup>j</sup>* (*i*), *Bi*, *<sup>j</sup>* ( *j*)} ≥0.5. Furthermore, since the decision making is performed based upon the maximum value of the beliefs, without interruption of assumptions, we can change the variation interval of *di*, *<sup>j</sup>* \* (*s*) from [0,1] to [0.5,1]. Now by considering *di*, *<sup>j</sup>* \* (*s*)≥0.5 implicitly, we have *Si* ≡{*Bi*, *<sup>j</sup>* (*i*)≥*di*, *<sup>j</sup>* \* (*s*)}. By similar reasoning *Sj* <sup>≡</sup>{*Bi*, *<sup>j</sup>* ( *j*)≥*di*, *<sup>j</sup>* \* (*s*)}. Hence

$$\begin{split} \operatorname{B}\_{i,j}\left\{\mathcal{S}\_{j}\right\} &= \operatorname{\mathbf{Pr}}\Big(\operatorname{B}\_{i,j}\big(j\right) \geq \operatorname{d}\_{i,j}^{\*}\big(s\big)\Big) = \\ &\operatorname{\mathbf{Pr}}\Big\{\frac{\operatorname{B}\Big(\operatorname{a}\_{j,k-1},\operatorname{\mathcal{P}}\_{j,k-1}\big)\overline{p\_{j,k}}}{\operatorname{B}\Big(\operatorname{a}\_{i,k-1},\operatorname{\mathcal{P}}\_{i,k-1}\big)\overline{p\_{i,k}} + \operatorname{\mathbf{B}}\Big(\operatorname{a}\_{j,k-1},\operatorname{\mathcal{P}}\_{j,k-1}\big)\overline{p\_{j,k}}} \geq \operatorname{d}\_{i,j}^{\*}\big(s\big) \\ &= \operatorname{\mathbf{Pr}}\Big(\operatorname{B}\_{i,j,k}\big(s\big)\Big(\operatorname{a}\_{i,j}\big(s\big)\big)\overline{p\_{i,k}} + \operatorname{\mathbf{B}}\Big(\operatorname{a}\_{i,j}\big(s\big)\big)\Big(\operatorname{b}\_{j,k}\big(s\big)\big)\Big) \Big\} \end{split} \tag{16}$$

$$\text{In which, } h\left(d\_{i\_{\cdot,j}}^{\ast}(s)\right) = \frac{d\_{i\_{\cdot,j}}^{\ast}(s)B\left(\alpha\_{i,k-1\prime}\beta\_{i,k-1}\right)}{\{1-d\_{i\_{\cdot,j}}^{\ast}(s)\}B\left(\alpha\_{j,k-1\prime}\beta\_{j,k-1}\right)}.$$

To evaluate Pr{ *<sup>p</sup>*¯ *j*,*k p*¯ *i*,*k* ≥*h* (*di*, *<sup>j</sup>* \* (*s*))} in equation (16), let *<sup>f</sup>* 1(*<sup>p</sup>*¯ *<sup>j</sup>*,*<sup>k</sup>* ) and *<sup>f</sup>* <sup>2</sup>(*<sup>p</sup>*¯ *<sup>i</sup>*,*<sup>k</sup>* ) to be the probability distributions of *<sup>p</sup>*¯ *<sup>j</sup>*,*k* and *<sup>p</sup>*¯ *<sup>i</sup>*,*<sup>k</sup>* , respectively. Then,

$$\begin{split} f\_{2}(\overline{p\_{i,k}}) &= \frac{\Gamma(\alpha\_{i,k} + \beta\_{i,k} + 1)}{\Gamma(\alpha\_{i,k} + 0.5)\Gamma(\beta\_{i,k} + 0.5)} \overline{p\_{i,k}}^{\alpha\_{i,k} - 0.5} (1 - \overline{p\_{i,k}})^{\beta\_{i,k} - 0.5} \\ f\_{1}(\overline{p\_{j,k}}) &= \frac{\Gamma(\alpha\_{j,k} + \beta\_{j,k} + 1)}{\Gamma(\alpha\_{j,k} + 0.5)\Gamma(\beta\_{j,k} + 0.5)} \overline{p\_{j,k}}^{\alpha\_{j,k} - 0.5} (1 - \overline{p\_{j,k}})^{\beta\_{j,k} - 0.5} \end{split} \tag{17}$$

**c.** *Bi*, *<sup>j</sup>*

*Bi*, *<sup>j</sup>*

*di*, *<sup>j</sup>*

{*CS* | *N Si*, *<sup>j</sup>*

{*CS* | *N Si*, *<sup>j</sup>*

}

current stage, using discount factor *α*,

}is the belief of making the correct decision when none of the subspaces *i* and *j*

Using Dynamic Programming Based on Bayesian Inference in Selection Problems

(22)

} =1−(*Bi*, *<sup>j</sup>*

( *j*), equation (23) can be written

\* (*s* −1)) are negative. Since we are

{*Si* } + *Bi*, *<sup>j</sup>* {*Sj* }),

http://dx.doi.org/10.5772/57423

133

(23)

(24)

{*N Si*, *<sup>j</sup>*

has been chosen as the optimal one. In other words, the maximum beliefs has been less than

Having all the belief terms of equation (11) evaluated in equations (12), (13), (14), (15), and (16),

( ) { ( ) ( )} ( ) { ( ) ( )}

(*i*)≥ *Bi*, *<sup>j</sup>*

 a

{ } \* , ,, ( 1) *ij ij ij B CS NS V s* = a

{ }( { ( )} { ( )})}

*ij ij ij ij ij ij d s*


= ³+ ³+

1 Pr ( ) Pr ( )

, , ,, , ,

*ij ij ij ij ij ij*

*B NS CS B i d s B j d s*

( ) max { Pr Pr

, ,, , , , 0.5 ( ) 1

, , ,, , , , 0.5 ( ) 1

*i j ij ij ij ij ij ij d s*

*V s Bi Bi ds Bj Bj ds*

= ³+ ³

( ) { ( ) ( )} ( )

*V s Bi ds Bj d*

1 1 Pr Pr

max Pr Pr

+ - ³- ³

, ,, , ,

*i j ij ij ij ij*

{ ( ) { ( ) ( )} ( ) { ( ) ( )}

( ( ) ( )) { ( ) ( )}

=-- ³+ -- ³+-

\* \* \* , , ,, ,

1 Pr 1

( *j*)−*αVi*, *<sup>j</sup>*

(*s*)), then the two probability terms in equation (24) must be minimized.

\* (*s*)=1, making the probability terms equal to zero. Now

( ( ) ( )) { ( ) ( )} ( )

\*\* \* , , , ,,

*V s Bi Vs Bi ds Bj Vs Bjds Vs*

*ij ij ij ij ij ij ij ij ij i j*

\* (*<sup>s</sup>* <sup>−</sup>1)) and (*Bi*, *<sup>j</sup>*

( ) 1 Pr

For the decision-making problem at hand, three cases may happen

\* (*s*)=1, we continue to the next stage.

a

*Bi Bi ds Bj Bj ds*

and knowing that by partitioning the state space we have *Bi*, *<sup>j</sup>*

equation (11) can now be evaluated by substituting.

Assuming that for the two biggest beliefs we have *Bi*, *<sup>j</sup>*

a

(*i*)−*αVi*, *<sup>j</sup>*

,

£ £

*i j*

,

£ £

*i j*

\*

a

*2.2.1. Making the decision*

(*i*)<*αVi*, *<sup>j</sup>*

In this case, both (*Bi*, *<sup>j</sup>*

(*i*)<*di*, *<sup>j</sup>*

maximizing *Vi*, *<sup>j</sup>*

\* (*s* −1) :

(*s*, *di*, *<sup>j</sup>*

This only happens when we let *di*, *<sup>j</sup>*

as

**1.** *Bi*, *<sup>j</sup>*

since *Bi*, *<sup>j</sup>*

\*

\* (*s*) and the process of decision-making continues to the next stage. In terms of stochastic dynamic programming approach, the belief of this event is equal to the maximum belief of making the correct decision in *(s-1)* stages. Since the value of this belief is discounted in the

Hence,

$$\begin{split} \left| \Pr \left\{ \overline{p\_{j,k}} \ge h \middle( d\_{i,j}^{\*} \left( s \right) \right) \overline{p\_{i,k}} \right\} &= \int\_{0}^{1} \int\_{h \middle| d\_{i,j}^{\*} \left( s \right) \overline{p\_{j,k}}}^{1} f\_{1} \left( \overline{p\_{j,k}} \right) f\_{2} \left( \overline{p\_{i,k}} \right) d \overline{p\_{j,k}} d \overline{p\_{i,k}} = \\ \int\_{0}^{1} \int\_{h \middle| d\_{i,j}^{\*} \left( s \right) \overline{p\_{i,k}}}^{1} A\_{i} \overline{p\_{i,k}}^{a\_{i,k} - 0.5} \left( 1 - \overline{p\_{i,k}} \right)^{g\_{i,k} - 0.5} A\_{j} \overline{p\_{j,k}}^{a\_{j,k} - 0.5} \left( 1 - \overline{p\_{j,k}} \right)^{g\_{j,k} - 0.5} d \overline{p\_{j,k}} \overline{d \overline{p\_{i,k}}} \end{split} \tag{18}$$

where

$$A\_{i} = \frac{\Gamma(\alpha\_{i,k} + \beta\_{i,k} + 1)}{\Gamma(\alpha\_{i,k} + 0.5)\Gamma(\beta\_{i,k} + 0.5)},\\A\_{j} = \frac{\Gamma(\alpha\_{j,k} + \beta\_{j,k} + 1)}{\Gamma(\alpha\_{j,k} + 0.5)\Gamma(\beta\_{j,k} + 0.5)}.\tag{19}$$

By change of variables technique, we have:

$$\begin{aligned} II &= \frac{\overline{p\_{j,k}}}{\overline{p\_{i,k}}} \text{ and } V = \overline{p\_{i,k}}\\ f\left(\mathsf{LI}\right) &= A\_i A\_j \mathsf{LI}^{a\_{i,k} - 0.5} \int\_0^1 V^{a\_{i,k} + a\_{j,k}} \left(1 - V\right)^{\theta\_{i,k} - 1} \left(1 - \mathsf{LI}V\right)^{\theta\_{j,k} - 0.5} dV\\ \Pr\left\{\overline{p\_{i,k}} &\geq h\left(\underline{d\_{i,j}}\left(s\right)\right)\right\} = 1 - \int\_0^{h\left(\underline{d\_{i,j}}\left(s\right)\right)} f\left(\mathsf{LI}\right) d\mathsf{LI} = 1 - F\left(h\left(\underline{d\_{i,j}}\left(s\right)\right)\right) \end{aligned} \tag{20}$$

For *Bi*, *<sup>j</sup>* {*Si* } we have

$$\begin{aligned} \left| \mathcal{B}\_{i,j} \left\{ \mathcal{S}\_{i} \right\} \right| &= \Pr \left\{ \mathcal{B}\_{i,j} \left( i \right) \ge d\_{i,j}^{\*} \left( s \right) \right\} = \\ \Pr \left\{ 1 - \mathcal{B}\_{i,j} \left( j \right) \ge d\_{i,j}^{\*} \left( s \right) \right\} &= \Pr \left\{ \mathcal{B}\_{i,j} \left( j \right) \le 1 - d\_{i,j}^{\*} \left( s \right) \right\} = F \left( h \left( 1 - d\_{i,j}^{\*} \left( s \right) \right) \right) \end{aligned} \tag{21}$$

#### **c.** *Bi*, *<sup>j</sup>* {*CS* | *N Si*, *<sup>j</sup>* }

In which, *h* (*di*, *<sup>j</sup>*

To evaluate Pr{ *<sup>p</sup>*¯

distributions of *<sup>p</sup>*¯

Hence,

where

For *Bi*, *<sup>j</sup>*

{*Si*

} we have

\* (*s*)) <sup>=</sup> *di*, *<sup>j</sup>*

*j*,*k p*¯ *i*,*k*

( ( ))

*ij ik*

By change of variables technique, we have:

,

*j k*

and

*<sup>p</sup> U Vp p*

= =

*i j*

ì ü ï ï

ï ï î þ

{ } { ( ) ( )}

= ³=

, ,,

*ij i ij ij*

*BS Bi ds*

Pr

a-

,

*i k*

*p*

*i k*

\* , ,

Pr

ò ò

(1 − *di*, *<sup>j</sup>*

132 Dynamic Programming and Bayesian Inference, Concepts and Applications

≥*h* (*di*, *<sup>j</sup>*

*<sup>j</sup>*,*k* and *<sup>p</sup>*¯

\* (*s*)*B*(*αi*,*<sup>k</sup>* <sup>−</sup>1, *<sup>β</sup>i*,*<sup>k</sup>* <sup>−</sup>1)

\* (*s*))*B*(*<sup>α</sup> <sup>j</sup>*,*<sup>k</sup>* <sup>−</sup>1, *<sup>β</sup> <sup>j</sup>*,*<sup>k</sup>* <sup>−</sup>1) .

*<sup>i</sup>*,*<sup>k</sup>* , respectively. Then,

*ik ik*

2 , , , , ,

*f p p p*

 b

( 1) ( ) (1 ) ( 0.5) ( 0.5)

, , , ,


*i ik i k j jk j k jk ik hd s p*

, , , , ,, ,, ( 1) ( 1) , . ( 0.5) ( 0.5) ( 0.5) ( 0.5) *ik ik jk jk*

*i k i k j k j k*

( ) ( ) ( )

ò

= - -

a a+

í ý ³ =- =-

\* ,

, \* \* , , <sup>0</sup> ,

*<sup>p</sup> hd s f U dU F h d s*

{ ( )} { ( )} ( ( ( )))


*ij ij i j i j i j*

*B j d s B j d s Fh d s*

\* \*\* ,, , , ,

*f U A A U V V UV dV*

*i j*

*i k j k i k j k*

(1 ) (1 )

*A p p Ap p dp dp*

 a

( ( )) ( ) ( ( )) ( ( ( )))

*i j i j*

0.5 1 1 0.5

1 1

b

, , , , ,

*i k ik jk i k j k*

a b

ab

G ++ G ++ = = G +G + G +G + (19)

1 , , ,

*f p p p*

 b

( 1) ( ) (1 ) ( 0.5) ( 0.5)

*j k j k j k*

G ++ <sup>=</sup> - G +G +

*i k i k i k*

G ++ <sup>=</sup> - G +G +

, ,

{ ( ( )) } ( ) ( ) ( ( ))

ò ò

*i j*

,

*hd s j k*

*i k*

0

\*

Pr 1 ( ) Pr ( ) 1 1

ò

Pr 1 1

*A A* a b

ab

\* , ,

*p hd s p f p f p dp dp*

, ,, 1,2, , , 0 1 1 0.5 0.5 0.5 0.5 , , , , ,, 0

*jk ij ik jk ik jk ik hd s p*

*ij ik*

³ = =

 - b-

*j k j k*

*i k i k jk jk*

a b

a b

a

a

1 1 \*

a

\* (*s*))} in equation (16), let *<sup>f</sup>* 1(*<sup>p</sup>*¯

*<sup>j</sup>*,*<sup>k</sup>* ) and *<sup>f</sup>* 2(*<sup>p</sup>*¯

b

b

b-

 b- -

, ,

*i k i k*


0.5 , , 0.5

a

, , 0.5 0.5

a , ,

*j k j k*


*<sup>i</sup>*,*<sup>k</sup>* ) to be the probability

(17)

(18)

(20)

(21)

*Bi*, *<sup>j</sup>* {*CS* | *N Si*, *<sup>j</sup>* }is the belief of making the correct decision when none of the subspaces *i* and *j* has been chosen as the optimal one. In other words, the maximum beliefs has been less than *di*, *<sup>j</sup>* \* (*s*) and the process of decision-making continues to the next stage. In terms of stochastic dynamic programming approach, the belief of this event is equal to the maximum belief of making the correct decision in *(s-1)* stages. Since the value of this belief is discounted in the current stage, using discount factor *α*,

$$B\_{i,j} \left\langle \text{CS} \middle| \text{NS}\_{i,j} \right\rangle = \alpha V\_{i,j}^\*(\text{s-1}) \tag{22}$$

Having all the belief terms of equation (11) evaluated in equations (12), (13), (14), (15), and (16), and knowing that by partitioning the state space we have *Bi*, *<sup>j</sup>* {*N Si*, *<sup>j</sup>* } =1−(*Bi*, *<sup>j</sup>* {*Si* } + *Bi*, *<sup>j</sup>* {*Sj* }), equation (11) can now be evaluated by substituting.

( ) { ( ) ( )} ( ) { ( ) ( )} { }( { ( )} { ( )})} { ( ) { ( ) ( )} ( ) { ( ) ( )} ( ) { ( ) ( )} ( ) , , \* , , ,, , , , 0.5 ( ) 1 , , ,, , , , ,, , , , 0.5 ( ) 1 \* , ,, , , ( ) max { Pr Pr 1 Pr ( ) Pr ( ) max Pr Pr 1 1 Pr Pr *i j i j i j ij ij ij ij ij ij d s ij ij ij ij ij ij ij ij ij ij ij ij d s i j ij ij ij ij V s Bi Bi ds Bj Bj ds B NS CS B i d s B j d s Bi Bi ds Bj Bj ds* a*V s Bi ds Bj d* £ £ £ £ = ³+ ³ + - ³- ³ = ³+ ³+ -- ³ - ³ ( { (*s*)})} (23)
