**Theorem 1**

If the *i th* population is the best, then *Limk*→*<sup>∞</sup> B*(*αi*,*<sup>k</sup>* , *βi*,*<sup>k</sup>* ) = *Bi* =1.

In order to prove the theorem first we prove the following two lemmas.

## **Lemma 1:**

Define a recursive sequence {*Rk* , *<sup>j</sup>* ; *j* =1, 2, ..., *l*} as

$$R\_{k,j} = \begin{cases} \frac{c\_j R\_{k-1,j}}{l} & \text{for} \quad k = 1, 2, 3, \dots \\ \sum\_{i=1}^{l} c\_i R\_{k-1,i} & \\ & P\_j & \text{for} \quad k = 0 \end{cases} \tag{5}$$

**Lemma 2:**

**Proof**

*Hk* ,*<sup>i</sup>* =

Sequence *Rk* , *<sup>j</sup>*

the maximum value of *cj*

suppose that *cg* = max

*Rk* ,*<sup>g</sup> Rk* ,*<sup>i</sup>*

From equation (6), we know that ∑

That is a contradiction because *Limk*→*<sup>∞</sup>*

, ,

a b

of the *j*

and *Bj j*≠*i*

*LimB B Lim*

®¥ ®¥

to select the best population.

on both sides of equation (3), we will have

= =

1

*j*

From the law of large numbers, we know that *Limk*→*<sup>∞</sup>*

=

å

*ik ik i <sup>n</sup> k k th*

a

a

*th* population. Hence, using equation (7) we have *Bi* <sup>=</sup> *Bi pi*

*j*∈{1...*m*} {*cj*

.

. By equation (5), we have *Hk* ,*<sup>i</sup>* =

*j*=1 *l l*

*g g*

*cg ci Hk* <sup>−</sup>1,*<sup>i</sup>*

, 1, 0, ( , ) *k*

== Þ = ç ÷ ¥ ç ÷ è ø

*Limk*→*∞* (*Rk* ,*<sup>i</sup>* ) = *l g l i*

( ) ( ) {( ) }

*B i*

,1 ,1 , ,

*ik ik ik ik*

,1 ,1 , ,

*B j*

*jk jk jk jk*

,

 ab

*i* is the best, i.e., it possesses the largest value of *pj* 's, by lemma 1 and 2 we conclude that *Bi* =1

In real-world applications, since there is a cost associated with the data gathering process we need to select the best population in a finite number of decision-making stages. In the next section, we present the proposed decision-making method in the form of a stochastic dynamic programming model in which there is a limited number of decision-making stages available

=0. This concludes the convergence property of the proposed method.

*p*¯ *<sup>j</sup>*,*<sup>k</sup>* = *pj*

 ab

Now we are ready to prove the convergence property of the proposed method. Taking limit

*ki k i <sup>i</sup> k i <sup>k</sup> i i c c H H H Lim H c c* - ®¥ æ ö

> (*Hk* ,*<sup>i</sup>* ) = *Limk*→*∞* (*Rk* ,*g*)

 b


 b


converges to one for *j* = *g* and converges to zero for *j* ≠ *g*, where *g* is an index for

*<sup>j</sup>* =1. Then by lemma 1, we have *l*

} and *g* ≠*i*. We will show that this is a contradiction. Consider

Using Dynamic Programming Based on Bayesian Inference in Selection Problems

=0. So *l*

( ) {( ) }

é ù ê ú ë û ë û

, Pr , Population is the best

*th*

é ù ê ú

, Pr , Population is the best

∑ *j*=1 *n Bj pj*

*<sup>g</sup>* =1

, where *pj* is the probability of success

. Then assuming population

. Since *Ho*,*<sup>i</sup>* >0 we will have

*<sup>i</sup>* =1 for only one *i*. Now

http://dx.doi.org/10.5772/57423

(8)

129

(9)

where *c*1, *c*2, ..., and *cl* are different positive constants, ∑ *j*=1 *l Pj* =1 , and *Pj* >0 Then, if *l <sup>j</sup>* <sup>=</sup> *Limk*→*<sup>∞</sup>* (*Rk* , *<sup>j</sup>* ), there exist at most one non-zero *l j* .
