**Proof**

From equation (6), we know that ∑ *j*=1 *l l <sup>j</sup>* =1. Then by lemma 1, we have *l <sup>i</sup>* =1 for only one *i*. Now suppose that *cg* = max *j*∈{1...*m*} {*cj* } and *g* ≠*i*. We will show that this is a contradiction. Consider *Hk* ,*<sup>i</sup>* = *Rk* ,*<sup>g</sup> Rk* ,*<sup>i</sup>* . By equation (5), we have *Hk* ,*<sup>i</sup>* = *cg ci Hk* <sup>−</sup>1,*<sup>i</sup>* . Since *Ho*,*<sup>i</sup>* >0 we will have

$$H\_{k,i} = \frac{c\_{\mathcal{g}}}{c\_i} H\_{k-1,i} = \left(\frac{c\_{\mathcal{g}}}{c\_i}\right)^k H\_{0,i} \quad \Rightarrow \quad \lim\_{k \to \infty} (H\_{k,i}) = \infty \tag{8}$$

That is a contradiction because *Limk*→*<sup>∞</sup>* (*Hk* ,*<sup>i</sup>* ) = *Limk*→*∞* (*Rk* ,*g*) *Limk*→*∞* (*Rk* ,*<sup>i</sup>* ) = *l g l i* =0. So *l <sup>g</sup>* =1

Now we are ready to prove the convergence property of the proposed method. Taking limit on both sides of equation (3), we will have

$$\operatorname\*{Lin}\_{k\to\infty}\operatorname{B}\left(\alpha\_{i,k},\beta\_{i,k}\right) = \operatorname\*{B}\_{i} = \operatorname\*{Lin}\left[\frac{\operatorname\*{B}\left(\alpha\_{i,k-1},\beta\_{i,k-1}\right)\Pr\left\{\left(a\_{i,k},\beta\_{i,k}\right)\Big| i^{\text{th}}\text{ Population is the best}\right\}}{\sum\_{j=1}^{n}\left[\operatorname\*{B}\left(\alpha\_{j,k-1},\beta\_{j,k-1}\right)\Pr\left\{\left(a\_{j,k},\beta\_{j,k}\right)\Big| j^{\text{th}}\text{ Population is the best}\right\}\right]}\right] \tag{9}$$

From the law of large numbers, we know that *Limk*→*<sup>∞</sup> p*¯ *<sup>j</sup>*,*<sup>k</sup>* = *pj* , where *pj* is the probability of success of the *j th* population. Hence, using equation (7) we have *Bi* <sup>=</sup> *Bi pi* ∑ *j*=1 *n Bj pj* . Then assuming population

*i* is the best, i.e., it possesses the largest value of *pj* 's, by lemma 1 and 2 we conclude that *Bi* =1 and *Bj j*≠*i* =0. This concludes the convergence property of the proposed method.

In real-world applications, since there is a cost associated with the data gathering process we need to select the best population in a finite number of decision-making stages. In the next section, we present the proposed decision-making method in the form of a stochastic dynamic programming model in which there is a limited number of decision-making stages available to select the best population.
