**4.1 Selection of individual control vectors**

Since the conjugate prior of the Poisson distribution is the Gamma distribution [28], the posterior *p Q*<sup>0</sup> ð Þ j**r**0*; d*1:*<sup>k</sup>* is also a Gamma distribution with updated parameters *κ<sup>k</sup>* and *ϑk*, i.e., *p Q*<sup>0</sup> ð Þ¼ j**r**0*; d*1:*<sup>k</sup>* G *Q*<sup>0</sup> ð Þ ; *κk; ϑ<sup>k</sup>* . The computation of *κ<sup>k</sup>* and *ϑ<sup>k</sup>* can

*k, <sup>ϑ</sup><sup>k</sup>* <sup>¼</sup> *<sup>ϑ</sup>k*�<sup>1</sup>

The parameters of the prior for source strength, *p Q*<sup>0</sup> ð Þ¼ Gð Þ *κ*0*; ϑ*<sup>0</sup> are chosen so

factorised form (8). Given *p*ð Þ **r**0j*d*1:*k*�<sup>1</sup> , the update step of the particle filter using *dk*

The problem in using (11) is that the likelihood function *g d*ð Þ *<sup>k</sup>*j**r**0*; d*<sup>1</sup>:*k*�<sup>1</sup> is unknown; only *g dk*j*η*<sup>0</sup> ð Þ of (6) is known. Fortunately, it is possible to derive an analytic

> *ϑκk <sup>k</sup>* Γð Þ *κ<sup>k</sup>*

The Rao-Blackwellised particle filter (RBPF) fully describes the posterior

n o

*<sup>k</sup>*�<sup>1</sup> <sup>Γ</sup>ð Þ *<sup>κ</sup><sup>k</sup>*�<sup>1</sup>

*<sup>k</sup>; <sup>ϑ</sup>m,i k*

> *m,i k,*0 n o

<sup>0</sup> <sup>¼</sup> *<sup>κ</sup>*<sup>0</sup> and *<sup>ϑ</sup>m,i*

*<sup>k</sup>* and *<sup>ϑ</sup>m,i*

corresponding Gamma distribution for the source strength. Initially, at time *k* ¼ 0,

putation of the posterior *pi η*<sup>0</sup> ð Þ j*d*<sup>1</sup>:*<sup>k</sup>* using the RBPF is carried out by a recursive

In decentralised multi-robot search, each platform autonomously makes a

its break-up, there is a need to impose a form of coordination between the

However, in order to maintain the geometric shape of the formation and thus avoid

*<sup>k</sup>* over time.

*ϑκk*�<sup>1</sup>

<sup>1</sup> <sup>þ</sup> *<sup>ϑ</sup>k*�<sup>1</sup>∑*<sup>N</sup>*

*g d*ð Þ *<sup>k</sup>*j**r**0*; d*<sup>1</sup>:*k*�<sup>1</sup> *p*ð Þ **r**0j*d*<sup>1</sup>:*k*�<sup>1</sup> *f d*ð Þ *<sup>k</sup>*j*d*<sup>1</sup>:*k*�<sup>1</sup>

*g d*ð Þ *<sup>k</sup>*j**r**0*; d*<sup>1</sup>:*k*�<sup>1</sup> *p*ð Þ **r**0j*d*<sup>1</sup>:*k*�<sup>1</sup> *d***r**<sup>0</sup> is a normalisation constant.

Y *N*

*ρ* **r**0*;* **r***<sup>i</sup> k* � �*<sup>z</sup><sup>i</sup> k*

*zi*

*<sup>k</sup>* is a (normalised) weight associated with

are the points on a regular grid

<sup>0</sup> ¼ *ϑ*0. The sequential com-

*<sup>k</sup>*, as described in Section 4.1.

*<sup>k</sup>* are the parameters of the

*i*¼1

<sup>1</sup><sup>≤</sup> *<sup>m</sup>* <sup>≤</sup> *<sup>M</sup>:*

*<sup>i</sup>*¼1*<sup>ρ</sup>* **<sup>r</sup>**0*;* **<sup>r</sup>***<sup>i</sup>*

*k*

� � *:* (10)

*<sup>k</sup>*! (12)

ð Þ **r**0j*d*1:*<sup>k</sup>* in the

(11)

be carried out analytically as a function of **r**<sup>0</sup> and the measurement set

Next, we turn our attention to the posterior of source position *pi*

*N i*¼1 *zi*

that this density covers a large span of possible values of *Q*0.

*p*ð Þ¼ **r**0j*d*<sup>1</sup>:*<sup>k</sup>*

*g d*ð Þ¼ *<sup>k</sup>*j**r**0*; d*<sup>1</sup>:*k*�<sup>1</sup>

S*i*

Here, *M* is the number of particles, *wm,i*

the weights are uniform (and equal to 1*=M*), **r**

covering a specified search area, while *κ<sup>i</sup>*

**4. Decentralised formation control**

decision at time *tk*�<sup>1</sup> about its next control vector **<sup>u</sup>***<sup>i</sup>*

platforms. This will be explained in Section 4.2.

*<sup>k</sup>* � *wm,i*

*m,i* <sup>0</sup>*,k*, while *κ<sup>i</sup>*

*<sup>k</sup> ;* **r** *m,i* <sup>0</sup>*,k; <sup>κ</sup><sup>i</sup>*

Ð

*dk* <sup>¼</sup> **<sup>r</sup>***<sup>i</sup>*

*<sup>k</sup>; zi k* � � � �

applies the Bayes rule:

where *f d*ð Þ¼ *<sup>k</sup>*j*d*<sup>1</sup>:*k*�<sup>1</sup>

expression for *g d*ð Þ *<sup>k</sup>*j**r**0*; d*<sup>1</sup>:*k*�<sup>1</sup> :

*pi η*<sup>0</sup> ð Þ j*d*<sup>1</sup>:*<sup>k</sup>* by a particle system

the source position sample **r**

update of the particle system S*<sup>i</sup>*

**20**

<sup>1</sup>≤*i*<sup>≤</sup> *<sup>N</sup>* [27]:

*Unmanned Robotic Systems and Applications*

*κ<sup>k</sup>* ¼ *κk*�<sup>1</sup> þ ∑

A robot platform *i* autonomously decides on the control vector **u***<sup>i</sup> <sup>k</sup>* using the infotaxis strategy [13], which can be formulated as a partially observed Markov decision process (POMDP) [29]. The elements of POMDP are (i) the information state, (ii) the set of admissible actions and (iii) the reward function. The information state at time *tk*�<sup>1</sup> is the posterior density *pi η*<sup>0</sup> ð Þ j*d*1:*k*�<sup>1</sup> ; it accurately specifies the *i*th platform current knowledge about the source position and its release rate. Admissible actions can be formed with one or multiple steps ahead. A decision in the context of search is the selection of a motion control vector **u***<sup>i</sup> <sup>k</sup>* ∈U which will maximise the reward function. According to Section 2, the space of admissible actions U is continuous with dimensions: linear velocity *V*, angular velocity Ω and duration of motion *T*. In order to reduce the computational complexity of numerical optimisation, U is adopted as a discrete set with only myopic (one step ahead) controls. In addition, U is timeinvariant and identical for all platforms. If V, O and T denote the sets of possible discrete-values of *V*, Ω and *T*, respectively, then U is the Cartesian product V � O � T . The myopic selection of the control vector at time *tk* on platform *i* is expressed as:

$$\mathbf{u}\_k^i = \arg\max\_{\mathbf{v}\in\mathcal{U}} \mathbb{E}\left\{ \mathcal{D}\left[p\_i(\eta\_0|d\_{1:k-1}), z\_k^i(\mathbf{v})\right] \right\} \tag{13}$$

where D is the reward function and *z<sup>i</sup> <sup>k</sup>* is the future concentration measurement collected by the *i*th platform if the platform moved under the control **v**∈U to position *xi <sup>k</sup>; yi k* � �. In reality, this future measurement is not available (the decision has to be made at time *tk*�1), and therefore the expectation operator E with respect to the prior measurement PDF features in (13).

Previous studies of search strategies [3, 20] found that the reward function defined as the *reduction of entropy*, results in the most efficient search. Hence, we adopt the expected reward defined as

$$\mathcal{R}\_i = \mathbb{E}\{\mathcal{D}[p\_i(\eta\_0|d\_{1:k-1}), z\_k^i(\mathbf{v})] \} = H\_{k-1}^i - \mathbb{E}\{H\_k^i(z\_k^i(\mathbf{v})) \} \tag{14}$$

where *Hk*�<sup>1</sup> is the current differential entropy, defined as

$$H\_{k-1}^i = -\int p\_i(\eta\_0|d\_{1:k-1}) \, \ln p\_i(\eta\_0|d\_{1:k-1})d\eta\_0,\tag{15}$$

while *H<sup>i</sup> <sup>k</sup> zi <sup>k</sup>*ð Þ **<sup>v</sup>** � �<sup>≤</sup> *Hk*�<sup>1</sup> is the future differential entropy (after a hypothetical control vector **v** has been applied to collect *zi k*):

$$H\_k^i(z\_k^i(\mathbf{v})) = -\int p\_i(\eta\_0 | d\_{1:k-1}, d\_k^i(\mathbf{v})) \ln p\_i(\eta\_0 | d\_{1:k-1}, d\_k^i(\mathbf{v})) d\eta\_0,\tag{16}$$

where *di <sup>k</sup>* <sup>¼</sup> *xi <sup>k</sup>; y<sup>i</sup> <sup>k</sup>; z<sup>i</sup> k* � �. The expectation operator E in (14) is with respect to the probability mass function *P z<sup>i</sup> <sup>k</sup>*j*d*<sup>1</sup>:*k*�<sup>1</sup> � � <sup>¼</sup> <sup>Ð</sup> ℓ *z<sup>i</sup> <sup>k</sup>*j*η*<sup>0</sup> � �*pi <sup>η</sup>*<sup>0</sup> ð Þ <sup>j</sup>*d*<sup>1</sup>:*k*�<sup>1</sup> *<sup>d</sup>η*0, that is:

$$\mathbb{E}\left\{H\_k^i\left(z\_k^i(\mathbf{v})\right)\right\} = \sum\_{z\_k^i} P\left\{z\_k^i | d\_{1:k-1}\right\} \cdot H\_k\left(z\_k^i(\mathbf{v})\right). \tag{17}$$

Given that *pi <sup>η</sup>*<sup>0</sup> ð Þ <sup>j</sup>*d*<sup>1</sup>:*k*�<sup>1</sup> is approximated by a particle system <sup>S</sup>*<sup>i</sup> <sup>k</sup>*�1, one can approximately compute *H<sup>i</sup> <sup>k</sup>*�1, which features in (14), as

$$H\_{k-1}^{i} \approx -\sum\_{m=1}^{M} w\_{k-1}^{m,i} \text{ ln } w\_{k-1}^{m,i}.\tag{18}$$

**u**<sup>1</sup> 6¼ **u**<sup>2</sup> 6¼ ⋯*,* 6¼ **u***N*. In order to maintain the shape of the formation during the motion period (from time *tk*�<sup>1</sup> to *tk*), the platforms need to reach an agreement on the common action **u***k*, to be applied to all platforms at the same time. But this is not sufficient; according to the motion model in Section 2, the platforms also need to agree on the formation centroid coordinates and the common heading angle *ϕk*�<sup>1</sup> to

*Decentralised Scalable Search for a Hazardous Source in Turbulent Conditions*

We apply decentralised cooperative control based on the average consensus [30, 31]. In a network of collaborating agents, consensus is an iterative protocol designed to reach an agreement regarding a certain quantity of interest. Suppose that every platform, as a node in the communication network, initially has an individual scalar value. The goal of average consensus is for every node in the network to compute the average of initial scalar values, in a completely

decentralised manner: by communicating only with the neighbours in the communication graph (without knowing the topology of the communication graph).

*<sup>k</sup>*, two formation centroid coordinates, i.e., *xi*

Let us denote the scalar value of interest by *bi*, that is

*bi* ∈ *V<sup>i</sup>*

*<sup>k</sup>;* Ω*<sup>i</sup> <sup>k</sup>; T<sup>i</sup> <sup>k</sup>; x i <sup>k</sup>*�<sup>1</sup>*; <sup>y</sup><sup>i</sup>*

*<sup>k</sup>*�<sup>1</sup> and *<sup>y</sup><sup>i</sup>*

*N* � �*bi*ð Þþ *<sup>s</sup>* � <sup>1</sup>

> *m,i* 0*,k*

values *bi*ð Þ*s* after many iterations converge to the mean *b* [32].

In the problem we consider, there is not only a single individual scalar value, but six of them. They include three motion control parameters, i.e., for platform *i*, *V<sup>i</sup>*

n o*:*

*<sup>i</sup>*¼<sup>1</sup>*bi*. If all platforms in the formation were to use identical average values

Ideally, we want every platform in the formation to compute the mean value

for motion control, centroid coordinates and heading, then their motion would be coordinated (except for process noise, which will be taken care of through vector

*<sup>k</sup>*�<sup>1</sup> in (4)) and the shape of the formation would be maintained (provided *R*max is

Average consensus is an iterative algorithm. At iteration *s* ¼ 0, the node in the communication graph (the robot platform) will initialise its state *bi*ð Þ 0 using either

*<sup>k</sup>*�<sup>1</sup> (if *bi* is a formation centroid coordinate or heading angle). This value is locally available. The initial values of centroid coordinates are the actual *i*th platform

*<sup>k</sup>*�<sup>1</sup>ð Þ¼ <sup>0</sup> *<sup>y</sup><sup>i</sup>*

*s* ¼ 1*,* 2*,* ⋯, each platform updates its state with a linear combination of its own state and the states of its current neighbours. Let us denote the set of current neighbours

where ∣J *<sup>i</sup>*∣ is the number of neighbours of platform *i*. This particular linear combination is based on the so-called *maximum degree weights* [32]. Other weights can be also used. It can be shown that if the communication graph is connected, the

The search continues until the global stopping criterion is satisfied. The local stopping criterion is calculated on each platform independently based on the spread

its sample covariance matrix **C***k*. For example, if the spread of particles on platform *i* is smaller than a certain threshold *ϖ*, then the local stopping criterion is satisfied

*<sup>k</sup>*�<sup>1</sup>, *<sup>y</sup><sup>i</sup>*

*<sup>k</sup>*�<sup>1</sup>*; <sup>ϕ</sup><sup>i</sup> k*�1

*<sup>k</sup>* (if *bi* is a motion control parameter) or the platform pose

1 *<sup>N</sup>* <sup>∑</sup> *j*∈J *<sup>i</sup>*

n o, measured by the square-root of the trace of

*<sup>k</sup>*�<sup>1</sup>. At each following iteration

*bj*ð Þ *s* � 1 (21)

*k*,

*<sup>k</sup>*�<sup>1</sup> and the heading angle

be applied in (4).

of each platform *ϕ<sup>i</sup>*

*<sup>k</sup>*�1.

*DOI: http://dx.doi.org/10.5772/intechopen.86540*

*<sup>k</sup>*�<sup>1</sup>ð Þ¼ <sup>0</sup> *xi*

*bi*ðÞ¼ *<sup>s</sup>* <sup>1</sup> � <sup>∣</sup><sup>J</sup> *<sup>i</sup>*<sup>∣</sup>

Ω*i <sup>k</sup>* and *T<sup>i</sup>*

*<sup>b</sup>* <sup>¼</sup> <sup>1</sup> *<sup>N</sup>* <sup>∑</sup>*<sup>N</sup>*

**B***i*

*θi*

**23**

adequate).

a component of vector **u***<sup>i</sup>*

of platform *i* by J *<sup>i</sup>*. Then [30]:

of the local positional particles **r**

coordinates, i.e., *xi*

In order to compute E *H<sup>i</sup> <sup>k</sup> z<sup>i</sup> <sup>k</sup>*ð Þ **<sup>v</sup>** � � � � of (17), first note that *P z<sup>i</sup> <sup>k</sup>*j*d*1:*k*�<sup>1</sup> � � <sup>¼</sup> <sup>P</sup> *zi <sup>k</sup>*; ^*μ i k*�1 � �, where *<sup>μ</sup>*^*<sup>i</sup> <sup>k</sup>*�<sup>1</sup> is the predicted mean rate of chemical particle encounters at location **r***<sup>i</sup> <sup>k</sup>* (where the platform *i* would move after applying a hypothetical control **v**), computed based on *d*1:*k*�1. According to Section 2,

$$\hat{\mu}\_{k-1}^{i} \approx \sum\_{m=1}^{M} \, \omega\_{k-1}^{m,i} \kappa\_{k-1}^{i} \theta\_{k-1}^{m,i} \rho \left( \mathbf{r}\_{0,k-1}^{m,i}, \mathbf{r}\_{k}^{i} \right) \tag{19}$$

where the product *κ<sup>i</sup> <sup>k</sup>*�1*ϑm,i <sup>k</sup>*�<sup>1</sup> approximates the source release rate as the mean of the Gamma distribution with parameters *κ<sup>i</sup> <sup>k</sup>*�1*; <sup>ϑ</sup>m,i k*�1 � �. Next, we find the value of *z*max as the minimum value of *z*<sup>0</sup> such that the cumulative distribution function ∑*<sup>z</sup>*<sup>0</sup> *<sup>z</sup>*¼<sup>0</sup><sup>P</sup> *<sup>z</sup>*; ^*μ<sup>i</sup> k*�1 � � is greater than a certain threshold 1 � *<sup>ε</sup>*, where *<sup>ε</sup>* <sup>≪</sup> 1. The summation (17) is then computed only for *zi <sup>k</sup>* <sup>¼</sup> <sup>0</sup>*,* <sup>1</sup>*,* <sup>⋯</sup>*, z*max. Computation of *Hk <sup>z</sup><sup>i</sup> <sup>k</sup>*ð Þ **<sup>v</sup>** � � is carried out according to (18), except that *wm,i <sup>k</sup>*�<sup>1</sup> is replaced with *wm,i <sup>k</sup>* <sup>¼</sup> *<sup>w</sup>m,i <sup>k</sup>*�<sup>1</sup> � <sup>P</sup> *<sup>z</sup><sup>i</sup> <sup>k</sup>*; *<sup>μ</sup>m,i k*�1 � �, where

$$
\boldsymbol{\mu}\_{k-1}^{m,i} = \boldsymbol{\kappa}\_{k-1}^{i} \boldsymbol{\Phi}\_{k-1}^{m,i} \boldsymbol{\rho} \left( \mathbf{r}\_{0,k-1}^{m,i}, \mathbf{r}\_k^i \right).
$$

Thus, (17) is approximated with

$$\mathbb{E}\{H\_k^i(z\_k^i(\mathbf{v}))\} \approx \sum\_{x=0}^{x\_{\text{max}}} \mathcal{P}\left(x; \hat{\mu}\_{k-1}^i\right) \left[-\sum\_{m=1}^M w\_k^{m,i} \ln w\_k^{m,i}\right] \tag{20}$$

Pseudo-code of the routine for the computation of control vector on platform *i* is given by Algorithm 1.

Algorithm 1 Computation of **u***<sup>i</sup> k* n o

1: **Input:** S*<sup>i</sup> <sup>k</sup>*�<sup>1</sup> � *<sup>w</sup>m,i <sup>k</sup>*�<sup>1</sup>*;* **<sup>r</sup>** *m,i* <sup>0</sup>*,k*�<sup>1</sup>*; <sup>κ</sup><sup>i</sup> <sup>k</sup>*�<sup>1</sup>*; <sup>ϑ</sup>m,i k*�1 <sup>1</sup><sup>≤</sup> *<sup>m</sup>* <sup>≤</sup> *<sup>M</sup>,* 2: Compute *Hk*�<sup>1</sup> using (18) 3: Create admissible set U ¼ V � O � T 4: **for** every **v**∈ U **do** 5: Compute the future platform location **r***<sup>i</sup> <sup>k</sup>*ð Þ **v** 6: Compute *μ*^*<sup>i</sup> <sup>k</sup>*�<sup>1</sup> using (19) 7: Determine *z*max s.t. ∑*<sup>z</sup>*max *<sup>z</sup>*¼<sup>0</sup><sup>P</sup> *<sup>z</sup>*; ^*μ<sup>i</sup> k*�1 � �>1 � *<sup>ε</sup>* 8: Compute E *H<sup>i</sup> <sup>k</sup> z<sup>i</sup> <sup>k</sup>*ð Þ **<sup>v</sup>** � � � � using (20) 9: Calculate the expected reward ℛ*<sup>i</sup>* using (14) 10: **end for** 11: Find **u***<sup>i</sup> <sup>k</sup>* using (13)

12: **Output: u***<sup>i</sup> k*

#### **4.2 Cooperative control through consensus**

So far, we have explained how platform *i* would independent of the other platforms in the formation determine the best action for itself, i.e., **u***<sup>i</sup> <sup>k</sup>*. In general, individual platforms will disagree on the best action, and in the extreme

*Decentralised Scalable Search for a Hazardous Source in Turbulent Conditions DOI: http://dx.doi.org/10.5772/intechopen.86540*

**u**<sup>1</sup> 6¼ **u**<sup>2</sup> 6¼ ⋯*,* 6¼ **u***N*. In order to maintain the shape of the formation during the motion period (from time *tk*�<sup>1</sup> to *tk*), the platforms need to reach an agreement on the common action **u***k*, to be applied to all platforms at the same time. But this is not sufficient; according to the motion model in Section 2, the platforms also need to agree on the formation centroid coordinates and the common heading angle *ϕk*�<sup>1</sup> to be applied in (4).

We apply decentralised cooperative control based on the average consensus [30, 31]. In a network of collaborating agents, consensus is an iterative protocol designed to reach an agreement regarding a certain quantity of interest. Suppose that every platform, as a node in the communication network, initially has an individual scalar value. The goal of average consensus is for every node in the network to compute the average of initial scalar values, in a completely decentralised manner: by communicating only with the neighbours in the communication graph (without knowing the topology of the communication graph).

In the problem we consider, there is not only a single individual scalar value, but six of them. They include three motion control parameters, i.e., for platform *i*, *V<sup>i</sup> k*, Ω*i <sup>k</sup>* and *T<sup>i</sup> <sup>k</sup>*, two formation centroid coordinates, i.e., *xi <sup>k</sup>*�<sup>1</sup>, *<sup>y</sup><sup>i</sup> <sup>k</sup>*�<sup>1</sup> and the heading angle of each platform *ϕ<sup>i</sup> <sup>k</sup>*�1.

Let us denote the scalar value of interest by *bi*, that is

$$b\_i \in \left\{ V\_k^i, \Omega\_k^i, \ T\_k^i, \overline{\varpi}\_{k-1}^i, \overline{\eta}\_{k-1}^i, \phi\_{k-1}^i \right\}.$$

Ideally, we want every platform in the formation to compute the mean value *<sup>b</sup>* <sup>¼</sup> <sup>1</sup> *<sup>N</sup>* <sup>∑</sup>*<sup>N</sup> <sup>i</sup>*¼<sup>1</sup>*bi*. If all platforms in the formation were to use identical average values for motion control, centroid coordinates and heading, then their motion would be coordinated (except for process noise, which will be taken care of through vector **B***i <sup>k</sup>*�<sup>1</sup> in (4)) and the shape of the formation would be maintained (provided *R*max is adequate).

Average consensus is an iterative algorithm. At iteration *s* ¼ 0, the node in the communication graph (the robot platform) will initialise its state *bi*ð Þ 0 using either a component of vector **u***<sup>i</sup> <sup>k</sup>* (if *bi* is a motion control parameter) or the platform pose *θi <sup>k</sup>*�<sup>1</sup> (if *bi* is a formation centroid coordinate or heading angle). This value is locally available. The initial values of centroid coordinates are the actual *i*th platform coordinates, i.e., *xi <sup>k</sup>*�<sup>1</sup>ð Þ¼ <sup>0</sup> *xi <sup>k</sup>*�<sup>1</sup> and *<sup>y</sup><sup>i</sup> <sup>k</sup>*�<sup>1</sup>ð Þ¼ <sup>0</sup> *<sup>y</sup><sup>i</sup> <sup>k</sup>*�<sup>1</sup>. At each following iteration *s* ¼ 1*,* 2*,* ⋯, each platform updates its state with a linear combination of its own state and the states of its current neighbours. Let us denote the set of current neighbours of platform *i* by J *<sup>i</sup>*. Then [30]:

$$b\_i(\mathfrak{s}) = \left(\mathbf{1} - \frac{|\mathcal{I}\_i|}{N}\right) b\_i(\mathfrak{s} - \mathbf{1}) + \frac{\mathbf{1}}{N} \sum\_{j \in \mathcal{I}\_i} b\_j(\mathfrak{s} - \mathbf{1}) \tag{21}$$

where ∣J *<sup>i</sup>*∣ is the number of neighbours of platform *i*. This particular linear combination is based on the so-called *maximum degree weights* [32]. Other weights can be also used. It can be shown that if the communication graph is connected, the values *bi*ð Þ*s* after many iterations converge to the mean *b* [32].

The search continues until the global stopping criterion is satisfied. The local stopping criterion is calculated on each platform independently based on the spread of the local positional particles **r** *m,i* 0*,k* n o, measured by the square-root of the trace of its sample covariance matrix **C***k*. For example, if the spread of particles on platform *i* is smaller than a certain threshold *ϖ*, then the local stopping criterion is satisfied

*Hi*

*<sup>k</sup> z<sup>i</sup>*

, where *μ*^*<sup>i</sup>*

In order to compute E *H<sup>i</sup>*

ticle encounters at location **r***<sup>i</sup>*

where the product *κ<sup>i</sup>*

*k*�1

*<sup>k</sup>*�<sup>1</sup> � <sup>P</sup> *<sup>z</sup><sup>i</sup>*

given by Algorithm 1.

1: **Input:** S*<sup>i</sup>*

10: **end for** 11: Find **u***<sup>i</sup>*

**22**

12: **Output: u***<sup>i</sup>*

*<sup>k</sup>*; ^*μ i k*�1 � �

*Unmanned Robotic Systems and Applications*

*μ*^*i <sup>k</sup>*�<sup>1</sup>≈ ∑ *M m*¼1

the Gamma distribution with parameters *κ<sup>i</sup>*

carried out according to (18), except that *wm,i*

tion (17) is then computed only for *zi*

*<sup>k</sup>*; *<sup>μ</sup>m,i k*�1 � �, where

Thus, (17) is approximated with

E *H<sup>i</sup> <sup>k</sup> z<sup>i</sup> <sup>k</sup>*ð Þ **<sup>v</sup>** � � � � ≈ ∑

Algorithm 1 Computation of **u***<sup>i</sup>*

2: Compute *Hk*�<sup>1</sup> using (18)

4: **for** every **v**∈ U **do**

6: Compute *μ*^*<sup>i</sup>*

8: Compute E *H<sup>i</sup>*

*<sup>k</sup>*�<sup>1</sup> � *<sup>w</sup>m,i*

7: Determine *z*max s.t. ∑*<sup>z</sup>*max

*<sup>k</sup>* using (13)

**4.2 Cooperative control through consensus**

*k*

*<sup>k</sup>*�<sup>1</sup>*;* **<sup>r</sup>** *m,i* <sup>0</sup>*,k*�<sup>1</sup>*; <sup>κ</sup><sup>i</sup>*

5: Compute the future platform location **r***<sup>i</sup>*

*<sup>k</sup> z<sup>i</sup>*

*<sup>k</sup>*�<sup>1</sup> using (19)

9: Calculate the expected reward ℛ*<sup>i</sup>* using (14)

3: Create admissible set U ¼ V � O � T

*<sup>k</sup>*�1*ϑm,i*

*μm,i <sup>k</sup>*�<sup>1</sup> <sup>¼</sup> *<sup>κ</sup><sup>i</sup>*

*P z<sup>i</sup>*

∑*<sup>z</sup>*<sup>0</sup>

*wm,i <sup>k</sup>* <sup>¼</sup> *<sup>w</sup>m,i*

*<sup>z</sup>*¼<sup>0</sup><sup>P</sup> *<sup>z</sup>*; ^*μ<sup>i</sup>*

*<sup>k</sup>*j*d*1:*k*�<sup>1</sup> � � <sup>¼</sup> <sup>P</sup> *zi* *<sup>k</sup>*�<sup>1</sup>≈ � ∑

hypothetical control **v**), computed based on *d*1:*k*�1. According to Section 2,

*wm,i <sup>k</sup>*�1*κ<sup>i</sup>*

*z*max as the minimum value of *z*<sup>0</sup> such that the cumulative distribution function

*<sup>k</sup>*�<sup>1</sup>*ϑm,i <sup>k</sup>*�<sup>1</sup> *<sup>ρ</sup>* **<sup>r</sup>**

P *z*; ^*μ<sup>i</sup> k*�1 � � � <sup>∑</sup>

*<sup>k</sup>*�<sup>1</sup>*; <sup>ϑ</sup>m,i k*�1

*<sup>z</sup>*¼<sup>0</sup><sup>P</sup> *<sup>z</sup>*; ^*μ<sup>i</sup>*

*<sup>k</sup>*ð Þ **<sup>v</sup>** � � � � using (20)

*k*�1 � �>1 � *<sup>ε</sup>*

So far, we have explained how platform *i* would independent of the other

platforms in the formation determine the best action for itself, i.e., **u***<sup>i</sup>*

individual platforms will disagree on the best action, and in the extreme

Pseudo-code of the routine for the computation of control vector on platform *i* is

*z*max *z*¼0

*k*

n o

*M m*¼1

*wm,i*

*<sup>k</sup>*ð Þ **<sup>v</sup>** � � � � of (17), first note that

*<sup>k</sup>*�1*ϑm,i <sup>k</sup>*�<sup>1</sup> *<sup>ρ</sup>* **<sup>r</sup>**

*<sup>k</sup>*�<sup>1</sup> ln *<sup>w</sup>m,i*

*<sup>k</sup>*�1*:* (18)

(19)

*<sup>k</sup>*ð Þ **<sup>v</sup>** � � is

(20)

*<sup>k</sup>*. In general,

*<sup>k</sup>*�<sup>1</sup> is the predicted mean rate of chemical par-

� �. Next, we find the value of

*<sup>k</sup>* (where the platform *i* would move after applying a

*<sup>k</sup>*�<sup>1</sup> approximates the source release rate as the mean of

*<sup>k</sup>* <sup>¼</sup> <sup>0</sup>*,* <sup>1</sup>*,* <sup>⋯</sup>*, z*max. Computation of *Hk <sup>z</sup><sup>i</sup>*

*:*

*wm,i <sup>k</sup>* ln *<sup>w</sup>m,i k*

� �

*<sup>k</sup>*�<sup>1</sup> is replaced with

*M m*¼1

*m,i* <sup>0</sup>*,k*�1*;* **<sup>r</sup>***<sup>i</sup> k* � �

*<sup>k</sup>*�1*; <sup>ϑ</sup>m,i k*�1

*m,i* <sup>0</sup>*,k*�<sup>1</sup>*;* **<sup>r</sup>***<sup>i</sup> k* � �

<sup>1</sup><sup>≤</sup> *<sup>m</sup>* <sup>≤</sup> *<sup>M</sup>,*

*<sup>k</sup>*ð Þ **v**

� � is greater than a certain threshold 1 � *<sup>ε</sup>*, where *<sup>ε</sup>* <sup>≪</sup> 1. The summa-

and is given a value of one, otherwise it is zero. This local stopping criterion value (zero or one) becomes the initial state of the global stopping criterion on platform *i*, denoted *σi*ð Þ 0 :

$$\sigma\_i(\mathbf{0}) = \begin{cases} 1 & \text{if } \sqrt{tr[\mathbf{C}\_k]} < \sigma \\ 0 & \text{otherwise} \end{cases} \tag{22}$$

measurement triples and in the consensus algorithm, was fixed to 30. The local

*Decentralised Scalable Search for a Hazardous Source in Turbulent Conditions*

**Figure 2** displays the top-down view of the search progress at step indices *k* ¼ 0*;* 12*;* 22*;* 32. The formation consists of *N* ¼ 7 platforms, whose trajectories are shown in different colours. The search algorithm terminated at *K* ¼ 33. Note that the plume size is much smaller than the search area. Panels **(a)–(c)** of **Figure 2** show the particles before resampling: the particles are placed on a regular grid, thus mimicking a grid-based approach, with the value of particle weights indicated by the grey-scale intensity plot (white means a zero weight). This provides a good visual representation of the posterior *p*ð Þ **r**0j*d*1:*<sup>k</sup>* . Panel **(d)** shows the situation after a non-zero concentration measurement was collected by the search team. The positional particles have been resampled at this point of time and moved closer to the

Using 200 Monte Carlo simulations, the mean search time for the algorithm was 2525 a.u., with a 5th and 95th quantile of 1840 and 3445 a.u., respectively. Note that in all simulations the formation started from the bottom right hand corner indicated

*Experimental dataset: an illustrative run of the decentralised multi-robot search using N* ¼ *7 platforms. Graphs (a)–(d) show the positions and trajectories of the platforms at step indices k* ¼ *0,12,22 and 32, respectively. The concentration of the plume is represented in grey-scale (darker colours represent higher*

search stopping threshold was *ϖ* ¼ 3.

*DOI: http://dx.doi.org/10.5772/intechopen.86540*

true source location.

in **Figure 2(a)**.

**Figure 2.**

**25**

*concentration).*

The global stopping criterion is computed on each platform using the average consensus algorithm, using (21), but with *bi* replaced by *σi*. After a sufficient number of iterations, *S*, platform *i* decides to stop the search if at least one of the platforms in the formation has reached the local stopping criterion, that is, if *σi*ð Þ*S* >0.

We point out that both estimation and control are based on the consensus algorithm. While the cooperative control is using the *average* consensus (21), the decentralised measurement dissemination of Section 3 achieves the consensus on the set of measurements at time *k*. The consensus algorithm is iterative, and hence its convergence properties are very important. First note that, although the network topology changes with time (as the robots move while searching for the source), during the short interval of time when the exchange of information takes place, the topology can be considered as *time-invariant*. Furthermore, assuming bidirectional communication between the robots in formation, the network topology can be represented by an undirected graph. The convergence of the consensus algorithm for a time-invariant undirected communication topology is guaranteed if the graph is connected [31–33]. Note that this theoretical result is valid for an infinite number of iterations. In practice, if the communication graph at some point of time is not connected, or if an insufficient number of consensus iterations are performed, it may happen that one or more robots are lost (they could re-join the formation only by coincidence). This event, however, does not mean that the search mission has failed: the emitting source will be found eventually, albeit by a smaller formation in possibly longer interval of time.
