**3. Decentralised sequential estimation**

Estimation and robot motion control are carried out using the measurement dissemination-based decentralised fusion architecture [25]. Measurement locations<sup>2</sup> and the corresponding measured concentration values, i.e., the triple *x<sup>i</sup> <sup>k</sup>; y<sup>i</sup> <sup>k</sup>; zi k* � �, are exchanged via the communication network. The protocol is iterative. In the first iteration, platform *i* broadcasts its triple to its neighbours and receives from them their measurement triples. In the second, third and all subsequent iterations, platform *i* broadcasts its newly acquired triples to the neighbours, and accepts from them only the triples that this platform has not seen before (newly acquired). Providing that the communication graph is connected, after a sufficient number of iterations (which depends on the topology of the graph), a complete list of measurement triples from all platforms in the formation, denoted *dk* <sup>¼</sup> *<sup>x</sup><sup>i</sup> <sup>k</sup>; y<sup>i</sup> <sup>k</sup>; z<sup>i</sup> k* � � � � <sup>1</sup>≤*i*<sup>≤</sup> *<sup>N</sup>*, will be available at each platform.

Suppose the posterior density function of the source at discrete-time *k* � 1 and platform *i* be denoted *pi η*<sup>0</sup> ð Þ j*d*<sup>1</sup>:*k*�<sup>1</sup> , where *d*<sup>1</sup>:*k*�<sup>1</sup> � *d*1*, d*2*,* ⋯*, dk*�1. Given *pi η*<sup>0</sup> ð Þ j*d*<sup>1</sup>:*k*�<sup>1</sup> and *dk*, the problem of sequential estimation is to compute the posterior at time *k*, i.e., *pi η*<sup>0</sup> ð Þ j*d*<sup>1</sup>:*<sup>k</sup>* . Using the Bayes rule, the posterior is

$$p\_i(\eta\_0|d\_{1:k}) = \frac{\operatorname{g}(d\_k|\eta\_0)p\_i(\eta\_0|d\_{1:k-1})}{\int \operatorname{g}(d\_k|\eta\_0)p\_i(\eta\_0|d\_{1:k-1})d\eta\_0} \tag{5}$$

where *g dk*j*η*<sup>0</sup> ð Þ is the likelihood function. Assuming that individual platform measurements are conditionally independent, *g dk*j*η*<sup>0</sup> ð Þ can be expressed as

$$g(d\_k | \eta\_0) = \prod\_{i=1}^{N} \ell(\mathbf{z}\_k^i | \eta\_0) = \prod\_{i=1}^{N} \mathcal{P}(\mathbf{z}\_k^i; \mathbf{Q}\_0 \ \rho(\mathbf{r}\_0, \mathbf{r}\_k^i)) \tag{6}$$

where

*β θ<sup>i</sup> <sup>t</sup>*�*<sup>δ</sup>;* **<sup>u</sup>***<sup>i</sup> k* � � <sup>¼</sup> *<sup>θ</sup><sup>i</sup>*

*Unmanned Robotic Systems and Applications*

*<sup>k</sup>*�<sup>1</sup> <sup>¼</sup> *<sup>ε</sup><sup>i</sup>*

*x δ Ti k εi y δ Ti k* 0

*εi <sup>x</sup>* <sup>¼</sup> *xi*

*εi <sup>y</sup>* <sup>¼</sup> *<sup>y</sup><sup>i</sup>*

*<sup>k</sup>*�<sup>1</sup> and *<sup>y</sup><sup>c</sup>*

*<sup>k</sup>* <sup>¼</sup> <sup>⋯</sup> <sup>¼</sup> **<sup>u</sup>***<sup>N</sup>*

*<sup>k</sup>*�<sup>1</sup> would be zero.

*N* ¼ 7 searching platforms at two consecutive time instants.

tion of the formation due to process noise with parameters:

h i<sup>⊺</sup>

where vector **B***<sup>i</sup>*

*<sup>k</sup>*�<sup>1</sup> and *<sup>y</sup><sup>i</sup>*

described transitional density *π θ<sup>i</sup>*

*<sup>k</sup>*�<sup>1</sup> <sup>¼</sup> <sup>⋯</sup> <sup>¼</sup> *<sup>ϕ</sup><sup>i</sup>*

*<sup>k</sup>* <sup>¼</sup> **<sup>u</sup>**<sup>2</sup>

nates of the formation centroid (i.e., *xi*

Here, *xi*

identical (i.e., **u**<sup>1</sup>

correction vectors **B***<sup>i</sup>*

nates *xi*

*ϕ*1 *<sup>k</sup>*�<sup>1</sup> <sup>¼</sup> *<sup>ϕ</sup>*<sup>2</sup>

**Figure 1.**

**18**

*velocity vector.*

at *<sup>k</sup>* � 1 (that is of *xc*

*<sup>k</sup>*�<sup>1</sup> and *<sup>y</sup><sup>i</sup>*

*<sup>t</sup>*�*<sup>δ</sup>* <sup>þ</sup> *<sup>δ</sup>*

*<sup>k</sup>*�<sup>1</sup> � *xi*

*<sup>k</sup>*�<sup>1</sup> � *<sup>y</sup><sup>i</sup>*

trates the trajectories of *N* ¼ 7 autonomous vehicles in a formation using the

**Q** ¼ 0), the vehicles would move in a perfect formation if (a) all control vectors are

*<sup>k</sup>* <sup>¼</sup> *xc <sup>k</sup>*, *y<sup>i</sup> <sup>k</sup>* <sup>¼</sup> *yc*

*An example of a formation of N* ¼ *7 searching platforms at k* ¼ *1, 2. The communication graphs (based on established links between the platforms) are indicated with green lines. Note that communication network topology is time-varying. The red line, starting from the centroid of the formation, indicates the instantaneous*

A robotic platform can communicate with another platform of the formation, if their mutual distance is smaller than a certain range *R*max. Because of process noise in motion, the distance between the vehicles in the formation will vary and consequently the topology of the communication network graph may also vary. For simplicity, we will assume that communication links (when established) are error free. **Figure 1** illustrates the communication graphs of a formation consisting of

*t* j*θi <sup>t</sup>*�*δ;* **<sup>u</sup>***<sup>i</sup> k*

*Vi*

2 6 4

*Vi <sup>k</sup>* sin *ϕ<sup>i</sup>*

*<sup>k</sup>* cos *ϕ<sup>i</sup>*

Ω*i k*

*<sup>k</sup>*�<sup>1</sup> � <sup>Δ</sup>*xi*

*<sup>k</sup>*�<sup>1</sup> � <sup>Δ</sup>*xi*

*<sup>k</sup>*�<sup>1</sup> are the estimates of the coordinates of the formation centroid

*<sup>k</sup>*�<sup>1</sup> refer to the *known i*th vehicle position at *<sup>k</sup>* � 1. **Figure 1** illus-

*<sup>k</sup>*�1, respectively) available to the *<sup>i</sup>*th platform. Coordi-

� �. In the absence of process noise (i.e.,

*<sup>k</sup>* ), and (b) all headings are identical (i.e.,

*<sup>k</sup>*�1). In this case, each platform would know the true coordi-

*k*�1 � � 3 7 <sup>5</sup> <sup>þ</sup> **<sup>B</sup>***<sup>i</sup>*

is introduced to compensate for a distor-

� � ð Þ 4a

� �*:* ð Þ 4b

*<sup>k</sup>*, for *i* ¼ 1*,* …*, N*), and hence the

*<sup>k</sup>*�1*,* (4)

*k*�1 � �

$$\rho\left(\mathbf{r}\_{0},\mathbf{r}\_{k}^{i}\right) = \mathbf{t}\_{0}\mathbf{R}\left(\eta\_{0},\mathbf{r}\_{k}^{i}\right)/\mathbf{Q}\_{0} = \frac{\mathbf{t}\_{0}}{\ln\left(\frac{i}{d}\right)}\exp\left[\frac{\left(\mathbf{X}\_{0} - \mathbf{x}\_{k}^{i}\right)U}{2D}\right] \cdot K\_{0}\left(\frac{d\_{k}^{i}\left(\mathbf{r}\_{0},\mathbf{r}\_{k}^{i}\right)}{\lambda}\right) \tag{7}$$

is independent of *Q*0. The posterior density *pi η*<sup>0</sup> ð Þ j*d*<sup>1</sup>:*<sup>k</sup>* is computed using the Rao-Blackwell dimension reduction scheme [26]. Using the chain rule, the posterior can be expressed as:

$$p\_i(\eta\_0|d\_{1:k}) = p\_i(Q\_0|\mathbf{r}\_0, d\_{1:k}) \cdot \ p\_i(\mathbf{r}\_0|d\_{1:k}) \tag{8}$$

where the posterior of source strength *pi Q*<sup>0</sup> ð Þ j**r**0*; d*<sup>1</sup>:*<sup>k</sup>* will be worked out analytically, while the posterior of source position *pi* ð Þ **r**0j*d*<sup>1</sup>:*<sup>k</sup>* will be computed using a particle filter. Following [27], we express the posterior *pi Q*<sup>0</sup> ð Þ j**r**0*; d*<sup>1</sup>:*k*�<sup>1</sup> with the Gamma distribution whose shape and scale parameters are *κ<sup>k</sup>*�<sup>1</sup> and *ϑ<sup>k</sup>*�1, respectively. That is

$$\begin{split} p\_i(Q\_0|\mathbf{r}\_0, d\_{1:k-1}) &= \mathcal{G}(Q\_0; \kappa\_{k-1}, \theta\_{k-1}) \\ &= \frac{Q\_0^{(\kappa\_{k-1}-1)} e^{-Q\_0/\theta\_{k-1}}}{\theta\_{k-1}^{\kappa\_{k-1}} \Gamma(\kappa\_{k-1})}. \end{split} \tag{9}$$

<sup>2</sup> Because the measurement locations are assumed to be known exactly, they will not be treated as random variables.

Since the conjugate prior of the Poisson distribution is the Gamma distribution [28], the posterior *p Q*<sup>0</sup> ð Þ j**r**0*; d*1:*<sup>k</sup>* is also a Gamma distribution with updated parameters *κ<sup>k</sup>* and *ϑk*, i.e., *p Q*<sup>0</sup> ð Þ¼ j**r**0*; d*1:*<sup>k</sup>* G *Q*<sup>0</sup> ð Þ ; *κk; ϑ<sup>k</sup>* . The computation of *κ<sup>k</sup>* and *ϑ<sup>k</sup>* can be carried out analytically as a function of **r**<sup>0</sup> and the measurement set *dk* <sup>¼</sup> **<sup>r</sup>***<sup>i</sup> <sup>k</sup>; zi k* � � � � <sup>1</sup>≤*i*<sup>≤</sup> *<sup>N</sup>* [27]:

$$\kappa\_k = \kappa\_{k-1} + \sum\_{i=1}^{N} z\_{k\prime}^i \quad \theta\_k = \frac{\theta\_{k-1}}{1 + \theta\_{k-1} \sum\_{i=1}^{N} \rho\left(\mathbf{r}\_0, \mathbf{r}\_k^i\right)}. \tag{10}$$

**4.1 Selection of individual control vectors**

*DOI: http://dx.doi.org/10.5772/intechopen.86540*

the selection of a motion control vector **u***<sup>i</sup>*

**u***i*

where D is the reward function and *z<sup>i</sup>*

to the prior measurement PDF features in (13).

<sup>R</sup>*<sup>i</sup>* <sup>¼</sup> <sup>E</sup> <sup>D</sup> *pi <sup>η</sup>*<sup>0</sup> ð Þ <sup>j</sup>*d*<sup>1</sup>:*k*�<sup>1</sup> *; <sup>z</sup><sup>i</sup>*

*<sup>k</sup>*�<sup>1</sup> ¼ � <sup>ð</sup>

adopt the expected reward defined as

*Hi*

control vector **v** has been applied to collect *zi*

E *H<sup>i</sup> <sup>k</sup> z<sup>i</sup> <sup>k</sup>*ð Þ **<sup>v</sup>** � � � � <sup>¼</sup> <sup>∑</sup>

position *xi*

while *H<sup>i</sup>*

*Hi <sup>k</sup> zi*

where *di*

**21**

*<sup>k</sup> zi*

*<sup>k</sup>*ð Þ **<sup>v</sup>** � � ¼ � <sup>ð</sup>

*<sup>k</sup>* <sup>¼</sup> *xi <sup>k</sup>; y<sup>i</sup> <sup>k</sup>; z<sup>i</sup> k*

probability mass function *P z<sup>i</sup>*

approximately compute *H<sup>i</sup>*

*<sup>k</sup>; yi k* *<sup>k</sup>* <sup>¼</sup> arg max **<sup>v</sup>**∈<sup>U</sup>

A robot platform *i* autonomously decides on the control vector **u***<sup>i</sup>*

*Decentralised Scalable Search for a Hazardous Source in Turbulent Conditions*

infotaxis strategy [13], which can be formulated as a partially observed Markov decision process (POMDP) [29]. The elements of POMDP are (i) the information state, (ii) the set of admissible actions and (iii) the reward function. The information state at time *tk*�<sup>1</sup> is the posterior density *pi η*<sup>0</sup> ð Þ j*d*1:*k*�<sup>1</sup> ; it accurately specifies the *i*th platform current knowledge about the source position and its release rate. Admissible actions can be formed with one or multiple steps ahead. A decision in the context of search is

function. According to Section 2, the space of admissible actions U is continuous with dimensions: linear velocity *V*, angular velocity Ω and duration of motion *T*. In order to reduce the computational complexity of numerical optimisation, U is adopted as a discrete set with only myopic (one step ahead) controls. In addition, U is timeinvariant and identical for all platforms. If V, O and T denote the sets of possible discrete-values of *V*, Ω and *T*, respectively, then U is the Cartesian product V � O � T . The myopic selection of the control vector at time *tk* on platform *i* is expressed as:

<sup>E</sup> <sup>D</sup> *pi <sup>η</sup>*<sup>0</sup> ð Þ <sup>j</sup>*d*<sup>1</sup>:*k*�<sup>1</sup> *; <sup>z</sup><sup>i</sup>*

� �. In reality, this future measurement is not available (the decision

*<sup>k</sup>*ð Þ **<sup>v</sup>** � �<sup>≤</sup> *Hk*�<sup>1</sup> is the future differential entropy (after a hypothetical

*k*):

� �. The expectation operator E in (14) is with respect to the

*<sup>k</sup>*ð Þ **<sup>v</sup>** � �ln *pi <sup>η</sup>*0j*d*<sup>1</sup>:*k*�<sup>1</sup>*; <sup>d</sup><sup>i</sup>*

ℓ *z<sup>i</sup> <sup>k</sup>*j*η*<sup>0</sup>

*<sup>k</sup>*j*d*<sup>1</sup>:*k*�<sup>1</sup> � � � *Hk <sup>z</sup><sup>i</sup>*

collected by the *i*th platform if the platform moved under the control **v**∈U to

has to be made at time *tk*�1), and therefore the expectation operator E with respect

Previous studies of search strategies [3, 20] found that the reward function defined as the *reduction of entropy*, results in the most efficient search. Hence, we

*<sup>k</sup>*ð Þ **<sup>v</sup>** � � � � <sup>¼</sup> *<sup>H</sup><sup>i</sup>*

where *Hk*�<sup>1</sup> is the current differential entropy, defined as

*pi <sup>η</sup>*0j*d*<sup>1</sup>:*k*�<sup>1</sup>*; di*

*<sup>k</sup>*j*d*<sup>1</sup>:*k*�<sup>1</sup> � � <sup>¼</sup> <sup>Ð</sup>

Given that *pi <sup>η</sup>*<sup>0</sup> ð Þ <sup>j</sup>*d*<sup>1</sup>:*k*�<sup>1</sup> is approximated by a particle system <sup>S</sup>*<sup>i</sup>*

*zi k* *P z<sup>i</sup>*

*<sup>k</sup>*�1, which features in (14), as

*<sup>k</sup>* using the

*<sup>k</sup>* ∈U which will maximise the reward

*<sup>k</sup>*ð Þ **<sup>v</sup>** � � � � (13)

*<sup>k</sup>*�<sup>1</sup> � <sup>E</sup> *<sup>H</sup><sup>i</sup>*

*pi η*<sup>0</sup> ð Þ j*d*<sup>1</sup>:*k*�<sup>1</sup> ln *pi η*<sup>0</sup> ð Þ j*d*<sup>1</sup>:*k*�<sup>1</sup> *dη*0*,* (15)

*<sup>k</sup>* is the future concentration measurement

*<sup>k</sup> z<sup>i</sup>*

*<sup>k</sup>*ð Þ **<sup>v</sup>** � �*dη*0*,* (16)

*<sup>k</sup>*ð Þ **<sup>v</sup>** � �*:* (17)

*<sup>k</sup>*�1, one can

� �*pi <sup>η</sup>*<sup>0</sup> ð Þ <sup>j</sup>*d*<sup>1</sup>:*k*�<sup>1</sup> *<sup>d</sup>η*0, that is:

*<sup>k</sup>*ð Þ **<sup>v</sup>** � � � � (14)

The parameters of the prior for source strength, *p Q*<sup>0</sup> ð Þ¼ Gð Þ *κ*0*; ϑ*<sup>0</sup> are chosen so that this density covers a large span of possible values of *Q*0.

Next, we turn our attention to the posterior of source position *pi* ð Þ **r**0j*d*1:*<sup>k</sup>* in the factorised form (8). Given *p*ð Þ **r**0j*d*1:*k*�<sup>1</sup> , the update step of the particle filter using *dk* applies the Bayes rule:

$$p(\mathbf{r}\_0|d\_{1:k}) = \frac{g(d\_k|\mathbf{r}\_0, d\_{1:k-1})p(\mathbf{r}\_0|d\_{1:k-1})}{f(d\_k|d\_{1:k-1})} \tag{11}$$

where *f d*ð Þ¼ *<sup>k</sup>*j*d*<sup>1</sup>:*k*�<sup>1</sup> Ð *g d*ð Þ *<sup>k</sup>*j**r**0*; d*<sup>1</sup>:*k*�<sup>1</sup> *p*ð Þ **r**0j*d*<sup>1</sup>:*k*�<sup>1</sup> *d***r**<sup>0</sup> is a normalisation constant. The problem in using (11) is that the likelihood function *g d*ð Þ *<sup>k</sup>*j**r**0*; d*<sup>1</sup>:*k*�<sup>1</sup> is unknown; only *g dk*j*η*<sup>0</sup> ð Þ of (6) is known. Fortunately, it is possible to derive an analytic expression for *g d*ð Þ *<sup>k</sup>*j**r**0*; d*<sup>1</sup>:*k*�<sup>1</sup> :

$$g(d\_k | \mathbf{r}\_0, d\_{1:k-1}) = \frac{\theta\_k^{\kappa\_k}}{\theta\_{k-1}^{\kappa\_{k-1}}} \frac{\Gamma(\kappa\_k)}{\Gamma(\kappa\_{k-1})} \prod\_{i=1}^N \frac{\rho\left(\mathbf{r}\_0, \mathbf{r}\_k^i\right)^{x\_k^i}}{x\_k^i!} \tag{12}$$

The Rao-Blackwellised particle filter (RBPF) fully describes the posterior *pi η*<sup>0</sup> ð Þ j*d*<sup>1</sup>:*<sup>k</sup>* by a particle system

$$\mathcal{S}\_k^i \equiv \left\{ \boldsymbol{w}\_k^{m,i}, \mathbf{r}\_{0,k}^{m,i}, \kappa\_k^i, \,\vartheta\_k^{m,i} \right\}\_{1 \le m \le M}.$$

Here, *M* is the number of particles, *wm,i <sup>k</sup>* is a (normalised) weight associated with the source position sample **r** *m,i* <sup>0</sup>*,k*, while *κ<sup>i</sup> <sup>k</sup>* and *<sup>ϑ</sup>m,i <sup>k</sup>* are the parameters of the corresponding Gamma distribution for the source strength. Initially, at time *k* ¼ 0, the weights are uniform (and equal to 1*=M*), **r** *m,i k,*0 n o are the points on a regular grid covering a specified search area, while *κ<sup>i</sup>* <sup>0</sup> <sup>¼</sup> *<sup>κ</sup>*<sup>0</sup> and *<sup>ϑ</sup>m,i* <sup>0</sup> ¼ *ϑ*0. The sequential computation of the posterior *pi η*<sup>0</sup> ð Þ j*d*<sup>1</sup>:*<sup>k</sup>* using the RBPF is carried out by a recursive update of the particle system S*<sup>i</sup> <sup>k</sup>* over time.

## **4. Decentralised formation control**

In decentralised multi-robot search, each platform autonomously makes a decision at time *tk*�<sup>1</sup> about its next control vector **<sup>u</sup>***<sup>i</sup> <sup>k</sup>*, as described in Section 4.1. However, in order to maintain the geometric shape of the formation and thus avoid its break-up, there is a need to impose a form of coordination between the platforms. This will be explained in Section 4.2.

*Decentralised Scalable Search for a Hazardous Source in Turbulent Conditions DOI: http://dx.doi.org/10.5772/intechopen.86540*
