Object Recognition and Tracking Using the Particle Estimator

*Edgardo Comas and Adrián Stácul*

#### **Abstract**

In this chapter we describe the particle estimators and its effectiveness for tracking objects in video sequences. The particles estimators are specifically advantageous in transition state models and measurements, especially when these are non-linear and not Gaussian. Once the target object to follow has been identified (in position and size) its main characteristics are obtained using algorithms such as FAST, SURF, BRIEF or ORB. As the particle estimator is a recursive Bayesian estimator, where observations update the probability of validating a hypothesis, that is, they use all the available information to reduce the amount of uncertainty present in an inference or decision problem. Therefore, the main characteristics of the object to follow are those that will determine the probability of validating the hypothesis in the particle estimator. Finally, as an example, the application of a particle estimator is described in a real case of tracking an object in a sequence of infrared images.

**Keywords:** recognition, tracking, estimator, image analysis, image processing

#### **1. Introduction**

The first step in tracking object in an image sequence is to identify the reference object to be tracked; this will allow determining its attributes to carry out its identification by means of some of the main characteristics of the image of the object, such as the characteristics points. It should be noted that if is known: the initial position of the object to be tracked in the camera coordinates and the mathematical model of the camera, it is possible, in addition to tracking the object, to estimate its coordinates and moving in this reference system. One of the techniques applied for object tracking to which we will particularly refer is the particle estimator. This technique is a special type of Monte Carlo sequential method, one of its main advantages being its applicability to any model of transition of states and observations, especially when these are non-linear and non-Gaussian [1]. Although the Kalman estimator, which is applied to systems where the system evolution and measurement models are linear, with Gaussian noise, and with known mean and variance; has extensions to nonlinear models by applying techniques to achieve linearity, the Gaussian additive noise constraint cannot be overcome [1, 2].

Therefore, the particle estimators do not have the restrictive hypothesis of the Kalman estimator, so they can be applied to non-linear models with non-Gaussian and multimodal noise, where the reliable numerical estimate is a function of an adequate number of samples [3].

This estimator distributes N particles over the image, and the observations made on each one of them update their probability of validating a hypothesis, that is, they use all the available information to reduce the uncertainty present in an inference or decision problem [2]. In some cases where the images are not clear or noisy, particularly those acquired through infrared cameras, it is necessary to make an improvement before applying the estimator; generally this improvement is based on a reduction in the incidence of the background image, for the special case of infrared images subtracting the value from the mean intensity and modifying its histogram to increase the contrast result a good choice.

Faced with rapid and unpredictable movements of the referent or the camera, the resampling process considers a scattering value based on the number of valid particles, so that the area covered during tracking is dynamically modified. Basically, the particle estimator provides us with a framework, in which it is possible to insert different algorithms for the recognition of the reference image in each particle; some of them are SURF, BRIEF, ORB, etc. These algorithms have the ability to generate a set of invariant features against some image variations, such as: scaling, rotation, illumination and with robustness against occlusion conditions.

#### **2. Particle estimator tracking**

To define the state estimation problem let us consider the system model, composed of the state evolution and observation models described by the following equations:

$$\mathbf{X}\_{\mathbf{k}} = \mathbf{f}(\mathbf{X}\_{\mathbf{k-1}}, \mathbf{v}\_{\mathbf{k-1}}),\tag{1}$$

$$\mathbf{Z}\_{\mathbf{k}} = \mathbf{h}(\mathbf{X}\_{\mathbf{k}}, \mathbf{n}\_{\mathbf{k}}). \tag{2}$$

Where X ϵ Rn contains all the state variables that will be dynamically estimated, f is the non-linear function of the state variables, v ϵ R<sup>n</sup> represents the state noise system, Z ϵ Rn are all observations related with the state variables by (Eq. (2)), n ϵ R<sup>n</sup> is the measurement noise, and h is known as an observation model. Remembering that P að Þ jb is the conditional probability of a if b, then the evolution and observation models given by (Eqs. (1) and (2)) are based on the following hypotheses referring to the following sequences [2, 4]:

a. Xk, k ¼ 1, 2, … is a Markov process

$$\mathbf{P}(\mathbf{X}\_{\mathbf{k}}|\mathbf{X}\_{0}, \mathbf{X}\_{1}, \dots \mathbf{X}\_{\mathbf{k}-1}) = \mathbf{P}(\mathbf{X}\_{\mathbf{k}}|\mathbf{X}\_{\mathbf{k}-1}),\tag{3}$$

b. Zk, k ¼ 1, 2, … is a Markov process regarding the historical data of X such that

$$\mathbf{P}(\mathbf{Z}\_{\mathbf{k}}|\mathbf{X}\_{0}, \mathbf{X}\_{1}, \dots \mathbf{X}\_{\mathbf{k}}) = \mathbf{P}(\mathbf{Z}\_{\mathbf{k}}|\mathbf{X}\_{\mathbf{k}}),\tag{4}$$

c. and the sequence of past observations only depends on its history, that is

$$\mathbf{P}(\mathbf{X}\_{\mathbf{k}}|\mathbf{X}\_{\mathbf{k}-1}, \mathbf{Z}\_{\mathbf{k}-1}) = \mathbf{P}(\mathbf{X}\_{\mathbf{k}}|\mathbf{X}\_{\mathbf{k}-1}).\tag{5}$$

Considering that the system and observation noises vi ∧ vj and ni ∧ nj are mutually independent of i ∧ j and also the initial state for all i 6¼ j; and on the other hand *Object Recognition and Tracking Using the Particle Estimator DOI: http://dx.doi.org/10.5772/intechopen.99615*

we know that P Xð Þ¼ <sup>0</sup>jZ0 P Xð Þ<sup>0</sup> , then, the two-step Bayesian estimator, prediction and update allows us to obtain the probability density P Xi ð Þ¼ jZi P Xð Þ<sup>i</sup> [3, 5].

The particle estimator represents the posterior probability density of a random set of samples with their probabilities of validation with the hypothesis, so it is possible to estimate the most likely particle from this set. When we make the number of particles approaches to infinity this process approaches to the a posteriori likelihood function, and the solution approaches an optimal Bayesian estimator [5].

To go into the details of the particle estimator for object tracking, we will rely on the importance sampling method, taking a set of samples from the state space, which characterizes the a posteriori probability density function *p X*ð Þ <sup>0</sup>:*k*j*Z*1:*<sup>k</sup>* for the state:

$$X\_{0:k}^i = \{ \mathbf{X}\_i, \mathbf{i} = \mathbf{0}, \dots, \mathbf{k} \}, \tag{6}$$

while that the corresponding observations are Z0:<sup>k</sup> ¼ Zi f g , i ¼ 0, … k , then, the a posteriori density in tk can be approximated by:

$$p(\mathbf{X}\_{0:k}|\mathbf{Z}\_{1:k}) \approx \sum\_{i=1}^{N} W\_k^i \delta(\mathbf{X}\_{0:k} - \mathbf{X}\_{0:k}^i),\tag{7}$$

where <sup>δ</sup>ð Þ*:* is Dirac's delta function, *<sup>N</sup>* the total number of particles and W<sup>i</sup> k � �<sup>N</sup> i¼1 are the assigned weighting.

Considering the hypotheses corresponding to the expressions (Eqs. (1) and (2)) the density a posteriori (Eq. (7)) can be written as [6, 7]:

$$p(\mathbf{X}\_k | \mathbf{Z}\_{1:k}) \approx \sum\_{i=1}^{N} W\_k^i \delta(\mathbf{X}\_k - \mathbf{X}\_k^i),\tag{8}$$

and the evaluation of the weights within the importance sampling principle assumes that there is an evaluable probability density function *<sup>p</sup>*ð Þ *<sup>X</sup>* such that:

$$\mathcal{W}\_k^i \text{ap}(\mathbf{X}\_k | \mathbf{Z}\_{1:k}) \tag{9}$$

and *W<sup>i</sup> <sup>k</sup>* are normalized according to (Eqs. (10) and (11)).

$$\sum\_{i=1}^{N} W\_k^i = \mathbf{1}, \ldots \tag{10}$$

$$\boldsymbol{W}\_k^i = \frac{\boldsymbol{w}\_k \left( \mathbf{X}\_{i=1:n}^i \right)}{\sum\_{j=1}^N \boldsymbol{w}\_k \left( \mathbf{X}\_{j=1:n}^j \right)} \tag{11}$$

This algorithm has a common problem known as the degeneracy phenomenon, which manifests itself after a few states, where all but a few particles (usually one) have negligible weight [3, 6]. This can be solved resampling the particles, however this creates another problem, which is the increasing information uncertainty arising in the random sampling process [8]. However, this problem can be detected by means of what is known as the effective sample size *Neff* , which can be estimated by means of the (Eq. (12)).

$$N\_{\mathcal{G}\overline{\mathcal{Y}}} = \frac{1}{\sum\_{i=1}^{N} \left(\mathcal{W}\_k^i\right)^2} \tag{12}$$

When all particles have the same weight, i.e. *W<sup>i</sup> <sup>k</sup>* <sup>¼</sup> <sup>1</sup> *<sup>N</sup>*, for ¼ 1, … , *N*, then the effectiveness is maximum and equal to *Neff* ¼ *N*, but in the case where all but one particle has zero weight, the effectiveness is minimum and equal to *Neff* ¼ 1 [7].

While the correct choice of the probability density function *<sup>p</sup>*ð Þ *<sup>X</sup>* to evaluate the particles weights (Eq. (9)), minimizes the problem of the degeneracy phenomenon; but to solve it, a resampling has to be incorporated into the algorithm; this incorporation is known as the Sequential Importance Resampling (SIR). This technique is applied in the case where the effective sample size *Neff* falls below a threshold value *NT*, its effect is to remove particles with small weights and replicate those with greater weights.

## **2.1 Particle estimator algorithm**

In the following, we describe the six steps of the particle estimator algorithm applied to video object tracking:

	- An observation function Ηð Þ*<sup>k</sup>* , used for the evaluation of the similarity probability for each particle with the referent object.
	- Determine whether image pre-processing is required.
	- The number of particles N.
	- The threshold number of particles Neff, to determine which type of resampling strategy to use.
	- The noise distribution function <sup>χ</sup>ð Þ <sup>k</sup> , applied for resampling.
	- Identification on the image the object to be tracked in its position and size; this operation is performed by the user and defining the reference particle.
	- Determination of its main characteristics by means of the observation function Ηð Þ <sup>k</sup> ; generating the information for the identification of the reference particle.
	- Generation of a set of N particles in random position over the whole image, if there is a priori information of its location; this is used for the positioning of the particles centered on it, and over this a random distribution.
	- The set of particles are initialized with the normalized weights, with the same values Wk <sup>¼</sup> <sup>1</sup> N.
	- For each particle, their normalized weights are calculated, based on the state probability of similarity to the reference.
	- The effective number of particles Neff is evaluated, and if it is lower than the threshold value NT, the lowest weight particles are discarded.
	- From this new subset of particles most likely to match the reference particle, the new set of N particles for the next state is created.
	- The state of this new set of particles is modified by introducing the additive noise χð Þ <sup>k</sup> , that brings variability to the system.
	- You are returned to the Update stage c), as long as the data sequence is not finished.

#### **2.2 Observation function**

Observation functions are those that allow me to extract the main characteristics of an image. In the particle estimator they are used to obtain the main characteristics of the reference image and those of the particles. These sets of main characteristics allow determine the probability of similarity between the reference and each particle. Some of the main algorithms are described below.

#### *2.2.1 Features from accelerated segment test (FAST) algorithm*

The FAST algorithm is basically searching over the whole image the points where the changes in intensity in all directions are significantly (corner detection method) [9]. The principal advantage of this algorithm is its high speed performance, and is very suitable for real time applications in computer vision processing. Exist several technics to find a characteristic point in an image; two of this is described below.

In the first one, and as parameters of the algorithm, a threshold value T and a radius r are defined for the evaluation. On the pixel p to evaluate and which has an intensity Ip, a Bresenham circle of radius r is considered (see **Figure 1**).

This circle of radius r defines a set of N points, if in this circle there is a set of n pixels whose inensity is greater than Ip <sup>þ</sup> <sup>T</sup> or less than Ip � <sup>T</sup> , then the pixel p can be considered as a characteristic point. The values taken by the authors after the experimental results are: n≥0*:*75N to consider a pixel p as a characteristic point, and the value of the radius r ¼ 3, which defines N ¼ 16 and n ¼ 12.

In a first step, the intensity Ip of pixel p is compared with the intensity of the pixels 1, 5, 9, and 13, if 3 of these 4 pixels meet the threshold criteria, then it is checked if there are at least 12 pixels that meet with this criteria to consider it as a characteristic pixel.

This procedure must be repeated for all pixels in the image and its drawbacks are: for values of n<12 a large number of characteristics points are generated, and

**Figure 1.** *A Bresenham circle of radius r.*

having to evaluate all points of the circle slows down the algorithm. To improve the speed of the algorithm a proposal of the authors is to give it a machine learning approach [10].

In the second one, known as Fast Radial Blob Detector (FRBD), the technique consists of applying the Gaussian Laplacian filter to an image Ið Þ x,y , the Laplacian

operator, as well as detecting the edges very well also detects the noise very well [11]. Therefore a Gaussian filter must be previously applied to the image to reduce its noise level; a Gaussian kernel of width σ to convolve with the image is represented by the (Eq. (13)) to suppress the noise before using Laplace for edge detection (Eq. (14)), and finally (Eq. (15)) represent the kernel of the Gaussian Laplacian filter to convolve with the image (Eq. (16)).

$$G\_{(x,y,\sigma)} = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\left(\frac{x^2 + y^2}{2\sigma^2}\right)}\tag{13}$$

$$L\_{(\mathbf{x},\mathbf{y})} = \nabla^2 I\_{(\mathbf{x},\mathbf{y})} = \frac{d^2 I\_{(\mathbf{x},\mathbf{y})}}{d\mathbf{x}^2} + \frac{d^2 I\_{(\mathbf{x},\mathbf{y})}}{d\mathbf{y}^2} \tag{14}$$

$$
\Delta \text{LoG}\_{(\mathbf{x},\mathbf{y})} = \Delta \text{G}\_{(\mathbf{x},\mathbf{y},\sigma)} = \frac{d^2 \text{G}\_{(\mathbf{x},\mathbf{y},\sigma)}}{d\mathbf{x}^2} + \frac{d^2 \text{G}\_{(\mathbf{x},\mathbf{y},\sigma)}}{d\mathbf{y}^2} \tag{15}
$$

$$
\Delta \text{LoG}\_{(\mathbf{x}, \mathbf{y}, \sigma)} = \Delta G\_{(\mathbf{x}, \mathbf{y}, \sigma)} \ast I\_{(\mathbf{x}, \mathbf{y})} \tag{16}
$$

This kernel <sup>Δ</sup>Gð Þ x,y,<sup>σ</sup> showed in **Figure 2**, is a feature detector because it finds regions where the image gradients are changing quickly for example blobs, corners and edges.

This FRBD algorithm goes one step further by using a second-order finitedifference approximation on the filtered image. An approximation to the LoG but which can be computed more rapidly is the Difference of Gaussian (DoG) operator, this approximation using a second-order finite differencing which estimates how the filtered image changes at a given pixel. A circle of radius r is constructed with center in a pixel p, and sampled pixels in eight discrete directions are evaluated, refer to the central pixel (**Figure 3**).

Using the sample points *P*½ � <sup>0</sup> … <sup>8</sup> compute the average pixel difference around pixel *P*½ � <sup>0</sup> as,

$$\mathbf{F}(\mathbf{x}, \mathbf{y}, \mathbf{r}) = \mathbf{abs} \sum\_{\mathbf{l}=1}^{8} (\mathbf{P}\_{[0]} - \mathbf{P}\_{[\mathrm{i}]}) \tag{17}$$

The average pixel difference (Eq. (17)) is identical to convolving the original image with the kernel of the **Figure 2**.

The features are extracted maximizing the rate of change of F x, y, r � � respect to r, calculate as first order differencing,

*Object Recognition and Tracking Using the Particle Estimator DOI: http://dx.doi.org/10.5772/intechopen.99615*

**Figure 2.** *Kernel of the Gaussian Laplacian filter.*

#### **Figure 3.**

*Eight discrete directions of the sampled pixels.*

$$\mathbf{R}(\mathbf{x}, \mathbf{y}, \mathbf{r}) = \mathbf{F}(\mathbf{x}, \mathbf{y}, \mathbf{r}) - \mathbf{F}(\mathbf{x}, \mathbf{y}, \mathbf{r} - \mathbf{1}) \tag{18}$$

if R(x, y, r) is multiplied by the minimum pixel difference,

$$\text{Fmin}(\mathbf{x}, \mathbf{y}, \mathbf{r}) = \min\_{\mathbf{i}} \left( \mathbf{P}\_{[0]} - \mathbf{P}\_{[\mathbf{i}]} \right) \tag{19}$$

the pixels that no exhibit changes in intensity in all directions are suppressing.

#### *2.2.2 Speeded-up robust features (SURF) algorithm*

The particularity of this algorithm is its ability to determine the characteristics points in an image, which are invariant to changes in: scale, rotations and translations, and partially to illumination changes. This algorithm is an optimization of the Scale Invariant Feature Transform (SIFT) algorithm [12], being its execution speed much higher than the latter [13]. On the other hand, the SURF algorithm produces less information that the SIFT for each characteristic point of the image, although the produced information by the SURF algorithm is more than enough for most applications, including the present one.

The use of integral images in this algorithm makes it very fast to represent at different scales of the original image its differential features. Integral images accelerate the computation at different scales the application of Haar wavelets, and together with the application of the Hessian differential operator allows the determination of key points and their robust descriptor features [2, 11].

Given an image I, and a point X <sup>¼</sup> x, y � � in this image, the Hessian matrix Hð Þ X,<sup>σ</sup> in X <sup>¼</sup> x, y � � to the scale <sup>σ</sup> is defined as:

$$\mathbf{H}\_{(\mathbf{X},\sigma)} = \begin{bmatrix} \mathbf{L}\_{\mathbf{x},\mathbf{x},(\mathbf{X},\sigma)} & \mathbf{L}\_{\mathbf{y},\mathbf{x},(\mathbf{X},\sigma)} \\ \mathbf{L}\_{\mathbf{x},\mathbf{y},(\mathbf{X},\sigma)} & \mathbf{L}\_{\mathbf{y},\mathbf{y},(\mathbf{X},\sigma)} \end{bmatrix},\tag{20}$$

where Lx,x, X, ð Þ <sup>σ</sup> , Lx,y, X, ð Þ <sup>σ</sup> , Ly,x, X, ð Þ <sup>σ</sup> and Ly,y, X, ð Þ <sup>σ</sup> represent the convolution product of the second derivative of the Gaussian *<sup>∂</sup>*<sup>2</sup> *<sup>∂</sup>*X<sup>2</sup> <sup>g</sup>ð Þ X,<sup>σ</sup> with the Image I in the point X <sup>¼</sup> x, y � � [12], see (Eq. (22)).

$$\mathcal{L}\_{\mathbf{x},\mathbf{x},(\mathbf{X},\sigma)} = \frac{\partial^2}{\partial \mathbf{X}^2} \ast \left[ \mathbf{G}\_{\left(\mathbf{x},\mathbf{y},\sigma\right)} \ast \mathbf{I}\_{\left(\mathbf{x},\mathbf{y}\right)} \right] \tag{21}$$

$$L\_{\mathbf{x},\mathbf{x},(\mathbf{X},\sigma)} = \left[\frac{\partial^2}{\partial \mathbf{X}^2} \ast G\_{(\mathbf{x},\mathbf{y},\sigma)}\right] \ast I\_{(\mathbf{x},\mathbf{y})} \tag{22}$$

$$\wedge \mathbf{G}\_{\left(\mathbf{x}, \mathbf{y}, \sigma\right)} = \frac{\mathbf{1}}{2\pi\sigma^2} \mathbf{e}^{-\frac{\mathbf{x}^2 + \mathbf{y}^2}{2\sigma^2}} \tag{23}$$

The determinant of the Hessian matrix allows the calculation of the scale of the point, defined as follows:

$$\left|\mathbf{H}\_{\mathrm{(X,\sigma)}}\right| = \mathbf{D}\_{\mathrm{x,x}}\mathbf{D}\_{\mathrm{y,y}} - \left(\alpha \mathbf{D}\_{\mathrm{x,y}}\right)^2,\tag{24}$$

where Dx,x, Dy,y, and Dx,y ¼ Dy,x are the approximations of the partial derivatives, and ω is the balance factor of the determinant [2], obtained from (Eq. (25)) where j j*:* <sup>F</sup> is the Frobenius norm [3], see (Eq. (26)).

$$\mathcal{w} = \frac{\left| L\_{\mathbf{x}, \mathbf{y}, (X, \sigma)} \right|\_{F} \left| D\_{\mathbf{y}, \mathbf{y}} \right|\_{F}}{\left| L\_{\mathbf{y}, \mathbf{y}, (X, \sigma)} \right|\_{F} \left| D\_{\mathbf{x}, \mathbf{y}} \right|\_{F}} \tag{25}$$

$$|\mathcal{A}|\_F = \sqrt{\sum\_{i=1}^{m} \sum\_{j=1}^{n} \left(a\_{ij}\right)^2} \tag{26}$$

Applying the Haar-Wavelet filters in a circular area of radius 6*s* provides us a set of outputs in both directions (dx and *dy* respectively), and the mean value of those responses as a dominant direction within the sliding area of π/3 [12].

Finally the feature descriptors for a certain scale and for each characteristic point are obtained. To do this a rectangular area of 20*σ x* 20*σ* centered on the point is constructed in the dominant orientation. This is divided into four sub-regions of 4*x*4, and for each sub-region the Haar-Wavelet is applied obtaining the horizontal dx and vertical dy responses. A characteristic vector V is formed, and then for each point a total of 64 SURF descriptors are generated [10, 11],

$$\mathbf{V} = \left(\sum d\_{\mathbf{x}}, \sum d\_{\mathbf{y}}, \sum |d\_{\mathbf{x}}|, \sum |d\_{\mathbf{y}}|\right). \tag{27}$$

#### *2.2.3 Binary robust independent elementary features (BRIEF) algorithm*

In terms of execution time, the SURF algorithm performs better than SIFT, but this is not sufficient for current applications for real-time processing of video streams in navigation, augmented reality, etc. To satisfy these applications simpler concepts are applied in algorithms for obtaining fast detectors and descriptors, such as: FAST [14], FASTER [15], CenSurE [16], and SUSurE [17] are some examples of them.

In particular the BRIEF descriptor, like SURF, uses the integral image and applies to a set of very simple binary tests which are adequate to the use of the Hamming distance (distance between two code words, is the number of bit positions in which they differ). This distance is much simpler and faster to evaluate than the Euclidean distance. It is also demonstrated in a practical way that a 32-dimensional BRIEF descriptor achieves similar results to a 64-dimensional SURF descriptor.

This algorithm obtains the descriptors of the characteristics points of the image, and for this a defining a neighboring area centered on this points, this area is known as patch *p*, which is square and a few pixels high and wide *LxL* (see **Figure 4**).

Since, the BRIEF algorithm handles pixel intensity levels this makes it very sensitive to noise, therefore it is necessary to pre-smooth the patch to reduce the sensitivity and increase the accuracy of the descriptors. Is for that, after create a patch centered on the feature point a smooth Gaussian filter is applied to the patch (Eq. (28)),

$$\mathbf{f}\_{\left(\mathbf{x},\mathbf{y}\right)} = \frac{1}{2\pi\sigma^2} \mathbf{e}^{\left(-\frac{\mathbf{x}^2 + \mathbf{y}^2}{2\sigma^2}\right)}.\tag{28}$$

BRIEF converts the patches into a binary vector representative of it, this descriptor containing only 1 and 0, so each descriptor of a characteristic point is a string of 128–512 bits. After applied the smoothing to the patch p by (Eq. (16)) the patch is converted to binary feature vector as responses of binary test τ, which is define by (Eq. (29)),

$$\pi\_{\{\mathbf{p} : \mathbf{x}, \mathbf{y}\}} = \begin{cases} \mathbf{1} : \mathbf{p}\_{\left(\mathbf{x}\right)} \lessapprox \mathbf{p}\_{\left(\mathbf{y}\right)} \\ \mathbf{0} : \mathbf{p}\_{\left(\mathbf{x}\right)} \succeq \mathbf{p}\_{\left(\mathbf{y}\right)} \end{cases} \tag{29}$$

**Figure 4.** *Patch in an image over a specific key point.*

where pð Þ <sup>x</sup> is the intensity value of the pixel at point x; then a set of n x, y � � with n equal to 128, 256 or 512, and the location pairs (see **Figure 4**) must be defined as a set of binary tests uniquely. The pixel pð Þ x,y is located inside the patch, and it is called random pair, for creating a binary feature vector of number *n* is necessary select the random pairs; the most useful functions to this selection are the following five, showed in (Eq. (30)).

$$\begin{aligned} I. \qquad &(\mathbf{X}, \mathbf{Y}) \sim \text{Uniform}\left(-\frac{L}{2}, +\frac{L}{2}\right) \\\ II. \qquad &(\mathbf{X}, \mathbf{Y}) \sim \text{Gaussian}\left(0, +\frac{1}{25}L^2\right) \\\ III. \qquad &\qquad \mathbf{X} \sim \text{Gaussian}\left(0, +\frac{1}{25}L^2\right) \end{aligned} \qquad \begin{aligned} \mathbf{X}. \qquad &(\mathbf{X}, \mathbf{Y}) \sim \text{Uniform}\left(-\frac{1}{25}L^2\right) \\\ \mathbf{X}. \qquad &(\mathbf{X}, \mathbf{Y}) \sim \text{Uniform}\left(-\frac{1}{25}L^2\right) \end{aligned}$$

$$\begin{array}{ll} \text{III.} & \mathbf{X} \sim \text{Gaussian} \left( \mathbf{0}, +\frac{1}{25} L^2 \right) \\ & \mathbf{Y} \sim \text{Gaussian} \left( \boldsymbol{\kappa}\_i, \frac{1}{100} L^2 \right) \\\\ \text{IV.} & \text{The } \left( \boldsymbol{\kappa}\_i, \boldsymbol{y}\_i \right) \text{ are randomly sampled} \\\\ \text{V.} & \boldsymbol{\kappa}\_i = \left( \boldsymbol{0}, \boldsymbol{0} \right)^T \text{ and } \boldsymbol{y}\_i \text{ is takes all} \\\\ & \text{possible values on a coarse polar grid} \end{array} \tag{30}$$

$$\begin{aligned} \text{V.} \qquad & \quad \begin{aligned} \varkappa\_i &= \left( \mathbf{0}, \mathbf{0} \right)^T \text{ and } y\_i \text{ is takes all} \\ \text{possible values on a coarse polar grid} \end{aligned} $$

The advantages characteristics of the BRIEF descriptor are: high-speed processing, little memory usage and strong to illumination and blur change, and disadvantages are: weak to the rotation of the viewpoint, and the change in the position of a light source.

The (Eq. (31)) describes the BRIEF descriptor, but in its application must be in consideration that its matching performance falls sharply with a few degrees in plane rotation.

$$\mathbf{f\_{n(p)}} = \sum\_{i=1}^{n} \mathbf{2^{i-1}}.\mathbf{\pi(p; x\_i, y\_i)}\tag{31}$$

#### *2.2.4 Oriented FAST and rotate BRIEF (ORB) algorithm*

The main characteristics of the ORB algorithm compared to its equivalents SIFT and SURF are that the ORB performs the feature detection operations as well as these but with a superior performance; having as an additional advantage that its use is free. It is based on the widely proven FAST and BRIEF algorithms, very good performance and low computational cost.

As the FAST algorithm does not have information about features of orientation and multi-scale, these must be implemented. For the multi-scale response, a scale pyramid is generated, where each level of the pyramid contains a version of the original image but at a lower resolution (**Figure 5**).

Applying to each level (scale) the FAST algorithm (see 2.1.1) we obtain the characteristics points at each level (vertices); as this algorithm also responds to edges, these must be discarded. For this we use the Harris algorithm with a low threshold [18], in order to obtain a large number of characteristic points, which are ordered according to the Harris measure to obtain the desired number of main points.

From there, the orientation is done by means of the intensity centroid [19], which assumes that the intensity of a corner is different from that of its center, and the vector formed between both points is used to calculate its orientation. Rosin defines the moments of a patch by means of (Eq. (32)) as,

*Object Recognition and Tracking Using the Particle Estimator DOI: http://dx.doi.org/10.5772/intechopen.99615*

**Figure 5.** *Multiscale scheme for an image.*

$$\mathbf{m}\_{\mathbf{p},\mathbf{q}} = \sum\_{\mathbf{x},\mathbf{y}} \mathbf{x}^{\mathbf{p}} \cdot \mathbf{y}^{\mathbf{q}} \, \mathrm{I}\_{\left(\mathbf{x},\mathbf{y}\right)} \tag{32}$$

and with these moments we can find the centroid by (Eq. (33)),

$$\mathbf{C} = \left(\frac{\mathbf{m}\_{1,0}}{\mathbf{m}\_{0,0}}, \frac{\mathbf{m}\_{0,1}}{\mathbf{m}\_{0,0}}\right). \tag{33}$$

While the orientation of the patch can be calculated by means of the vector OC�! with origin at the center C of the patch and a point O on its periphery, and is obtained by means of (Eq. (34)),

$$\theta = \texttt{atan2}(\mathbf{m}\_{0,1}, \mathbf{m}\_{1,0}).\tag{34}$$

To improve the invariance of this measurement we must ensure that the calculation of the moments is performed with *x* and *y* included in a circular region of radius *r* contained within the patch.

The table of features for each characteristics point is determined by means of the steered BRIEF method [20], according to the orientation θ of the patch at that point. For each characteristics point xi, yi � � we define a 2 x n matrix S (Eq. (35)), and a rotation matrix R<sup>θ</sup> as function of the orientation θ of the patch, obtaining an oriented version of the matrix S<sup>θ</sup> by (Eq. (36)).

$$\mathbf{S} = \begin{bmatrix} \mathbf{x\_1} & \dots & \mathbf{x\_n} \\ \mathbf{y\_1} & \dots & \mathbf{y\_n} \end{bmatrix} \tag{35}$$

$$\mathbf{S}\_{\theta} = \mathbf{R}\_{\theta} \mathbf{S} \tag{36}$$

Obtaining the operator steered BRIEF as,

$$\mathbf{g\_n(p, \theta) = f\_n(p)}(\mathbf{x\_i}, \mathbf{y\_i}) \in \mathbb{S}\_{\theta},\tag{37}$$

where fnð Þ p is the binary test operator obtained from (Eq. (31)).

To construct the BRIEF pattern search table, the angle θ is discretizing in increments of 2π*=*30 12 degrees ð Þ. All previous tests are ordered by their distance from a mean of 0*:*5 generating a vector *T*. Then the first test of *T* is taken and added to the table of results *R*, from there the next test of the vector *T* is taken and compared with all the tests results in *R*, if its absolute correlation is less than a threshold it is discarded, otherwise it is added to the table of results *R*, until a total of 256 tests are obtained.

#### **3. Processes in the particle estimator algorithm**

Basically this method deploys in the image a series of random particles, possible states of the process in this case the target position and its size; while their weights represent their posteriori probability of the density functions as an estimated from the observations. One of the particularities of the particle estimator is the number of configuration parameters it has, and to optimize its performance, in terms of estimation quality and processing time, they must be properly chosen. Some of these parameters refer to the behavior of the estimator itself and others to the behavior of the SURF algorithm (used here for infrared images), as described below.

For the behavior of the particle estimator:


For the behavior of the SURF algorithm:


Then, and for the case of this application (tracking objects in video images) we can describe the processes implemented following the guidelines of the algorithm described in 2.1.

#### **3.1 Initialization**

Mainly in this step, the reference particle to follow is selected, and all the parameters described in the previous paragraph are initialized with the design values.

#### **3.2 Particle description**

The particle was described by means of a rectangle; and to characterize it we define a state vector for each particle (Eq. (38)).

$$\dot{X} = \begin{bmatrix} x \ y \ w \ h \ \dot{x} \ \dot{y} \dot{w} \dot{h} \end{bmatrix}^T = \begin{bmatrix} p \ v \end{bmatrix}^T \tag{38}$$

This state vector contains the positions ð Þ *x*, *y* and for its size, lengths of width and height ð Þ *<sup>w</sup>*, *<sup>h</sup>* , while for its velocities ð Þ *<sup>x</sup>*\_ , *<sup>y</sup>*\_ and *<sup>w</sup>*\_ \_ *h* � � respectively.

The velocity of the particle is calculated by means of the difference between pkþ<sup>1</sup> � pk � � divided by the time video sampling, at the end of the cycle.

When a new frame is acquired, the particles are propagated following state evolution model in correspondence with (Eq. (1)):

$$P\_k = AP\_{k-1} + Av\_{k-1},\tag{39}$$

$$A = \begin{bmatrix} I\_{4\times4} & \Delta t.I\_{4\times4} \\ \mathbf{0}\_{4\times4} & I\_{4\times4} \end{bmatrix},\tag{40}$$

where I4x4 is the 4x4 identity matrix, 04x4 is the 4x4 null matrix and Δt is the frame sample time. After the particle propagation the bounds check is applied to verify that all the particles are within the image.

#### **3.3 Particle probabilities estimation**

The determination of the similarity between each particle and the reference depends on the type of image, either color or black and white. The images in this text are in black and white from images acquired from infrared sensors, but we will pause here to consider a color image.

#### *3.3.1 Color images*

In the case of the color image to determine the similarity with the reference the distance between histograms is used, in the example of the **Figure 6** the Bhattacharyya distance [21] for histograms (Eq. (41)) was used, instead of Euclidean distance, because a better response was obtained.

$$B\_{\rm dt} = \frac{\sum \left( hIr - \overline{hIr} \right) . \left( hI - \overline{hI} \right)}{\sqrt{\left( \sum \left( hIr - \overline{hIr} \right)^2 \right)} \left( \sum \left( hI - \overline{hI} \right)^2 \right)}\tag{41}$$

Where Bdst is the Bhattacharyya distance, hIr and hI the histogram to be compared, and hIr and hI its mean values. The probability of one is obtained when the sum of the three distances between the red, green and blue histograms, is equals to zero.

**Figure 7** shows the sequence of images for the tracking of a person, in which the particles can be seen in three colors, green for the best estimation, blue for the most probable particles, and red for the least probable ones. At 3.399 seconds the action of the increase dispersion parameter in the resampling procedure is observed, due a low number of valid particles; this is maintained only in two frames, returned to normal values quickly.

#### **Figure 6.**

*Airplane sequence of the particle estimator, where [a .. h] correspond to sequences between [4s .. 4.32s].*

**Figure 7.** *Person tracking, sequence of the particle estimator.*

### *3.3.2 Black and white images*

In the case of the black and white images, for determining the similarity between each particle with the reference, the SURF algorithm was used. Before applying the SURF algorithm, the background effect is reduced, generally this improvement is based on subtracting the value from the mean intensity and modifying its histogram to increase the contrast result a good choice for infrared images. In this case and for this type of image the procedure used to enhance the image of the object was: first

*Object Recognition and Tracking Using the Particle Estimator DOI: http://dx.doi.org/10.5772/intechopen.99615*

subtract the input image *IP* by a proportional factor α to its mean value *I*, second multiply the result by a proportional factor β to the relationship between the standard deviation of the image (as measurement of the contrast) divided by the maximum value of the luminosity (Eq. (42)).

$$\mathbf{I}\_{\mathbf{P}} = \left[ \mathbf{I} - \mathbf{a} \sum\_{\mathbf{i}=\mathbf{1}, \mathbf{j}=\mathbf{1}}^{\mathbf{N}, \mathbf{M}} \mathbf{I}\_{\mathbf{(i,j)}} \right] \mathfrak{F} \frac{\sigma\_{(I)}}{\max\left(I\right)},\tag{42}$$

where, N and M are the number of pixels X and Y respectively, σð Þ<sup>I</sup> and max Ið Þ the standard deviation and the maximum value of image respectively. The adaptive factors α and β are obtained experimentally and the best results was obtained with values of α ¼ 0*:*84314 and β ¼ 25.

For determining the similarity of a particle with the reference particle, a pairing of the characteristic points is performed. After this pairing, the numbers of pairs of points whose metric is less than a threshold metric are counted; and the relationship between the numbers of pairs of points with respect to the total number of characteristic points of the reference particle gives us its similarity probability.

#### **3.4 Position and size of the object to be tracked**

The position and size of the object to be tracked is determined by means of the most significant particles, i.e. those particles whose probability exceeds the selected probability threshold.

#### **3.5 Re-sampling**

From the most probable particle obtained in the previous step, a new particle set is distributed according to the dispersion of each variable, always limiting them to the size of the screen. But the dispersion of the variables is adapted according to the number of valid particles. This is implemented as a strategy of re-sampling, where the search zone is extended as function of the number of valid particles decreases.

#### **4. Experimental results**

Three experimental tests were performed with the previously described algorithm to tracking an object: a) an airplane; b) a six rotors "UAV" and c) a heliport. For the three experiments an infrared camera Tau 640, in the 8 to 14 micron band was used, in a) and b) experiments the camera was mounted on a positioning system, while in c) the camera was mounted on a six-rotor "UAV", taking a zenithal image.

In case b), where the camera was mounted on a positioning system, this had a tracking system attached to it, so the values of the *X* and *Y* coordinates in pixels of the target could be obtained. Comparing them with those obtained from the particle estimator the error of the latter was determined.

The images obtained from the video frames were recorded sequentially according to the video acquisition frequency, and before applying the particle estimation algorithm, they were pre-processed by a background suppression and contrast enhancement procedure (Eq. (42)).

In all cases the same observation function, the SURF routine, was used to determine the likelihood of similarity between the video image and the reference image. The dispersion factors applied to the state vector in the resampling function vary inversely proportional to the number of valid particles, so that if the number of valid particles decreases, the search area increases.
