**4. Object recognition**

The algorithm for object recognition is based on significant information about an object's shape and colour and determines if it is a target or not [22]. The implemented search algorithm (object recognition) expects targets, which are round and monochrome, such as red balls. Extending the system to enable a detection of more than one target colour at the time can easily be achieved. To provide the possibility of searching for other shapes like rectangles or human bodies the object recognition needs to be replaced or changed fundamentally. For the experi‐ ments described in this chapter, red balls with a diameter of approximately 7 cm are considered as search targets.

In the following sections, the necessary image processing fundamentals will be briefly discussed. Afterwards, the circle detection algorithm used to identify the balls will be intro‐ duced. Finally, the recognition procedure consisting of an initial scan to determine the search parameters and a subsequent search will be explained.

### **4.1. Image processing fundamentals**

For implementation and guidance, the open source computer vision library OpenCV can be recommended [23]. It contains a variety of basic image processing core algorithms as well as advanced procedures for applications such as object recognition, feature extraction and machine learning.

### *4.1.1. Image representation*

A standard way to represent a picture while using a PC is the RGB model. Each pixel is described by three intensity values: *r*ed, *g*reen and *b*lue. Here we assume a resolution of 8 bit. Therefore, the range for each value is 0–255 (=28  – 1).

For a bench of calculations, the RGB representation of a pixel is impractical and a single value per pixel is preferred. Using the so‐called greyscale image the value grey of a pixel *x* can be determined by its original RGB values (Formula (6)):

$$\mathbf{x}\text{ }\text{grey} \begin{pmatrix} \mathbf{x} \end{pmatrix} = \mathbf{0}.\text{\textquotedbl{}299}\cdot\mathbf{x}.\text{\textquotedbl{}R} + \mathbf{0}.\text{\textquotedbl{}887}\cdot\mathbf{x}.\mathbf{G} + \mathbf{0}.\text{\textquotedbl{}144}\cdot\mathbf{x}.\mathbf{B} \tag{6}$$

The simplest representation is the binary image in which only two values exist: 0 and 1. If a pixel meets certain requirements, for example if it has a specific RGB or a greyscale value, a value of 1 is assigned, otherwise it has a value of 0.

**Figure 8.** Image of two balls using its RGB (left), greyscale (middle) and binary (right) representation.

An example image of two balls and the corresponding greyscale image can be seen in **Figure 8**. Furthermore, a binary image was created by assigned value 1 to each pixel with a red value higher than 100 and with green and blue values lower than 50.

### *4.1.2. Filters*

the search strategy, the search area and the VFOV. **Figures 2**–**5** illustrate the effect of these

The algorithm for object recognition is based on significant information about an object's shape and colour and determines if it is a target or not [22]. The implemented search algorithm (object recognition) expects targets, which are round and monochrome, such as red balls. Extending the system to enable a detection of more than one target colour at the time can easily be achieved. To provide the possibility of searching for other shapes like rectangles or human bodies the object recognition needs to be replaced or changed fundamentally. For the experi‐ ments described in this chapter, red balls with a diameter of approximately 7 cm are considered

In the following sections, the necessary image processing fundamentals will be briefly discussed. Afterwards, the circle detection algorithm used to identify the balls will be intro‐ duced. Finally, the recognition procedure consisting of an initial scan to determine the search

For implementation and guidance, the open source computer vision library OpenCV can be recommended [23]. It contains a variety of basic image processing core algorithms as well as advanced procedures for applications such as object recognition, feature extraction and

A standard way to represent a picture while using a PC is the RGB model. Each pixel is described by three intensity values: *r*ed, *g*reen and *b*lue. Here we assume a resolution of 8 bit.

For a bench of calculations, the RGB representation of a pixel is impractical and a single value per pixel is preferred. Using the so‐called greyscale image the value grey of a pixel *x* can be

The simplest representation is the binary image in which only two values exist: 0 and 1. If a pixel meets certain requirements, for example if it has a specific RGB or a greyscale value, a

 – 1).

*grey x x R x G x B* ( ) =++ 0.299· . 0.587· . 0.114· . (6)

parameters and a subsequent search will be explained.

Therefore, the range for each value is 0–255 (=28

determined by its original RGB values (Formula (6)):

value of 1 is assigned, otherwise it has a value of 0.

**4.1. Image processing fundamentals**

parameters on the waypoint list.

**4. Object recognition**

8 Recent Advances in Robotic Systems

as search targets.

machine learning.

*4.1.1. Image representation*

In contrast to the operations already introduced, filters use a variety of pixels and not just a single one to determine the new value of a pixel. The idea behind the filters is to perform 2D convolution: a so‐called filter matrix is slid over the original image and simple multiplica‐ tions of the filter elements with the underlying values of the image pixels are performed. The calculated outcomes are summed up and the result is stored as the new value of the pix‐ el, which is located under the so‐called hot spot, the centre of the filter matrix. The entire process is shown in **Figure 9**.

**Figure 9.** Mode of operation of a linear filter [24].

### *4.1.3. Edge detection*

One of the most frequent applications of filters is edge detection [24]. Edges can be defined as regions in which big intensity changes occur in a certain direction. To detect those changes one or several filters have to be applied to the greyscale image. In most applications, the so‐called Canny edge detector is used [25]: after deploying a Gaussian filter in order to remove noise, the intensity gradients are computed by applying the Sobel operator, which consists of two separate filter matrices. The first filter computes the gradient in the *x*‐direction:

$$H\_x^S = \begin{bmatrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{bmatrix} \tag{7}$$

The second one highlights the change of intensity in the *y*‐direction:

$$H\_Y^S = \begin{bmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{bmatrix} \tag{8}$$

The local edge strength *E* can then be calculated by combining the resulting images *Dx* and *Dy* for each pixel (*u*, *v*):

$$E\left(\mu,\nu\right) = \sqrt{\left(D\_x(\mu,\nu)\right)^2 + \left(D\_y(\mu,\nu)\right)^2} \tag{9}$$

Furthermore, the local edge orientation angle Φ(*u*, *v*) can be determined as

$$\Phi\left(u,\nu\right) = \tan^{-1}\left(\frac{D\_{\boldsymbol{\gamma}}(u,\nu)}{D\_{\boldsymbol{\gamma}}(u,\nu)}\right) \tag{10}$$

In general, the described procedure leads to blurry edges. Therefore, an edge thinning technique called non‐maximum suppression is applied. The computed first derivatives are combined into four directional derivatives and the resulting local maxima are considered as edge candidates.

Finally, a hysteresis threshold operation is applied to the pixels. Two thresholds have to be defined: an upper one and a lower one. If the local edge strength of a pixel is higher than the upper one, the pixel immediately is accepted as an edge pixel. Pixels whose gradient is below the lower threshold are rejected. If the local edge strength is between the lower and the upper threshold, only pixels adjacent to pixels with gradients above the upper threshold are accepted. This process promotes the detection of connected contours. In this work, values of 20 and 60 were used for the lower and upper thresholds, respectively.

### **4.2. Hough circle transformation**

A circle is defined by its centre *C*(*xC*, *yC*) and its radius *r*. All points *P*(*xP*, *yP*) on the outline of the circle satisfy the circle equation:

$$\left(\left(\mathbf{x}\_{\mathcal{C}} - \mathbf{x}\_{p}\right)^{2} + \left(\mathbf{y}\_{\mathcal{C}} - \mathbf{y}\_{p}\right)^{2} = r^{2}\tag{11}$$

Identifying circles from an edge image by using this equation and the simple approach of checking for every centre candidate, how many edges lie on a circle around it, is very inefficient and highly inadvisable. In the following sections two much faster and more robust approaches are presented.

### *4.2.1. Basic method*

(7)

(8)

or several filters have to be applied to the greyscale image. In most applications, the so‐called Canny edge detector is used [25]: after deploying a Gaussian filter in order to remove noise, the intensity gradients are computed by applying the Sobel operator, which consists of two

> 101 202 101

121 000 121

The local edge strength *E* can then be calculated by combining the resulting images *Dx* and *Dy*

, (,) (,) *E uv D uv D uv* = + *x y* (9)

è ø (10)

( ) ( ) ( ) <sup>2</sup> <sup>2</sup>

( ) <sup>1</sup> (,) , tan (,)

*D uv u v D uv* - æ ö F = ç ÷

*y x*

In general, the described procedure leads to blurry edges. Therefore, an edge thinning technique called non‐maximum suppression is applied. The computed first derivatives are combined into four directional derivatives and the resulting local maxima are considered as

Finally, a hysteresis threshold operation is applied to the pixels. Two thresholds have to be defined: an upper one and a lower one. If the local edge strength of a pixel is higher than the upper one, the pixel immediately is accepted as an edge pixel. Pixels whose gradient is below the lower threshold are rejected. If the local edge strength is between the lower and the upper threshold, only pixels adjacent to pixels with gradients above the upper threshold are accepted. This process promotes the detection of connected contours. In this work, values of 20 and 60

Furthermore, the local edge orientation angle Φ(*u*, *v*) can be determined as

were used for the lower and upper thresholds, respectively.

é ù -- ê ú <sup>=</sup> ê ú ê ú ë û

é ù ê ú = -ê ú ê ú ë û -

separate filter matrices. The first filter computes the gradient in the *x*‐direction:

*S Hx*

The second one highlights the change of intensity in the *y*‐direction:

*S HY*

for each pixel (*u*, *v*):

10 Recent Advances in Robotic Systems

edge candidates.

The basic idea behind the Hough transformation can be seen in **Figure 10**. Given a target circle with radius *r*. If circles with the same radius *r* are drawn around an edge point of the target circle, they will intersect. The main accumulation of intersection will occur in the centre of the target circle.

**Figure 10.** Intersecting circles (blue) drawn around the edge pixels of the target circle(s) (red) using one radius (left) and a range of radii (right) [26].

To identify a circle with known radius *r* in an edge image, a so‐called accumulator array is used. Typically, it has the same dimension as the edge image or is scaled down by a low integer number. If the target radius is exactly known, a two‐dimensional array is sufficient. After initializing every cell with zero the voting process starts. Each edge pixel is treated as a possible circle outline and all corresponding centre candidates in the accumulator array get a vote. This means that their value is incremented by 1. After voting all detected circles can be identified by checking, in which cells in the accumulator array earned enough votes. Consequently, a threshold is needed. As the value of each cell roughly corresponds to the number of circle outline pixels in the edge image, a useful threshold can be derived from the maximal number of votes, i.e., the circle's circumference [25].

However, in real‐world application the target radius often is not exactly known and the algorithm has to search for a range of radii. This is implemented by extending the accumulator array to three dimensions: two for the already described two‐dimensional arrays and one for each radius. During the voting process each edge pixel votes for all possible circle centres by incrementing the corresponding values in every radius plane.

It can easily be seen that the standard approach is not suitable to most real‐world applications despite being very robust. First of all, it is slow because for every edge pixel approximately 2πr centre candidates have to be calculated per radius *r*. Another problem is that the accumulator arrays can be very memory intensive, especially if the input edge image resolution is high and the target radius is not exactly known. Hence, several improvements were introduced and will be discussed in the following section.

### *4.2.2. Gradient method*

As shown in Formula (10), the local edge orientation angle can be easily determined. By exploiting this, the circle detection algorithm can be executed with much higher efficiency. The key observation is that all edges are perpendicular to the line that connects the edge pixel and the centre of the circle. Therefore, it is not necessary to calculate up to 2πr centre candidates for each edge pixel and vote for them in the accumulator array. Because of the edge orientation angle the amount of candidates can be narrowed down to only a few pixels. This is shown in **Figure 11**.

The vertical line in the centre of **Figure 11** shows the respective local edge orientation. If the radius is exactly known, in theory only two possible centres correspond to the given edge direction: *C*1 and *C*2. They are located on a line perpendicular to the edge direction and their distance to the considered edge pixel is *r*. If the algorithm is searching for a range of radii, the sets of possible centres *C*1[ ] and *C*2[ ] are located on aforementioned line and the distances of the centres to the initial edge pixel vary from *rmin* to *rmax*.

Hence, the accumulator array can be reduced to two dimensions even when searching for several radii [27]. During the voting process the location of each edge pixel that casts a vote is stored. After the vote the centre candidates are selected. To be taken into consideration the accumulator value of a valid centre candidate has to be above the given threshold and higher than the values of all its immediate neighbours. The approved centre candidates are sorted in descending order according to their accumulator values.

Now the best fitting radius has to be determined. For this, the previously stored edge pixels are considered. The distances between each of these pixels and the centre candidates are calculated. Using these distances, the best supported radius can be determined. Finally, it has to be checked, if the resulting centre is not too close to any previously accepted centre and if it is supported by a sufficient number of edge pixels.

**Figure 11.** Possible locations of centres given a specific edge direction (green) using one radius (left) and a range of radii (right) [26].

### *4.2.3. Run‐time comparison*

outline pixels in the edge image, a useful threshold can be derived from the maximal number

However, in real‐world application the target radius often is not exactly known and the algorithm has to search for a range of radii. This is implemented by extending the accumulator array to three dimensions: two for the already described two‐dimensional arrays and one for each radius. During the voting process each edge pixel votes for all possible circle centres by

It can easily be seen that the standard approach is not suitable to most real‐world applications despite being very robust. First of all, it is slow because for every edge pixel approximately 2πr centre candidates have to be calculated per radius *r*. Another problem is that the accumulator arrays can be very memory intensive, especially if the input edge image resolution is high and the target radius is not exactly known. Hence, several improvements were introduced and will

As shown in Formula (10), the local edge orientation angle can be easily determined. By exploiting this, the circle detection algorithm can be executed with much higher efficiency. The key observation is that all edges are perpendicular to the line that connects the edge pixel and the centre of the circle. Therefore, it is not necessary to calculate up to 2πr centre candidates for each edge pixel and vote for them in the accumulator array. Because of the edge orientation angle the amount of candidates can be narrowed down to only a few pixels. This is shown in

The vertical line in the centre of **Figure 11** shows the respective local edge orientation. If the radius is exactly known, in theory only two possible centres correspond to the given edge direction: *C*1 and *C*2. They are located on a line perpendicular to the edge direction and their distance to the considered edge pixel is *r*. If the algorithm is searching for a range of radii, the sets of possible centres *C*1[ ] and *C*2[ ] are located on aforementioned line and the distances of

Hence, the accumulator array can be reduced to two dimensions even when searching for several radii [27]. During the voting process the location of each edge pixel that casts a vote is stored. After the vote the centre candidates are selected. To be taken into consideration the accumulator value of a valid centre candidate has to be above the given threshold and higher than the values of all its immediate neighbours. The approved centre candidates are sorted in

Now the best fitting radius has to be determined. For this, the previously stored edge pixels are considered. The distances between each of these pixels and the centre candidates are calculated. Using these distances, the best supported radius can be determined. Finally, it has to be checked, if the resulting centre is not too close to any previously accepted centre and if

of votes, i.e., the circle's circumference [25].

12 Recent Advances in Robotic Systems

be discussed in the following section.

*4.2.2. Gradient method*

**Figure 11**.

incrementing the corresponding values in every radius plane.

the centres to the initial edge pixel vary from *rmin* to *rmax*.

descending order according to their accumulator values.

it is supported by a sufficient number of edge pixels.

In Ding *et al.* [22], the run times of the described basic method and the gradient method are compared by detecting red balls in an image using a resolution of 192 × 144 pixel. The achieved results were averaged over 10 measurements and are displayed in **Table 1**.


**Table 1.** Results of the run‐time comparison.

When only one ball is detected, the gradient method is already about three times faster than the basic method. The difference becomes even more significant when the number of balls and therefore the amount of edges is increased.

### **4.3. Recognition procedure**

The recognition procedure is split into two phases: the initial search to determine the target parameters pre‐flight and the actual search which is performed mid‐air by the quadrocopter.

### *4.3.1. Initial scan*

Prior to the search, some parameters have to be predefined. Thus, a picture of the search target is taken on a plane background and from a height close to the flying height of the quadrocopter. The radius can be directly determined by detecting the ball and storing the radius, in which the maximum number of votes in the accumulator array was achieved.

Furthermore, the dominating colour within the detected circle is calculated by combining several of the most frequent red, green and blue values. It is notable that the average values are not used because they can be heavily influenced by bright spots on the search target which emerge because of unfavourable lighting conditions. The final target colour does not consist of exactly one set of RGB values but of ranges for each colour channel which are derived from the original values.

Furthermore, the algorithm searches for more than one radius. Therefore, the initially detected radius ±2 can be chosen as a target range to compensate for light variations of height during the flight. To allow bigger chances in flying altitude, the radius range would have to be adjusted according to the currently measured distance of the quadrocopter to the floor.
