**4.2 Visual object tracking using a pan-tilt-zoom camera**

Having identified the adversary or collaborating drone, a PTZ-camera is utilized to track its motion. This Visual Object Tracking (VOT) problem is challenging when the drone is occluded, thus Long Term Efficient (LTE) algorithms are sought for moving objects. Despite the development of Short Term Efficient (STE) algorithms [33] using

**47**

needs to be addressed.

VOT purposes.

**Figure 10.**

**Figure 14**.

*Development of a Versatile Modular Platform for Aerial Manipulators*

either correlation methods or deep learning ones, an initial bounding box containing the target is required. In the authors' case, the developed VOT algorithm employs two methods relying on a comparison: (a) between the tracking of the points transformed based on the PTZ-parameters and those using an optical flow, and (b) between the

*3D-relative path inferred through the MoCaS and the visual method between two collaborating drones.*

The first method is based on the PTZ known motion and IMU's acceleration and gyroscope measurements (**Figure 11**), in order to estimate the motion of the pixels due to the motion of the camera in relation to the surroundings [34]. An IMU with triple accelerometers, gyroscopes, and magnetometers is attached to the PTZ-camera, as shown in **Figure 11**. While the enhancement provided by the PTZ camera allows for efficient VOT, the need to control its parameters (pan, tilt, and zoom) while placed on a floating base and at the presence of several occlusions

The objective is to provide the bounding box *pt* of the approaching drone from the attached camera to the "Tracking drone" as shown in **Figure 12**. The IMU's sensors are sent to an embedded EKF to compute the camera's pan and tilt angles in the global coordinate system (and their angular velocities) at a 100 Hz rate. The angular velocities are used to compute the optical flow, and the angles are used for

A GPU-based background subtraction technique eliminates the background pixels leaving only the moving object pixels. The bounding box encapsulates all pixels of the moving drone and the pan and tilt angles are adjusted to position the centroid of the moving bounding box at the image's center while the zoom is adjusted to enlarge this box. The communication between the i7-minicomputer and the PTZ-camera is shown in **Figure 13**, while the VOT algorithm is shown in

The feature points are recognized in each frame and the transformation matrix between successive frames follows along [35]; the formulas provide the transformation based on the PTZ-parameters and an augmentation is needed to account for the camera's translation, as provided by the on-board accelerometers. The pixels

homography matrix transformed points and the optical flow.

*DOI: http://dx.doi.org/10.5772/intechopen.94027*

*Development of a Versatile Modular Platform for Aerial Manipulators DOI: http://dx.doi.org/10.5772/intechopen.94027*

*Service Robotics*

**46**

**Figure 9.**

**Figure 8.**

*Experimental setup for 360° camera relative localization.*

*Rectified N* = 12 *partitions for a single "spherical" frame.*

**4.2 Visual object tracking using a pan-tilt-zoom camera**

Having identified the adversary or collaborating drone, a PTZ-camera is utilized to track its motion. This Visual Object Tracking (VOT) problem is challenging when the drone is occluded, thus Long Term Efficient (LTE) algorithms are sought for moving objects. Despite the development of Short Term Efficient (STE) algorithms [33] using

**Figure 10.** *3D-relative path inferred through the MoCaS and the visual method between two collaborating drones.*

either correlation methods or deep learning ones, an initial bounding box containing the target is required. In the authors' case, the developed VOT algorithm employs two methods relying on a comparison: (a) between the tracking of the points transformed based on the PTZ-parameters and those using an optical flow, and (b) between the homography matrix transformed points and the optical flow.

The first method is based on the PTZ known motion and IMU's acceleration and gyroscope measurements (**Figure 11**), in order to estimate the motion of the pixels due to the motion of the camera in relation to the surroundings [34]. An IMU with triple accelerometers, gyroscopes, and magnetometers is attached to the PTZ-camera, as shown in **Figure 11**. While the enhancement provided by the PTZ camera allows for efficient VOT, the need to control its parameters (pan, tilt, and zoom) while placed on a floating base and at the presence of several occlusions needs to be addressed.

The objective is to provide the bounding box *pt* of the approaching drone from the attached camera to the "Tracking drone" as shown in **Figure 12**. The IMU's sensors are sent to an embedded EKF to compute the camera's pan and tilt angles in the global coordinate system (and their angular velocities) at a 100 Hz rate. The angular velocities are used to compute the optical flow, and the angles are used for VOT purposes.

A GPU-based background subtraction technique eliminates the background pixels leaving only the moving object pixels. The bounding box encapsulates all pixels of the moving drone and the pan and tilt angles are adjusted to position the centroid of the moving bounding box at the image's center while the zoom is adjusted to enlarge this box. The communication between the i7-minicomputer and the PTZ-camera is shown in **Figure 13**, while the VOT algorithm is shown in **Figure 14**.

The feature points are recognized in each frame and the transformation matrix between successive frames follows along [35]; the formulas provide the transformation based on the PTZ-parameters and an augmentation is needed to account for the camera's translation, as provided by the on-board accelerometers. The pixels

**Figure 11.**

*PTZ-camera for visual object tracking.*

**Figure 12.** *Sample drone tracking setup.*

that correspond to static background objects will follow the predicted motion by the camera motion and coincide to the positions predicted by an optical flow based estimation, while the rest will be classified as belonging to moving objects of interest (**Figure 15**). The computations for the optical flow parallels that of the Lucas-Kanade method using a pyramidal scheme with variable image resolutions [36]. The basic optical flow premise is to discover the positioning of an image feature in the previous frame, in the current frame captured by the camera.

The second method is relying only on visual feedback and homography calculations [37] between two successive frames and does not require either the PTZ or the IMU-measurements, as shown in **Figure 16**. Initially a set using "strong image features" is identified on the previous camera frame and an optical flow technique is used to estimate the position of the features in the current frame. The method involves the discovery of special image areas with specific characteristics.

**49**

**Figure 14.**

**Figure 13.**

*Development of a Versatile Modular Platform for Aerial Manipulators*

The algorithm used for finding the strong corners image features relies on the GPU-enhanced "goodFeaturestoTrack" [38]. Under the assumption that the background is formed by the majority of the pixels, a homography is calculated

*DOI: http://dx.doi.org/10.5772/intechopen.94027*

*PTZ-camera hardware tracking and control schematic.*

*Pan-tilt-zoom/IMU and optical flow VOT algorithm.*

*Development of a Versatile Modular Platform for Aerial Manipulators DOI: http://dx.doi.org/10.5772/intechopen.94027*

#### **Figure 13.**

*Service Robotics*

**Figure 11.**

**Figure 12.**

*Sample drone tracking setup.*

*PTZ-camera for visual object tracking.*

**48**

that correspond to static background objects will follow the predicted motion by the camera motion and coincide to the positions predicted by an optical flow based estimation, while the rest will be classified as belonging to moving objects of interest (**Figure 15**). The computations for the optical flow parallels that of the Lucas-Kanade method using a pyramidal scheme with variable image resolutions [36]. The basic optical flow premise is to discover the positioning of an image feature in the

The second method is relying only on visual feedback and homography calculations [37] between two successive frames and does not require either the PTZ or the IMU-measurements, as shown in **Figure 16**. Initially a set using "strong image features" is identified on the previous camera frame and an optical flow technique is used to estimate the position of the features in the current frame. The method involves the discovery of special image areas with specific characteristics.

previous frame, in the current frame captured by the camera.

*PTZ-camera hardware tracking and control schematic.*

#### **Figure 14.**

*Pan-tilt-zoom/IMU and optical flow VOT algorithm.*

The algorithm used for finding the strong corners image features relies on the GPU-enhanced "goodFeaturestoTrack" [38]. Under the assumption that the background is formed by the majority of the pixels, a homography is calculated

that transforms the features positions from the previous to the current frame; these correspond to the background pixels. The previous frame features are then transformed using the homography to get their position in the current frame. Herein, it

**51**

for the estimated tracking window.

at 100 Hz under ROS middleware.

**5. Aerial manipulation**

**Figure 16.**

*Homography-based VOT.*

*Development of a Versatile Modular Platform for Aerial Manipulators*

is assumed that the background points transformed with the homography will coincide with the estimated ones by the optical flow, while the moving objects' features estimated by the optical flow will diverge from the homography transformed pixels. One downside of the technique is that when the tracked object remains static

correlation-based STE-tracker relying on the MOSSE algorithm [39], is also used in order to estimate the drone's position until new measurements of a moving drone are available. Several more robust but slower tracking algorithms were evaluated, including the KCF [40], CSRT [41], MIL [42], MedianFlow [43], TLD [44], and the MOSSE-algorithm was selected because of its fast implementation (600 Frames-per-Second (FpS)). A Kalman prediction scheme [45] was used to predict the bounding box and the one obtained from the MOSSE in the presence of noisy measurements of the moving object center, using a 2D-constant acceleration model

A seven Degree-of -Freedom (DoF) robotic arm has been attached for exerting forces on surfaces in aerial manipulation tasks, such as grinding, cleaning or physical contact based inspection [46]. The Kinova Gen 2 Assistive 7DoF robot [47] was attached through a custom mount. This manipulator is characterized by a 2:1 weight to payload ratio, with the available payload at the end-effector being 1.2 kg grasped by the 3-finger gripper. Torque sensing is provided at each joint and these measurements along with the joint angles are communicated to the main computer

and blends with the background it is unable to identify it. In this case, a fast

*DOI: http://dx.doi.org/10.5772/intechopen.94027*

*Development of a Versatile Modular Platform for Aerial Manipulators DOI: http://dx.doi.org/10.5772/intechopen.94027*

**Figure 16.** *Homography-based VOT.*

*Service Robotics*

that transforms the features positions from the previous to the current frame; these correspond to the background pixels. The previous frame features are then transformed using the homography to get their position in the current frame. Herein, it

**50**

**Figure 15.**

*Background/foreground estimation using Homography-based VOT.*

is assumed that the background points transformed with the homography will coincide with the estimated ones by the optical flow, while the moving objects' features estimated by the optical flow will diverge from the homography transformed pixels.

One downside of the technique is that when the tracked object remains static and blends with the background it is unable to identify it. In this case, a fast correlation-based STE-tracker relying on the MOSSE algorithm [39], is also used in order to estimate the drone's position until new measurements of a moving drone are available. Several more robust but slower tracking algorithms were evaluated, including the KCF [40], CSRT [41], MIL [42], MedianFlow [43], TLD [44], and the MOSSE-algorithm was selected because of its fast implementation (600 Frames-per-Second (FpS)). A Kalman prediction scheme [45] was used to predict the bounding box and the one obtained from the MOSSE in the presence of noisy measurements of the moving object center, using a 2D-constant acceleration model for the estimated tracking window.
