**2. Related work**

*Industrial Robotics - New Paradigms*

are the keys.

It is clear that industrial robotics leads the market worldwide, but social/gaming uses of robots have increased sales. Nevertheless, the most promising scenario for the present time and short term is the use of robots in commercial applications out of the plant floor. Emergency systems, inspection, and maintenance of facilities of any kind, rescues, surveillance, agriculture, fishing, border patrolling, and many other applications (without military use) attract users/clients because their use increases the productivity of the different sectors, low prices and high profitability

There exist many robot morphologies and types (surface, underwater, aerial, underground, legged, wheels, caterpillar, etc.) but authors want to draw attention in the unmanned aerial vehicles (UAVs), which have several properties that make them attractive for a set of application that cannot be done with any other type of robot. First, those autonomous robots can fly, and therefore, they can reach areas that humans or other robots cannot. They are light, easy to move from one area to another, and can be adapted to any area, terrain, soil, building, or facility. The drawback is the fragility in front of adverse meteorological events, and their autonomy is quite limited compared with unmanned surface vehicles (USVs).

UAVs have seen the birth of a new era of unthinkable cheap, easy applications up to now. The authors would like to focus its use in the maintenance and inspection of industrial facilities, but specifically in the inspection of pipes in big, complex factories (mainly gas and oil companies) where the manual inspection (and even location and mapping) of pipes becomes an impossible task. Manned helicopters (with thermal engines) cannot fly close to pipes or even among a bunch of pipes. Scaffolds cannot be put up in complex, unstable, and fragile pipes to manually inspect them. Therefore, a complex problem can be solved through the use of UAVs for inspecting pipes of different diameters, colors, textures, and conditions in hazardous factories. This problem is not new and some solutions have been brought to an incipient market. Works as those in [1, 2] propose the creation of a map of the pipe set navigating among it with odometry and inertial units [3]. Obstacle avoidance in a crowded 3D world of pipes becomes of great interest when planning a flight; in [4], some contributions are made in this direction although the accuracy of object is deficient to be a reliable technology. Work in [5] overcomes some of the latter problems with the use of a big range of sensors, cameras, laser, barometer, ultrasound, and a computationally inefficient software scheme made the UAV too

heavy and unreliable due to the excessive sensor fusion approach.

Many of the technical developments that have helped robotics grow have had a wider impact, especially those related with increasing computational power and parallelization levels. Faster processors, with tens of cores and additional multiple threat capabilities, and modern GPUs (graphics processing unit) have led to the emergence of GPGPU (general-purpose computing on GPU). These type of computing techniques have led to huge advances in the artificial intelligence (AI) field, producing the emergence of the "deep learning" field. The deep learning (DL) field is focused in using artificial neural networks (ANNs) that present tens or hundreds of layers, exploiting the huge parallelization capabilities of modern GPU. This is used in exploiting computational cores (e.g., CUDA cores), which compared on a one-to-one basis with a processor core, they are less powerful and slower, but can be found in amounts of hundreds or thousands. This has allowed the transition from shallow ANN to the deeper architectures and innovations such as several types of convolutional layers. In this work, the authors present a novel approach to detect pipes in industrial environments based in fully convolutional networks (FCNs). These will be used to extract the apparent contour of the pipes, replacing most of the architecture developed in [6] and discussed in Section 2. To properly train these networks, a custom dataset relevant to the domain is required, so the authors

**132**

As it has been discussed, inspection and surveying are a frequent problem where UAV technologies are applied. The most common scenario found is that of a hard to reach infrastructure that is visually inspected through different sensors onboard a piloted UAV. Some projects have proposed the introduction of higher level perception and automation capacities, depending on the specific problem. In these cases, it is common to join state-of-the-art academic and industrial expertise to reach functional solutions.

In one of these projects, the specific challenge of accurately detecting and positioning a pipe in real time using only the hardware deployable in a small (per industry standards) UAV platform was considered (**Figure 1**), with several solutions studied and tested (including vision- and LIDAR-based techniques).

In the case of LIDAR-based detection, finding a pipe is generally treated as a segmentation problem in the sensor space (using R3 data collected as "*point clouds*"). There are many methods used for LIDAR detection, but the most successful are based on stochastic model fitting and registration, commonly in RANSAC (Random Sample Consensus [7]) or derived approaches [8, 9]. Three different data density levels were tested using the libraries available through ROS: using RANSAC over a map estimated by a SLAM technique, namely LOAM [10]; detecting the pipe

### **Figure 1.**

*One of the UAV used for the development of perception tasks in the AEROARMS project. Several sensors were deployed, processing them with a set of SBCs (single-board computers), including a Velodyne LiDAR, two different cameras, ultrasonic range-finder (height), and optical flow.*

in a small window of consecutive point clouds joined by an ICP-like approach [11]; and finally to simply work using the most recent point cloud. The first approach probed to be computationally unfeasible, no matter what optimization was tested, as even working with a single datum cloud point could be prohibitive if not done carefully. To enhance the performance, the single cloud point approach was optimized employing spatial and stochastic filtering to reduce the data magnitude, and a curvature filter allowed to reduce fake positives in degenerate configurations, producing robust results at between 1 and 4 Hz. To solve the same problem with visual sensors, a two-step strategy was used. In order to estimate the pose of the pipes to be found, they were assumed to be circular and regular enough to be modeled as a straight homogeneous circular cylinder. This allowed using a closed-form conic equation [12], which related the axis of the pipe (its position and orientation as denoted in Plücker coordinates) with the edges of its projection in the image space. While this solves the positioning problem, the detection probed to be a little more challenging: techniques based on edge detection, segmentation, or other classical computer vision methods used to work under controlled light but failed to perform acceptably in outdoor scenarios. This issue was solved by introducing human supervision, where an initial seed for the pipe in the image sensor space was provided (or validated) by a human and then tracked robustly through vision predicting it with the UAV odometry.

With these results, discussed in [6], it was apparent that a new solution was needed, as the LiDAR approaches were too slow and the vision-based techniques probed themselves unreliable. The final proposed solution was based on integrating data from the laser and the vision sensors: the RANSAC over LiDAR approach would detect robustly the pipe and provide an initial position, which would then be projected into the image space (accounting for displacements if odometry is available) and used as a seed for the vision-based pipeline described.

In that same work [6], a sensibility analysis studying the effects of the relative pose between the sensor and pipes is provided. Once the pipe is detected in the LiDAR's space sensor, the cylinder model is projected into the R2 image space using a projection matrix derived from the calibrated camera model (assumed to be a thin lens pinhole model, per classic literature [13]). This provides a region or band of interest where to look for the edges of the pipe in the image and is useful to solve the degenerate conic equation up to scale (i.e., being a function of the radius). An updated architecture version of the process is depicted in **Figure 2**.

The detailed architecture of the multimodal approach reveals how the LiDARbased pipeline minimizes the data dimensionality by filtering non-curved surfaces (i.e., remove walls, floor, etc.) and also by removing entirely regions of the sensed space if priors or relevant data or the expected relative position of the pipe to the sensor is available. This was aimed at minimizing the size of the point cloud to be processed by the RANSAC step. To be able to project the detected pipe from the LiDAR sensor space into the camera image, some additional information was required: the rigid transformation between sensors (i.e., the calibration between LiDAR and camera) and an estimation of the odometry of the UAV. This is due because, even in the best assumption, with a performance slightly over 4 Hz, the delay between the captured point cloud and the produced estimation of the pipe would be over 200 ms. Therefore, the projection of the detected pipe to predict the area of interest to search the apparent contour has to consider the displacement during this period, not only the rigid LiDAR to camera transformation. This predicted region of interest is used in the vision process pipeline, with predictions of the appearance of the pipes into image space used to refine the contour search. This contour search relies on stacking a Hough transform to join line segment detector (LSD) detected segments (to overcome partial obstructions) on the relevant area

**135**

**Figure 2.**

**learning (DL)**

*Deep Learning-Based Detection of Pipes in Industrial Environments*

and allows to choose the nearest correctly aligned lines. Notice that using a visual servoing library [14], an option to use data provided through human interaction was kept as available, though the integration of LiDAR detections as seeds into the visual pipeline made it unnecessary. To avoid degenerate or spurious solutions, a validation step (based on reprojection and "matching" of the Plückerian coordi-

*The architecture of the multimodal perception pipeline combining LiDAR and camera vision. An updated* 

This architecture leads to a fast (limited by the performance of the vision-based part) and robust (based on the RANSAC resilience to spurious detections) pipe detector with great accuracy, which was deployed and test in a UAV. The main issue of the approach is the hardware requirements: access to odometry from the avionics systems, LiDAR, and camera sensors, and enough computing power to process them (beyond any other task required from the UAV). All this hardware is focused on solving what can be described as a semantic segmentation problem. This is relevant given the enormous changes produced in the last decade in the computer vision field, and how classic problems like semantic segmentation are currently solved.

**3. Semantic segmentation problem: classic approaches and deep** 

In the context of computer vision, the semantic segmentation problem is used to determine which regions of an image present an object of a given category, that is, a class or label is assigned to a given area (be it a pixel, window, or segmented region). The different granularity accepted is produced by how the technique and

nates [15] for a tracked piped) was later introduced.

*version adds to previous works a validation step using odometric measurements.*

*DOI: http://dx.doi.org/10.5772/intechopen.93164*

*Deep Learning-Based Detection of Pipes in Industrial Environments DOI: http://dx.doi.org/10.5772/intechopen.93164*

**Figure 2.**

*Industrial Robotics - New Paradigms*

the UAV odometry.

in a small window of consecutive point clouds joined by an ICP-like approach [11]; and finally to simply work using the most recent point cloud. The first approach probed to be computationally unfeasible, no matter what optimization was tested, as even working with a single datum cloud point could be prohibitive if not done carefully. To enhance the performance, the single cloud point approach was optimized employing spatial and stochastic filtering to reduce the data magnitude, and a curvature filter allowed to reduce fake positives in degenerate configurations, producing robust results at between 1 and 4 Hz. To solve the same problem with visual sensors, a two-step strategy was used. In order to estimate the pose of the pipes to be found, they were assumed to be circular and regular enough to be modeled as a straight homogeneous circular cylinder. This allowed using a closed-form conic equation [12], which related the axis of the pipe (its position and orientation as denoted in Plücker coordinates) with the edges of its projection in the image space. While this solves the positioning problem, the detection probed to be a little more challenging: techniques based on edge detection, segmentation, or other classical computer vision methods used to work under controlled light but failed to perform acceptably in outdoor scenarios. This issue was solved by introducing human supervision, where an initial seed for the pipe in the image sensor space was provided (or validated) by a human and then tracked robustly through vision predicting it with

With these results, discussed in [6], it was apparent that a new solution was needed, as the LiDAR approaches were too slow and the vision-based techniques probed themselves unreliable. The final proposed solution was based on integrating data from the laser and the vision sensors: the RANSAC over LiDAR approach would detect robustly the pipe and provide an initial position, which would then be projected into the image space (accounting for displacements if odometry is avail-

In that same work [6], a sensibility analysis studying the effects of the relative pose between the sensor and pipes is provided. Once the pipe is detected in the

The detailed architecture of the multimodal approach reveals how the LiDARbased pipeline minimizes the data dimensionality by filtering non-curved surfaces (i.e., remove walls, floor, etc.) and also by removing entirely regions of the sensed space if priors or relevant data or the expected relative position of the pipe to the sensor is available. This was aimed at minimizing the size of the point cloud to be processed by the RANSAC step. To be able to project the detected pipe from the LiDAR sensor space into the camera image, some additional information was required: the rigid transformation between sensors (i.e., the calibration between LiDAR and camera) and an estimation of the odometry of the UAV. This is due because, even in the best assumption, with a performance slightly over 4 Hz, the delay between the captured point cloud and the produced estimation of the pipe would be over 200 ms. Therefore, the projection of the detected pipe to predict the area of interest to search the apparent contour has to consider the displacement during this period, not only the rigid LiDAR to camera transformation. This predicted region of interest is used in the vision process pipeline, with predictions of the appearance of the pipes into image space used to refine the contour search. This contour search relies on stacking a Hough transform to join line segment detector (LSD) detected segments (to overcome partial obstructions) on the relevant area

a projection matrix derived from the calibrated camera model (assumed to be a thin lens pinhole model, per classic literature [13]). This provides a region or band of interest where to look for the edges of the pipe in the image and is useful to solve the degenerate conic equation up to scale (i.e., being a function of the radius). An

image space using

able) and used as a seed for the vision-based pipeline described.

LiDAR's space sensor, the cylinder model is projected into the R2

updated architecture version of the process is depicted in **Figure 2**.

**134**

*The architecture of the multimodal perception pipeline combining LiDAR and camera vision. An updated version adds to previous works a validation step using odometric measurements.*

and allows to choose the nearest correctly aligned lines. Notice that using a visual servoing library [14], an option to use data provided through human interaction was kept as available, though the integration of LiDAR detections as seeds into the visual pipeline made it unnecessary. To avoid degenerate or spurious solutions, a validation step (based on reprojection and "matching" of the Plückerian coordinates [15] for a tracked piped) was later introduced.

This architecture leads to a fast (limited by the performance of the vision-based part) and robust (based on the RANSAC resilience to spurious detections) pipe detector with great accuracy, which was deployed and test in a UAV. The main issue of the approach is the hardware requirements: access to odometry from the avionics systems, LiDAR, and camera sensors, and enough computing power to process them (beyond any other task required from the UAV). All this hardware is focused on solving what can be described as a semantic segmentation problem. This is relevant given the enormous changes produced in the last decade in the computer vision field, and how classic problems like semantic segmentation are currently solved.
