**6. Conclusions**

The field of computer vision has been greatly impacted by the advances in deep learning that have emerged in the last decade. This has allowed solving, with purely vision-based approaches, some problems that were considered unsolvable under this restriction. In the case presented, a detection and positioning problem, solved with limited hardware resources (onboard a UAV) in an industry-like uncontrolled scenario through a multimodal approach, has been solved with a vision-only approach. The previous multimodal approach relied in LiDAR, cameras, and odometric measurements (mainly from GPS and IMU) to extract data with complex algorithms like RANSAC and combine them to predict the position of a pipe and produce a measurement. This system was notable thanks to its robustness and performance but presented the huge requirements detailed in [6]. In order to solve the problem in a simpler and more affordable manner, a pure visual solution was chosen as the way to go, exploring the deep learning opportunities.

Although the switch to a pure visual solution meant that during its use, the procedure would only use the camera sensor, the multimodal approach was still used to capture data, and through a series of modifications, turn it into an automatic labeling tool. This allowed building a small but complete dataset with fully labeled images relevant to the problem that we were trying to solve. Finally, to test this dataset, we train a DL architecture able to solve the semantic segmentation problem. Thus, three different contributions have been discussed in this chapter: firstly, a dataset generator exploiting multimodal data captured by the perception system to be replaced has been designed and implemented; secondly, with this dataset generation tool, the data captured has been properly labeled so it can be used for DL applications; and finally, a sample lightweight network model for semantic segmentation, FCN with AlexNet classification, has been trained and evaluated to test the problem.

By the same reasons that there was no dataset available for our challenge and we had to capture and develop one dedicated to our domain, there were no related works to obtain metrics. In order to have some relevant metrics to compare the results of the developed approach, a modified version of [13] was produced and

**143**

China

**Author details**

Edmundo Guerra1

Barcelona, Spain

Lleida - UdL, Lleida, Spain

, Jordi Palacin<sup>2</sup>

\*Address all correspondence to: antoni.grau@upc.edu

provided the original work is properly cited.

, Zhuping Wang3

3 Department of Control Science and Engineering, Tongji University, Shanghai,

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

1 Automatic Control Department, Technical University of Catalonia UPC,

2 Department of Informatics and Industrial Engineering, University of

and Antoni Grau1

\*

*Deep Learning-Based Detection of Pipes in Industrial Environments*

benchmarked without the use of prior knowledge. Under these assumptions, the new CNN-based method was able to clearly surpass the multimodal approach, though it still lacks robustness to be considered ready for industrial standards. Still, these initial tests have proven the viability of the built dataset generator and the utilization of CNN-based semantic segmentation to replace the multimodal approach.

This research was funded by the Spanish Ministry of Economy, Industry and

*DOI: http://dx.doi.org/10.5772/intechopen.93164*

Competitiveness through Project 2016-78957-R.

**Acknowledgements**

*Deep Learning-Based Detection of Pipes in Industrial Environments DOI: http://dx.doi.org/10.5772/intechopen.93164*

benchmarked without the use of prior knowledge. Under these assumptions, the new CNN-based method was able to clearly surpass the multimodal approach, though it still lacks robustness to be considered ready for industrial standards. Still, these initial tests have proven the viability of the built dataset generator and the utilization of CNN-based semantic segmentation to replace the multimodal approach.
