A Study on Traditional and CNN Based Computer Vision Sensors for Detection and Recognition of Road Signs with Realization for ADAS

*Vinay M. Shivanna, Kuan-Chou Chen, Bo-Xun Wu and Jiun-In Guo*

## **Abstract**

The aim of this chapter is to provide an overview of how road signs can be detected and recognized to aid the ADAS applications and thus enhance the safety employing digital image processing and neural network based methods. The chapter also provides a comparison of these methods.

**Keywords:** Advanced Driver Assistance System (ADAS), digit recognition, digital image processing, neural networks, shape detection, road signs detection, road signs recognition

## **1. Introduction**

Increasing population elevated the demand for personal vehicles and hence evolved the advancements in the vehicular designs, engine designs, and integration of embedded electronics making the personal vehicles one of the most integrated technologies of the everyday life [1, 2]. With personal vehicles becoming ubiquitous in everyday life, there has been a rise in the associated risks. As per the data from the U.S. Census Bureau, 10.8 million vehicular accidents have been recorded in the year 2009 compared to 11.5 million accidents in the year 1990 [3] marking the reduction in accidents by 6%.

With the evolution of progressive intelligence systems popularly referred as Advanced Driver Assistance System (ADAS) comprising of lane departure warning systems, forward collision warning system, road signs (speed limit and speed regulatory) detection and recognition system, driver drowsiness and behavioral detection and alert systems, and also adoption of passive safety measures such as airbags, antilock brakes, tire pressure monitoring systems or deflation detection systems, automated parking systems, infrared night vision, pre-crash safety system and so on have not only increased the driver safety but also resulted in the reduction of associated risks as these technologies continuously monitor the driver as well as their and vehicular environment and provides timely information and warnings to the driver.

The detection and recognition of road signs is an important technology for the ADAS. Road signs are a guide to the drivers about the directions on the road, conditions of the road and serve as an essential warning under certain special road conditions. Thus, they enhance the road safety by providing the vital information. However, there might be the cases where a driver is distracted, is under stress of life, work or traffic, suffering lack of concentration or overwhelmed leading to overlook the road signs. Therefore, a system to monitor the road ahead of the vehicle, recognizing road signs and alerting about the vital conditions of the road would be an excellent assistance to the drivers. Pointedly, the road signs detection and recognition, which is the topic presented in this chapter cautions driver about the various road signs in a particular stretch of highways/roads enabling the drivers to drive within those limits, taking care of the road conditions and preventing from any overspeeding dangers.

The branch of computer science engineering that enables the machines which is the ADAS system in this case, to see, identify, interpret, and respond to the digital images and videos is termed as *Computer Vision*, abbreviated as *CV*. Until the boom of machine learning<sup>1</sup> techniques, CV was largely depended on traditional digital image processing (DIP)<sup>2</sup> methods, which are now mostly predicated on artificial neural networks (CNN)<sup>3</sup> . The impossible task for facilitating machines to respond to visions is achieved with the help of CV and it is intertwined with artificial intelligence<sup>4</sup> .

The field of CV comprises of all tasks similar to biological vision systems such as seeing, i.e., visual senses, perceiving what is seen, draw detailed information in a pattern in which it can be used for further processes ultimately providing appropriate responses. In short, it is a modus operandi to instill humankind tendencies to a computer. CV finds its applications in the field of multiple disciplines aiding in simulating and automating the functions biological vision system employing sensors, computers, and various embedded platforms in assistance with numerous algorithms.

The applications of CV are enormous and broader. Of those numerous applications, using CV for detection and recognition of road signs to aid the Advanced Driver Assistance Systems (ADAS) is pivotal. This chapter focuses on road signs, also termed as traffic signs, detection and recognition using the key CV techniques.

The novelty of this chapter includes: (i) the proposed CV based method detects and recognizes the speed limit and speed regulatory signs without any templates as the templates are part of the code and not the images. (ii) the proposed CSPJacinto-SSD network enhances detection accuracy while reducing the model parameters and complexity compared to the original Jacinto-SSD.

<sup>1</sup> Machine learning (ML) is a branch of artificial intelligence (AI) and computer science, which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy [4].

<sup>2</sup> Digital Image Processing (DIP) refers to the use of computer algorithms to perform image processing on digital or digitized images, leading to the extraction of attributes from the processed images and to the recognition and mapping of individual objects, features or patterns [5].

<sup>3</sup> An Artificial Neural Network (ANN) is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates [6].

<sup>4</sup> Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving [7].

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

## **2. Computer vision in ADAS applications**

Computer Vision (CV) is one of the crucial technologies in building the smart and advanced vehicles with autonomous driving capabilities termed as Advanced Driving Assisting System (ADAS). One of the key arena of active research of the ADAS is the road signs detection and recognition, which is a challenging task. A number of issues such as the type of camcorder, the speed of a car, noises in the image depending on speed and direction, type and intensity of light and weather conditions and sometimes the background and other objects that are similar to the signs makes it monotonous to detect and recognize the road signs. Additionally, road signs may also be damaged, faded out, tilted and partially submerged by other objects such as building signboards, trees, leading to confusion in the automated system. The process of detecting the road signs of all types is carried out using the images/videos candidates comprising of targeted road signs in case of both DIP methods and CNN methods. The road signs can be obtained from various datasets such as German Traffic Signs Dataset (GTSDB) [8], Tsinghua-Tencent 100 K [9], ImageNet dataset [10], Pascal VOC [11] to name a few. Most of these vastly used datasets may not have all the road signs in sufficient numbers captured under different lighting and weather conditions. This leads to researches to build their own datasets or rely on mechanical simulations such as CarSim [12] to build the lacking traffic signs.

This chapter discusses a low-complexity DIP algorithm and CNN based method along with the existing researches, product embodiments of these technologies followed by the algorithm design, hardware implementation and performance results of road signs detection and recognition.

#### **2.1 Road signs detection and recognition**

The process of locating the road signs from a moving vehicle followed by recognizing the exact type of road signs can be termed as *'road signs detection and recognition.'* Although there are various approaches and different algorithms, some patterns may appear similar to that of an existing body of work as in **Figure 1** that shows the basic steps employed in road signs detection and recognition flow. The process is generally divided into three parts namely, road signs detection to locate the potential candidates of road signs followed by the verification of the detected road signs' candidates from the previous stages. Finally, the recognition of traffic signs to formulate the actual information from the detected and verified signs. This task of detecting followed by recognizing road signs to aid ADAS can be achieved through both DIP and CNN based methods.

Torresen et al. [13] presents a red-colored circular speed limit signs detection method to detect and recognize the speed limit signs of Norway. Moutarde et al. [14] presents a robust visual speed limit signs detection and recognition system for American and European speed limit signs. Keller et al. [15] presents a rectangular speed limit signs detection scheme aimed at detecting and recognizing the speed limit signs in United States of America (U.S.A.). A different approach is used by Liu et al. [16] wherein the de-noising method based on the histogram of oriented gradients (HOG) is applied to Fast Radial Symmetric Transform approach to detect the circular speed-limit signs. Zumra et al. [17] and Vavilin et al. [18] both uses color segmentation followed by other digital processing methods. Lipo et al. [19] presents the method that fusions camera and LIDAR data followed by the HOG and linear SVM to classify the traffic signs.

Sebastian et al. [20] presents the evaluation of the traffic signs detection in the real-world environments. The traffic signs are detected using the Viola-Jones detector

#### **Figure 1.**

*Basic steps of road signs detection and recognition.*

based on the Haar features and Histogram of Orientated Gradients (HOG) relied on linear classifiers. Model-based Hough-like voting methods are tested on the standard-The GTSDB. It also discusses different methods proposed by the Ming et al. [21] that uses two different, supervised modules for detection and recognition, respectively. Markus et al. [22] uses modern variants of HOG features for detection and sparse representations for classification and Gangyi et al. [23] presents the method that uses the HOG and a coarse-to-fine sliding window scheme for the detection and recognition of traffic signs, respectively.

Supreeth et al. [24] presents color and shape based detection scheme aimed at detection of red color traffic signs that are recognized using the auto associative neural networks. Nadra Ben et al. [25] presents a traffic sign detection and recognition scheme aimed at recognition and tracking of the prohibitory signs. Then Feature vector extraction along with the Support vector mechanism (SVM) is used to recognize the traffic signs and the recognized traffic signs are tracked by the optical-flow based method of Lucas-Kanade tracker [26]. Y. Chang et al. [27] adopted the modified radial symmetric transform to detect the rectangular patterns and then Haar-like feature based AdaBoost detector to reject the false positives. Abdelhamid Mammeri et al. [28] proposed an algorithm for the North American Speed limit signs detection and recognition. There are plenty of state-of-the-art researches based on different models of CNN [29–32] to detect and recognize the traffic signs including some hybrid approaches [33, 34].

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

#### *2.1.1 Traditional digital image processing methods to detect and recognize road signs*

The method employed to achieve certain operation on images with the aim of getting an enhanced image or extracting useful, interpretable information is termed as Image processing. It is similar to signal processing with the contraction that here input is an image and output is either an image or features affiliated with that image. In recent decades, image processing is among rapidly growing technologies. It forms the foundation for the computer vision and one of the core research area within engineering and computer science disciplines.

Fundamentally, image processing comprises of three steps namely, (i) Use of image acquisition tools to capture/import the images; (ii) Analyses and manipulation of the image; and, (iii) Output in which result can be altered image or report that is based on image analysis.

The methods used for image processing can be broadly classified in two, namely, analogue and digital image processing. Analogue image processing (AIP) refers to use of printouts and photographs to analyze via the basic interpretation employing visual techniques. On the other hand, digital image processing (DIP) techniques, as per the name, comprises of techniques that manipulate images digitally using computers. Pre-processing, enhancement, information extraction, and display are the basic, customary processes for all the data to undergo in DIP.

The process of detection and recognition of speed limit road signs [35] can be broadly classified into three stages namely, (i) speed limit signs detection, (ii) digit segmentation, (iii) digit recognition and that of detection and recognition of speed regulatory road signs [36] also into three stages such as, (i) speed regulatory signs detection, (ii) feature extraction, (iii) feature matching. **Figure 2** depicts the proposed algorithm used in detection and recognition of the road signs. The following sections discuss each step of the algorithm and the corresponding implementation specifications of the respective stages.

#### A. Shape Detection

The process of detecting regular and irregular polygon shapes is termed as *Shape detection*. Shape detection in this chapter refers to detecting the road signs which processes the entire frame and then focuses on selecting the potential candidates of size 32x32, 64x64 and so on comprising the common shapes, either a circle or a rectangle of the speed limit sign or a triangular signs of the speed-regulated signs using radial symmetric transform method.

The concept of radial symmetric transform [37, 38] uses the axes of radial symmetry. The normal polygenes of n-sides possess several axes of symmetry and the radial symmetric transform works based on these symmetric axes.

The voting process is based on the gradient of each pixel [39]. The direction of gradient generates a vote. The vote generated from each pixel follows the symmetric axes resulting in the highest votes at the center of the respective symmetric axes. **Figure 3** shows the radial symmetry for common polygenes.

Fundamentally, the Sobel operator [40] is applied to calculate the gradient of each pixel using a Sobel mask. Sobel operator generates the gradient of the intensities in the vector forms with the horizontal gradient denoted by *Gx* and vertical gradient denoted by *Gy* by convolving corresponding Sobel masks defining the direction of the gradients for each pixel. Besides, in order to eliminate the noises of small magnitudes, the

#### **Figure 2.**

*Flow chart of the proposed algorithm used to detect speed limit and speed regulatory signs.*

#### **Figure 3.**

*Radial symmetry of common polygenes.*

threshold for the absolute values *Gabs* is set for horizontal and vertical gradient given by Eq. (1)

$$\mathbf{G\_{abs}} = |\mathbf{G\_x}| + |\mathbf{G\_y}| \tag{1}$$

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

Once the horizontal gradient Gx and the vertical gradient Gy is obtained and the noise is eliminated, the radial symmetric transform can be processed based on the calculated gradients. **Figure 4** shows the results of the horizontal and vertical gradients.

## i. Rectangular Radial Symmetric Transform

The voting process in the rectangular radial symmetric transform phase is based on the gradient generated from the Sobel operator [13, 27]. Each selected pixel with its absolute magnitude Gabs greater than a small threshold is denoted as *p*, and the gradient vector is denoted as *g(p).* The direction of *g(p)* can be formulated with the horizontal gradient *Gx* and the vertical gradient *Gy* into an angle using Eq. (2).

$$\mathbf{g(p)} = \tan^{-1} \frac{\mathbf{G\_y}}{\mathbf{G\_x}} \tag{2}$$

For each considered pixel p, the votes along with the known width *W* and height *H* are divided into two categories- horizontal vote and the vertical vote. The direction of gradient *g(p)* for each pixel is adopted to implement these two categories. The magnitude ranges and the ratio between *G*<sup>x</sup> and *Gy* is used to verify the horizontal and the vertical votes with respect to the higher threshold and lower threshold values.


Here, the values of higher threshold and lower threshold are experimentally chosen based on the size of the Sobel mask. In the case of 3x3 Sobel mask, the higher threshold is set in the range of 45–55, and the lower threshold is between 15 and 25. In case of the nighttime scenarios, both the thresholds are lowered to half of their original values and constrains are set for the ratio of horizontal and vertical gradients.

Each pixel contributes a positive vote and a negative vote. A voting line is then generated by each pixel with both positive and negative votes. The positive votes indicate the probable center of the speed limit sign while the negative votes indicate

the non-existence of speed limit signs. The positive horizontal votes *Vhorizontal+* and negative horizontal vote *Vhorizontal-* votes are formulated as in Eqs. (3) and (4). *Lhorizontal (p, m)* describes a line of pixels ahead and behind each pixel *p* at a distance *W* given by Eq. (5).

$$V\_{horizontal+} \left\{ L\_{horizontal}(p, m) | m \ c \left[ -\frac{W}{2}, \frac{W}{2} \right] \right\} \tag{3}$$

$$V\_{horizontal-}\left\{ \begin{array}{c} L\_{horizontal}(p,m)|m \ c\left[ -W, -\frac{W}{2} \right) \\\\ \cup \left( \frac{W}{2}, W \right] \end{array} \right\} \tag{4}$$

$$L\_{horizontal}(p, m) 
round(m \* \overline{\mathfrak{g}}(p) + W \* \mathfrak{g}(p)) \tag{5}$$

where *g(p)* is a unit vector perpendicular to *g(p)*. **Figure 5(a)** represents the process of horizontal voting. Similarly, the positive and negative vertical votes are formulated as in the Eqs. (6) and (7). *Lvertical (p, m)* describes a line of pixels ahead and behind each pixel *p* at a distance *W* given by Eq. (8), and as shown in **Figure 5(b)** where *g(p)* is a unit vector perpendicular to *g(p).*

$$V\_{vertical} \left\{ L\_{vertical}(p, m) | m \epsilon \left[ -\frac{H}{2}, \frac{H}{2} \right] \right\} \tag{6}$$

$$\mathcal{V}\_{\text{vertical}}\left\{L\_{\text{vertical}}(\mathbf{p},\mathbf{m})|\mathbf{m}\text{e}\left[-\mathbf{H},-\frac{\mathbf{H}}{2}\right)\cup\left(\frac{\mathbf{H}}{2},\mathbf{H}\right]\right\}\tag{7}$$

$$L\_{vertical}(\mathbf{p}, \mathbf{m}) = \mathbf{p} + \text{round}(\mathbf{m} \ast \overline{\mathbf{g}}(\mathbf{p}) + \mathbf{W} \ast \mathbf{g}(\mathbf{p})) \tag{8}$$

After this voting process, the centers of the sign candidates will receive higher votes. The voting image is initially initialized to zero, and then it goes on accumulating both the positive and the negative votes. **Figure 6** shows the result for rectangular signs after the voting process.

**Figure 5.**

*(a) The voting line corresponding to the horizontal voting. (b) the voting line corresponding to the vertical voting.*

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

#### **Figure 6.**

*The result after the horizontal voting process.*

## ii. Circular Radial Symmetric Transform

The detection of circular speed limit signs using radial symmetric transform method is similar to that of the detection of the rectangular signs with a difference that the circular radial symmetric transform need not to be divided into two parts as horizontal votes and vertical votes. It is entirely based on the direction of gradient for each pixel *g(p)* and each considered pixel contributes only positive votes *V+* as in Eq. (9).

$$V\_{+} = \mathbf{p} + \text{round}\left(\mathbf{R} \ast \mathbf{g}(\mathbf{p})\right) \tag{9}$$

**Figure 7(a)** illustrates the voting process for the circular sign detection and **Figure 7(b)** shows the result for circular signs after the voting process.

#### iii. Triangular Radial Symmetric Transform

The voting process of the triangular shape detection is also based on the gradient of each pixel [39]. The vote generated from each pixel follows the rule of the proposed triangle detection algorithm shown in **Figure 8**. It comprises of: (i) Sobel operator is applied to calculate gradient for each pixel. Consequently, we calculate the horizontal and vertical gradients by convolving the corresponding Sobel masks. Each selected pixel is represented with its absolute magnitude, and the gradient vector is denoted as *g(p)*. The direction of *g(p)* can be formulated with the horizontal gradient *Gx* and the vertical gradient *Gy* into an angle as shown in Eq. (10). Only 180 degrees of gradient is used in this algorithm followed by the morphological erosion to eliminate noises.

**Figure 7.** *(a) The vote center corresponding to (9); (b) the result of the circular voting result.*

**Figure 8.** *(a) Illustration of triangle detection algorithm; (b) illustration of voting for the center.*

Morphological erosion is applied to eliminate the noise. For a pixel *p x*ð Þ , *y*, *f x*ð Þ , *y* and a structure element *b i*ð Þ , *j* , the formula of erosion is as in Eq. (11).

$$\mathbf{g}(p) = \tan^{-1} \frac{\mathbf{G}\_{\mathbf{y}}}{\mathbf{G}\_{\mathbf{x}}} \tag{10}$$

$$(f \ominus b)(\mathbf{x}, \mathbf{y}) = \varepsilon(\mathbf{x}, \mathbf{y}) = \min\_{(i, j)} (f(\mathbf{x} + i, \mathbf{y} + j) - b(i, j)) \tag{11}$$

Then the proposed algorithm exploits the nature of triangle for the detection as in **Figure 9(a)**. We look for the points having a gradient of 30 degree, defined as point A. Once the point A is obtained, we search for the points that have a gradient of 150 degrees on the same row as that of point A, defined as point B. The last step is to find the points C and D with 90 degrees gradient on the same column with points A and B, respectively. Once all these points are determined, a vote is placed to the point G at the centroid of the triangle as in **Figure 9(b)**.

In order to vote for the center point, the width of the detected position and the size of the target triangle is required to be calculated. The formulas shown in Eqs. (12) and (13), where *Centerx* is the *x*-coordinate of *G*, *Centery* is the *y*-coordinate of *G, H* is the height of the small triangle, and *D* is the size of the target triangle, as shown in **Figure 9(b)**.

$$Center\_{\chi} = \frac{A\_{\chi} + B\_{\chi}}{2}; H = (B\_{\chi} - A\_{\chi}) \* \frac{\sqrt{3}}{2} \tag{12}$$

$$Center\_{\mathcal{V}} = \left\{ \begin{array}{c} A\_{\mathcal{V}} + D \ast \frac{\sqrt{3}}{3} - H, H < D \ast \frac{\sqrt{3}}{3} \\\\ A\_{\mathcal{V}} + H - D \ast \frac{\sqrt{3}}{3}, H > D \ast \frac{\sqrt{3}}{3} \end{array} \right. \tag{13}$$

Then the width of the detected position is needed for calculating the detected points A and B. Thus, it is easy to build a look up table to reduce the computation cost *A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

#### **Figure 9.**

*The steps in detail of the sign candidate extraction (a) initialization of elements in the buffer to zero. (b), (c) insert the sign candidates based on the vote value. (d) if the current sign candidates own the greater vote value than any element in the buffer, firstly shift the element and the other elements in the wake of the right for the one-element and abandon the last element, and update the value to the element. (e), (f) post several iterations, the buffer is full of the sign candidates, and merge the cluster, only leaving the element with the greatest vote value. (g) the elements in red are merged.*

of the voting process. The pixels with higher votes are judged to be in and near the center of the triangular road sign candidates. In order to reduce the computation cost, candidates that are close to each other will be merged into one candidate. The new coordinates of the candidate is the weighted arithmetic mean calculated using the coordinates of the merged candidates weighted by their vote thereby reducing the different candidates representing the same triangle.

B. Sign Candidates Extraction

After detecting the shape, the potential candidates of the road signs are extracted. A buffer is created to save all the potential sign candidates according to the following steps:


buffer based on its vote value, ensuring the buffer is in a decreasing order. Every time this buffer is sorted in a decreasing order to ensure that, the pixels with greater values are in a prior order.

The values thus generated by the votes of the sign candidates result in a cluster of candidates in a small region. To overcome this challenge, the distance and search is defined from the prior order in the buffer by setting a small distance threshold to merge the cluster of sign candidates using non-maximum suppression as per Eq. (14). where *x* and *y* are the coordinates of the current considered sign candidates, and *xi* and *yi* are the sign candidates of the threshold candidates. **Figure 9** illustrates the details of the sign candidates' extraction along with the results of merging the cluster of sign candidates and the results of merging the clusters of the sign candidates.

$$Distance = |\mathbf{x} - \mathbf{x}\_i| + |\mathbf{y} - \mathbf{y}\_i| \tag{14}$$

C. Achromatic Decomposition

The key feature of the rectangular speed limit road signs in USA is that all the common speed limit signs are in gray-scale as in **Figure 10(a-d)**. There also exits advisory speed limit signs on the freeway exits as in **Figure 10(e-f)**. In order to detect the actual speed limit signs as in **Figure 11**, the achromatic gray scale color of the signs

**Figure 10.**

*(a-d) the rectangular speed limits road signs in USA. (e-f) the advisory speed signs on freeway exits.*

**Figure 11.** *The schematic of the RGB model and the angle α.*

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

is extracted by the achromatic decomposition whereas the non-gray scale advisory speed signs are ignored from the further consideration.

The vector of gray scale is along (1,1,1) in RGB color space and the inner product is between (1,1,1) and the each considered pixel checks an angle *α* between these two vectors to apply the decomposition in RGB domain [41] as illustrated in **Figure 10(a-d)**.

Each of the considered pixel is in the vector form of (r, g, b). The cosine function of *α*, which is equal to the inner product [26] is shown in Eq. (15).

$$\text{Cost}\alpha = (\mathbf{1}, \mathbf{1}, \mathbf{1}) \cdot (r, \mathbf{g}, b) = \frac{r + \mathbf{g} + b}{\sqrt{3} \times \sqrt{r^2 + \mathbf{g}^2 + b^2}} \tag{15}$$

For the implementation in our proposed, a mere value of *cos<sup>2</sup>* is considered. If the value is near to one, *α* is near to zero, which implies the considered pixel is in grayscale and is taken in account for the further steps. **Figure 12** shows the results of the achromatic decomposition where the speed warning signs of non-gray scale found on the freeway exits are not acknowledged.

#### D.Binarization

In the proposed system, the Otsu threshold method is used for the daylight binarization while the adaptive threshold method during the nighttime. **Figure 13** illustrates these proposed steps.

To differentiate between daylight and the night-light, the ROI is set to the cent er part of the frame choosing the width *ROIw* and height *ROIh* as in Eqs. (16) and (17) where the sky often lies. The noise with extremely high and low pixel values are filtered out and then in the remaining 75% of the pixels, the average of pixel values is calculated to judge if it is a day-light or a night condition. **Figure 14** shows the schematic of the day and nighttime judgment.

$$ROI\_{\rm h} = \frac{Height \text{ of } \,\,the \, frame}{\text{6}} \tag{16}$$

$$ROI = \frac{2}{3} \text{Width of the frame;} \\ \text{Excluding } \frac{Width \text{ of the frame}}{6} \\ \text{on either ends} \\ \tag{17}$$

The Otsu method [42] can automatically decide the best threshold to binarize well in daytime, but at night, the chosen threshold causes the breakage of the sign digit.

**Figure 12.** *The results of the achromatic decomposition.*

**Figure 13.** *The proposed steps of binarization.*

In Otsu method, the best threshold that can divide the content into two groups and minimize the sum of variances in each group is found by an iterative process. **Figure 15** shows the schematic steps of Otsu method. On the other hand, the adaptive threshold is more sensitive. It divides the signs into several sub-blocks, mean of each block is calculated, and then the threshold for respective sub-blocks is computed based on the means in each sub-block. The corresponding results of the adaptive threshold is as shown in **Figure 16**.

Therefore, we chose Otsu threshold to automatically select the best-fit threshold in automatically daytime and adaptive threshold at night to handle the low contrast environment.

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

**Figure 15.** *The schematic steps of Otsu method.*

**Figure 16.** *Different thresholding results.*

For acceleration, this paper adopts the integral image [36] in which each pixel is compared to an average of the surrounding pixels. An approximate moving average of the last *s*-pixels seen is specifically calculated while traversing the image. If the value of the current pixel is lower than the average then it is set to black, otherwise it is set to white. In the proposed algorithm, it is considered to be stored at each location, *I(x, y)*, the sum of all *f(x, y)* terms to the left and above the pixel *(x, y)*. This is accomplished in linear time using Eq. (18) for each pixel. Once the integral image is first calculated, the sum of the function for any rectangle with the upper left corner *(x1, y1)*, and lower right corner *(x2, y2)* can be computed in constant time using Eq. (19). The schematic of Eq. (13) can be illustrated with **Figure 17** and Eq. (19) can be modified into Eq. (20). Finally, the mean of each sub-block can be calculated and then

**Figure 17.** *The schematic figure of Eq. (18).*

each pixel in the sub-block which in terms of *A(x, y)* can be binarized with Eq. (21) where ∝ is a scalar based on the contrast under different conditions.

$$I(\mathbf{x}, \mathbf{y}) = f(\mathbf{x}, \mathbf{y}) + I((\mathbf{x} - \mathbf{1}), \mathbf{y}) + I(\mathbf{x}, (\mathbf{y} - \mathbf{1})) - I((\mathbf{x} - \mathbf{1}), (\mathbf{y} - \mathbf{1})) \tag{18}$$

$$\sum\_{\mathbf{x}=\mathbf{x}\_1}^{\mathbf{x}\_2} \sum\_{\mathbf{y}=\mathbf{y}\_1}^{\mathbf{y}\_2} f(\mathbf{x}, \mathbf{y}) = I(\mathbf{x}\_2, \mathbf{y}\_2) - I(\mathbf{x}\_2, \mathbf{y}\_1 - \mathbf{1}) - I(\mathbf{x}\_1 - \mathbf{1}, \mathbf{y}\_2) + I(\mathbf{x}\_1 - \mathbf{1}, \mathbf{y}\_1 - \mathbf{1}) \tag{19}$$

$$\mathbf{D} = (\mathbf{A} + \mathbf{B} + \mathbf{C} + \mathbf{D}) \text{-(A+B)-(A+C)} + \mathbf{A} \tag{20}$$

$$A(\varkappa, y) = \begin{cases} 255, & \text{if } I(\varkappa, y) > T\_{\text{avg}} \times \infty \\ 0, & \text{otherwise} \end{cases} \tag{21}$$

#### E. Connected Component Labelling (CCL)

Connected component labelling (CCL) labels the object inside the sign candidates with the height, width, area and coordinate information. The CCL algorithm [28] is divided into two processing passes namely first pass and the second pass. Two different actions are taken in these passes if the pixel iterated is not the background.

The steps of connected component labeling are illustrated in **Figure 18**. In this case, the equivalent labels are (1, 2), (3, 7) and (4, 6).

#### F. Digit Segmentation

The purpose for digit segmentation [43] is to extract the digit from the binarized image. In the process of rectangular speed limit signs detection, the signboards have the characters reading "SPEED LIMIT" alongside the digits. As a result, it is necessary to set constrains on size of the digit candidates as per Eqs. (22) and (23). Similarly, the constraints on the size of the digit candidates in circular speed limit signs are as per Eqs. (24) and (25).

$$0.15 \text{ x W} \le \text{Digit width} \le 0.5 \text{ x W} \tag{22}$$

$$0.15 \text{ x H} \le \text{Digit height} \le 0.5 \text{ x H} \tag{23}$$

$$\mathbf{0}.\mathbf{1}2\mathbf{5} \ge \mathbf{R} \le \mathbf{Digit} \text{ width} \le \mathbf{R} \tag{24}$$

#### **Figure 18.**

*The steps for connected-component labeling, (a) processing initialization, (b) the result after the first pass, (c) the result after the second pass.*

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

$$0.5 \text{ x } \mathbb{R} \le \text{Digit height} \le 1.5 \text{ x } \mathbb{H} \tag{25}$$

Considering the fact that rectangular speed limit signs consist two-digits alongside the characters, it must be ensured that the selected candidates are of digits of speed limit sign and not the characters. The pairing rule of sizes and positions proposed in this paper are as follows:


Whereas in the circular speed limit detection that are of both 2-digits and 3-digits, a loose constrain is adopted as it has only digits inside the circular speed limit signs and no characters. The pairing steps are similar to those followed in the rectangular speed limit signs. **Figures 19** and **20** shows the segmentation results of the rectangular and circular speed limit signs, respectively.

There exists a critical challenge in binarization process, as the digits may appear connected to each other. To overcome this challenge, a two-pass segmentation process is proposed in this paper. Digit segmentation, similar to the previous segmentation

**Figure 19.**

*The example of digit segmentation results of rectangular speed limit signs*

**Figure 20.** *The example of digit segmentation results of circular speed limit signs.*

process, is applied. If large components are detected, then the second pass of the segmentation is applied.

The horizontal pixel projection is applied in the second pass segmentation. During this projection, the total number of pixels in each column is accumulated choosing the segment line based on the horizontal projection. **Figure 21** shows the example of horizontal projection and the result of the two-pass segment step.

G.Digit Recognition

In the digit recognition phase, the extracted digits are compared with the built-in templates and three probable digits with least matching difference are selected based on Sum of Absolute Difference (SAD) [44]. After which, the blob and breach features of the digits are applied to verify the final digits [45–47]. **Figure 22** depicts the proposed steps for digit recognition.

After selecting the three probable digit candidates, the blob feature is employed to verify the digit. Here the blob is defined as a close region inside the digit and is detected and gathered in several rows forming a union row. The pixel value of a union row is the union of values of the rows in that union row. For each union row, the number of lines in the white pixels are counted. A blob is formed only if the number of lines is of the sequence "1, 2,. … , 2, 1". This union row method yields the position and

**Figure 21.**

*Example of horizontal projection and the result of the proposed two-pass segmentation steps.*

**Figure 22.** *The proposed steps for digit recognition.*

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*


#### **Table 1.**

*The blob and breach feature verification for all the digits.*

the number of the blobs in the digit candidate as in **Table 1**. The exact blob feature is defined for the specific digits.

Similarly, the breach feature is also adopted to verify the digits. A breach is defined as an open region formed by a close region with a gap. The breach is detected by counting the number of pixels where the white pixel first appears from both the right and the left in each column to half the width of the digit candidates. If there is a series of pixels that are larger than half the digit height, then it is regarded as a breach. **Table 1** shows the blob and breach feature verification for the digits from 0 to 9 and **Figure 23** shows the results of the digit recognition in terms of blob and breach feature verification.

## H.FAST Feature Extraction

Features from Accelerated Segment Test (FAST) [48, 49] is a high repeatability corner detector. As shown in **Figure 24**, it uses a circle of 16 pixels to classify whether or not a candidate point *p* is a corner. The FAST feature extraction conditions can be written as in Eq. (26) where *S* is a set of *N* contiguous pixels in the circle, *Ix* is intensity of *x*, *Ip* is intensity of candidate *p* and *t* is the threshold.

**Figure 23.** *Digit recognition results.*

**Figure 24.** *The illustration of FAST algorithm.*

$$\forall \mathbf{x} \in \mathcal{S}, I\_{\mathbf{x}} > I\_{\mathbf{p}} + t, \text{and } \forall \mathbf{x} \in \mathcal{S}, I\_{\mathbf{x}} < I\_{\mathbf{p}} - t \tag{26}$$

There are two parameters to be chosen in FAST algorithm namely, the number of contiguous pixels *N* and the threshold *t*. *N* is fixed as 9 in the proposed algorithm whereas to overcome the changes in the intensity inclinations as shown in **Figure 25**, the threshold *t* is set to be dynamic. The dynamic threshold is calculated by the image patch of the sign candidates. First, we count for pixels with intensity bigger than 128. If the number of bright pixels is between 20% and 80% of the total number of pixels in the image patch, the threshold is computed by the percentage of number of bright pixels over the total number of pixels. There are two fixed thresholds for the conditions that the number of bright pixels is lower than 20% of the total number of pixels or higher than 80% of the total number of pixels. Accordingly, the threshold dynamically updates to the number of bright pixels over the total number of pixels.

## I. Fixed Feature Extraction

There are certain conditions in which the contents of road signs are too simple to be extracted using the FAST feature, as shown in **Figure 26(a**-**c)**. Thus, the Fixed Feature Extraction is applied to handle these road signs with good inclinations in the proposed algorithm.

Fixed Feature Extraction uses thirty fixed feature points to describe a road sign, as shown in **Figure 26(d)**. This method is similar to template matching, but it is more robust to noises as it uses descriptors to describe the small area around the feature points.

**Figure 25.** *Different lighting conditions of road signs.*

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

**Figure 26.**

*(a-c) Road signs with simple contents; (d) thirty fixed feature points used in fixed feature extraction.*

#### J. Feature Matching

The main objective of this phase is to match the features between the pre-built template and the detected sign candidates as shown in **Figure 27**. The features extracted previously are matched by their coordinates and the descriptors which are constructed to describe their respective features. Due to the fact that the proposed system is aimed at real-time applications, the construction and the matching procedure of the descriptor algorithm should be both simple and efficient at the same time.

**Figure 27.** *Steps followed in the feature matching.*

Binary Robust Independent Elementary Features (BRIEF) [50] is a simple descriptor with good matching performance and low computation cost. In order to build a BRIEF descriptor of length *n*, *n* pairs (*xi*, *yi* Þ are chosen. *X* and *Y* representing the vectors of point *xi* and *yi* respectively, are randomly sampled with the Gaussian distribution and stored in a pre-built array to reduce the computation cost. They are sampled with Gaussian distribution stored in a pre-built array to reduce computation cost. To build a BRIEF descriptor, *τ* test is defined as in Eq. (27) and *n* is chosen as 256 to yield the best performance.

$$\pi(p; \boldsymbol{x}, \boldsymbol{y}) \coloneqq \begin{cases} \mathbf{1}, p(\boldsymbol{\kappa}) < p(\boldsymbol{y}) \\ \mathbf{0}, p(\boldsymbol{\kappa}) \ge p(\boldsymbol{y}) \end{cases} \tag{27}$$

The advantages of BRIEF are obviously, low computation time and better matching performance whereas the disadvantage is that the BRIEF is not rotation invariant and scale invariant. Since the size of the detected signs is fixed and the road signs would not have too much rotation effect, these disadvantages do not influence the recognition result.

After descriptor construction, a two-step matching process comprising distance matching and descriptor matching is applied to match the detected sign candidates with the pre-built templates. Distance matching considers only the coordinates of the feature points. In this application of road signs recognition, the detected road signs should be a regular triangle sometimes with certain defects such as, lighting changes, slight rotation, and occluded with an object. Thus, the two similar feature points are not matched if the coordinates of these two feature points are different.

The goal of descriptor matching is to compute the distance between two descriptors, one is from the detected sign candidate and the other is from the pre-built template. As with all the binary descriptors, the measure of BRIEF distance is the number of different bits between two binary strings which can also be computed as the sum of the XOR operation between the strings.

After all sign candidates are matched, the scores comparison is applied to choose which template is the most suitable for final recognition result. The template with the scores higher than the others is judged as the result of the template matching. Moreover, the same result should be recognized a few times in several frames of a video to make sure that the result does not produce a false alarm.

The performance of the aforementioned DIP based algorithms in detecting and recognizing the road signs are discussed in the Section 3.

#### *2.1.2 Computer neural network (CNN) methods to detect and recognize road signs*

Artificial Neural networks (ANN) generally referred as Neural networks (NN), specifically as Computer Neural Networks (CNN) have been a sensation in the field of CV. ANNs, Artificial Intelligence (AI) and Deep Learning (DL) are interdependent and importantly, indispensable topics of recent years research and applications in engineering and in the technology industry. This reason for this prominence is that they currently provide the best solutions to many problems extensively in image recognition, speech recognition and natural language processing (NLP).

The inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen defines a neural network as *"...a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs."* In simpler words, the theme of ANNs are motivated by biological neural networks to learn and process the information fed to it. **Figure 28**

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

**Figure 28.**

*(a) A representative biological neuron with (b) its mathematical model from [34].*

shows the similarity in function of a biological neuron in **Figure 28(a)** with its respective mathematical model in **Figure 28(b)**.

A neuron is the fundamental unit of computation in a biological neural network whereas the basic unit of an ANN is called a node or a unit. The node/unit receives inputs from external sources and from other nodes within the ANN to process and computes an output. Every input has a characteristic weight (w) allotted based on its corresponding importance to other inputs. The node applies a function to the weighted sum of its inputs.

ANNs are generally organized in layers that are made up of numerous interconnected 'nodes' comprising an 'activation function' as in **Figure 29**. The inputs are presented to the ANN via the 'input layers', which communicates with one or more 'hidden layers' in which a particular processing is done by a system of weighted 'connections'. The hidden layers then link to an 'output layer' where the answer is output. For the general model of ANN in **Figure 29**, the net input can be calculated as in Eq. (28) and the output by applying the activation function over the net input can be calculated using Eq. (29).

$$Y\_{\text{in}} = X\_1.w\_1 + X\_3.w\_2 + \dots \\ X\_n.w\_n.i.e., \text{the net input, } Y\_{\text{in}} = \left(\sum i - n\right)X\_i.w\_i \tag{28}$$

$$Y = F(Y\_{\text{in}}) \tag{29}$$

Recent researches and publications present that ANNs are extensively used for various applications ranging from object detections to learning to paint, to create images from the sentences, to play board games-AlphaGo and so on. There are many more unthinkable and unconvincing things done by the ANNs in the present days and research studies on further advancing them are going on rigorously.

**Figure 29.** *The general model of an ANN.*

There exist a various models presented from researches across the word for different applications. Some of the prominent CNNs are Single Shot Detector (SSD) [51], Region based Connect Neural Network (R-CNN) [52], Fast-RCNN [53], Faster-RCNN [54], You Only Look Once (YOLO) [55] and different versions of it, Generative Adversarial Networks (GAN) [56] and different modules [57] based on it and many more. This chapter also discusses CSPJacinto-SSD based on CSPNet [58] features in JacintoNet [59]. These innumerous ANNs are extensively used by the researchers and industries alike. The researchers and industries go hand-in-hand to investigate on further improvisations of the existing NNs, expanding them into diversified applications and solving a problem/challenge using effective and low-cost measures, ultimately manufacturing commercial products to make human lives easier and smarter.

In this chapter, we explore object detection NNs such as SSD, Faster-RCNN, YOLO and propose the newer CNN model termed as 'CSPJacinto-SSD' for the detection and recognition of road signs.

The SSD, as its name suggests, only need to take one single shot to detect multiple objects within the image. It has two components- a backbone and SSD head. The backbone model is a network used for pre-trained image classification. The SSD head is network with one or more convolutional layers added to the backbone network and the outputs are interpreted as the bounding boxes and classes of objects in the spatial location of the final layers activations as in **Figure 30**.

Faster-RCNN [54] comprises of two modules of which the first module is a deep fully convolutional network that proposes regions, and the second module is the Fast R-CNN detector [53] that uses the proposed regions. The earlier version of the Faster-RCNN such as R-CNN and Fast R-CNN both use selective search method to find out the region proposals. The selective search method is a slow, hence time-consuming process affecting the performance of the network. To overcome these challenges, an advanced version of the R-CNN called Faster-RCNN was proposed [54] that has an object detection algorithm eliminating the selective search algorithm and the network learn the region proposals. **Figure 31** shows a Faster-RCNN network.

All of the previous object detection algorithms until the year 2015, used regions to locate the objects in the image. That means the network does not look at the complete image but only parts of the image that may the high probabilities of containing an object. In 2015, J. Redmon proposed a new NN called YOLO (You Only Look Once) [55] as in **Figure 32**. It is an object detection algorithm much different from the region-based algorithms. In YOLO, a single convolutional network predicts the bounding boxes and the class probabilities for these boxes.

#### **Figure 30.**

*SSD model adds several feature layers to the end of a base network, which predict the offsets to default boxes of different scales and aspect ratios and their associated confidences.*

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

**Figure 31.** *A single, unified faster R-CNN for object detection.*

**Figure 32.** *A representative of the YOLO architecture for object detection.*

The overall architecture of CSPJacinto-SSD is shown in the **Figure 33**. The CSPNet [58] features are added in JacintoNet [59], which is a simple light-weighted model composed of convolution, group convolution, and max-pooling layers. The Cross Stage Partial (CSP) feature is proved to improve the accuracy while reducing the model parameters and complexity. The function of CSP is simply to split the feature maps into two parts along channels at the input of each stage, one part sends into the convolution block, as usual, the other part skips all layers and concatenate with the output convolution block together as the final block output. In **Figure 33**, one blue and one green square can be seen as a convolution block. The blue arrows show the CSP feature as described above, and the red arrows show the output of each stage. The 1 x 1 convolution before the convolution block is used to increase the feature channels, and the 1 x 1 convolution after the convolution block is used to merge the context of features from the CSP layer. Out1 to Out5 labels the feature maps that are used for dense heads to process the bounding box outputs.

#### **Figure 33.**

*CSPJacinto-SSD model architecture.*


#### **Table 2.**

*Base size of anchor box.*

The dense heads employed in the proposed CSPJacinto-SSD are referred to those in SSD, with some modifications in the anchor boxes based on the concept of multi-head SSD proposed in [60]. At dense head levels 2 to 4, there is an extra location of anchor boxes with offset 0, instead of only the original offset 0.5. This feature can increase the anchor boxes density improving the recall of object detection, especially used for light-weighted SSD models that need more anchor boxes to guide the objects' possibly appeared location.

The anchor box settings are a bit different from the original SSD model. The anchors 1:2 is changed to 1:1.5 because it may make the anchor borders denser, and preserve 1:3 anchors. The base size of the anchor boxes are modified compared to the original SSD, as shown in **Table 2**. Those anchor sizes can better fit with our model input size 256x256.

The performance of these ANN object detection networks in detecting and recognizing the road signs are discussed in the Section 3.

## **3. Results and discussion**

#### **3.1 The DIP based algorithms**

The DIP based algorithms for detecting and recognizing road signs are implemented in C++ on a Visual Studio platform on desktop computer and a Freescale i.MX6. Due to the lack of standard video datasets dedicated to speed limit signs, we have examined the algorithm using the original video frames captured under different weather conditions such as daylight, backlight, cloudy, night, rain and snow.

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

#### *3.1.1 System specifications*

The DIP based algorithm discussed in Section 2.1 for speed limit and speed regulatory signs are realized on the standard desktop machine consisting of Intel® Core™ i7–3770 CPU, operating at 3.6GHz frequency with a memory of DDR3–1600-8GB on a Windows-7 64 bit and Ubuntu 14.04. The same DIP based algorithms are also realized on the Freescale i.MX 6 which is one of the standard developing processors for the real-time vehicular applications with a ARM Cortex-A9 CPU possessing an operating frequency of 1.2 GHz and memory of 1GB working with a Linux operation system. The same has Video Processing Unit (VPU) decoder-H.264, MPEG-4, H.263, MJPE and Image Processing Unit (IPU) possessing blending, rotating, scaling, cropping, de-interlacing, color spacing converting functions.

#### *3.1.2 Performance: speed*

On the desktop computer as well as on the Freescale i.MX 6, the size of images of speed limit signs is set to D1 resolution (720x480). The processing speed of the DIP based algorithm could reach up to 150fps on an average on the desktop computer and about 30fps on the Freescale i.MX 6. The image resolution is set as 1280x720 on the desktop computer and the performance can reach 161 fps on an average. On Freescale i.MX 6, the image resolution is also set as 1280x720 and the performance of the proposed algorithm is 17 fps.

## *3.1.3 Performance: accuracy and comparison*

The performance accuracy of rectangular and circular speed limit signs and triangular speed regulatory signs' detection and recognition by DIP based algorithm discussed in Section 2.1 is tabulated in **Table 3**. The overall accuracy is defined as


#### **Table 3.**

*The accuracies of the speed limit signs and speed regulatory signs detection and recognition.*

"when a car passes a road scene with a road sign instance, the final output of the proposed algorithm is correct as that of the road sign visible to naked eyes."

The detection accuracy of rectangular speed-limit road signs is 96.10% and recognition accuracy is 97.30% accounting to the total accuracy of 93.51%. The detection accuracy of circular speed limit road signs is 95.58% and its recognition accuracy is 96.30% accounting to the overall accuracy of 91.15% whereas the detection accuracy of the triangular speed regulatory signs is 98.33% and the recognition accuracy is 94.92% resulting in the overall accuracy of 93.33%. The performance efficiency of these algorithms are evaluated under different weather conditions such as daytime, cloudy, with strong backlight, nighttime and during snow and rain. Some of these results are tabulated in **Tables 4**–**6**.

The efficiency of the proposed algorithm is also compared with the state-of-the-art works on the road signs detection and recognition.

As listed in **Table 7**, the proposed speed limit signs detection and recognition system is compared with some of the previous works. It can be implemented on embedded systems for real-time ADAS applications as it is capable of performing under computing resource and support both circular and rectangular speed limit road signs such as 15, 20, 25, 30, 35,....., 110 irrespective of the digit fonts from numerous countries adopting to the blob and breach features.


#### **Table 4.**

*Some details of the rectangular speed limit road signs detection and recognition.*


#### **Table 5.**

*Some details of the circular speed limit road signs detection and recognition.*

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*


#### **Table 6.**

*Some details of the triangular speed regulatory road signs detection and recognition.*


#### **Table 7.**

*The comparison of the proposed speed regulatory road signs detection and recognition algorithm with previous works.*

**Table 8** lists the comparison of the proposed speed regulatory signs detection and recognition system with relative previous works. It can also be implemented on the embedded systems for real-time ADAS applications, as it is capable of performing under embedded computing resource and support different types of speed regulatory signs as in **Figure 34** from numerous countries adopting feature extraction and feature matching features.

From the comparison listed in the **Tables 7** and **8**, it can be interpreted that DIP based algorithm discussed in this chapter is more robust i.e., it supports speed limit and speed-regulatory road signs of different types existing in most of the countries. It also performs well with the varied fonts of the speed limit signs and it can be compatibly implemented in an embedded system for the real-time applications. Importantly, it has decent accuracy when working with video as in the real camcorder environment compared to the state-of-the-art methods. Above all, the least complexity of our proposed algorithm yields higher fps compared to those of the other previous works.

**Figure 35** shows the experimental results of the speed limit road signs detection and recognition method for detection and recognition of the rectangular speed limit road signs. **Figure 35(a**-**c)** is the result during the daytime, **Figure 35(d)** is during the cloudy weather, **Figure 35(e**-**f)** is during the rain and **Figure 35(g**-**j)** is during the nighttime.

#### **Figure 34.**

*Some samples of speed regulatory road signs.*


#### **Table 8.**

*The comparison of the proposed speed regulatory road signs detection and recognition algorithm with previous works.*

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

**Figure 35.**

*The overall results for rectangular speed limit signs detection (a-c) during the daytime (d) during the cloudy weather (e-f) during the rain (g-j) during the nighttime.*

**Figure 36** shows the experimental results of the speed limit road signs detection method for detection as well as recognition of the circular speed limit road signs. **Figure 36(a**-**c)** is the result during the daytime, **Figure 36(d**-**e)** is during the backlight condition, **Figure 36(f**-**g)** is during the cloudy weather, **Figure 36(h**-**i)** is during the snow, **Figure 36(j**-**l)** is during the rains and **Figure 36(m**-**o)** is during the nighttime.

**Figure 37** shows the experimental results of the speed regulatory road signs detection and recognition method for detection of the triangular speed regulatory road signs of which **Figure 37(a**-**c)** is the result during the daytime, **Figure 37(d**-**g)** is during the cloudy weather, **Figure 37(h**-**i)** is during the cloudy weather and **Figure 37 (j**-**l)** is during the nighttime.

Additionally, the proposed CV based method is capable of detecting and recognizing the speed limit signs ending with the digit "5" as in **Figure 38**.

#### **3.2 The CNN based algorithms**

The CNN based object detection algorithms such as SSD [51], Faster R-CNN [54], YOLO [55] and the proposed CSPJacinto-SSD are implemented in Python.

#### **Figure 36.**

*The overall results for circular speed limit signs detection (a-c) during the daytime, (d-e) during the backlight condition, (f-g) during the cloudy weather, (h-i) during the snow (j-l) during rains, (m-o) during nighttime.*

**Figure 37.**

*The overall results for triangular speed regulatory road signs detection (a-c) during the daytime, (d-g) during the backlight condition, (h-i) during the cloudy weather (j-l) during the night.*

**Figure 38.** *The detection and recognition result of speed limit ending with the digit '5'.*

In order to carry out the roads signs detection and recognition, the same are trained and tested using a traffic signs dedicated dataset titled 'Tsinghua-Tencent 100 K' [9].

## *3.2.1 System specifications*

The CNN based algorithm discussed in Section 2.1.2 for road signs are realized on the standard desktop machine consisting of Intel® Core™ i7–3770 CPU, operating at 4.2GHz frequency with a memory of DDR3–1600-16GB on a Windows-10 64 bit with Geforce GTX Titan X.

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

#### *3.2.2 Performance: speed*

On the desktop computer, the size of images are used as available in the dataset. The processing speed of the SSD, Faster-RCNN, YOLO and CSPJacinto-SSD object detection algorithms are around 20 fps, 5 fps, 21 fps, and 22 fps respectively.

#### *3.2.3 Performance: accuracy and comparison*

The performance efficiency of the CNN models is mostly calculated using mAP, AP and IoU as per [61]. Average precision (AP) is the most commonly used metric to measure the accuracy of object detection by various CNNs, and image-processing methods. The AP computes the average precision value for recall value. Precision measures how accurate the predictions are, by a method, i.e., the percentage of correct predictions whereas, Recall measures the extent to which the predicted positives are good. Eq. (30) is employed in this paper to estimate the AP where *r* refers to recall rate and ^*r* refers to the precision value for recall. The interpolated average precision [62] was used to evaluate both classification and detection. The intention in interpolating the precision/recall curve in this way is to reduce the impact of the wiggles in the precision/recall values, caused by small variations in the ranking. Similarly, the mean average precision (mAP) is the average of AP. The accuracy of these models [51, 54, 55] in detecting and recognizing road signs from dataset [9] is as tabulated in **Table 9**.

$$AP = \sum (\mathbf{r}\_{\mathbf{n}+1} - \mathbf{r}\_{\mathbf{n}}) \, \mathbf{p}\_{\text{interp}} \, (\mathbf{r}\_{\mathbf{n}+1}) \text{ and } \mathbf{p}\_{\text{interp}} \, (\mathbf{r}\_{\mathbf{n}+1}) = \max\_{\hat{r} \ge m+1} p(\hat{r}) \tag{30}$$

## **4. The comparison of DIP and CNN based methods**

The traditional DIP based methods are popular CV techniques namely SIFT, SURF, BRIEF, to list a few employed for object detection. Feature extraction process was carried out for image classification jobs. The features are descriptive of "interesting" in images. Various CV algorithms, such as edge detection, corner detection and/or threshold segmentation would be involved in this step. Thus extracted features from images forms the basis for definition for an object to be detected for respective class. During deployment of such algorithms, these definitions are sought in other images. If a significant number of features from defined for a class are found in other images, the image is classified, respectively.

Contrastingly, CNN came up with end-to-end learning where a machine learns about the object from a classes of annotated images which is termed as 'training' of a given dataset. During this, the CNN perceives the fundamental patterns in those


**Table 9.**

*Performance efficiency of CNN models in detection and recognition of road signs.*

classes of images and consistently establishes a descriptive salient features for each specific classes of objects.

With almost all the researches and industries that involves CV are now employing CNN based methods, the functionalities of a CV professional has exceptionally changed in terms of both knowledge, skills and expertise as in **Figure 39**.

A comparison between the DIP and CNN based methods is tabulated in **Table 10**. The DIP based methods are more compatible and suited for the real-time applications of the ADAS as it is of low-complexity and does not require any data for pre-training of the system as compared to the data-hungry neural networks based systems. Apart from pre-training and complexity, the Freescale iMX6 consumes a total power of 1.17 W [64] in the video playback idle mode and other embedded systems would lie in the range where a minimum of 300 W [65] required by a basic GPU which makes them power-hungry as well. Additionally, the DIP based methods can perform object detection in any scenes irrespective of having seen the same or similar scene, but the CNN models can perform object detection of the objects that they models have seen

**Figure 39.** *Comparison of DIP and CNN based workflow. Fig. From [63].*


#### **Table 10.**

*Comparison of the proposed system with that of CNN based systems.*

during the training processes. Hence, the CNN models require good-amount of time spent to teach them perform a task followed by testing before employing them for the real-time applications unlike the DIP based methods. Moreover, CNN models exhibit high flexibility and perform better in inclement weathers than DIP based methods.

## **5. The conclusion**

This chapter discussed traditional image processing methods and a few CNN based methods to detection and recognition of road signs for ADAS systems. It has been conclusive that DNNs perform better than the traditional algorithms with certain specific trade-offs with respect to computing requirements and training time. While there are pros and cons of both traditional DIP and CNN based methods, a lot of DIP based CV methods invented over the last 2–3 decades have now become obsolete because newer and much more efficinet methods of CNN have replaced them. However, knowledge and skills gained are also invaluable and not bounded by never inventions instead the knowledge of traditional methods forms a strong foundation for the professional to explore and widen his point-of-viewing a problems. Additionally, there are some of the traditional methods still being used in a hybrid-approach to improvise, innvoate leading to incredible innovations.

## **Acknowledgements**

The authors thank the partial support by the *"Center for mmWave Smart Radar Systems and Technologies"* under the *"Featured Areas Research Center Program"* within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE), Taiwan R.O.C. We also thank the partial support from the Ministry of Science and Technology (MOST), Taiwan R.O.C. projects with grants MOST 108-3017-F-009-001, MOST 110-2221-E-A49-145-MY3, and MOST 109-2634-F-009-017 through Pervasive Artificial Intelligence Research Labs (PAIR Labs) in Taiwan, R.O.C. as well as the partial support from the Qualcomm Technologies under the research collaboration agreement 408929.

## **Conflict of interest**

The authors declare no conflict of interest.

## **Author details**

Vinay M. Shivanna<sup>1</sup> \*, Kuan-Chou Chen<sup>1</sup> , Bo-Xun Wu1 and Jiun-In Guo1,2,3

1 Department of Electronics Engineering and Institute of Electronics, National Yang Ming Chiao Tung University, Hsinchu City, Taiwan

2 Pervasive Artificial Intelligence Research (PAIR) Labs, Hsinchu City, Taiwan

3 Wistron-NCTU Embedded Artificial Intelligence Research Center, Hsinchu City, Taiwan

\*Address all correspondence to: vinay.ms23@gmail.com

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

## **References**

[1] J. Urry, "The 'System' of Automobility", Theory, Culture & Society, vol. 21, no. 4-5, pp. 25-39, October 2004.

[2] E. Eckermann, World History of the Automobile, Society of Automotive Engineers, Warrendale, PA, 2001.

[3] "Transportation: Motor Vehicle Accidents and Fatalities", The 2012 Statistical Abstract. U.S. Census Bureau, September. 2011.

[4] What is Machine Learning[Internet]? Ibm.com. Available from: https://www.ibm.com/cloud/learn/ machine-learning

[5] What is Digital Image Processing (DIP) | IGI Global [Internet]. Igi-global. com. Available from: https://www.igiglobal.com/dictionary/digitalimage-processing-dip/48620

[6] Neural Network Definition[Internet]. Investopedia. Available from: https:// www.investopedia.com/ terms/n/neuralnetwork.asp

[7] How Artificial Intelligence Works [Internet]. Investopedia. Available from: https://www.investopedia.com/ terms/a/artificial-intelligence-ai.asp

[8] Houben S, Stallkamp J, Salmen J, Schlipsing M, Igel C. Detection of traffic signs in real-world images: The German traffic sign detection benchmark. The 2013 International Joint Conference on Neural Networks (IJCNN). IEEE; 2013. p. 1-8. Available from: http://10.1109/IJCNN.2013. 6706807

[9] Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S. Traffic-Sign Detection and Classification in the Wild. 2016 IEEE

Conference on Computer Vision and Pattern Recognition (CVPR) . IEEE; 2016. p. 2110-2118. Available from: http://10.1109/CVPR.2016.232

[10] Deng J, Dong W, Socher R, Li L, Li K, Li F. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2009. p. 248– 255. Available from: http://10.1109/ CVPR.2009.5206848.

[11] Everingham M, Van Gool L, Williams C, Winn J, Zisserman A. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision. 2009;88(2): 303-338.

[12] Mechanical Simulation [Internet]. Carsim.com. Available from: https:// www.carsim.com/

[13] Jim Torresen, Jorgen W. Bakke and Lukas Sekanina, "Efficient Recognition of Speed Limit Signs," Proc. 2004 IEEE Intelligent Transportation Systems Conference, Washington, D.C., USA, October 3-6, 2004.

[14] Fabien Moutarde, Alexandre Bargeton, Anne Herbin, and Lowik Chanussot, "Robust on-vehicle real-time visual detection of American and European speed limit signs, with a modular Traffic Signs Recognition system," Proc. 2007 IEEE Intelligent Vehicles Symposium, Istanbul, Turkey, June 13-15, 2007.

[15] Christoph Gustav Keller, Christoph Sprunk, Claus Bahlmann, Jan Giebel and Gregory Baratoff, "Real-time Recognition of U.S. Speed Signs," Proc. 2008 IEEE Intelligent Vehicles Symposium, June 4-6, 2008, The Netherlands.

[16] Wei Liu,Jin Lv, Haihua Gao, Bobo Duan, Huai Yuan and Hong Zhao, "An Efficient Real-time Speed Limit Signs Recognition Based on Rotation Invariant Feature", Proc. 2011 IEEE Intelligent Vehicles Symposium (IV) Baden-Baden, Germany, June 5-9, 2011.

[17] Zumra Malik and Imran Siddiqi, "Detection and Recognition of Traffic Sign Road Scene Images", 12th International Conference on Frontiers of Information Technology, pp 330-335, Dec 17-19, 2014.

[18] Vavilin Andrey and Kang Hyun Jo, "Automatic Detection and Recognition of Traffic Signs using Geometric Structure Analysis", International Joint Conference on SICE-ICASE, Oct 18-21, 2006.

[19] Lipu Zhou, Zhidong Deng, "LIDAR and Vision-Based Real-Time Traffic Sign Detection and Recognition Algorithm for Intelligent Vehicle", IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), Oct 8-11, 2014.G. Eason, B. Noble, and I. N. Sneddon, "On certain integrals of Lipschitz-Hankel type involving products of Bessel functions," Phil. Trans. Roy. Soc. London, vol. A247, pp. 529–551, April 1955. *(references)*

[20] Sebastian Houben, Johannes Stallkamp, Jan Salmen, Marc Schlipsing, and Christian Igel, "Detection of Traffic Signs in Real-World Images: The German Traffic Sign Detection Benchmark" The International Joint Conference on Neural Networks (IJCNN), 2013.

[21] M. liang, M. Yuan, X. Hu, J. Li, and H. Liu, "Traffic sign detection by supervised learning of color and shape," in Proceedings of IEEE International Joint Conference on Neural Networks, 2013.

[22] M. Mathias, R. Timofte, R. Benenson, and L. V. Gool, "Traffic sign recognition - how far are we from the solution?" in Proceedings of IEEE International Joint Conference on Neural Networks, 2013.

[23] G. Wang, G. Ren, Z. Wu, Y. Zhao, and L. Jiang, "A robust, coarse-to-fine traffic sign detection method," in Proceedings of IEEE International Joint Conference on Neural Networks, 2013.

[24] Supreeth H.S.G, Chandrashekar M Patil, "An Approach Towards Efficient Detection and Recognition of Traffic Signs in Videos using Neural Networks" International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 2016, pp 456-459

[25] Nadra Ben Romdhane, Hazar Mliki, Mohamed Hammami, "An mproved Traffic Signs Recognition and Tracking Method for Driver Assistance System", IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS)-2016

[26] B. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision", The International Joint Conference on Artificial Intelligent, 1981, pp. 674–679.

[27] Y. Zhang, C. Hong and W. Charles, "An efficient real time rectangle speed limit sign recognition system," 2010 IEEE Intelligent Vehicles Symposium, San Diego, CA, 2010, pp. 34-38. DOI: 10.1109/IVS.2010.5548140

[28] A. Mammeri, A. Boukerche, J. Feng and R. Wang, "North-American speed limit sign detection and recognition for smart cars," 38th Annual IEEE Conference on Local Computer Networks - Workshops, Sydney, NSW, 2013, pp. 154-161

*A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

[29] C. Wang, "Research and Application of Traffic Sign Detection and Recognition Based on Deep Learning," 2018 International Conference on Robots & Intelligent System (ICRIS), 2018, pp. 150-152, doi: 10.1109/ ICRIS.2018.00047.

[30] R. Hasegawa, Y. Iwamoto and Y. Chen, "Robust Detection and Recognition of Japanese Traffic Sign in the Complex Scenes Based on Deep Learning," 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE), 2019, pp. 575-578, doi: 10.1109/ GCCE46687.2019.9015419.

[31] Y. Sun, P. Ge and D. Liu, "Traffic Sign Detection and Recognition Based on Convolutional Neural Network," 2019 Chinese Automation Congress (CAC), 2019, pp. 2851-2854, doi: 10.1109/ CAC48633.2019.8997240.

[32] Y. Yang, H. Luo, H. Xu and F. Wu, "Towards Real-Time Traffic Sign Detection and Classification," in IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 7, pp. 2022-2031, July 2016, doi: 10.1109/ TITS.2015.2482461.

[33] R. Jain and D. Gianchandani, "A Hybrid Approach for Detection and Recognition of Traffic Text Sign using MSER and OCR," 2018 2nd International Conference on I-SMAC, 2018, pp. 775-778, doi: 10.1109/I-SMAC.2018.8653761.

[34] M. Z. Abedin, P. Dhar and K. Deb, "Traffic sign recognition using hybrid features descriptor and artificial neural network classifier," 2016 19th International Conference on Computer and Information Technology (ICCIT), 2016, pp. 457-462, doi: 10.1109/ ICCITECHN.2016.7860241.

[35] Lin Y, Chou T, Vinay M, Guo J. Algorithm derivation and its embedded system realization of speed limit detection for multiple countries. 2016 IEEE International Symposium on Circuits and Systems (ISCAS). Montreal, QC: IEEE; 2016. p. 2555-2558. Available from: http://10.1109/ ISCAS.2016.7539114

[36] Chou T, Chang S, Vinay M, Guo J. Triangular Road Signs Detection and Recognition Algorithm and its Embedded System Implementation. The 21st Int'l Conference on Image Processing, Computer Vision and Pattern Recognition. CSREA Press; 2017. p. 71-76. Available from: http://ISBN: 1-60132-464-2

[37] Gareth Loy and Nick Bames, "Fast Shape-based Road Sign Detection for a Driver Assistance System," Proc. IEEE/ RSl International Conference on Intelligent Robots and Systems, September 28 - October 2, 2004.

[38] Nick Barnes and Gareth Loy, "Realtime regular polygonal sign detection", Springer Tracts in Advanced Robotics Volume 25, pp 55-66, 2006.

[39] Sebastian Houben, "A single target voting scheme for traffic sign detection," Proc. 2011 IEEE Intelligent Vehicles Symposium (IV), Baden-Baden, Germany, June 5-9, 2011.

[40] Feature Detectors - Sobel Edge Detector. Homepages.inf.ed.ac.uk. Available from: https://homepages.inf.ed .ac.uk/rbf/HIPR2/sobel.htm

[41] Fatin Zaklouta and Bogdan Stanciulescu, "Real-time traffic sign recognition in three stages," Robotics and Autonomous Systems, Volume 62, Issue 1, January 2014.

[42] Derek Bradley and Gerhard Roth, "Adaptive Thresholding using the Integral Image," Journal of Graphics,

GPU, and Game Tools, Volume 12, Issue 2, 2007.

[43] Alexandre Bargeton, Fabien Moutarde, Fawzi Nashashibi, and Benazouz Bradai, "Improving pan-European speed-limit signs recognition with a new "global number segmentation" before digit recognition," Proc. 2008 IEEE Intelligent Vehicles Symposium, Eindhoven, The Netherlands, June 4-6, 2008.

[44] Dang Khanh Hoa, Le Dung, and Nguyen Tien Dzung, "Efficient determination of disparity map from stereo images with modified Sum of Absolute Differences (SAD) algorithm", 2013 International Conference on Advanced Technologies for Communications (ATC 2013)

[45] J. R. Parker, "Vector Templates and Handprinted Digit Recognition," Proc. 12th IAPR International Conference on Pattern Recognition, 1994. Vol. 2 - Conference B: Computer Vision & Image Processing, Jerusalem, 9-13 Oct 1994.

[46] Phalgun Pandya and Mandeep Singh, "Morphology Based Approach To Recognize Number Plates in India," International Journal of Soft Computing and Engineering (IJSCE), Volume-1, Issue-3, July 2011.

[47] Kamaljit Kaur and Balpreet Kaur, "Character Recognition of High Security Number Plates Using Morphological Operator," International Journal of Computer Science & Engineering Technology (IJCSET), 2011 IEEE Intelligent Vehicles Symposium (IV), Vol. 4, May, 2013.

[48] Lifeng He and Yuyan Chao, "A Very Fast Algorithm for Simultaneously Performing Connected-Component Labeling and Euler Number Computing", IEEE TRANSACTIONS

ON IMAGE PROCESSING, VOL. 24, NO. 9, SEPTEMBER 2015

[49] Rachid Belaroussi and Jean Philippe Tarel, "Angle Vertex and Bisector Geometric Model for Triangular Road Sign Detection", IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1-7, 2009.

[50] H. Bay, T. Tuytelaars, and L. Van Gool, "Surf: Speeded up robust features", European Conference on Computer Vision, May 2006.

[51] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C et al. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision (ECCV). 2016. p. 21–37. Available from: http://arXiv: 1512.02325

[52] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014

[53] Girshick R. Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV) [Internet]. IEEE; 2015. Available from: http://10.1109/ ICCV.2015.169

[54] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," in Advances in Neural Information Processing Systems, 2014, pp. 2672– 2680.

[55] Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017; 39(6):1137-1149.

[56] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, *A Study on Traditional and CNN Based Computer Vision Sensors for Detection… DOI: http://dx.doi.org/10.5772/intechopen.99416*

A. Courville, and Y. Bengio, "Generative adversarial nets," in Advances in Neural Information Processing Systems, 2014, pp. 2672– 2680.

[57] Martin Arjovsky, Soumith Chintala, and LeonBottou.Wasserstein GAN. arXiv:1701.07875v2 [stat.ML], 9 Mar 2017.

[58] C. Wang, H. M. Liao, Y. Wu, *P. Chen*, J. Hsieh, and I. Yeh, "CSPNet: A New Backbone that can Enhance Learning Capability of CNN," in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 14-19 June 2020 2020, pp. 1571-1580.

[59] M. Mathew, K. Desappan, P. K. Swami, and S. Nagori, "Sparse, Quantized, Full Frame CNN for Low Power Embedded Devices," in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 21-26 July 2017 2017, pp. 328-336

[60] C. Y. Lai, B. X. Wu, T. H. Lee, V. M. Shivanna, and J. I. Guo, "A Light Weight Multi-Head SSD Model For ADAS Applications," in 2020 International Conference on Pervasive Artificial Intelligence (ICPAI), 3-5 Dec. 2020 2020, pp. 1-6

[61] Jonathan, H. mAP (mean Average Precision) for Object Detection. Available online: **https://medium.com/ @jonathan\_hui/map-mean-averageprecision-for-object-detection-45c 121a31173** (accessed on 12 July 2018).

[62] Salton, G.; McGill, M.J. *Introduction to Modern Information Retrieval*; McGraw-Hill: New York, NY, USA, 1986. [**Google Scholar**]

[63] Wang J, Ma Y, Zhang L, Gao RX (2018) Deep learning for smart

manufacturing: Methods and applications. J Manuf Syst 48:144–156. https://doi.org/10.1016/J.JMSY. 2018.01.003

[64] Freescale Semiconductor, "i.MX 6Dual/6Quad Power Consumption Measurement" from "https://bit.ly/ 2ATVcWk"

[65] GEFORCE. "Desktop GPUs-Specifications" from "https://www. geforce.co.uk/hardware/desktopgpus/geforce-gt-1030/specifications"

## **Chapter 7**

## *Smart-Road*: Road Damage Estimation Using a Mobile Device

*Izyalith E. Álvarez-Cisneros, Blanca E. Carvajal-Gámez, David Araujo-Díaz, Miguel A. Castillo-Martínez and L. Méndez-Segundo*

## **Abstract**

Mexico is located on five tectonic plates, which when moving, generate telluric movements. These movements, depending on their intensity, affect the telecommunications infrastructure. Earthquakes tend to cause landslides, subsidence, damage to structures in houses, buildings, and roads. In the case of road damage, it is reflected in cracks in the pavement, which are classified according to their size, shape, and depth. The methods that are currently implemented to inspect roads mainly use human perception and are limited to a superficial inspection of the terrain, causing this process ineffective for the timely detection of damage. This work presents a method of road analysis using a drone to acquire images. For the processing and recognition of damages, a mobile device is used, allowing to determine the damage type on the road. Artificial intelligence techniques are implemented to classify them into linear cracks or zig-zag cracks.

**Keywords:** convolutional neural networks, computational vision, descriptors, cracks road, earthquakes

## **1. Introduction**

A country that is endowed with good road infrastructures can generate the basic elements of competitiveness and provide opportunities for better economic development and at the same time to promote its social and cultural development [1]. There are several factors by which access roads can be affected, and some examples are as follows: time, use, excessive weight, the quality of materials, ubication, natural disasters, etc. We can highlight the damage to the roads caused by earthquakes, which is due to the movement of the tectonic plates, it can cause fissures that are on the surface. Within the damages caused on the roads, deterioration can occur from small cracks too wide ruptures or separations in the road, and these types of incidents tend to occur mainly in the seismic areas of the country.

#### **1.1 Seismicity in Mexico**

The Mexican Republic is in located one of the most seismically active regions in the world and is immersed within the area known as the Circumpacific Belt (or Pacific

Ring of Fire), where the greatest seismic and volcanic activity on the planet is concentrated [2], **Figure 1**.

The Global Seismic Hazard Assessment Program (GSHAP) was a project sponsored by the United Nations (UN) that assembled the first worldwide map of earthquake zones [3]. In Mexico, an earthquake hotspot follows the route through the Sierra Madre Occidental that reaches south of Puerto Vallarta to the Pacific coast on the border with Guatemala [3]. **Figure 2** shows the seismic regionalization in the Mexican Republic, marking the zone A as the one with the lowest risk, followed by zones B, C, and D. These last three are the ones that generate the greatest concern to the scientific community and the inhabitants of these areas, due to the structural damage that occurs on the roads and in their localities.

#### **Figure 1.**

*Circumpacific Belt Zone. Source: National Institute of Statistic and Geography.*

**Figure 2.** *Areas with the highest propensity to earthquakes in Mexico [3].*

## **1.2 Road network in Mexico**

In Mexico, as in other countries, the road network is the most widely used transport infrastructure. The national network has 378,923 km, which is made up of avenues, streets, highways, and rural roads that allow connectivity between practically all the populations of the country. **Figure 3** shows the main roads that interconnect the Mexican Republic [4].

## *1.2.1 Damage to land access roads*

Seismic activity is recurrent in certain areas, causing damage to road infrastructure. These damages are identified as fissures, cracks in the asphalt, landslides, separation of road sections, subsidence, and other damages in different access roads that interconnect the country. The road must be inspected, and the damages detected must be reported and repaired. **Figure 4** shows some examples of damage caused by seismicity in Mexican territory on different roads [5].

**Figure 3.** *Mexico's major highways 2009 [4].*

#### **Figure 4.**

*Damage to roads caused by earthquakes, left: Chiapas, 5.6°r earthquake, right: Oaxaca, 5.2°r. Source: Google.*

In Mexico, most of the road inspections after an earthquake are carried out in person, which can generate more conflicts in some critical points. In these conflict points, semi-autonomous surveillance systems are required that implement mobile technology to detect damage to access roads. Therefore, it is proposed to develop a methodology that, through image processing techniques and neural networks, allows the identification of damage to roads, classifying two types of cracks: linear and zigzag. This chapter is divided into six sections. Section 2 explains the related work, Section 3 explain methods and materials, Section 4 provides tests and results, Section 5 conclusion and finally Section 6 discussion.

## **2. Related work**

There is research in the field of artificial intelligence related to techniques and practices used to automate the detection of road defects. Below are some related works that have been developed: In [6], an automatic system for identifying cracks in roads through a camera is developed. It scans the roads by zone and inspects the condition of cracks and fissures. The authors propose the following stages: i) smooth, adjust, and binarize the image using the threshold value method, ii) perform morphological operations such as dilation and erosion, iii) eliminate false cracks in the image with smoothing filters, iv) clean and perform the connection of cracks in the image, and finally v) estimate the shape of the crack, using geometric characteristics and shape description. In [7], using texture classifiers, the authors address the descriptors. Through these techniques, it is possible to detect color and texture changes in an image, and thus perform the identification of edges by extracting a set of characteristics, generated from these histograms. For each frame of a pavement video analyzed, the method extracts the characteristic and creates its binary version to classify each region.

In [8], the authors apply morphological operations to the images and segment the images using the K near-neighbor (Knn) method. The proposed algorithm highlights the information of the image texture, and the results are classified using the standard deviation; to define regions delimited by the intensity of gray, these techniques allow to detect patches on the roads through images processed on a Smartphone. **Figure 5** shows the results presented by the authors.

In [9], a system for identifying cracks in buildings from an unmanned aerial vehicle (UAV) equipped with a camera is presented. Fly through the building to acquire images, which are transmitted remotely *via* Wi-Fi to a computer for processing. Images are segmented with techniques to change the Red, Green, Blue (RGB) color space to grayscale. The threshold is calculated with statistical methods (mean and standard deviation), to categorize the black and white pixels and identify the cracks in the building.

Maeda et al. [9] developed a system for identifying cracks in the pavement, where images are captured from a Smartphone mounted on a cell phone holder on the dashboard of a car. It develops an application that analyzes images obtained from the Smartphone through a deep neural network that allows the identification of cracks in the road. In this work, they use deep neural networks such as Region-based Convolutional Neural Networks (R-CNN), You Only Look Ones (YOLO), and Single Shot MultiBox Detector (SSD), for the extraction of characteristics from the region of interest (cracks). In 2019, Zhang et al. [10] propose an intelligent monitoring system to evaluate the damage in the pavement, and this methodology proposes the use of a

Smart-Road*: Road Damage Estimation Using a Mobile Device DOI: http://dx.doi.org/10.5772/intechopen.100289*

#### **Figure 5.**

*Detection of the mobile damage system. a) Hole in the pavement, b) longitudinal crack, c) transverse crack, and d) horizontal crack [8].*

set of points of an image obtained from a UAV, making use of Harris performing the processing in the cloud for the identification of cracks in the pavement.

## **3. Methods and materials**

In **Figure 6**, the general architecture of the proposed methodology for the classification and identification of linear and zig-zag cracks is shown.

**Figure 6.** *Proposed architecture for the identification and classification of cracks.*

**Figure 6** shows the methodology is composed of different stages of development which are image acquisition, pre-processing, descriptors, classification, and result. Each of these stages is detailed below.

## **3.1 Image acquisition**

In this step, the image is taken with the camera that the PARROT BEBOP 2 FPV Drone has, which has the following characteristics: 14-megapixel camera with wideangle lens, unique digital image stabilization system, live video from a Smartphone or tablet with a viewing angle of 180°, photo format: RAW, JPEG, DNG, and image resolution of 3800 3188 pixels to automate the route of the drone, a function implemented to trace the flight path is used. In **Figure 7**, the programmed route map is shown.

## **3.2 Pre-processing**

## *3.2.1 Dimension reduction*

In this section, the image scaling is performed by implementing the Discrete Wavelet Transform "Haar" (DWT-H). In **Figure 8**, three levels of decomposition are shown.

## *3.2.2 Edge enhancement*

To obtain the edge enhancement of the image obtained from point 3.2.1, the use of a Laplacian filter is proposed. The Laplacian of an image highlights the regions of rapid intensity change and is an example of a second-order or a second derivative method of enhancement. It is particularly good at finding the fine details of an image. Any feature with a sharp discontinuity will be enhanced by a Laplacian operator [11]. The Laplacian is a well-known linear differential operator approximating the second derivative given by Eq. (1).

**Figure 7.** *Simulation of a programmed route at 15 mts.*

**Figure 8.** *DWT-H decomposition.*

where *f* denotes the image. The following process is performed, a 3 � 3 matrix is convolved with the image, **Figure 9**.

*∂*2 *f*

*<sup>∂</sup>y*<sup>2</sup> (1)

∇2 *<sup>f</sup>* <sup>¼</sup> *<sup>∂</sup>*<sup>2</sup> *f ∂x*<sup>2</sup> þ

In **Figure 10**, the Laplacian filtering process is shown. This process consists of the following steps: From the image obtained by the DWT-H (**Figure 10a**), the convolution is performed with the proposed 3 � 3 kernel (**Figure 10b**). Finally, the sub-image of the crack is obtained with the edges highlighted as seen in **Figure 10d**.

#### **3.3 Feature extraction**

One of the main objectives of this work is to implement the methodology on a mobile device, which will perform the image processing offline, obtaining the result

**Figure 9.** *Convolution of the 3* � *3 kernel at a point (x, y) in the image.*

**Figure 10.** *The result obtained by the Laplacian Filter.*

on the site. It is therefore essential to extract only the key points that provide information about features outstanding image and thus make their classification in a convolutional neural network LeNet efficient. To do this, it is proposed to perform the extraction of the characteristics through the scale-invariant feature transform (SIFT) and the pixel rearrangement of the points thrown from the Laplacian filter through statistical moments. The following is the extraction of the characteristics:

### *3.3.1 Statistical central moments*

Central moments also referred to as moments of the mean has been calculated as [12], Eq. (2),

$$\mu\_m = \sum\_{n=0}^{L-1} ((X\_n - \chi)^m t(X\_n)) \tag{2}$$

where '*m*' is the order of the moment, '*L*' is the number of possible intensity values, '*Xn*' is the discrete variable that represents the intensity level in the image, and '*y*' is the mean of the values, *t*(*Xn*) is the probability estimate of the occurrence of '*Xn*', Eq. (3).

$$\mathcal{Y} = \sum\_{n=0}^{L-1} (\mathbf{X}\_n \mathbf{t}(\mathbf{X}\_n)) \tag{3}$$

The mean is the first-order moment followed by variance, skewness, and kurtosis as the second, third, and fourth moments. The mean at the first-order central moment is used to measure the average intensity value of the pixel distribution. Variance (μ2) was used to measure how wide the pixels spread over from the mean value, Eq. (4).

$$\mu\_2 = \sum\_{n=0}^{L-1} \left( (X\_n - \jmath)^2 t(X\_n) \right) \tag{4}$$

To know the dispersion of the values located as key points by SIFT, the second central moment is implemented to group the pixels of the image processed by the Laplacian Filter. The smoothness texture "R" is defined by Eq. (5),

$$R = \left(\mathbf{1} - \left(\mathbb{1}\_{\left(1+\mu\_2(x)\right)}\right)\right) \tag{5}$$

where 'μ2' is the variance and '*x*' is an intensity level. Then, the following condition is established, by Eq. (6),

$$I\_{\text{Fisure}}(i,j) = \begin{cases} R - \mathbf{y} = \mathbf{1}, & I\_{\text{Fisure}} = \mathbf{1} \\ \text{OTHERWISE,} & I\_{\text{Fisure}} = \mathbf{0} \end{cases} \tag{6}$$

*3.3.2 SIFT*

According to the SIFT methodology [13], the first step is scale detection. For the particular case of the crack contour, this step is very useful for the identification of the crack, since the taking of the images can vary depending on the shooting distance. The formal description of this step is detailed below.

#### *3.3.2.1 Scale detection*

The scalar space L (x, y) of an image, is obtained from the convolution of an input image *IFissure*, through a Gaussian filter G (x, y, σ) at different scales of the value of σ = 0.5 [13], Eq. (7):

$$L(\mathbf{x}, \mathbf{y}, \sigma) = G(\mathbf{x}, \mathbf{y}, \sigma) \* I\_{\text{Fisure}}(\mathbf{x}, \mathbf{y}, \sigma) \tag{7}$$

where, ð Þ¼ *<sup>x</sup>*, *<sup>y</sup>*, *<sup>σ</sup>* <sup>1</sup> <sup>2</sup>*πσ*<sup>2</sup> *e* � *<sup>x</sup>*2þ*y*<sup>2</sup> ð Þ <sup>2</sup>*σ*<sup>2</sup> , is the function of the Gaussian filter; it is applied in both dimensions (x,y) of the *IFissure* image plane.

To obtain the different scale versions of the *IFissure* image, it is necessary to multiply the value of σ with different values of the constant k to obtain the projections of the contiguous scales (where k takes values k > 1), each scale's projection is subtracted with the original scale, obtaining the differences from the original image *IFissure*, Eq. (8):

$$D\_m(\mathfrak{x}, \mathfrak{y}, \sigma) = L(\mathfrak{x}, \mathfrak{y}, k\sigma) - L(\mathfrak{x}, \mathfrak{y}, \sigma) \tag{8}$$

The search for extreme values on the spatial scale produces multiple candidates of which the points that are not considered are the low contrast ones since they are not stable to changes in lighting and noise. Eq. (9) shows how the points of interest are located within the image and these locations are given by [13]:

$$z = -\frac{\partial^2 D^{-1}(\mathbf{x}, y, \sigma)}{\partial \mathbf{x}^2} \frac{\partial D(\mathbf{x}, y, \sigma)}{\partial \mathbf{x}} \tag{9}$$

Subsequently, the vectors are arranged according to the orientation of the points obtained from Eq. (9), and it is explained below.

#### *3.3.2.2 Orientation mapping*

This step assigns a constant orientation to the key points based on the properties of the image obtained in the previous steps. The key point descriptor can be represented with this orientation, achieving the invariance to rotation, which is important to highlight because the image can be taken at different shooting angles. The procedure to find the orientation of the points, is as follows [13]:

Using the scalar value of the points of interest selected in Eq. (4).

a. Calculation of the magnitude value, M.

$$M(\mathbf{x}, \mathbf{y}) = \sqrt{\left(\left(L(\mathbf{x} + \mathbf{1}, \mathbf{y}) - L(\mathbf{x} - \mathbf{1}, \mathbf{y})\right)^2 + \left(L(\mathbf{x}, \mathbf{y} + \mathbf{1}) - L(\mathbf{x}, \mathbf{y} - \mathbf{1})\right)^2\right)} \tag{10}$$

## b. Calculation of orientation, θ

$$\theta(\mathbf{x}, \mathbf{y}) = \tan^{-1}((L(\mathbf{x}, \mathbf{y} + \mathbf{1}) - L(\mathbf{x}, \mathbf{y} - \mathbf{1})/(L(\mathbf{x} + \mathbf{1}, \mathbf{y}) - L(\mathbf{x} - \mathbf{1}, \mathbf{y})))) \tag{11}$$

Finally, the description of the characteristic points obtained in the previous steps must be identified the interesting points, **Figure 11**.

## *3.3.3 Convolutional neural network LeNet*

The neural network will allow, based on the characteristics obtained, to train and identify the cracks that appear in the image. The neural network used in this research is the convolutional neural network LeNet, which is made up of five layers of neurons in its architecture, and has an input of 1024 � 1024 � 3 values, and an output of two possible classes [14]. LeNet is a network that is optimized for mobile devices, which allows greater efficiency in the detection and performance of the processes on the mobile device. The network architecture is presented in below **Figure 12** [14]:

For the training of the network, a collection of approximately 500 images were made in various areas of Tecámac, State of Mexico. Therefore, in this investigation,

#### **Figure 11.**

*Results obtained from the proposed methodology: a) original image, b) image obtained with the DWT-H, c) image with the Laplacian filter, d) image obtained with the descriptors, and e) final image.*

**Figure 12.** *LeNet neural network architecture [14].*

**Figure 13.** *Zig-zag cracks.*

**Figure 14.** *Linear cracks.*

cracks with different intensities will be detected, so these will be identified and classified in the following categories [15]:

Erratic or zigzag cracks (ZZC): These types of cracks in the pavement with erratic longitudinal patterns. It is presented by extreme changes in temperature, defective base, and seismic movements.

Significant cracks (LC): These are cracks with a length greater than 30 centimeters. Very significant cracks (VSC): Those that are shown in the pavement and have a length greater than 60 centimeters due to their size are a risk. These cracks are the most visible.

Non-significant cracks (NSC): These are cracks that appear in the pavement and that have a fine shape and a length of fewer than 30 centimeters. **Figures 13** and **14** show images referring to the classifications that have been delimited for identification. These define the two classes to detector zig-zag crack (ZZC) and linear crack (LC), respectively.

## **4. Tests and results**

To perform the tests, they were divided into phases to estimate the time of each one of these and thus detect which of them generates a greater consumption of time compared to the others. The processing and results were developed on a Motorola X4 mobile device with a processor: 2.2 GHz and 3 GB RAM. To select the optimum distance for taking the images, tests were carried out between 10 and 30 meters above ground level. At each distance, it was ensured that the images were clear and that the crack would be visualized. **Figure 15** shows the range in height and image visibility for the 500 sample images. From **Figure 15**, we can see that at a height of 10 meters the drone has a visibility range of 26 meters in radius. In a similar way we can observe that for heights of 15, 20, 25 and 30 meters, they correspond to 40, 53, 67 and 80 meters of visibility radius. For our case we consider a height between 15 and 20 meters.

**Figure 15.** *Parrot Drone viewing distances in meters.*

## **4.1 Phase 1. Distance estimation**

To validate the distances shown in **Figure 15** and their visibility range, four consecutive objects are placed on the crack in the road. From **Figure 16**, only three elements can be observed which are enclosed in circles as can be seen. The dimensions

of the objects placed on the crack are 10 � 10 cm, which were used to estimate the field of view of the drone camera. Based on these tests, a height of 15 meters is proposed for clear detection of the object by the drone, coupled with its stability in the air currents present in the tests.

## **4.2 Phase 2. Estimation of the pre-processing stage**

**Table 1** shows the average times calculated for the number of samples acquired, as the DWT-H decomposition increases, the average processing time increases. From **Table 1**, we observe the processing times for the feature extraction and classification stage for each decomposition scale of the DWT-H. The dimension of the initial image is 2048 � 2048, we observe that decomposition level 4 it gives a processing time of 14,445 ms for the two proposed stages.

## **4.3 Phase 3. Descriptors**

**Table 2** shows the average results obtained for the 500 images in the feature extraction stage. From **Table 1**, it is concluded that the optimal wavelet decomposition size for this estimation is at the fourth wavelet decomposition level.

## **4.4 Phase 4. Classification of images**

The tests to validate the proposed methodology were carried out with 150 images acquired at a height of 15 meters. During the development of the test scenarios, four cases were considered: two correct classifications and two wrong classifications. The correct classifications are true positive (TP) and false positive (FP); and the misclassifications are false negative (FN) and true negative (TN). By using these metrics, we can obtain different performance measures like [13].



#### **Table 1.**

*Result of pre-processing stage time.*


#### **Table 2.**

*Descriptor stage processing time result.*

$$\text{Se} = \frac{\text{TP}}{(\text{TP} + \text{FN})} \tag{13}$$

$$Acc = \frac{(TP + TN)}{cracks \text{ detected in the image}} \tag{14}$$


#### **Table 3.**

*Confusion matrix.*


#### **Table 4.**

*The obtained results from the acquired images.*

#### **Figure 17.**

*Results obtained by the proposed methodology: a), c), and e) original image, b) and d) processed image (ZZC), finally f) processed image (LC).*

Smart-Road*: Road Damage Estimation Using a Mobile Device DOI: http://dx.doi.org/10.5772/intechopen.100289*

**Figure 18.** *Graphical user interface for interaction with the proposed methodology: a) LC detected and b) ZZC detected.*

where specificity (Sp) is the ability to detect non-crack pixels, sensitivity (Se) reflects the ability of the algorithm to detect the edge of the crack, Accuracy (Acc) measures the proportion of the total number of pixels obtained correctly (sum of true positives and true negatives) by the total number of pixels that constitute the image of the cracks [13]; this is the probability that a pixel belonging to the crack image will be correctly identified. **Table 3** shows the results obtained from 150 test images that were acquired in flight, obtaining a total of 140 images (TP), 1 (FN), 5 (FP), and 4 (TN).

In **Table 4**, the results obtained from Acc, Sp, Se of the 150 acquired test images are shown. From **Table 4**, the results obtained show that 99.29% was obtained for Acc, which indicates that in this percentage the cracks were detected and classified positively. In addition, 96.55% of Sp represents that the result of no crack is true, as well as the value of Se with 80% to detect that it is not a crack.

In **Figure 17**, some images obtained through the proposed methodology and the result obtained from the classification are shown.

Finally, the mobile application that serves as the development and user interface is shown in **Figure 18**.

## **5. Discussion**

Based on the tests carried out in the monitoring of the roads using the Parrot drone, we observed that the height between 15 and 20 meters gives satisfactory results. Within the development of the proposal, the size reduction stage made it possible to speed up the processing of the extraction of the characteristics, as well as the proposal to reduce the key points obtained by the statistical descriptors and SIFT, through Eq. (6). These development stages are fundamental because all the crack detection and identification processing are carried out internally on a mid-range mobile device. The section of the LeNet neural network was streamlined through the preprocessing stage, observing that the precision results obtained were not affected, which are 99%, even limiting the data that are entered into the neural network.

## **6. Conclusion**

In conclusion, it can be emphasized the fact that the objectives that were sought to be achieved with the identification of cracks in roads, streets, highways or avenues, were achieved. It had the specific characteristics that allowed using the proposed processes on a mobile device, and it was possible to demonstrate that the processing of the proposed methodology was developed on an Android platform that to date is one of the most commercial platforms worldwide between mobile devices. The preprocessing results show a clear trend in terms of the time required to adapt an image and perform the crack identification process, time that does not exceed 14.79 ms, thanks to the use of DWT-H instead of other processes that require greater computational complexity for image size reduction. On the other hand, the results show that the proposed operations are 99% accurate in finding cracks. It was also found that the times of certain stages of the process can be improved by changing some processes such as the scaling of the images, which reduces the time by up to 200 milliseconds, among other possible improvements that can be implemented.

## **Acknowledgements**

The work team is grateful for the support provided to perform this research to the Secretaria de Educación, Ciencia, Tecnología e Innovación de la Ciudad de México with the project SECITI/072/2016 and SECTEI/226/2019. We also thank the Instituto Politécnico Nacional for the research project SIP 20210178.

## **Conflict of interest**

The authors declare no conflict of interest.

Smart-Road*: Road Damage Estimation Using a Mobile Device DOI: http://dx.doi.org/10.5772/intechopen.100289*

## **Author details**

Izyalith E. Álvarez-Cisneros<sup>1</sup> , Blanca E. Carvajal-Gámez<sup>2</sup> \*, David Araujo-Díaz<sup>1</sup> , Miguel A. Castillo-Martínez<sup>3</sup> and L. Méndez-Segundo<sup>1</sup>

1 SEPI-ESCOM, Instituto Politécnico Nacional, Unidad Profesional Adolfo López Mateos, Ciudad de México, México

2 SEPI-UPIITA, Instituto Politécnico Nacional, La Laguna Ticoman, Gustavo A. Madero, Ciudad de México, México

3 SEPI-ESIME Culhuacan, Instituto Politécnico Nacional, San Francisco Culhuacan, Culhuacan CTM V, Coyoacán, Ciudad de México, México

\*Address all correspondence to: drabecarvajalg@gmail.com

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## **References**

[1] Economic Commission for Latin America and the Caribbean. Local Economic Development and Decentralization in Latin America: Comparative Analysis. Economic Commission for Latin America and the Caribbean. 2001. Available from: https://www.cepal.org/es/publicaciones/ 2691-desarrollo-economico-localdescentralizacion-america-latina-analisiscomparativo [Accessed: 27 July 2021]

[2] Mexican Geological Service. Evolution of Tectonics in Mexico. Mexican Geological Service. 2017. Available from: https://www.sgm.gob. mx/Web/MuseoVirtual/Riesgos-geologic os/Evolucion-tectonica-Mexico.html [Accessed: 27 July 2021]

[3] Alden A. The World's Major Earthquake Zones. 2020. Available from: https://www.thoughtco.com/ seismic-hazard-maps-of-the-world-1441205 [Accessed: 06 August 2021]

[4] Geo-México. Geo-México the Geography and Dinamics of Modern México. 2015. Available from: https:// geo-mexico.com/?p=12955 [Accessed: 27 July 2021]

[5] Secretary of Communications and Transportation. Federal Roads and Bridges Package for Income and Related Services. Secretary of Communications and Transportation. 2018. Available from: https://www.sct.gob.mx/ fileadmin/Transparencia/rendiciondecuentas/MD/34\_MD.pdf [Accessed: 27 July 2021]

[6] Porras Díaz H, Castañeda Pinzón E, Sanabria Echeverry D, Medina Pérez G. Detección automática de grietas de pavimento asfáltico aplicando características geométricas y descriptores de forma. Dialnet. 2012;**8**:261-280

[7] Radopoulou S, Brilakis I. Patch detection for pavement assessment. Automation in Construction. 2015;**53**: 95-104. DOI: 10.1016/j.autcon. 2015.03.010

[8] Tedeschi A, Benedetto F. A real-time automatic pavement crack and pothole recognition system for mobile Androidbased devices. Advanced Engineering Informatics. 2017;**32**:11-25. DOI: 10.1016/j.aei.2016.12.004

[9] Maeda H, Sekimoto Y, Seto T, Kashiyama T, Omata H. Road damage detection using deep neural networks with images captured through a smartphone. Computer-Aided Civil and Infrastructure Engineering. 2018;**2018**: 1-14. DOI: 10.1111/mice.12387

[10] Zhang B, Liu X. Intelligent pavement damage monitoring research in China. IEEE Access. 2019;**7**: 45891-45897. DOI: 10.1109/ ACCESS.2019.2905845

[11] Bhairannawar S. Efficient medical image enhancement technique using transform HSV space and adaptive histogram equalization. In: Soft Computing Based Medical Image Analysis. EEUU: Science Direct; 2018. pp. 51-60. DOI: 10.1016/B978-0- 12-813087-2.00003-8

[12] Prabha D, Kumar J. Assessment of banana fruit maturity by image processing technique. Journal of Food Science and Technology. 2013;**2013**:1-13. DOI: 10.1007/s13197-013-1188-3

[13] Ramos-Arredondo RI, Carvajal-Gámez BE, Gendron D, Gallegos-Funes FJ, Mújica-Vargas D, Rosas-Fernández JB. PhotoId-Whale: Blue whale dorsal fin classification for mobile devices. PLoS One. 2020;**15**(10):

Smart-Road*: Road Damage Estimation Using a Mobile Device DOI: http://dx.doi.org/10.5772/intechopen.100289*

e0237570. DOI: 10.1371/journal. pone.0237570

[14] Pymasearch. LeNet – Convolutional Neural Network in Python. 2016. Available from: https://www.pyimagesea rch.com/2016/08/01/lenetconvolutional-neural-network-inpython/ [Accessed: 27 July 2021]

[15] Secretary of Communications and Transport. Catalog of Deterioration in Flexible Pavements of Mexican Highways. Secretary of Communications and Transport. 1991. Available from: https://imt.mx/archivos/Publicaciones/ PublicacionTecnica/pt21.pdf [Accessed: 27 July 2021]

Section 3
