**5. Transfer learning for defect detection**

It has been seen that the first layers of CNNs learn kernels acting as color blob detectors or Gabor filters. Such a property seems to be very general and the features learned do not appear to be strictly dependent on the particular training set that has been adopted. As humans can learn from experience and transfer the notion learned in diverse application domains, similarly, a DL architecture can transfer the features learned on a particular dataset to another CNN, which will be trained on a different one [51]. Such a technique is called *transfer learning* and is a worthy tool for solving the problem of scarcity of defective samples. This paradigm involves pretraining the network on a dataset (usually larger) for learning feature extraction layers and afterward fine-tuning the classification pipeline with the relevant dataset for the specific task. Knowledge transfer breaks the fundamental assumption that the data presented to the network during the training phase must be in the same feature space as the ones presented in the inference phase. Feature extraction layers obtained applying the *transfer learning* paradigm would be able to extract generic convolution features that could be exploited in different tasks.

Following the *transfer learning* approach, the work by Sassi et al. [36] yielded a 97% *accuracy* during testing in the laboratory and proved successful during operation in a real production line, reaching an *accuracy* of 99% after subsequent training.

The work combines a traditional computer vision pipeline together with a DL architecture. This pipeline was necessary to maintain the compatibility with classical production lines and provide a correct input to the welding defect detection phase. The algorithm receives the raw image as input, converts it from Bayer format to grayscale, and improves the edge detection by equalizing the levels and applying a Gaussian blur. In a successive step, since different kinds of injectors can be analyzed by the same system, the type of injector is identified, and the position of its center is obtained. The algorithm proceeds to detect the outer shell of the injector head by estimating an external radius that approximates the detected blob. Then, using the extracted information, the algorithm performs an area search for welding points and estimates a welding circle on the joint. Subsequently, the algorithm collects statistics about the number of welding points found and their positions. In traditional industrial systems, a set of thresholds decided by the manufacturing company is used to evaluate the welding quality from the measured quantities.

A schematic overview of the algorithm is shown in **Figure 4**. The algorithm's output gives quantitative information about the welding and produces a processed image to be given as input to the second analysis stage. The extracted information allows evaluating the continuity of the welding in a certain area on the injector's head, verifying the centering of the inner part of the injector with respect to the outer one,

**Figure 4.** *Schematics of the components of the geometrical analysis pipeline.*

*Welding Defect Detection with Deep Learning Architectures DOI: http://dx.doi.org/10.5772/intechopen.101951*

#### **Figure 5.**

*Schematic representation of the layers and blocks in the DenseNet-121 deep learning architecture.*

and eventually the welding thickness. This information is also beneficial to clean the image from unnecessary data for the subsequent analysis and to center the injector images to obtain more controlled conditions on the input of the successive stage.

The DL architecture chosen in that work is the DenseNet-121. **Figure 5** depicts the structure of the network. DenseNet efficiently simplifies the connectivity pattern between layers guaranteeing maximum information flow by reusing the features through the network. Concerning the training phase, every layer has direct access to the gradients from the original input image and the *loss function*. Different from the first feedforward neural networks that connect the output of each layer to the subsequent layer after applying a composite of operations, DenseNet concatenates the output feature maps of the layers to obtain the equation *xl* ¼ *Hl x*0, *x*1, … , *xl*�<sup>1</sup> ð Þ ½ � . The network is formed by dense blocks, which have a constant size of the feature maps within a block but a varying number of filters, and transition layers that connect the blocks combining batch normalization, 1*x*1 convolution, and 2*x*2 pooling.

In the approach presented by Sassi et al. [36], the *transfer learning* technique has been employed and evaluated by comparing the results achieved when the features are transferred from a pretrained model on the *Material in Context* (MINC) [52] dataset. Such dataset contains 2,996,674 patches obtained from 436,749 images labeled according to 23 material classes. A binary classification problem has been set up by selecting positive samples as scrap injectors and negative samples as good injectors. The results of the classification problem are shown as a confusion matrix having four possible values—*true negative* (*tn*), *true positive* (*tp*), *false negative* (*fn*), and *false positive* (*fp*). The metrics that better estimate the quality of the defect analysis are the *recall tp tp*þ*fn* , which describes the ability to detect faulty pieces, and the *accuracy tp*þ*tn tp*þ*tn*þ*fp*þ*fn* that describes the overall quality of the analysis. The *precision tp tp*þ*fp* is important to not discard too many injectors, but it is not crucial as the *recall* since not detecting a defect could be dangerous if it proceeds through the assembly line.

Unfortunately, the MINC dataset is highly unbalanced. Therefore, three classes, that is, *plastic*, *metal*, and *others*, have been selected as a subset for the pretraining stage to alleviate training problems. The idea of using such a dataset for *transfer learning* was to exploit the metallic features that could resemble the ones in the welding images, and the dataset class reduction does not affect the learned features on metallic materials.
