**6. Managing production variability**

Sometimes, during production lines maintenance or innovations, the replacement of a machine, the change of a supplier, or the change in a manufacturing process, could lead to a significant variation on the usual production procedure in terms of the visual quality of the products. Such situations could vanish the capacity of a machine computation to return the expected results.

In this context, continuing on the problem of detecting welding defects on injectors heads, the work presented by Tripicchio et al. [48] proposes possible solutions to this issue without requiring an architectural change in the learning architecture. The new case had to handle some modifications concerning the parameters associated with the welding process, producing input samples with specific artifacts that the previously designed and trained network did never encounter. In particular, such new inputs were correlated to a variation in the substance used for the soldering that generated gold-violet spots on the injector head in random positions. Such noise introduces a novel complexity in the detection of the defects because the spots can hide or visually resemble the presence of bumps and holes in the welding layer. The followed approach was to make fewer changes as possible in the architecture of the network, operating a smart preprocessing and applying filtering techniques.

The results show the ability to train a network with almost 7 million parameters on just 306 training images belonging to the new alteration, achieving a *recall* of 100.00% and an *accuracy* of 97.22%.

Such a result has been achieved leveraging on two important aspects. The first is the design of a custom preprocessing and filtering stage, while the second is the adoption of a novel data balancing strategy.

A preprocessing stage is needed on the input images with the aim of erasing or smoothing the chromatic nuances that could confuse the feature learning process. In particular, three filtering approaches have been proposed and tested (**Figure 6**). The first filter (*constant filter*) detects regions on the image in the gold and violet ranges of the HSV space filling such regions with a constant RGB color resembling the chromatic value of the injector contour. The second kind of filter (*median filter*), once selected the regions, fills them with the median RGB value of each channel. In the third filtering approach (*patch filter*), a 4 � 4 patch is virtually generated to resemble a part not affected by defects, and it is used to fill the detected gold and violet regions. In particular, every pixel of the regions is substituted with the value of the corresponding pixel of the synthetic image, adding the median value of the original image.

Different analyses have been done to assess the performance improvement given by such filters. As a result, a *patch filter* was selected as the method for the subsequent tuning of the network.

Concerning data imbalance, an exploration of different unbalanced splits has been performed. To prevent overfitting and lead the learning process toward generalization, the authors propose to compute the performance metrics at each evaluation step considering the input imbalance. In particular, metrics like *specificity* or *recall* are not affected by the imbalance of the data different from other metrics, like *accuracy*, which should be revised.

#### **Figure 6.** *Different filters applied on a sector of the same injector contour image. (a) No filter. (b) Median fill filter. (c) Patch filter.*

Defective injectors were chosen as positive samples and *false negatives* and *true positives* values were weighted depending on the imbalance since the defective class was the smaller one. The imbalance is compensated by multiplying these values by the proportion of the input dataset. Consequently, the confusion matrix presents a more balanced indication of the performance of the training.

Cross-validation has been applied to improve generalization concerning the stochastic gradient descent optimization. The network has been trained multiple times by combining different variations of the proportions between defective and good samples and changing the numbers of epochs. During the training phase, each epoch is compared with all previous epochs for obtaining the one with the highest performance in terms of *recall* and *accuracy*.

The *F-score* was chosen as the best multi-performance metric to evaluate the testing achieved on the different variations of the training. Concerning the imbalance, the obtained performances give that an *unbalanced dataset* could provide better results if the imbalance is considered while training.
