*3.1.5 Wavelets*

*Innovation in the Food Sector Through the Valorization of Food and Agro-Food By-Products*

Benefitting from recent advancements in both algorithms and processing technology, neural networks and their derivatives have experienced rapid development over the past decade. Another method for dimensionality reduction and feature extraction is based on a particular type of neural network called an autoencoder. An autoencoder network contains input (e,g., spectral measurement) and output layers of the same size but includes hidden layers in between with gradually decreasing numbers of nodes (see **Figure 9**). During training, the network weights are updated until the output is the same as the input within an acceptable tolerance. The layers from the input to the bottleneck center thus effectively encode a compressed version of the input signal. This set of layers is referred to as the "encoder" section. The compressed signal can then be uncompressed in the later layers (called the "decoder") to form a copy of the signal in the output layer. This process has the added benefit of removing noise in the input during encoding such that the decoded copy is more representative of the true response. In 2021, Vasafi et al. [41] made an initial application of an autoencoder in the field of food production process control by using it to detect anomalies such as changes in fat, temperature, added water, and cleaning solution during milk processing. Anomalies were found to result in significantly higher reconstruction error at the autoencoder output layer as com-

*Autoencoder neural network showing bottleneck which separates the encoder and decoder portions.*

PLSR is a well-known and often used means of conducting regression in the presence of noise. Regression provides a function that predicts a response from a data input (as opposed to classification which assigns the input to a class). While both PCA and PLSR are derived from experimental data, PCA is more qualitative by nature, often used in an exploratory manner, and is an unsupervised learning method. PLSR on the other hand is more quantitative and is a multi-dimensional evaluation that is linear. Both methods rely on computing a maximum covariance,

PLSR works best when the observed variables are highly correlated and noisy [42], which is a benefit in hyperspectral analysis where data at nearby wavelengths can be highly correlated. Also, PLSR assumes that the data set is linear and that that projection holds in a new subspace. However, if linearity does not hold up for the

**102**

*3.1.3 Autoencoders*

**Figure 9.**

pared with the control (i.e., "normal") data.

*3.1.4 Partial least squares regression (PLSR)*

PCA in the original data and PLSR in the data and response.

Wavelets are mathematical functions that, like Fourier analysis, transform data into its constituent spectral components. However, unlike the standard Fourier transform, wavelet transforms can provide frequency information for specific locations in the temporal or spatial domain. Wavelets of different shapes (called mother wavelets) focus on different portions of the frequency spectrum and are typically used in combination to analyze the full spectral bandwidth of concern. Each mother wavelet can also be rescaled to form daughter wavelets to change the resolution in the temporal or spatial domain and thus examine higher or lower portions of the frequency spectrum in more detail. Some wavelet examples are shown in **Figure 10**.

Wavelet analysis has been used in spectroscopic applications as a means of extracting useful features from specific regions of the spectra. For example, Qi et al. [47] applied a single wavelet form at seven different scales to extract features from shortwave infrared (SWIR) hyperspectral reflectance images from peanuts. Using these features, they were able to distinguish moldy portions of peanuts from healthy portions. Wavelet analysis can also be applied in the image domain to extract 2D features. Ji et al. [48] applied wavelet transforms to hyperspectral visible-NIR images of potatoes (after first applying PCA) to decompose the original images into sub-band images at different scales to extract textural features that would enable the identification of bruising in the potatoes.

#### *3.1.6 Splines*

Splines are piecewise linear or polynomial functions that are combined to approximate a given set of data. Splines are often used as smoothing functions to approximate data curves while eliminating the "roughness" caused by noise. Like the other techniques discussed above, splines benefit the feature extraction process by focusing on the true signal within the data.

One application of splines common to chemometric analysis is the regression analysis method of multivariate adaptive regression splines (MARS) [49]. MARS

**Figure 10.** *Wavelet examples from different wavelet families.*

models nonlinearities in data by fitting splines to specific regions of the input variable range. The regions are separated by "hinge" functions that have a value of zero for all locations except within the region of applicability. The transition points that link consecutive splines are called "knots." In a forward process knots or splines are added to yield a close fit to the data. In a backward process the least contributing terms are pruned to minimize overfitting. Garre et al. [50] compared a model developed using MARS to similar regression models developed to predict the amount of waste in food production and quantify model uncertainties. The MARS model achieved a precision comparable to that of more sophisticated machine learning models such as random forest methods developed to deal with the high variability in decision trees while maintaining low bias [51].

Closely related to spline regression is Savitzky–Golay filtering in which the data points are convolved with a set of filter weights, much like a weighted moving average. However, as the filter moves to each successive data point, a polynomial of degree *p* is fit to the data within the filter window, and the point in the center of the window is replaced by the polynomial value at that point [52]. One key benefit of Savitzky–Golay filtering is that it tends to preserve high frequency signal components while rejecting high frequency noise (often found in CCD or InGaAs arrays or photon-starved detection systems) whereas standard finite impulse response (FIR) filters tend to remove these signal components [53].

Savitzky–Golay filtering is a popular pre-processing technique that has been used extensively for food spectral analysis. Examples since 2020 include the use of Savitzky–Golay filtering in pre-processing NIR spectra to improve classification performance in the identification of allergens in powdered food materials [54], filter noise from FTIR spectra of instant freeze-dried coffee and MIR spectra of fruit puree samples [53], and NIR reflectance spectra of Indonesia rice flour-based food to enable accurate classification and level estimation of added sweeteners [55].

### **3.2 Classification**

Automatic detection of food spoilage requires an algorithm that can successfully classify a food product (or part of a food product) as spoiled or healthy, either by detecting the presence of contaminants or by classifying physical changes to the product. A variety of sophisticated machine learning algorithms have been developed over the past few decades to provide accurate classification, and many of these have been used in spectroscopic and food quality applications. Here, we discuss two of the most popular classification algorithms, the support vector machine (SVM) and the artificial neural network, both of which take as input a set of features that are typically generated using the methods described in the previous section. We also discuss deep learning methods, which have advanced rapidly since 2012 and are being used in a wide variety of applications including food analysis. Unlike more conventional machine learning methods, deep learning methods include their own automatic feature extraction process [56].

#### *3.2.1 Support vector machines (SVM)*

An SVM is a supervised learning algorithm that seeks to find the separating hyperplane between data points of different classes that minimizes classification error. The position of the hyperplane is determined by the set of points (called "support vectors") that are closest to it. The basic concept of the SVM is intuitive when the hyperplane is linear and the classification is binary (see **Figure 11**).

**105**

*Advanced Optical Technologies in Food Quality and Waste Management*

However, SVMs can also be applied to data whose classes are not linearly separable by transforming the data from the original space into one in which they are linearly separable. This is often accomplished using the so-called "kernel trick" in which a kernel function compares vectors in the new space without performing the actual transformation, thus minimizing computation cost. Common kernels include

*Separating hyperplane determination for a Support Vector Machine. The hyperplane is positioned to maximize* 

A reliable and robust machine learning classifier, the SVM has been used in many hyperspectral imaging food analysis applications. A few examples since 2020 include the detection of spoilage in visible-NIR imagery of baked goods [38], detection of bacterial foodborne pathogens in visible-NIR imagery [57], and detection of fish fillet substitution and mislabeling through accurate classification of fillet species from imagery collected from visible-NIR, fluorescence with UV excitation,

Artificial neural networks are another popular supervised classification method

that has been surging in popularity with the advances in processing technology over the past few decades. Conventional artificial neural networks are based on the multilayer perceptron (MLP) architecture (see **Figure 12**) which was designed to resemble neurons in the brain. Such neurons accept some number of input values and remain dormant until the sum of inputs rises above a certain threshold value, at which point the neurons "fire." This nonlinear thresholding effect is enabled in artificial neural network nodes by nonlinear activation functions that determine each node's output value. Common activation functions include the sigmoid, hyperbolic

Artificial neural networks are trained by initializing the network weights (usually with random values) and comparing the predicted results at the output layer to known target values. An error metric is calculated based on the difference between the prediction and target values, and the network weights are updated by

*DOI: http://dx.doi.org/10.5772/intechopen.97624*

linear, radial (i.e., Gaussian), and polynomial.

SWIR, and Raman spectral bands [58].

tangent, and rectified linear unit functions.

*3.2.2 Artificial neural networks*

*the margin between the support vectors.*

**Figure 11.**

*Advanced Optical Technologies in Food Quality and Waste Management DOI: http://dx.doi.org/10.5772/intechopen.97624*

#### **Figure 11.**

*Innovation in the Food Sector Through the Valorization of Food and Agro-Food By-Products*

decision trees while maintaining low bias [51].

filters tend to remove these signal components [53].

sweeteners [55].

**3.2 Classification**

models nonlinearities in data by fitting splines to specific regions of the input variable range. The regions are separated by "hinge" functions that have a value of zero for all locations except within the region of applicability. The transition points that link consecutive splines are called "knots." In a forward process knots or splines are added to yield a close fit to the data. In a backward process the least contributing terms are pruned to minimize overfitting. Garre et al. [50] compared a model developed using MARS to similar regression models developed to predict the amount of waste in food production and quantify model uncertainties. The MARS model achieved a precision comparable to that of more sophisticated machine learning models such as random forest methods developed to deal with the high variability in

Closely related to spline regression is Savitzky–Golay filtering in which the data points are convolved with a set of filter weights, much like a weighted moving average. However, as the filter moves to each successive data point, a polynomial of degree *p* is fit to the data within the filter window, and the point in the center of the window is replaced by the polynomial value at that point [52]. One key benefit of Savitzky–Golay filtering is that it tends to preserve high frequency signal components while rejecting high frequency noise (often found in CCD or InGaAs arrays or photon-starved detection systems) whereas standard finite impulse response (FIR)

Savitzky–Golay filtering is a popular pre-processing technique that has been used extensively for food spectral analysis. Examples since 2020 include the use of Savitzky–Golay filtering in pre-processing NIR spectra to improve classification performance in the identification of allergens in powdered food materials [54], filter noise from FTIR spectra of instant freeze-dried coffee and MIR spectra of fruit puree samples [53], and NIR reflectance spectra of Indonesia rice flour-based food to enable accurate classification and level estimation of added

Automatic detection of food spoilage requires an algorithm that can successfully classify a food product (or part of a food product) as spoiled or healthy, either by detecting the presence of contaminants or by classifying physical changes to the product. A variety of sophisticated machine learning algorithms have been developed over the past few decades to provide accurate classification, and many of these have been used in spectroscopic and food quality applications. Here, we discuss two of the most popular classification algorithms, the support vector machine (SVM) and the artificial neural network, both of which take as input a set of features that are typically generated using the methods described in the previous section. We also discuss deep learning methods, which have advanced rapidly since 2012 and are being used in a wide variety of applications including food analysis. Unlike more conventional machine learning methods,

deep learning methods include their own automatic feature extraction

An SVM is a supervised learning algorithm that seeks to find the separating hyperplane between data points of different classes that minimizes classification error. The position of the hyperplane is determined by the set of points (called "support vectors") that are closest to it. The basic concept of the SVM is intuitive when the hyperplane is linear and the classification is binary (see **Figure 11**).

**104**

process [56].

*3.2.1 Support vector machines (SVM)*

*Separating hyperplane determination for a Support Vector Machine. The hyperplane is positioned to maximize the margin between the support vectors.*

However, SVMs can also be applied to data whose classes are not linearly separable by transforming the data from the original space into one in which they are linearly separable. This is often accomplished using the so-called "kernel trick" in which a kernel function compares vectors in the new space without performing the actual transformation, thus minimizing computation cost. Common kernels include linear, radial (i.e., Gaussian), and polynomial.

A reliable and robust machine learning classifier, the SVM has been used in many hyperspectral imaging food analysis applications. A few examples since 2020 include the detection of spoilage in visible-NIR imagery of baked goods [38], detection of bacterial foodborne pathogens in visible-NIR imagery [57], and detection of fish fillet substitution and mislabeling through accurate classification of fillet species from imagery collected from visible-NIR, fluorescence with UV excitation, SWIR, and Raman spectral bands [58].

#### *3.2.2 Artificial neural networks*

Artificial neural networks are another popular supervised classification method that has been surging in popularity with the advances in processing technology over the past few decades. Conventional artificial neural networks are based on the multilayer perceptron (MLP) architecture (see **Figure 12**) which was designed to resemble neurons in the brain. Such neurons accept some number of input values and remain dormant until the sum of inputs rises above a certain threshold value, at which point the neurons "fire." This nonlinear thresholding effect is enabled in artificial neural network nodes by nonlinear activation functions that determine each node's output value. Common activation functions include the sigmoid, hyperbolic tangent, and rectified linear unit functions.

Artificial neural networks are trained by initializing the network weights (usually with random values) and comparing the predicted results at the output layer to known target values. An error metric is calculated based on the difference between the prediction and target values, and the network weights are updated by

#### **Figure 12.**

*A simple illustration of a multi-layer perceptron neural network architecture. Each circle represents a neural network node and each arrow represents the weight that connects a node in one layer to a node in the subsequent layer.*

calculating partial derivatives of the error with respect to each weight, starting with the output layers and moving backward toward the input layer in a process known as backpropagation. This entire process is repeated until the error is brought below an acceptable tolerance or other stopping criteria are met.

Like SVMs, artificial neural networks have been widely used for spectral classification applications due to their ability to achieve high accuracy. In 2020, Balabanov et al. [59] developed a vison-based system with an artificial neural network to detect defects in apples passing on a conveyer belt by analyzing HSI in the visible and NIR spectral ranges. Although once popular, these MLP-based neural networks are rapidly being replaced by deep learning neural networks which not only offer superior performance, but also ease the data processing pipeline by eliminating the need for manual feature selection and extraction.

Prior to the advent of modern sophisticated processing technology, neural networks were limited in size due to their computational loads which grew with the number of layers and the number of nodes within each layer. As this processing technology advanced, more and more layers could be added to neural networks to improve their performance (although the theoretical reason for why this is the case is still poorly understood). Furthermore, as shown in **Figure 13**, layers could be added to perform different operations on the data, such as convolution and averaging (often called "pooling" in this context). The network then performs feature extraction by learning the weights in the convolutional layers which yield accurate classifications. In essence, the network learns which filters should be applied to the data to best extract the signal within. Pooling layers following the convolutional layers then apply averaging to help prevent overfitting. Following the successive convolutional and pooling (and possibly other) layers, the results are concatenated into a single-dimensional vector and fed into an MLP neural network to combine these features for classification.

In 2021 alone, deep learning neural networks have been used to classify beef freshness from visible-NIR reflectance spectra [60], to analyze NIR HSI to detect the presence of contamination during food packing [61], and to conduct a series of different food quality analyses from NIR spectra [62].

**107**

*Advanced Optical Technologies in Food Quality and Waste Management*

*DOI: http://dx.doi.org/10.5772/intechopen.97624*

**4. Food traceability and dynamic pricing**

of the most difficult problems.

**Figure 13.**

value for each node in the supply chain.

**4.1 Inadequacies of existing traceability technologies**

Traceability and real time analysis of food products that can help minimize waste will require new tools for quality assurance, authentication and digital supply chain management that can track products from harvest to market. Such tools must be objective, verifiable, provide data on quality, provenance, and freshness, and easy to incorporate at multiple nodes in the supply chain. Current technologies are inadequate to address most of these challenges and address only some components

*Basic CNN architecture. Data at the input layer is passed through a convolutional layer which generates feature sets. These are then reduced in size through averaging in the pooling layer and the resulting features are concatenated. The final layers form an MLP-based neural network to yield the final classifications.*

State of the art seafood traceability platforms provide tools for establishing chain of custody but lack dynamic pricing features or the verifiable and trusted freshness and authenticity data. These current platforms rely on estimates of shelf life based on catch date and storage conditions. These inputs are insufficiently verifiable and quantifiable for digital tools based on them to be broadly trusted and accepted for dynamic pricing,

As of 2021, dynamic pricing software solutions also lack higher quality verifiable

and they do not address authentication and quality metrics or capabilities at all.

and trusted freshness and authenticity data and are primarily designed for final retail discounting, often integrated only into broader retailer systems. This makes them less effective for application to upstream supply chain node tasks and adding

Products for quantitative measurement of fish freshness rely primarily on destructive laboratory-based methods that are not capable of accurate spot checks of individual fish or fish portions or cannot be realistically and easily repeated at low cost at multiple points along the supply chain. Tools are available that measure tissue conductivity through fish skin, primarily to assess moisture, but they are not

designed to address broader nutrient content, species, and traceability.

**4.2 SafetySpect's quality, adulteration, and traceability (QAT) technology**

One approach to addressing the problem of traceability and rapid detection of spoilage is SafetySpect's newly developed handheld QAT scanner that optically *Advanced Optical Technologies in Food Quality and Waste Management DOI: http://dx.doi.org/10.5772/intechopen.97624*

#### **Figure 13.**

*Innovation in the Food Sector Through the Valorization of Food and Agro-Food By-Products*

calculating partial derivatives of the error with respect to each weight, starting with the output layers and moving backward toward the input layer in a process known as backpropagation. This entire process is repeated until the error is brought below

*A simple illustration of a multi-layer perceptron neural network architecture. Each circle represents a neural network node and each arrow represents the weight that connects a node in one layer to a node in the* 

Like SVMs, artificial neural networks have been widely used for spectral classification applications due to their ability to achieve high accuracy. In 2020, Balabanov et al. [59] developed a vison-based system with an artificial neural network to detect defects in apples passing on a conveyer belt by analyzing HSI in the visible and NIR spectral ranges. Although once popular, these MLP-based neural networks are rapidly being replaced by deep learning neural networks which not only offer superior performance, but also ease the data processing pipeline by eliminating the

Prior to the advent of modern sophisticated processing technology, neural networks were limited in size due to their computational loads which grew with the number of layers and the number of nodes within each layer. As this processing technology advanced, more and more layers could be added to neural networks to improve their performance (although the theoretical reason for why this is the case is still poorly understood). Furthermore, as shown in **Figure 13**, layers could be added to perform different operations on the data, such as convolution and averaging (often called "pooling" in this context). The network then performs feature extraction by learning the weights in the convolutional layers which yield accurate classifications. In essence, the network learns which filters should be applied to the data to best extract the signal within. Pooling layers following the convolutional layers then apply averaging to help prevent overfitting. Following the successive convolutional and pooling (and possibly other) layers, the results are concatenated into a single-dimensional vector and fed into an MLP neural network to combine

In 2021 alone, deep learning neural networks have been used to classify beef freshness from visible-NIR reflectance spectra [60], to analyze NIR HSI to detect the presence of contamination during food packing [61], and to conduct a series of

an acceptable tolerance or other stopping criteria are met.

need for manual feature selection and extraction.

different food quality analyses from NIR spectra [62].

these features for classification.

**106**

**Figure 12.**

*subsequent layer.*

*Basic CNN architecture. Data at the input layer is passed through a convolutional layer which generates feature sets. These are then reduced in size through averaging in the pooling layer and the resulting features are concatenated. The final layers form an MLP-based neural network to yield the final classifications.*
