*2.2.2 Applications of handheld MIR spectrometers for food analysis*

MIR spectrometers have long been used for food analysis, but most have been conducted in a laboratory setting. Examples include detection of food spoilage bacteria in meat and dairy produce, brand authentication of a range of Trappist beers, and adulteration of milk and of beef burgers [10]. More recently, portable MIR devices have been used for the simultaneous analysis of sugar and amino acid concentrations in raw potato tubers, the measurement of quality factors in tomato juices, and the measurement of fatty acid content of marine oil dietary supplements [30].

#### **2.3 Raman spectroscopy**

Raman spectroscopy is often seen as complimentary to infrared spectroscopy given the relative nature of the phenomena involved. While infrared spectroscopy measures the absorption of energy, Raman spectroscopy measures the exchange of energy with radiation provided by a monochromatic light source (usually a laser with a wavelength in the ultraviolet to NIR range). This exchange causes a shift in the source's wavelength. Molecules are infrared active only if the vibration induced by the source results in a change to the dipole moment, whereas the Raman shift is caused by changes in the molecules' polarization [10]. Thus, these two methods provide mutually exclusive information. Raman peaks tend to be much sharper than infrared peaks and data collection tends to be faster, but the Raman effect is inherently weaker. Furthermore, Raman spectrometers tend to be more expensive to manufacture than their infrared counterparts.

#### *2.3.1 Raman spectrometers*

**Figure 8** shows an example design for a Raman spectrometer. Light from the laser is directed to the sample and the output is passed through a notch filter to separate out all but the Raman scattered light. A spectrograph grating then disperses this light into its constituent wavelengths and onto a detector. Metrohm's Mira M-1 is an example of a portable Raman spectrograph with a 785 nm laser [32]. Laser wavelengths for other Raman spectrometers can range from the ultraviolet (UV) to the NIR bands. Since spectral sensitivity and resolution increase with decreasing laser wavelength, UV lasers tend to be optimal for applications featuring bio-molecules [33].

**Figure 8.** *Basic diagram of a Raman spectrometer [31].*

#### *2.3.2 Applications of handheld Raman spectrometers for food analysis*

Recent applications of portable and handheld Raman spectrometers for food analysis include the detection of organophosphate and organothiophosphate pesticides on apple skins, the detection of fungicides and parasiticides on citrus fruits and bananas, authenticity and origin of vegetable and essential oils, detection of marker compounds for illegal alcoholic beverages, detection of adulteration in beef burgers, identification of rapid meat spoilage, and prediction of pork quality on a slaughterhouse line [10].

## **3. Artificial intelligence and machine learning techniques for spectral analysis**

Once the spectroscopic data has been collected, sophisticated algorithms, and capable processors to host these algorithms, are needed to convert this data into useful information. The microchip revolution that started back in the 1960's has continued unabated [34] and silicon vendors continue to innovate with even mature technologies such as field programmable gate arrays (FPGAs) [35] with transistor counts that exceed one billion in number. This growth of computing power coupled with advances in spectroscopy have enabled modern machine learning algorithms to be implemented that can lead to significant positive changes to food safety, adulteration and fraud.

In this section, we discuss key machine learning algorithms that have been applied to spectroscopy in general and to food valorization applications specifically. We first examine methods used to extract from the data features that are non-redundant and information-rich and can be used for accurate classification and quantification of food spoilage and food quality.

#### **3.1 Feature extraction**

Most spectral datasets contain subsets of features that are highly redundant or subject to high amounts of noise. The inclusion of such features in a classification or regression algorithm generally leads to suboptimal performance. Feature extraction is the process by which redundant or noisy features are removed from the dataset, leaving a smaller set of features with a high amount of signal content. Here, we discuss popular methods for feature extraction that have been used for spectroscopic applications.

**101**

*Advanced Optical Technologies in Food Quality and Waste Management*

PCA is a common method of feature extraction enabled through dimensionality reduction. PCA provides dimensionality reduction by representing the variance in the data within the smallest number of components possible. Each principal component is a linear combination of the original components and is calculated in an iterative fashion by identifying the weight vector that, when applied to the original data components, contains that largest amount of the remaining variance and is orthogonal to the previously calculated principal components. As a result, the majority of the variance (typically ~99% or more) is contained within the first few (typically 3–5) principal components, meaning the others can be safely ignored with negligible loss of information. These principal components are also eigenvectors of the data's covariance matrix and can be computed by eigendecomposition. The corresponding eigenvalues are proportional to the variance represented within each principal component and can be used to identify the principal components which are considered "significant." According to the Kaiser criterion [36], eigenvectors with eigenvalues less than 1 can be

In additional to its dimensionality reduction benefit, PCA tends to yield principal components that provide good separability between data collected from different classes. It is this property that makes PCA such an effective tool for feature extraction. Principal components also provide qualitative clues to key underlying molecular constituent differences and relative abundances, since their spectral characteristics, often are the key features in the second, third and higher principal

PCA is widely used in food chemistry studies [37] and specifically for analysis of food spoilage. For example, in 2020 Saleem et al. [38] presented a new method for predicting microbial spoilage and detecting its location in bakery goods using HSI. HSI cameras monitored baked goods over a period of time as they were allowed to spoil. PCA was applied to difference images created by subtracting images collected at the beginning of the monitoring period, when the goods were fresh, and images collected at later times. The researchers then used PCA to separate pixels represent-

Similar in concept to PCA, sparse representation methods are mathematical processes applied to data with the goal of transforming the data to a new representation containing as few non-zero elements as possible. This is achieved by conducting a trade-off between goodness-of-fit and sparsity. The transformation attempts to produce an accurate reproduction of the original data but is regulated with a cost penalty the increases with the number of non-zero components. Example sparse representation algorithms include basis pursuit, sparse dictionary learning,

With respect to HSI, sparsity can also be enforced through wavelength selection processes that identify a small number of information-rich wavelengths and discard all other wavelengths. Lei and Sun [40] developed a sparse coefficients wavelength selection and regression (SCWR) method for NIR spectral calibration to select the wavelengths that contributed most to the determination of the spectral response. They applied this method to a dataset if NIR spectra from potatoes with dehydration loss as the response variable. A model based on 23 selected wavelengths (from an original set of 200) predicted hydration loss with an accuracy that exceeded

L1-regularization, and non-negative matrix factorization (NMF) [39].

those yielded by common competing methods.

ing spoiled portions of the good from unspooled portions.

*DOI: http://dx.doi.org/10.5772/intechopen.97624*

*3.1.1 Principal component analysis (PCA)*

considered insignificant.

*3.1.2 Sparse representation*

components.

### *3.1.1 Principal component analysis (PCA)*

*Innovation in the Food Sector Through the Valorization of Food and Agro-Food By-Products*

*2.3.2 Applications of handheld Raman spectrometers for food analysis*

Recent applications of portable and handheld Raman spectrometers for food analysis include the detection of organophosphate and organothiophosphate pesticides on apple skins, the detection of fungicides and parasiticides on citrus fruits and bananas, authenticity and origin of vegetable and essential oils, detection of marker compounds for illegal alcoholic beverages, detection of adulteration in beef burgers, identification of rapid meat spoilage, and prediction of pork quality on a slaughterhouse line [10].

**3. Artificial intelligence and machine learning techniques for spectral** 

Once the spectroscopic data has been collected, sophisticated algorithms, and capable processors to host these algorithms, are needed to convert this data into useful information. The microchip revolution that started back in the 1960's has continued unabated [34] and silicon vendors continue to innovate with even mature technologies such as field programmable gate arrays (FPGAs) [35] with transistor counts that exceed one billion in number. This growth of computing power coupled with advances in spectroscopy have enabled modern machine learning algorithms to be implemented that can lead to significant positive changes to food safety,

In this section, we discuss key machine learning algorithms that have been applied to spectroscopy in general and to food valorization applications specifically. We first examine methods used to extract from the data features that are non-redundant and information-rich and can be used for accurate classification and

Most spectral datasets contain subsets of features that are highly redundant or subject to high amounts of noise. The inclusion of such features in a classification or regression algorithm generally leads to suboptimal performance. Feature extraction is the process by which redundant or noisy features are removed from the dataset, leaving a smaller set of features with a high amount of signal content. Here, we discuss popular methods for feature extraction that have been used for spectroscopic

**100**

applications.

**analysis**

**Figure 8.**

*Basic diagram of a Raman spectrometer [31].*

adulteration and fraud.

**3.1 Feature extraction**

quantification of food spoilage and food quality.

PCA is a common method of feature extraction enabled through dimensionality reduction. PCA provides dimensionality reduction by representing the variance in the data within the smallest number of components possible. Each principal component is a linear combination of the original components and is calculated in an iterative fashion by identifying the weight vector that, when applied to the original data components, contains that largest amount of the remaining variance and is orthogonal to the previously calculated principal components. As a result, the majority of the variance (typically ~99% or more) is contained within the first few (typically 3–5) principal components, meaning the others can be safely ignored with negligible loss of information. These principal components are also eigenvectors of the data's covariance matrix and can be computed by eigendecomposition. The corresponding eigenvalues are proportional to the variance represented within each principal component and can be used to identify the principal components which are considered "significant." According to the Kaiser criterion [36], eigenvectors with eigenvalues less than 1 can be considered insignificant.

In additional to its dimensionality reduction benefit, PCA tends to yield principal components that provide good separability between data collected from different classes. It is this property that makes PCA such an effective tool for feature extraction. Principal components also provide qualitative clues to key underlying molecular constituent differences and relative abundances, since their spectral characteristics, often are the key features in the second, third and higher principal components.

PCA is widely used in food chemistry studies [37] and specifically for analysis of food spoilage. For example, in 2020 Saleem et al. [38] presented a new method for predicting microbial spoilage and detecting its location in bakery goods using HSI. HSI cameras monitored baked goods over a period of time as they were allowed to spoil. PCA was applied to difference images created by subtracting images collected at the beginning of the monitoring period, when the goods were fresh, and images collected at later times. The researchers then used PCA to separate pixels representing spoiled portions of the good from unspooled portions.

#### *3.1.2 Sparse representation*

Similar in concept to PCA, sparse representation methods are mathematical processes applied to data with the goal of transforming the data to a new representation containing as few non-zero elements as possible. This is achieved by conducting a trade-off between goodness-of-fit and sparsity. The transformation attempts to produce an accurate reproduction of the original data but is regulated with a cost penalty the increases with the number of non-zero components. Example sparse representation algorithms include basis pursuit, sparse dictionary learning, L1-regularization, and non-negative matrix factorization (NMF) [39].

With respect to HSI, sparsity can also be enforced through wavelength selection processes that identify a small number of information-rich wavelengths and discard all other wavelengths. Lei and Sun [40] developed a sparse coefficients wavelength selection and regression (SCWR) method for NIR spectral calibration to select the wavelengths that contributed most to the determination of the spectral response. They applied this method to a dataset if NIR spectra from potatoes with dehydration loss as the response variable. A model based on 23 selected wavelengths (from an original set of 200) predicted hydration loss with an accuracy that exceeded those yielded by common competing methods.

**Figure 9.**

*Autoencoder neural network showing bottleneck which separates the encoder and decoder portions.*

### *3.1.3 Autoencoders*

Benefitting from recent advancements in both algorithms and processing technology, neural networks and their derivatives have experienced rapid development over the past decade. Another method for dimensionality reduction and feature extraction is based on a particular type of neural network called an autoencoder. An autoencoder network contains input (e,g., spectral measurement) and output layers of the same size but includes hidden layers in between with gradually decreasing numbers of nodes (see **Figure 9**). During training, the network weights are updated until the output is the same as the input within an acceptable tolerance. The layers from the input to the bottleneck center thus effectively encode a compressed version of the input signal. This set of layers is referred to as the "encoder" section. The compressed signal can then be uncompressed in the later layers (called the "decoder") to form a copy of the signal in the output layer. This process has the added benefit of removing noise in the input during encoding such that the decoded copy is more representative of the true response. In 2021, Vasafi et al. [41] made an initial application of an autoencoder in the field of food production process control by using it to detect anomalies such as changes in fat, temperature, added water, and cleaning solution during milk processing. Anomalies were found to result in significantly higher reconstruction error at the autoencoder output layer as compared with the control (i.e., "normal") data.

#### *3.1.4 Partial least squares regression (PLSR)*

PLSR is a well-known and often used means of conducting regression in the presence of noise. Regression provides a function that predicts a response from a data input (as opposed to classification which assigns the input to a class). While both PCA and PLSR are derived from experimental data, PCA is more qualitative by nature, often used in an exploratory manner, and is an unsupervised learning method. PLSR on the other hand is more quantitative and is a multi-dimensional evaluation that is linear. Both methods rely on computing a maximum covariance, PCA in the original data and PLSR in the data and response.

PLSR works best when the observed variables are highly correlated and noisy [42], which is a benefit in hyperspectral analysis where data at nearby wavelengths can be highly correlated. Also, PLSR assumes that the data set is linear and that that projection holds in a new subspace. However, if linearity does not hold up for the

**103**

**Figure 10.**

*Advanced Optical Technologies in Food Quality and Waste Management*

model, there are several ways to deal with this problem that include polynomials, splines or small neural nets [43]. There is also an easy way to deal with non-linearity [44], the basic idea being to expand the data matrix with square, cubic, or cross

PLSR continues to be a popular tool for food analysis and quality control applications. Jiang et al. [45] invoked PLSR to model the relationship between NIR spectra from hyperspectral images of chicken to total *Pseudomonas* spp. and *Enterobacteriaceae* counts (PEC) to predict PEC rapidly. Cavaglia et al. [46] similarly applied PLSR to predict density and pH in ATR-MIR spectra from alcoholic fermen-

Wavelets are mathematical functions that, like Fourier analysis, transform data into its constituent spectral components. However, unlike the standard Fourier transform, wavelet transforms can provide frequency information for specific locations in the temporal or spatial domain. Wavelets of different shapes (called mother wavelets) focus on different portions of the frequency spectrum and are typically used in combination to analyze the full spectral bandwidth of concern. Each mother wavelet can also be rescaled to form daughter wavelets to change the resolution in the temporal or spatial domain and thus examine higher or lower portions of the frequency spectrum in more detail. Some wavelet examples are shown in **Figure 10**. Wavelet analysis has been used in spectroscopic applications as a means of extracting useful features from specific regions of the spectra. For example, Qi et al. [47] applied a single wavelet form at seven different scales to extract features from shortwave infrared (SWIR) hyperspectral reflectance images from peanuts. Using these features, they were able to distinguish moldy portions of peanuts from healthy portions. Wavelet analysis can also be applied in the image domain to extract 2D features. Ji et al. [48] applied wavelet transforms to hyperspectral visible-NIR images of potatoes (after first applying PCA) to decompose the original images into sub-band images at different scales to extract textural features that would enable

Splines are piecewise linear or polynomial functions that are combined to approximate a given set of data. Splines are often used as smoothing functions to approximate data curves while eliminating the "roughness" caused by noise. Like the other techniques discussed above, splines benefit the feature extraction process

One application of splines common to chemometric analysis is the regression analysis method of multivariate adaptive regression splines (MARS) [49]. MARS

*DOI: http://dx.doi.org/10.5772/intechopen.97624*

the identification of bruising in the potatoes.

by focusing on the true signal within the data.

*Wavelet examples from different wavelet families.*

product terms.

tation samples.

*3.1.5 Wavelets*

*3.1.6 Splines*

#### *Advanced Optical Technologies in Food Quality and Waste Management DOI: http://dx.doi.org/10.5772/intechopen.97624*

model, there are several ways to deal with this problem that include polynomials, splines or small neural nets [43]. There is also an easy way to deal with non-linearity [44], the basic idea being to expand the data matrix with square, cubic, or cross product terms.

PLSR continues to be a popular tool for food analysis and quality control applications. Jiang et al. [45] invoked PLSR to model the relationship between NIR spectra from hyperspectral images of chicken to total *Pseudomonas* spp. and *Enterobacteriaceae* counts (PEC) to predict PEC rapidly. Cavaglia et al. [46] similarly applied PLSR to predict density and pH in ATR-MIR spectra from alcoholic fermentation samples.
