*1.4.10 Multiple Linear Regression*

Multiple Linear Regression (MLR) is a commonly used machine learning algorithm that allows to determine a mathematical relationship among a number of random variables, analyzing how multiple independent variables are related to one dependent variable. Since each of the independent factors has been determined to predict the dependent variable, information about the multiple variables is used to develop an accurate prediction about the level of effect they have on the outcome variable. The model generates a relationship in the form of a straight line (linear) that best approximates all the individual data points. The most important advantage of MLR is it helps us to understand the relationships among variables present in the dataset. This will further help in understanding the correlation between dependent

#### **Figure 5.**

*A comparative study of artificial neural network (MLP, RBF) models for rice biochemical parameters prediction. Simple configuration of (a) MLP and; (b) RBF neural networks [86].*

*Near-Infrared Spectroscopy and Machine Learning: Analysis and Classification Methods of Rice DOI: http://dx.doi.org/10.5772/intechopen.99017*

and independent variables. MLR is one of the oldest regression methods, being used to establish linear relationships between several independent variables (*X*i) and the dependent variable (sample property) (*Y*) that depends by them. The developed model can be represented in the following the Eq. (6):

$$\mathbf{y}\_i = \mathbf{b}\_0 + \sum\_{i=1}^{N} \mathbf{b}\_i \mathbf{x}\_i + \mathbf{e}\_{i,j} \tag{6}$$

where *y*; represents the sample property, *b*i represents the computed coefficient for each variable *x*i, while *e*i,j is the error. Each independent variable is analyzed and correlated with the specific property *y*j. Regression coefficients *b*i represent the effects of each determined term. After the MLR model has been developed the accuracy in prediction of the dependent variable is evaluated by computation of the correlation coefficient, which is calculated when true values are compared to predicted ones. Coefficient of determination R<sup>2</sup> is not reserved for MLR, as it is one of the most frequently used statistic parameters for assessment of validity of the developed model regardless of the model type (Eq. (7)).

$$R^2 = \mathbf{1} - \frac{\sum \left( \mathbf{y}\_i - \hat{\mathbf{y}}\_i \right)}{\sum \left( \mathbf{y}\_i - \overline{\mathbf{y}} \right)} \tag{7}$$

#### **2. Practical applications of NIR spectroscopy and chemometrics**

#### **2.1 NIR spectroscopy in rice analysis: identification and classification**

There are several studies that discribe the quantitative analysis by NIR spectroscopy in different types of food, providing an exceptional method for the evaluation of chemical composition (*i.e.* protein, starch, lipid, amylose, and moisture contents) in raw pork and beef [94], in cheese or other dairy products [95]. However, it is most widely used in the field of grains and cereal products. In some cases, such measurements are important to achieve the end-used objectives of a plant breeding program. The use of NIR spectroscopy for the quality assessment of processed foods has generated a lot of interest during the review period. Access to food with high quality is essential to human health. Thus, the accurate collection of agricultural food quality data in real-time is utmost importance, such as grains and flours. NIR spectroscopy has proven beneficial for the analysis of various cereals, grains, flours, and baked goods, including specific quality parameters, which influence classification, safety, grading, and price. By analyzing numerous factors and properties of crops during different steps in their development, crop quality can be expected early on. To maximize efficiency and lessen waste of produce, it is important that these data collection methods be non-invasive, non-destructive, and economical. Gas chromatography (GC), high-pressure liquid chromatography (HPLC), or mass spectrometry (MS) represent some quantitative instrumental techniques used for quality assessment of foods. However, these techniques are not applicable for real-time measurements. Spectroscopic instrumentation have recently utilized in agricultural industries for quality analysis. NIR spectroscopy allows a detailed food analysts to examine the quality, composition, and the authenticity of agricultural and food products quickly and accurately, based on physicochemical properties of

crops. Machine learning methodologies have been coupled with NIR spectroscopy for the prediction of rice quality factors [96] and the quantitative determination of amylose values [51]. There have been numerous applications of portable NIR instruments in recent years for specific analyses such as determining adulteration in rice and other food quality parameters. NIR spectroscopy is highly useful in analyzing shelf life and maturity of agricultural products like rice. However, the data collection and modeling are still time consuming for portable spectrometers to be efficient in some applications. This can be potentially overcome by combining NIR spectroscopy with other analytical methods. Studies developed by [97] allowed to develop a tandem approach of monitoring rice germ shelf life during storage using NIR and a portable *e*-nose. Le et al. proposed a study that combines deep learning with NIR to provide a much faster method of cereal analysis compratively to traditional NIR models [98]. The deep learning algorithm removes interference of spectral signal developing modeling significantly efficient. Jiang et al. developed a portable NIR spectrometer system to dynamically evaluate the fatty acid content of rice during storage [99]. Another challenge in NIR spectroscopy is determining authenticity and the geographical location of certain agricultural products like grains. Studies carried out by Sampaio et al. developed a strong and accurate classification model based on machine learning methods and NIR spectroscopy, allowing to sorting two genotypes of rice with high accuracy based on these characteristics [100]. Barnaby et al. correlated the grain chalk of rice to the genomic regions of NIR spectra. These spectral regions can be applied in the automation of grain chalk quantification or for other grain products as well [101].

There are several studies based on NIR to predict viscosity properties of rice. Delwiche et al. developed calibration models on whole-grain milled rice using PLS regression to predict viscosity properties of a flour-water paste as recorded by the RVA, that determine the cooking and processing characteristics of rice [102]. Meadows and Barton later used NIR to predict RVA data in rice flour [103]. A PLS regression of NIR spectra *vs*. RVA viscosity showed a highest correlation (R = 0.961– 0.903) to NIR was at 212–228 sec, which is between the initial pasting time and peak viscosity. Furthermore, the pasting parameters of setback and break down, and gelatinization peak temperature of rice flour were predicted successfully using NIR [104]. Texture of cooked rice was also predicted by NIR analysis of whole grain rice [105]. Five of seven sensory texture attributes were predicted by NIR using PLS analysis, whose calibration models were developed based onf second derivative spectra. RVA peak viscosity and breakdown were also successfuly predicted based on NIR spectra and PLS regression models. Calibrations were developed using PLS and ANN analyses. The results showed limited precision of this method. However, it can be used as a rough screening method for starch amylose content. Xie et al. later reported that NIR spectra correlated strongly with differential scanning calorimetry (DSC) for measuring amylopectin retrogradation in bread staling [106]. Nowadays, requirements of quality control in grain milling and food processing increasingly call for on-line analyses [41]. Studies developed by Sampaio et al. based on NIR spectroscopy associated to PCA, PLS-DA, and SVM for discrimination and classification of rice varieties (Indica and Japonica) were explored after different spectra processing steps such as MSC, first derivative and second derivative [100]. The PCA allowed revealing the pattern and relationship of each variety and chemical similarities that were effectively distinguished by PLS-DA and SVM, according to their specific properties. The SVM model, showed a significant fitting accuracy (97%), cross-validation (93%), and prediction (91%). These data support the strength of the model for efficient rice types classification. The principal differences between both rice types were present at range 7476–7095 cm−1, 7046 cm−1 and 4264–4153 cm−1, which can be used for its discrimination, being possible to develop

#### *Near-Infrared Spectroscopy and Machine Learning: Analysis and Classification Methods of Rice DOI: http://dx.doi.org/10.5772/intechopen.99017*

a robust classification model for rice samples based on their specific physicochemical properties. The classification models developed using SVM tools were very robust compared to PLS-DA models, allowing to classify with high confidence both rice varieties. The machine learning tools can facilitate the process of classification and identification of different types of grains being possible, in the next future, to discriminate their origin, harvest season, state of conservation as well as the presence of contaminants and adulteration issues based on robust classification method, allowing to create a rice database and making *in situ*, real-time in classifying the types and origins of rice.

Studies developed by Osborne et al. using near infrared transmission spectroscopy allowed to discriminate between Basmati and other long-grain rice samples. A discriminant rule was derived using the Fisher linear discriminant function calculated from the first few principal component scores of the NIR spectra [107]. The discriminant rule was assessed by cross-validation. Based on this study, nine Basmati varieties and 53 other rice samples were classified correctly from NIR spectra, but 8% of the Basmatis and 14% of the others were misclassified on the basis of spectra of individual grains. NIR spectroscopy technique also offers effective quantitative capability for moisture, fat, protein and gluten content in rice cookies [108].

According to studies performed by Chen et al., the NIR diffuse reflectance spectroscopy of multi-grain seeds, a spectral discriminant analysis method for the variety identification of multi-grain rice seed was developed using the PLS-DA [109]. Due to the slight differences of seeds spectra in various varieties, it's necessary to propose the novel and valid methods. In this study, the SNV pretreatment combined with wavelength-screening methods improved the accuracy of the discriminant models. The selected optimal wavelength model was the combination of 54 discrete wavelengths within NIR region. NIR spectral discrimination total recognition accuracy rates reached 94.3% for a study that involves the identification of one type of differentiation (negative and excellent hybrid variety) and several interference groups (positive, four pure groups and four mixed groups).

The Hyperspectral Imaging (HSI) technique coupled with visible (vis) and/or NIR spectroscopy is generally used to identify or inspect different substances of seed by recognizing the molecular bonds in the sample, being considered the most feasible methods for rapidly and non-destructively detecting the substances of agricultural products, combining the technologies of spectroscopy and digital imaging. Studies developed by He et al. used the system NIR-HSI combined with multiple data preprocessing methods [110]. This approach allowed simultaneously to obtain spectral and spatial information from testing samples in the form of a hypercube constituted by two spatial dimensions and one spectral dimension. The HSI technique has the ability to collect hyperspectral information from samples of different sizes and shapes based on the spatial data. The detection speed of HSI is faster than that of point-based techniques, as many samples can be scanned and analyzed at the same time by using an HSI camera [111]. The classification models was developed to identify the vitality of rice seeds, presenting a great potential for identifying vitality and vigor of rice seeds. When detecting the seed vitality of the three different years, the extreme learning machine model with Savitzky–Golay preprocessing reached a significant classification accuracy of 93.67% by spectral data. In terms of the nonviable seeds identification from viable seeds of different years, the least squares support vector machine model coupled with raw data and selected wavelengths achieved a significant classification achievement (94.38% accuracy), and can be adopted as an optimal combination to identify non-viable seeds from viable seeds. In another study, carried out by Barnaby et al., NIR hyperspectral image consists of numerous bands with small spectrum gaps (every 4 nm in our study) and can assess

grain traits such as fat, starch, protein, moisture, color, and many other physicochemical compounds at once [101]. Genome wide association study allowed to confirm known genes and to identify new genes that can affect grain quality traits based on hyperspectral imaging technique. The PLS-DA models of hyperspectral data identify spectral ranges that distinguished genetic and production environment differences, and this data can support to resolve the genetics of complex traits such as rice grain quality.

The nitrogen content is an important chemical indicator used for monitoring and management of plant due to its role in photosynthesis, productivity as well as its effect on carbon and oxygen cycle. The nitrogen content can be measured by laboratory analysis, meanwhile, its spectral reflectance of NIR (700–1075 nm) in the field was measured using hand held spectroradiometer. Studies performed by Afandia et al. evaluated nitrogen content in rice crop based on NIR reflectance using ANN [111]. The reported study allowed to conclude that the organic molecules (nitrogen, water, etc) present a specific absorption pattern in the NIR region and the comparison between measured and model estimation of nitrogen content presented a RMSE of 0.32.

A study developed by Lin et al., based on the imaging method, a system constituted by a NIR camera, filters, an automatically exchange filters device, and the imaging processing techniques allowed to detect the rice protein content based on the spectrum absorption. The NIR data allowed to establish the calibration model based on MLR, PLS, and ANN analysis models. In the MLR model, the NIR imaging system used the calibration model that take in account 5 wavelengths (880 nm, 910 nm, 920 nm, 1000 nm, and 1014 nm) to predict the rice protein content, and had R2 validation (0.782) and standard error of predicition (SEP) 0.274%, and respectively. The NIR imaging system used 15 filters ranging from 870 to 1014 nm in the PLS model, the predictive results expressed a significant performance (R<sup>2</sup> val = 0.782, and SEP = 0.274%) comparatively tothe MLR model. The ANN model, the net input using the 5 spectrum wavelengths selected by the MLR, simplified the model, and the predicting results (R<sup>2</sup> val = 0.806, and SEP = 0.266%) were similar to those of the PLS. The prediction results indicated that the developed NIR imaging system has the advantages of simple, convenient operation, and high detection accuracy as well as it presents commercial potential in non-destructive high accurate predicting capability detection of rice protein content [112].

NIR spectroscopy was used to develop a new discrimination method of varieties of rice. The several variables compressed by PCA were used as inputs of multiple discriminant analysis (MDA). The study showed that the combinantion of spectroscopy and computer data processing technology based on PCA and MDA for the identification of rice from different areas allowed to identify correctly about 98% for the calibration process, and 100% for the prediction process. These results showed that the proposed alternative method is a feasible way for the identification of the specific production areas of rice [113].

#### **2.2 NIR spectroscopy in rice authentication**

NIR spectroscopy has been widely used in the evaluation of agricultural products due to its many advantages, such as being easy-to-use, non-destructive, fast and accurate, providing highly reproducible results, requiring minimum or, often, no sample preparation, and allowing the analysis of several constituents based on a single measurement. As consequence of the importance of rice at global level, in the literature it is possible to find several studies aimed at their analysis and characterization. Due to environmental reasons and the rice the market, non-destructive approaches are generally preferred. NIR spectroscopy has emerged as an important

#### *Near-Infrared Spectroscopy and Machine Learning: Analysis and Classification Methods of Rice DOI: http://dx.doi.org/10.5772/intechopen.99017*

tool to determine fraud, adulteration, contamination in grains and flours. A substantial instrumental improvements (e.g., hyperspectral imaging, FT-NIR) and advances in data analysis (e.g., deep learning) have allowed for the development of screening methods for detecting the presence of pests (e.g., rice weevil) across a range of stored grains [114–116].

Direct spectroscopic measurements have been widely applied for several foods and commodities, especially in the grain, cereal products, such for classification of rice [117–121]. Furthermore, in the structure of the evaluation of rice quality, NIR spectroscopy has been used for the discrimination of rice [122, 123]; varieties classificationand transgenic rice detection [124]; the physico-chemical properties quantification (such as moisture content, sound whole kernel, whiteness, translucency, color, and amylogram characteristics) [125]; cultivars classification [126], protein and amylose content prediction [127, 128]; wax rice detection [129]; and eating quality prediction [130]. Barnaby et al. correlated the grain chalk of rice to the genomic regions of NIR spectra [101]. These spectral regions can be applied in the automation of grain chalk quantification and potentially for other grain products as well [131].

Rapid and nondestructive detection of rice authenticity and quality were performed based on hand-held NIR spectrometer coupled with the appropriate chemometrics. The selection of different preprocessing methods with PCA and modeling with KNN and SVM multivariate calibration model showed that MSC + PCA plus KNN showed superiority in this study with more than 90% classification rate for all categories of rice samples studied. Based on these results, the hand-held spectrometer associated to an appropriate multivariate calibration model could be used for quick and non-destructive detection of rice quality and authenticity [132].

Food fraud remains a significant problem for food regulators, importers, merchants, law enforcement personnel, and the consumer. A key feature of food fraud is the use of a lower value ingredient to imitate an authentic product. NIR analysis technology, PLS-DA, and SVM have been used to detect whether highquality rice was mixed with other varieties of rice. NIR spectral data analyzed using PLS-DA and a SVM algorithm, was shown to be a feasible method (5% detection limit) for the rapid identification of fraudulent rice varieties blended with authentic Wuchang rice samples [133].

Studies performed by Liu et al. showed that those techniques represent a significant support to qualitative discrimination [133]. PLS was used to establish the quantitative analysis model to support in the recognition of the degree of fraud. As consequence of the direct correlation between the results of NIR analysis and the homogeneity of the samples, four groups of samples with different physical forms (full granules, 40 mesh, 70 mesh, and 100 mesh) were prepared. Regarding qualitative analysis, the performance of the model has no obvious relationship with the physical state of the sample, the qualitative model of PLS-DA and SVM can detect the fraudulent rice with a 5% detection limit. The determination coefficient and root mean square errors of the optimal prediction result were 0.96 and 2.93, respectively. Based on this study, NIR analysis technology can be considered as a reliable and fast strategy to determine if the premium high-quality rice is adultered with inferior categories of rice.

Different preprocessing approache were used for NIR signals pretreatment. Besides considering raw data, the first derivative (Savitzky–Golay approach, 15 points window, 2nd order polynomial), second derivative (Savitzky–Golay approach, 15 points window, 3rd order polynomial), and standard normal variate (SNV) were also evaluated (**Figure 6**). NIR data were further mean-centered prior to the creation of any calibration model. The most suitable preprocessing approach, together with the optimal complexity (number of LVs or PCs to be extracted) of any classification model, were defined based on a cross-validation procedure. PLS-DA

**Figure 6.**

*NIR spectra (a) raw spectra of samples, (b) mean spectra of authentic (red line) and adulterated samples (blue line). (Adapted from [134]).*

selection, specifically, was based on the combination of pre-processing and model complexity leading to the lowest mean classification error, whereas for SIMCA the maximum efficiency was sought. A study developed by Duy Le Nguyen Doan investigate the possibility of combination NIR spectroscopy and chemometric classifiers with the aim of detecting adulterated rice samples [134]. Two different strategies were exploited: discriminant classifier (PLS-DA), and class-modelling technique (SIMCA). Both strategies provided different results; in particular, SIMCA appeared unable to solve the investigated problem. On the other hand, PLS-DA analysis showed to be a suitable approach. These results indicate that the high within-class variability can have an impact on the possibility of detecting low levels of adulteration; simultaneously, was also suggested that the proposed approach could be useful for detecting samples adulterated. Then, this study demonstrates that the combination of NIR spectroscopy and PLS-DA can represent an effective, rapid and nondestructive tool for the determination of adulteration in jasmine rice [134].

#### **2.3 NIR Spectroscopy in Rice Contamination**

Fast determination of heavy metals is necessary and important to ensure the safety of crops. The potential of NIR spectroscopy coupled with chemometric technology for quantitative analysis of cadmium in rice was investigated. The spectrum was pre-processed using first derivation to reduce the baseline shift and several chemometric techniques, such as iPLS, mwPLS, siPLS, and biPLS were proposed to extract and optimize spectral interval from full-spectrum data. The PLS models based on four chemometric algorithms outperformed the full-spectrum PLS model then developed. Among the techniques, biPLS performed better with the optimal subinterval selection [135].

Heavy metals are spectrally featureless so that spectral responses could not be directly used for the assessment of heavy metals in rice. With a close combination of protein, crude fiber, and other ingredients, heavy metals present significant correlation with protein in rice [136]. The detection of heavy metal concentration in grain is mostly realized by physical and chemical direct methods that can exactly obtain the residual levels of heavy metal; however, it is time consuming, cumbersome, and inefficient. On the basis of the hypothesis that heavy metal concentration could be spectrally estimated through the correlation between heavy metal concentration and protein contents, the objectives of this study are to: (1) build quantitative model for the quick prediction of both heavy metal and protein content, and (2) to evaluate the feasibility of near-infrared spectroscopy in assessing heavy metal concentration in coarse rice.

Protecting people from heavy metal contamination is an important publichealth concern and a major national environmental issue. The NIR spectral

*Near-Infrared Spectroscopy and Machine Learning: Analysis and Classification Methods of Rice DOI: http://dx.doi.org/10.5772/intechopen.99017*

technique is used to identify heavy metal concentration such as lead (Pb) and copper (Cu) in rice. The NIR spectral data were treated by some methods, including, logarithm, baseline correction, standard normal variate, multiple scatter correction, first derivates, and continuum removal. The lead (Pb) was accumulated in rice at a high level (17.05) compared with the others heavy metals. MSC-PLSR models were developed, respectively, for Pb (R2 = 0.49, RMSE = 2.01 mg/kg) and Cu (R2 = 0.29, RMSE = 0.75 mg/kg). It is achievable to identify Pb and Cu content in rice by using NIR spectral technique. However, further studies should be performed on the application of spectral technique in discriminating the other heavy metals in rice due to the limitations of few samples and particles size interference.
