**2.5 Spectral processing and development of calibration models for soil N and C**

### **2.5.1 Principal components analysis (PCA)**

PCA is a data compression process (i.e. a bilinear modelling process), which can be used to reduce a complex multidimensional data (e.g. spectra) into a smaller number of principal components (PCs) which reflect the underling structure of the original dataset. The first principal component typically explains most of the variation in the dataset with further principal components being orthogonal to the preceding PC and explaining less variation in the dataset. By plotting the PCs in two or three dimensional data space, interrelationships between the samples and variables can be examined (www.clw.csiro.au).

### **2.5.2 Partial least-squares regression (PLSR) analysis**

For PLSR, spectral information is arranged as N�M matrix which consists of N spectra with absorbance values for M wavelengths, and the calibration data is expressed as a single vector with measured values for these spectra. The PLSR algorithm decomposes the Mdimensional spectra space into few factors termed latent variables (LVs), which represent the best projections of the calibration vector onto the N�M matrix. One of the advantages of PLSR compared to other chemometric methods like PCA is the possibility to interpret the first few LVs, because they show the correlations between the property values and the spectral features. Furthermore, PLSR takes as well variations of the absorbance as variations of the calibration data into account. PLSR is a rapid analysis, can handle co-linear data, and can provide useful qualitative information.

#### **2.5.3 Procedure of spectral processing and model calibration**

Before the absorbance spectra were calibrated to predict soil properties, PCA was conducted to detect sample outliers in raw data set of Vis-NIR spectra, ATR spectra and DRIFT spectra. The identified sample outlier/s was/were excluded from further investigation. The remaining spectra were then subjected to various spectral pre-processing algorithms to reduce or eliminate noise, offset and bias in raw spectra. The investigated spectral preprocessing techniques included Savitzky-Golay smoothing, standard normal variate (SNV), multiplicative scatter correction (MSC), baseline offset correction (BOC), centre & scale, 1stand 2nd- detrendings, and 1st- and 2nd- derivatives. Several spectral normalizations were also included. They were conducted according to maximum, range, mean and quantile values. Details of these algorithms are available in www.camo.com. PLSR algorithm was used to decompose both raw and transformed spectra matrix into 10 LVs. All PLSR models were validated with full cross-validation approach in which each spectrum was in turn excluded from the calibration sample set and was predicted by the PLSR model calibrated for the remaining spectra. By decomposing the spectra into 10 LVs, it was assumed that the PLSR model would be over-fitted because signal noise of the spectral measurements could also be correlated with the property vector. The optimal number of LVs was determined by minimizing the predicted residual error sum of squares (PRESS). For better understanding the importance of different wavelength ranges in the prediction of soil N and C, PLSR models were also developed for the combinational Vis-NIR-ATR and Vis-NIR-DRIFT spectra. Spectral transformation and model calibration were conducted using the UnscramblerX10.1® (CAMO, Oslo, Norway).

#### **2.6 Model assessment criteria**

190 Infrared Spectroscopy – Life and Biomedical Sciences

reflectance of the sample (www.uksaf.org). DRIFT analysis of powders is conducted by focusing infrared light onto the powder (sometimes diluted in a non absorbing matrix, e.g.

In practice, DRIFT is most conveniently and rapidly used for soil analysis in diffuse reflection mode, where the incoming radiation is focused onto the soil sample surface, often in the form of a dry powder or <2 mm micro-aggregates, and the reflected radiation is passed back into the spectrophotometer (Fig.3, www.clw.csiro.au). In this study, infrared spectra were recorded in diffuse reflection mode with an Alpha spectrometer with DRIFT accessory. Bulk soil samples were scanned 20 times in the range from 4000 to 400 cm-1. DRIFT spectra were corrected against atmospheric CO2 and water vapour. Finally, the infrared reflectance spectra were transformed into absorbance spectra using

Fig. 3. A description of the method of acquiring a DRIFT spectrum (www.clw.csiro.au)

**2.5 Spectral processing and development of calibration models for soil N and C** 

between the samples and variables can be examined (www.clw.csiro.au).

**2.5.2 Partial least-squares regression (PLSR) analysis** 

PCA is a data compression process (i.e. a bilinear modelling process), which can be used to reduce a complex multidimensional data (e.g. spectra) into a smaller number of principal components (PCs) which reflect the underling structure of the original dataset. The first principal component typically explains most of the variation in the dataset with further principal components being orthogonal to the preceding PC and explaining less variation in the dataset. By plotting the PCs in two or three dimensional data space, interrelationships

For PLSR, spectral information is arranged as N�M matrix which consists of N spectra with absorbance values for M wavelengths, and the calibration data is expressed as a single vector with measured values for these spectra. The PLSR algorithm decomposes the Mdimensional spectra space into few factors termed latent variables (LVs), which represent the best projections of the calibration vector onto the N�M matrix. One of the advantages of PLSR compared to other chemometric methods like PCA is the possibility to interpret the first few LVs, because they show the correlations between the property values and the

**2.5.1 Principal components analysis (PCA)** 

KBr) and the scattered light is collected and relayed to the IR detector.

Log(1/R).

The validation accuracy of PLSR models is given by the root mean squared error (RMSE):

$$\text{RMSE} = \sqrt{\frac{1}{\text{N}} \sum\_{\mathbf{N}} (\mathbf{X}\_{\mathbf{i}} - \mathbf{Y}\_{\mathbf{i}})^2}$$

where X� is the predicted value, Y� the measured (reference) value and N the number of soil samples. To compare model performance, we recorded the residual predictive deviation (RPD), which is the ratio of standard deviation of reference values to RMSE of the calibration set during cross-validation. The criteria adopted for RPD classification (Mouazen*, et al.*, 2006) was that an RPD value below 1.5 indicates very poor model/predictions and that such as value could not be useful; an RPD value between 1.5 and 2.0 indicates a possibility to distinguish between high and low values, while a value between 2.0 and 2.5 makes approximate quantitative predictions possible. For RPD values between 2.5 and 3.0 and above 3.0, the prediction is classified as good and excellent, respectively. Meanwhile, we compared the coefficient of determination (*R*2) in crossvalidation of calibration models. Generally, a good model would have high values of *R*2 and RPD for cross-validation.

Vis/Near- and Mid- Infrared Spectroscopy for Predicting Soil N and C at a Farm Scale 193

Different wavelength bands respond to different chemical compositions or molecular groups in soil. However, this response is strongly influenced by soil texture classes. Figure 4 shows the representative Vis-NIR absorption spectra of samples from each soil texture class, e.g. sandy loam (field #D), clay loam (field #B) and clay (fields #C and #E), with high and low TC content. The shift in overall baseline in the Vis-NIR spectra is likely caused by the overall difference in the particle size distribution (Madari*, et al.*, 2006). The clay soil has a finer texture compared to the others, which results in a higher baseline. In general, smaller particle size results in higher reflectance or lower absorbance, but for our case, the higher absorption coefficients for the clay fraction apparently dominate the particle size effect resulting in higher absorbance. Within the clay texture class, the samples with higher TC content tend to exhibit stronger absorption in Vis-NIR spectra than those with lower TC contents (Fig.4). This observation seems true for sandy loam soils but not for clay loam soils, which might be attributed to the effect of soil colour. In Fig.4, the sample from clay loam class with low TC content of 1.59% shows higher absorbance than that with high TC content of 2.44%. This is probably due to particle size effect. The Vis-NIR spectra are characteristic of absorption bands associated with colour (400-760 nm), the bending (1413 nm) and stretching (1916 nm) of the O-H bonds of free water and lattice minerals at around 2210 nm (Madari*, et* 

Fig. 4. Vis-NIR absorption spectra of samples from each soil texture class with high and low

Figure 5 shows all raw Vis-NIR spectra, PCA scores plot for the spectra, and residual Xvariance for PC-1 and PC-2. PC-1 and PC-2 explains 96% and 3% of total variance, respectively. The PC-1 may explain variation related to SOM of the samples, as the PC-1 was better correlated to TN, TC and OC (*r*=0.74~0.76) than to PC-2 (*r* =-0.46~-0.48) (Table 3). Samples originated from different fields can be divided into two clusters: one for Luvisol soils (Showground Field, #D) and another for Cambisol soils (Orchard, #B; Ivy Ground, #C;

**3.2 Vis-NIR spectral analysis and model calibrations** 

*al.*, 2006).

TC content.

**3.2.1 PCA analysis for Vis-NIR spectra** 
