**3. Regression analysis**

This portion of the analysis is divided into two parts: the training and verification analysis:

### **3.1. Training analysis**

Savitzky-Golay can still preserve the overall profile (OriginLab Corporation, Northampton,

Manual peak editing was performed so as to effectively select the required respective peaks. The second (2nd) derivative was used to search for all the hidden peaks and heavily overlapped bands included in the spectrum data. Differentiation is used to resolve and locate peaks in an envelope. Sharp bands are enhanced at the expense of broad ones, and this may allow easier

The characteristic wavenumbers for pure cellulose, hemicellulose, and lignin listed in Table 3, were used for the peak assignment. Five characteristic peaks were identified for the pure cellulose, six characteristic peaks for pure hemicellulose, and six characteristic peaks for pure lignin (Figure 1a-c). The number of the peaks identified for the respective reference materials depends on the mixture of the reference materials. All seventeen peaks were identified for the

To obtain quantitative values from the area under the manipulated spectrum/peaks, the area

Prior to the spectrum manipulation, the FTIR-PAS biomass sample spectra were corrected for wavenumber-dependent instrumental effects by dividing the reference carbon black ("back‐ ground") spectrum intensity. This strategy implicitly assumes that the stability of the instru‐ mentation used is adequate to ensure reliable results, even though the sample and reference spectra were collected at different times [30]. Carbon black is featureless, in the sense that it does not show any major characteristics peak [30] Photoacoustic (PA) cell intensities varied with sample packing in the PA cell [36]. Stuart (1997) [35], also reported that absorbance varies linearly with the sample thickness. Therefore, the effect of bulk density of the reference materials and biomass samples was corrected by dividing the integrated areas with respective mass of the reference materials and biomass samples contained in the PA sample cup.

The model was further standardized by normalizing the corrected integrated area data from 0 to 1. This was performed by dividing individual reference materials and biomass samples corrected integrated area data by corresponding maximum corrected integrated area data value. The aforementioned steps were performed for the three major components of lignocel‐ lulosic biomass (cellulose, hemicellulose, and lignin). Therefore, this normalization approach ensures that the predictive model is adaptable for quantitative analysis of FTIR-PAS spectra

selection of a peak, even when there is a broad band beneath it [35].

under the respective peaks were integrated and output into excel file.

MA, 2012).

*2.6.4. Peak finding settings*

310 Biofuels - Status and Perspective

*2.6.5. Characteristic peak assignment*

treated and non-treated biomass samples.

*2.6.7. Normalization of photoacoustic infrared spectra*

obtained for any lignocellulosic biomass.

*2.6.6. Peak integration*

The normalized data of cellulose, hemicellulose, and lignin components in the reference materials and the combined biomass samples (i.e. combination of data from RF and SE analysis) were correlated to their percentage compositions. The RF and SE normalized data were combined so as to develop a general model that can be applied on a barley straw sample, irrespective of the pretreatment method applied on the biomass. The percentage compositions of the reference materials (Table 2) and the percentage compositions (Tables 4 & 6-8) of the combined biomass samples obtained from the NREL Standard were used as the dependent variable, while two out of the three replicates from the combined normalized data (obtained from the PA spectra) were randomly selected and used as the independent variables. Five independent variables were used for the cellulose (because it has five characteristic wave‐ numbers), while six independent variables were used for both hemicellulose and lignin (they both have six characteristic wavenumbers each). Multiple linear regression analysis was conducted at 5% significance level using IBM SPSS Statistics (Superior Performing Statistical Software, version 20 for Windows, 2012; IBM, Armonk, New York, NY). Regression models (equations 1-3) for cellulose, hemicellulose, and lignin were developed based on the generated regression coefficients. The developed models have the capability to predict quantitatively, the percentage compositions of cellulose, hemicellulose, and lignin in any lignocellulosic biomass.

### **3.2. Verification analysis (Validation)**

Subsequently, the normalized data (based on the respective characteristic peaks) from the third replicate of biomass sample was substituted into the developed predictive models to estimate/ predict the percentage compositions of lignocellulosic components present in the non-treated, RF-alkaline and SE pretreated biomass samples (Tables 4 & 6-8).
