**Figure 1.**

*Untargeted cheminformatics workflow for analysis of lignocellulosic materials by Py-GC/MS.*

## **1.3 Common problems in Py-GC/MS and contribution of cheminformatics for their solution**

Some apparent methodological problems attributed to pyrolysis are associated with the conditions necessary for the analysis of specific materials. Lourenço *et al*. [12], point out that care must be taken with the pyrolysis temperature when analyzing materials rich in suberin, such as barks. The main problem is that suberin decomposes at temperatures in the range of 550–600°C [31]. Therefore, this is an aspect to take into account if it is required to know the composition of this polymer within lignocellulosic samples [12]. Another problem referred in various works is that Py-GC/MS cannot guarantee an entirely quantitative determination. However, some authors have successfully carried out quantitative analyses in the optimization of aromatic hydrocarbon production from biomass [29]. Also for the quantification of only small amounts of aromatic hydrocarbons, applying the external calibration method [3, 30].

The amount of information that is generated as a result of the entire process can be challenging aspect. One analysis of 45 minutes by Py-GC/MS on lignocellulosic samples can generate up to 2,729 mass spectra [21]. However, after cheminformatics and manual curation of the datasets, the authors were able to unambiguously recognize 451 compounds, including some putative isomers. Another common problem is the displacement of the peaks in the chromatograms for samples with different chemical composition. For example, the displacement of the peak corresponding to levoglucosan in Py-GC/MS chromatograms for syringil-rich wood [18]. The displacement is due to the absence of acetovanillone in the samples. Therefore, the peak of levoglucosan appears at a Retention Time (RT) of 22.72 min, while in species that produce acetovanillone it is observed at 23.55 min (**Figure 2**). The above effect is problematic when it is required to directly process a batch of several samples with differential compositions. There are two reasons: 1) the process would be very time consuming if several species are analyzed and all the peaks identified by Py-GC/MS are compared one by one (about 40 compounds per sample, using

### **Figure 2.**

*Displacement of the peaks. Py-GC/MS chromatograms from extractives-free wood in cacti: A) Pilosocereus chrysacanthus and B) Ferocactus hamatacanthus. Displacement of levoglucosan (black arrows) is due to the absence of acetovanillone (gray arrows) in samples with 94% of syringil units [18]. The origin of the compounds is marked with letters: Ch, carbohydrates; G, guaiacyl subunits; S, syringil subunits, Fa, ferulates.*

### *Cheminformatics Applied to Analytical Pyrolysis of Lignocellulosic Materials DOI: http://dx.doi.org/10.5772/intechopen.100147*

the native GC/MS software). This implies that the analysis has to be limited only to differences in the relative abundance, or the presence/absence, of only certain compounds. 2) If the raw datasets from the chromatograms are compared directly, using any multivariate method, the peak displacement would cause methodological bias because equivalent compounds are not being compared. Cheminformatics analysis solves this problem by automating the alignment of mass spectra and the identification of compounds for a batch of samples.

On the other hand, the high degree of degradation caused by the high temperatures used in pyrolysis represents, by far, the main problem of this technique. Therefore, this technique is considered to be of little use to characterize molecules larger than monomers or dimers in biopolymers such as lignin [6]. In addition, it is considered that the large number of derivatives makes the description of the chemical composition of sample difficult. Therefore, the detailed interpretation of the results is difficult and probably not necessary [3]. For example, when analyzing carbohydrate samples, low molecular weight derivatives can originate from hexoses or pentoses [12, 32]. The reason is that cellulose and hemicelluloses involve similar thermal degradation pathways, therefore a large part of the derivatives produced are the same [33, 34]. The reason is that cellulose pyrolysis causes the heterolytic cleavage of the glycosidic C⸺O bonds. In addition, it involves complex reactions and different pathways to give rise to anhydro sugars and numerous compounds with low molecular weight: i.e., acetic acid, 1-hydroxybutan-2-one, hydroxyacetaldehyde, 1-hydroxypropan-2-one and 2-furaldehyde [15, 35, 36]. A large part of these small compounds can also be originated from the decomposition of hemicelluloses. For example, 2-furaldehyde and acetic acid can be produced from the degradation of xylan [12, 37, 38]. On the other hand, there are contrary cases, but they also

### **Figure 3.**

*Complete profile of the compounds identified for eight samples of lignocellulosic materials. A) Cluster corresponding to Guaiacyl lignin derivatives. B) Abundance patterns for carbohydrates derivatives. Similar (sMS) or quasi identical (qiMS) mass spectra.*

contribute to the ambiguity in the identification of the compounds and their origin. Particularly when different ions are produced by the same class of compounds. The case of pyrans and furans is an example of compounds with ambiguous origin; both, with different molecular ions, can derive from the degradation of cellulose or hemicelluloses [12]. In this sense, the use of cheminformatics makes it possible to identify the abundance patterns of the compounds in a batch of samples. Based on this, it can be inferred if there are coincidences in the behavior of the pyrolysis products (**Figure 3**). In this way, it is possible to infer whether different compounds have the same origin, or rule out differences due to the operating conditions of the method or the characteristics of the samples [21].

For example, 2,5-dimetylfuran and 4-methyl-2*H*-pyran correspond to different molecular ions, but have the same average mass (96.13 Da) with similar RT, 4.64 min and 4.74 min, respectively (*see* Supplementary Materials of [21]). Based on the observed abundance patterns, it can be deduced that they are related to

### **Figure 4.**

*Representation of the importance of using standardized data for the interpretation of the results. Nonstandardized data: A) just ordered alphabetically; it is not possible to identify abundance patterns. B) Data arranged based on the HCA; trace compounds are overshadowed by the most abundant ones. C) Standardized data; compounds with the same origin share patterns of abundance and high similarity.*

*Cheminformatics Applied to Analytical Pyrolysis of Lignocellulosic Materials DOI: http://dx.doi.org/10.5772/intechopen.100147*

two different groups of compounds. Another example includes guaiacols, which are derived from guaiacyl (G) units. Under the same conditions of pyrolysis and composition of the samples, their abundance patterns should be the same. In the clustering analysis (CA) of **Figure 3A**, the guaiacols appear together forming a single group. For carbohydrate derivatives, abundance patterns with high similarity can also be identified for related compounds or putative isomers. **Figure 3B** shows the abundance patterns for ethyleneglycol diacetate and compounds with *quasi* identical (qiMS) or similar (sMS) mass spectrum. Another similar example is the independent origin of catechols and guaiacols in some lignocellulosic samples [21]. Catechols can be produced from guaiacols by secondary reactions at high temperatures [12, 21, 36]. However, as seen in **Figure 4**, the catechol abundance patterns across the samples, under the same experimental conditions, are clearly different from those samples with a predominance of G lignin. Therefore, catechols can be considered independently derived from those derivatives from G lignin.
