**5. Applications of multivariate analysis to spectroscopic data of complex biological systems**

In the following, we will provide a few selected examples of the application of FTIR microspectroscopy coupled with multivariate analysis for biomedical relevant studies, with the aim to highlight the importance of linking the two approaches to extract the most significant spectral information from highly informative systems.

204 Multivariate Analysis in Management, Engineering and the Sciences

clustering.

**biological systems** 

assumption about the shape of the cluster.

**4.4. Artificial Neural Networks (ANN)** 

description of neural networks and their applications see [54, 65].

At first we have to define a measure of similarity or dissimilarity also called distance functions. The most common distance functions are: i) the Euclidean distance; ii) the

Based on the procedure they use, clustering algorithms can be divided into three main groups: hierarchical, partitional and density-based clustering. None of the following algorithms is better than the other. The choice of the clustering method strongly depends on

Hierarchical clustering algorithms can be again subdivided into agglomerative or divisive. The agglomerative clustering starts with all observations placed in different clusters and in each step an observation or a cluster of observations is merged into another cluster. The most commonly employed agglomerative clustering strategies are complete-linkage, averagelinkage, single-linkage, centroid-linkage. The drawback of the agglomerative clustering algorithms is that observations cannot be moved among the clusters once a cluster is made.

The divisive method starts with one single cluster containing all observations and then it divides the cluster into two sub-clusters at each step. Divisive methods have the same drawback of the agglomerative clustering, that is, once a cluster is made, an observation cannot be moved

The partitional algorithm assigns the observations to a set of clusters without using hierarchical approaches. One of the most used non-hierarchical approach is the k-means

The density-based clustering seeks to search for regions of high density without any

The artificial neural networks are mathematical models that were developed in analogy to a network of biological neurons [64]. Mathematically, a neuron can be modeled as a switch that receives, as input, a series of values and produces an output consisting of a weighted sum of the input eventually transformed by a function f. Many neurons can be combined to create more complex networks. Depending on the type of neurons and on how the neurons are connected to each others, different kinds of neural networks can be created. The most common type of neural network is the feed-forward neural network, in which neurons are grouped into layers, each neuron of a layer is connected to all the neurons of the next layer and the information flows from the input to the output without loops. For a comprehensive

**5. Applications of multivariate analysis to spectroscopic data of complex** 

In the following, we will provide a few selected examples of the application of FTIR microspectroscopy coupled with multivariate analysis for biomedical relevant studies, with

to another cluster. Divisive methods are suited when large clusters are searched for.

Manatthan distance; iii) the Mahalanobis distance; iv) the maximum norm.

the structure of the data and on which kind of results one would expect.

In some cases, PCA alone represents a powerful method for the analysis of multidimensional FTIR spectra. Indeed, several interesting works are reported in the literature, in which this approach is employed to support the spectroscopic investigation of complex biological systems and processes. For instance, synchrotron based FTIR microspectroscopy coupled with PCA has been applied to the characterization of human corneal stem cells [27, 66], in cancer research for the screening of cervical cancer [14], as well as to disclose the effects induced by a surface glycoprotein in colon carcinoma cells [67].

For instance, Matthew German and colleagues [68] coupled high-resolution synchrotron radiation-based FTIR (SR-FTIR) microspectroscopy with PCA to investigate the characteristics of putative adult stem cell (SC), transiently amplified (TA) cell, and terminally differentiated (TD) cell populations of the corneal epithelium. Using PCA, each spectrum, composed by many variables (the wavenumbers), is reduced to a point in a low dimensional space. Then, each observation can be visualized in a two or three dimensional score plot. Choosing the appropriate principal components, the authors were able to clearly distinguish the three cell populations confirming the ability of SR-FTIR microspectroscopy to identify SC, TA cell, and TD cell populations.

PCA alone is extremely powerful to reduce the number of variables; however, it is not a clustering algorithm and the group into clusters must be done with other techniques.

For example, Tanthanuch and colleagues applied FTIR microspectroscopy-supported by PCA and unsupervised hierarchical cluster analysis (UHCA) to identify specific spectral markers of the differentiation of murine embryonic stem cell (mESCs) and to distinguish them into different neural cell types [25]. In particular, focal plane array (FPA) - FTIR and SR-FTIR microspectroscopy measurements - performed on cell clumps and single cells respectively - allowed to obtain a biochemical fingerprint of different mESC developmental stages, namely embryoid bodies (EBs), neural progenitor cells (NPCs) and embryonic stemderived neural cells (ESNCs). Interestingly, it should be noted that the results obtained on cell clumps and on single cells were found to be comparable, corroborating the FPA-FTIR results on cell clumps. The analysis of second derivative spectra enabled to highlight important spectral changes occurring during ES cell differentiation, mainly in the lipid CH2 and CH3 stretching region and in the protein amide I band. Noteworthy, these results overall indicated that during neural differentiation the cell lipid content increased significantly, likely reflecting modifications in cell membranes, whose lipid content is known to have a key role in neural cell differentiation and signal transduction. Moreover, changes in the profile of amide I band, mainly involving the alpha-helix component around 1650-1652 cm-1, indicated an increased expression of alpha-helix reach protein in ESNCs compared with their progenitor cells, a result that could reflect the expression of cytoskeleton protein, crucial for the establishment of neural structure and function. These results were then strongly supported by PCA, that made it possible to disclose regions of the IR spectrum which most contributed to the spectral variance, namely amide I band and C-H

stretching region. Furthermore, the application of UHCA allowed to successfully discriminate and classify each stage of ESNCs differentiation, again considering the spectra in the spectral range mainly due to acyl chain vibrations and the extended region between 1750 and 900 cm-1.

Multivariate Analysis for Fourier Transform Infrared Spectra of Complex Biological Systems and Processes 207

We should note that a delicate point of PCA-LDA is the choice of the principal components to be used as LDA input and, as described in the previous section about PCA, several ways have been developed to perform this task. Alternatively, the PLS method can be used instead of PCA [6, 73, 74]. For instance, Sandt and colleagues, using synchrotron infrared microspectroscopy coupled with PLS-DA, were able to characterize the metabolic fingerprint of induced pluripotent stem cells (iPSCs). In particular, they found that iPSCs are characterized by a chemical composition that leads to a spectral signature indistinguishable from that of embryonic stem cells (ESCs), but entirely different from that of the original

**5.1. FTIR microspectroscopy supported by PCA-LDA for the characterization of** 

Recently, we applied FTIR microspectroscopy supported by PCA-LDA to the study of murine oocytes characterized by two different types of chromatin organization, namely surrounded nucleolus (SN) oocytes in which the chromatin is highly condensed and forms a ring around the nucleolus, and the not surrounded nucleolus (NSN) type where chromatin is dispersed and less condensed around the nucleolus [7, 75]. Interestingly, only SN oocytes are capable to complete the embryonic development after fertilization, while the NSN type, if fertilized, arrests at the two cell stage. To try to get new insights on the mechanisms that drive the different chromatin organization in the two kinds of oocytes, crucial for their embryonic development after fertilization, we studied the infrared absorption of single intact cells at different maturation stages, namely antral germinal vesicle (GV), metaphase I (MI, matured for 10 hours in vitro), and metaphase II (MII,

Indeed, as we will show in the following, the FTIR spectra of the oocytes taken at the different maturation stages are very complex, since they provide information on different processes that were taking place simultaneously within the cells. For this reason, beside a fundamental visual inspection of the data, enabling the identification and assignment of the different spectral bands, it was crucial the application of PCA-LDA that made it possible to draw out the most significant spectral information responsible for the different cell behavior. Moreover, PCA-LDA allowed to identify the stage at which the separation between the SN and NSN oocytes took place, leading to

As we discussed in paragraph 2, since the FTIR spectrum of cells is due to the overlapping contributes of the main biomolecules (see Figure 2), we analysed the second derivative spectra to identify the band peak positions and to assign them to the different biomolecule vibrational modes. The spectral analysis, strongly supported by PCA-LDA, allowed us to disclose the most important spectral differences between the two types of oocytes, at each maturation stage, that were found to occur mainly in the lipid and nucleic acid absorption

regions, as we will discuss below. For a full discussion of the results see [7].

somatic cells [6].

**SN and NSN murine oocytes** 

matured for 20 hours in vitro).

their well distinct cell destinies.

As discussed previously, PCA is frequently used for preliminary dimensionality reduction before further analyses, as LDA [21]. Indeed, a limit of using PCA alone is that it does not allow to obtain an unambiguous grouping of the data into clusters, requiring therefore the application of another analysis step able to reduce the intra-category variation while maximizing that inter-category [69]. The coupling, for instance, of PCA with LDA is a well established procedure which enables not only to classify the observations into groups but to quantify the importance of the single variables for this group separation. In this view, the advantage of LDA is that it makes it possible to reveal clusters, identifying objectively also the most contributory wavenumbers responsible for spectra discrimination [21, 58]. In particular, the application of PCA-LDA to spectroscopic investigation of complex biological systems proved to be a useful tool for the identification of spectral biomarkers of the process under investigation [7, 35, 69, 70, 71].

One outstanding work, worth to mention here, was done by Kelly and colleagues [70], where the authors showed how infrared spectroscopy and multivariate techniques can be used as a novel diagnostic approach for endometrial cancer screening. They first demonstrated how SR-FTIR microspectroscopy with subsequent PCA-LDA allows the clear segregation of different subtypes of endometrial carcinoma. However, the requirement of a particle accelerator impairs the use of endometrial spectroscopy as practical diagnostic application.

Recently, Taylor and colleagues applied ATR-FTIR spectroscopy supported by PCA-LDA analysis to interrogate endometrial tissues, employing in particular a conventional IR radiation source [72], showing that this approach, that can be applied directly to liquid or solid samples without further preparation, could provide a useful and simple objective test for endometrial cancer diagnosis.

Furthermore, in the work of Walsh and colleagues [69], ATR microspectroscopy has been successfully applied to the characterization of samples of exfoliative cervical cytology of different categories, with increasing severity of atypia. The spectral analysis was supported by PCA, with or without subsequent LDA, to verify if it was possible to discriminate among normal, low grade and high grade of exfoliative cytology. Indeed, important differences were found in the spectral range between 1500 and 1000 cm-1, mainly due to proteins, glycoproteins, phosphates and carbohydrates. Noteworthy, the authors stressed that only the employment of the combined PCA-LDA allowed to maximize the inter-category variance, whilst reducing that intra-category. In particular, they found that the glycogen content strongly influenced the intra-category variance, while that inter-category resulted to be mainly due to protein and DNA conformational changes. In this view, FTIR microspectroscopy coupled with PCA-LDA could allow for an objective classification approach to class cervical cytology.

We should note that a delicate point of PCA-LDA is the choice of the principal components to be used as LDA input and, as described in the previous section about PCA, several ways have been developed to perform this task. Alternatively, the PLS method can be used instead of PCA [6, 73, 74]. For instance, Sandt and colleagues, using synchrotron infrared microspectroscopy coupled with PLS-DA, were able to characterize the metabolic fingerprint of induced pluripotent stem cells (iPSCs). In particular, they found that iPSCs are characterized by a chemical composition that leads to a spectral signature indistinguishable from that of embryonic stem cells (ESCs), but entirely different from that of the original somatic cells [6].

206 Multivariate Analysis in Management, Engineering and the Sciences

1750 and 900 cm-1.

application.

under investigation [7, 35, 69, 70, 71].

for endometrial cancer diagnosis.

approach to class cervical cytology.

stretching region. Furthermore, the application of UHCA allowed to successfully discriminate and classify each stage of ESNCs differentiation, again considering the spectra in the spectral range mainly due to acyl chain vibrations and the extended region between

As discussed previously, PCA is frequently used for preliminary dimensionality reduction before further analyses, as LDA [21]. Indeed, a limit of using PCA alone is that it does not allow to obtain an unambiguous grouping of the data into clusters, requiring therefore the application of another analysis step able to reduce the intra-category variation while maximizing that inter-category [69]. The coupling, for instance, of PCA with LDA is a well established procedure which enables not only to classify the observations into groups but to quantify the importance of the single variables for this group separation. In this view, the advantage of LDA is that it makes it possible to reveal clusters, identifying objectively also the most contributory wavenumbers responsible for spectra discrimination [21, 58]. In particular, the application of PCA-LDA to spectroscopic investigation of complex biological systems proved to be a useful tool for the identification of spectral biomarkers of the process

One outstanding work, worth to mention here, was done by Kelly and colleagues [70], where the authors showed how infrared spectroscopy and multivariate techniques can be used as a novel diagnostic approach for endometrial cancer screening. They first demonstrated how SR-FTIR microspectroscopy with subsequent PCA-LDA allows the clear segregation of different subtypes of endometrial carcinoma. However, the requirement of a particle accelerator impairs the use of endometrial spectroscopy as practical diagnostic

Recently, Taylor and colleagues applied ATR-FTIR spectroscopy supported by PCA-LDA analysis to interrogate endometrial tissues, employing in particular a conventional IR radiation source [72], showing that this approach, that can be applied directly to liquid or solid samples without further preparation, could provide a useful and simple objective test

Furthermore, in the work of Walsh and colleagues [69], ATR microspectroscopy has been successfully applied to the characterization of samples of exfoliative cervical cytology of different categories, with increasing severity of atypia. The spectral analysis was supported by PCA, with or without subsequent LDA, to verify if it was possible to discriminate among normal, low grade and high grade of exfoliative cytology. Indeed, important differences were found in the spectral range between 1500 and 1000 cm-1, mainly due to proteins, glycoproteins, phosphates and carbohydrates. Noteworthy, the authors stressed that only the employment of the combined PCA-LDA allowed to maximize the inter-category variance, whilst reducing that intra-category. In particular, they found that the glycogen content strongly influenced the intra-category variance, while that inter-category resulted to be mainly due to protein and DNA conformational changes. In this view, FTIR microspectroscopy coupled with PCA-LDA could allow for an objective classification
