**7. Advances in data analysis**

Unique spectral fingerprints of biological entities in the mid IR region are the manifestation of several components absorbing at different wavenumbers, with overlap. Thus, both the

Chemometrics of Cells and Tissues Using

**N**

**0**

**50**

**100**

**Integrated Absorbance (AU)**

**1000-1400 cm-1**

**150**

**200**

**250**

**B1**

**B2**

**C2**

**Biopsies**

**Hp**

**Mi**

Hp=hyperplastic, Mi=mild, Mo=moderate, Se=severe.

**Mi-mo**

**Mo**

**A**

**Mo-Se**

IR Spectroscopy – Relevance in Biomedical Research 301

**)**

**Integrated Absorbance (1000-1480 cm-1**

Fig. 4. (A) Variation in levels of integrated absorbance between 1000-1480 cm-1 of colonic biopsies with different levels of malignancy or premalignancy. (B) Variation in the

parameter over time. The biopsies of the same patients are joined by solid lines. The dotted arrow points to the progression of the disease. N=normal, C1, B2, C2=Cancer grades,

similar quantitative traits, the other biopsies with varying degrees of malignancy or premalignancy display a decrease in the levels. It is also noted that the samples with worst prognosis show similar values (For example C2 grade cancer and Severe level of hyperplasia have similar value). This indicates that systems with similar spectral features pertain to a class of conditions with similar outcomes. The feature can also be used to monitor the progression

However the two groups of tissues need separate independent analysis i,e the samples of cancerous biopsies and hyperplastic biopsies when dealing from a clinical perspective. To further be able to differentiate between these two subgroups additional biomarkers become necessary. The availability of several biomarkers, suitable for disease diagnosis can complicate the selection though they can increase the sensitivity of the technique. Therefore other than using simple ratios, cluster analysis using several biomarkers or spectral data in entire regions can be undertaken. These help to minimize the false negatives or positives. Inclusion of mathematical and computational methods to analyze suitable wavenumbers for diagnosis focused on the differences between normal and abnormal tissues can utilize artificial neural networks and set up data bases that are used as a reference. Usually a part of the study sample is used to train the system before the blind samples are analyzed. Setting up of reference data base with spectra from clearly identified histopathological

The potential of FTIR increased several fold by the combination of computational methods can thus also be used to overcome spatial and temporal variation in samples. The setting up of such database becomes more crucial when microorganisms (that have a tendency to mutate and change rapidly) are being studied as the older database can be used to monitor such evolution by studying spectral variations. Cluster analysis where different groups are separated by the distance proportional to their heterogeneity is another way to study the evolutionary relation between species and subspecies of microorganism. However clustering may not be sensitive enough to discriminate among closely related samples. Thus,

of the disease over time, indicating its possible potential as a biomarker (Figure 4b).

systems is a primary step in setting up of a good diagnostic software.

**HP**

> **Mild**

**Mi-Mo** **Mo** **Mo-Se**

**B**

quantity and type of individual components can alter the fingerprint. As shown in Figure 3, during carcinogenesis, in both colon and cervical tissues, there is a depletion of carbohydrates likely due to increased metabolism. However the disappearance of carbohydrates evident from vanishing of the triads between 900-1185 cm-1 in cervical tissues is slightly different than in colonic tissues. This is likely because glycogen is known to accumulate in cervical tissues, increasing in concentration from the basal to superficial layer, the absorbance associated with colonic tissues is more likely to be from glycoproteins than pure glycogen. Thus, though similar functional groups may contribute to the absorbance, they can manifest as spectral variations. This also necessitates that each tissue or cell type is investigated independently though common biomarkers are used. Such type of differences and the gradual variation at specific wave numbers due to transition of tissues or cells from one type to another requires that contribution of individual metabolites like carbohydrates, nucleic acids, proteins is clearly evaluated.

Fig. 3. Baseline corrected spectra from (top) normal and cancerous colonic tissues and (bottom) normal and cancerous cervical tissues, indicating changes in the region 900-1200 cm-1. The circles in the histological sections depict the measurement sites. Note that spectra in the upper panel are further baseline corrected by using a rubber band baseline in the region 900-1200 cm-1 while in the lower panel the data presented are after normalization to the amide II peak ( not shown). Note also the similarity in the trends in peak intensity on carcinogenesis irrespective of tissue origin.

Figure 4a shows the variation in the integrated intensity in colonic biopsies with different diagnostic features. It is noted that while both normal and hyperplastic samples show

quantity and type of individual components can alter the fingerprint. As shown in Figure 3, during carcinogenesis, in both colon and cervical tissues, there is a depletion of carbohydrates likely due to increased metabolism. However the disappearance of carbohydrates evident from vanishing of the triads between 900-1185 cm-1 in cervical tissues is slightly different than in colonic tissues. This is likely because glycogen is known to accumulate in cervical tissues, increasing in concentration from the basal to superficial layer, the absorbance associated with colonic tissues is more likely to be from glycoproteins than pure glycogen. Thus, though similar functional groups may contribute to the absorbance, they can manifest as spectral variations. This also necessitates that each tissue or cell type is investigated independently though common biomarkers are used. Such type of differences and the gradual variation at specific wave numbers due to transition of tissues or cells from one type to another requires that contribution of individual metabolites like carbohydrates,

nucleic acids, proteins is clearly evaluated.

1128

1078

**0.0**

**0.0**

**0.1**

**0.2**

**Absorbance (A.U)**

**0.3**

**0.4**

**1154**

carcinogenesis irrespective of tissue origin.

**0.5**

**0.1**

**0.2**

**Absorbance (AU)**

1162

**0.3**

**0.4**

**0.5 (a)**

**1150 1100 1050 1000 950 900**

**1040**

**1023**

**Wavenumber (cm-1)**

**1081**

**1200 1150 1100 1050 1000 950 900**

 **Wavenumber (cm-1)**

**b**

**a**

Fig. 3. Baseline corrected spectra from (top) normal and cancerous colonic tissues and (bottom) normal and cancerous cervical tissues, indicating changes in the region 900-1200 cm-1. The circles in the histological sections depict the measurement sites. Note that spectra in the upper panel are further baseline corrected by using a rubber band baseline in the region 900-1200 cm-1 while in the lower panel the data presented are after normalization to the amide II peak ( not shown). Note also the similarity in the trends in peak intensity on

Figure 4a shows the variation in the integrated intensity in colonic biopsies with different diagnostic features. It is noted that while both normal and hyperplastic samples show

**966**

**933**

970

967

936

917

**a**

**b**

1046

Fig. 4. (A) Variation in levels of integrated absorbance between 1000-1480 cm-1 of colonic biopsies with different levels of malignancy or premalignancy. (B) Variation in the parameter over time. The biopsies of the same patients are joined by solid lines. The dotted arrow points to the progression of the disease. N=normal, C1, B2, C2=Cancer grades, Hp=hyperplastic, Mi=mild, Mo=moderate, Se=severe.

similar quantitative traits, the other biopsies with varying degrees of malignancy or premalignancy display a decrease in the levels. It is also noted that the samples with worst prognosis show similar values (For example C2 grade cancer and Severe level of hyperplasia have similar value). This indicates that systems with similar spectral features pertain to a class of conditions with similar outcomes. The feature can also be used to monitor the progression of the disease over time, indicating its possible potential as a biomarker (Figure 4b).

However the two groups of tissues need separate independent analysis i,e the samples of cancerous biopsies and hyperplastic biopsies when dealing from a clinical perspective. To further be able to differentiate between these two subgroups additional biomarkers become necessary. The availability of several biomarkers, suitable for disease diagnosis can complicate the selection though they can increase the sensitivity of the technique. Therefore other than using simple ratios, cluster analysis using several biomarkers or spectral data in entire regions can be undertaken. These help to minimize the false negatives or positives. Inclusion of mathematical and computational methods to analyze suitable wavenumbers for diagnosis focused on the differences between normal and abnormal tissues can utilize artificial neural networks and set up data bases that are used as a reference. Usually a part of the study sample is used to train the system before the blind samples are analyzed. Setting up of reference data base with spectra from clearly identified histopathological systems is a primary step in setting up of a good diagnostic software.

The potential of FTIR increased several fold by the combination of computational methods can thus also be used to overcome spatial and temporal variation in samples. The setting up of such database becomes more crucial when microorganisms (that have a tendency to mutate and change rapidly) are being studied as the older database can be used to monitor such evolution by studying spectral variations. Cluster analysis where different groups are separated by the distance proportional to their heterogeneity is another way to study the evolutionary relation between species and subspecies of microorganism. However clustering may not be sensitive enough to discriminate among closely related samples. Thus,

Chemometrics of Cells and Tissues Using

Biopsies

(Hammody et al 2007, Bogomonly et al 2009).

**8. Recent trends and future perspectives** 

encountered.

IR Spectroscopy – Relevance in Biomedical Research 303

Euclidean distance

Fig. 5B. Schematic representation of a cluster analysis of cervical biopsies evaluated using hierarchal clustering to show the distance of separation between similar and dissimilar diagnosis. Not that the distance between progressively malignant groups decreases.

classification score that is a linear combination of several potential biomarkers taking advantage of the variation at several wave numbers. Usually the normal or abnormal class is given a final value of 100 and its reverse condition the value of zero by using a combination of weighted coefficients. The different intermediary stages lie within these two extreme values(Hammody et al 2007, Bogomonly et al 2009). One or more of the statistical and computational methods can be resorted to classify samples when an ambiguity is

Classification of not only the normalized spectral intensities or spectral regions but also the spectral derivatives or difference spectra may be undertaken while performing cluster analysis. Confusion matrices are obtained as a result of ANN and PNN analysis which help to classify tissues or biopsies based on their probability of falling into a particular class. The LDA and PCA techniques classify biopsies into several groups presenting them in normal graphical formats and assigning the grade. The DCF analysis improves discrimination between different stages by using a weighted value for each biomarker used and enables proper assignment during studies involving progression of changes in cells and tissues

Though basic research has been extensively carried out including pilot scale experimentation of clinical trials in using FTIR measurements for diseases diagnosis, the potential has not been practically exploited due to inability of FTIR based diagnosis to be an independent technique for classification of diseased tissues. The role of a pathologist has been indispensable and pivotal in the preliminary process of sample selection. Advances made through development of FPA based techniques partially overcome this requirement where automated identification of diseased regions in tissues using programs like cluster analysis or ANN/PNN provide pseudocolored images depicting tissue morphology. The rapidity of the technique is however compromised in these cases. Moreover, while a pathologist could quickly look at areas of interest, the automation mandatorily examines the entire section, making it a time consuming affair. In case of complex material like melanoma

Normal

Normal

CIN1 CIN1

Cancer Cancer

CIN3 CIN3 CIN2 CIN2

more methods of analyses are resorted to like ANNs (Goodacre et al 1996). Artificial neural networks make it possible to examine samples over time by setting up reference data bases. These systems often work on the principal of classifying the tissue in a binary progression mode as depicted in Figure 5a. The final results are displayed as a confusion matrix where the probability of classifying a particular biopsy into one of the diagnostic group is expressed as a percentage.

Fig. 5. A model displaying a possible methodology to classify cervical biopsies using an ANN.

Data analysis involving FTIR spectra focuses on utilization of intensities or integrated absorbance from wave number/ wave number regions or their various combinations that result in separation of the different classes of samples under study. Often diagnosis between normal and abnormal tissues is carried out by monitoring absorbance at selected wave numbers after routine mathematical manipulation of the spectra. (Sahu et al 2004, Mordechai et al 2004). Later entire regions of spectra or their derivatives were used to classify tissues or samples (supervised or unsupervised) into clusters to determine their hierarchy or their relation with one another. These types of data conventionally presented as clusters were later used in advanced methods like FPA, to organize areas with similar into pseudocolor maps to establish patterns of tissues from FTIR spectra.

Cluster analysis based on the Ward's algorithm separates samples by the distance proportional to their heterogeneity and difference spectra of colonic crypts has successfully been used their classification (Sahu et al 2004). Similarly the cluster analysis has been used for classification of microorganisms (Sahu et al 2006). Figure 5b displays a schematic diagram of how different biopsies are classified and the closely related conditions tend to group together. Cluster analysis of FEWS spectra of human skin samples using a chemical factor analysis has been shown to differentiate melanoma from basaloma (Sukuta and Bruch 1999).

The data analysis methods like DCF are more useful when classifying systems using morphological features like crypts compared to the others (Sahu et al 2010) as each of the potential biomarkers is used with its weighed contribution which helps to discriminate between normal and abnormal biopsies by representing an adequate quantitative follow up of transformations versus time. DCF can therefore used to classify biopsies using a

more methods of analyses are resorted to like ANNs (Goodacre et al 1996). Artificial neural networks make it possible to examine samples over time by setting up reference data bases. These systems often work on the principal of classifying the tissue in a binary progression mode as depicted in Figure 5a. The final results are displayed as a confusion matrix where the probability of classifying a particular biopsy into one of the diagnostic group is

**FTIR Sample**

**Normal, Malignant Classifier**

> **Cancer, CIN Classifier**

**Cancer Classifier CIN Classifier**

Data analysis involving FTIR spectra focuses on utilization of intensities or integrated absorbance from wave number/ wave number regions or their various combinations that result in separation of the different classes of samples under study. Often diagnosis between normal and abnormal tissues is carried out by monitoring absorbance at selected wave numbers after routine mathematical manipulation of the spectra. (Sahu et al 2004, Mordechai et al 2004). Later entire regions of spectra or their derivatives were used to classify tissues or samples (supervised or unsupervised) into clusters to determine their hierarchy or their relation with one another. These types of data conventionally presented as clusters were later used in advanced methods like FPA, to organize areas with similar into

Cluster analysis based on the Ward's algorithm separates samples by the distance proportional to their heterogeneity and difference spectra of colonic crypts has successfully been used their classification (Sahu et al 2004). Similarly the cluster analysis has been used for classification of microorganisms (Sahu et al 2006). Figure 5b displays a schematic diagram of how different biopsies are classified and the closely related conditions tend to group together. Cluster analysis of FEWS spectra of human skin samples using a chemical factor analysis has been shown to differentiate melanoma from basaloma (Sukuta and Bruch

The data analysis methods like DCF are more useful when classifying systems using morphological features like crypts compared to the others (Sahu et al 2010) as each of the potential biomarkers is used with its weighed contribution which helps to discriminate between normal and abnormal biopsies by representing an adequate quantitative follow up of transformations versus time. DCF can therefore used to classify biopsies using a

Fig. 5. A model displaying a possible methodology to classify cervical biopsies using an

pseudocolor maps to establish patterns of tissues from FTIR spectra.

expressed as a percentage.

ANN.

1999).

#### Euclidean distance

Fig. 5B. Schematic representation of a cluster analysis of cervical biopsies evaluated using hierarchal clustering to show the distance of separation between similar and dissimilar diagnosis. Not that the distance between progressively malignant groups decreases.

classification score that is a linear combination of several potential biomarkers taking advantage of the variation at several wave numbers. Usually the normal or abnormal class is given a final value of 100 and its reverse condition the value of zero by using a combination of weighted coefficients. The different intermediary stages lie within these two extreme values(Hammody et al 2007, Bogomonly et al 2009). One or more of the statistical and computational methods can be resorted to classify samples when an ambiguity is encountered.

Classification of not only the normalized spectral intensities or spectral regions but also the spectral derivatives or difference spectra may be undertaken while performing cluster analysis. Confusion matrices are obtained as a result of ANN and PNN analysis which help to classify tissues or biopsies based on their probability of falling into a particular class. The LDA and PCA techniques classify biopsies into several groups presenting them in normal graphical formats and assigning the grade. The DCF analysis improves discrimination between different stages by using a weighted value for each biomarker used and enables proper assignment during studies involving progression of changes in cells and tissues (Hammody et al 2007, Bogomonly et al 2009).
