**1. Introduction**

22 Will-be-set-by-IN-TECH

22 Principal Component Analysis

Sanguansat, P., Asdornwised, W., Jitapunkul, S. & Marukatat, S. (2007b). Two-dimensional

Shan, S., Gao, W. & Zhao, D. (2003). Face recognition based on face-specific subspace,

Sirovich, L. & Kirby, M. (1987). Low-dimensional procedure for characterization of human

Skurichina, M. & Duin, R. P. W. (2002). Bagging, boosting and the random subspace method

Turk, M. & Pentland, A. (1991). Eigenfaces for recognition, *J. of Cognitive Neuroscience*

Xu, A., Jin, X., Jiang, Y. & Guo, P. (2006). Complete two-dimensional PCA for face recognition,

Xu, D., Yan, S., Zhang, L., Liu, Z. & Zhang, H. (2004). Coupled subspaces analysis, *Technical*

Yang, J. & Yang, J. Y. (2002). From image vector to matrix: A straightforward image projection

Yang, J., Zhang, D., Frangi, A. F. & yu Yang, J. (2004). Two-dimensional PCA: A new approach

Ye, J. (2004). Generalized low rank approximations of matrices, *International Conference on*

Ye, J., Janardan, R. & Li, Q. (2005). Two-dimensional linear discriminant analysis, *in* L. K. Saul,

Zhang, D. & Zhou, Z. H. (2005). (2D)2PCA: 2-directional 2-dimensional PCA for efficient face

Zhang, D., Zhou, Z.-H. & Chen, S. (2006). Diagonal principal component analysis for face

Zhao, W., Chellappa, R. & Krishnaswamy, A. (1998). Discriminant analysis of principle

Zhao, W., Chellappa, R. & Nandhakumar, N. (1998). Empirical performance analysis of linear

Zuo, W., Wang, K. & Zhang, D. (2005). Bi-dierectional PCA with assembled matrix distance metric, *International Conference on Image Processing*, Vol. 2, pp. 958–961.

to appearance-based face representation and recognition, *IEEE Trans. Pattern Anal.*

Y. Weiss & L. Bottou (eds), *Advances in Neural Information Processing Systems 17*, MIT

components for face recognition, *IEEE 3rd Inter. Conf. on Automatic Face and Gesture*

discriminant classifiers, *Computer Vision and Pattern Recognition*, IEEE Computer

*International Journal of Imaging Systems and Technology* 13(1): 23–32.

*International Conference on Pattern Recognition*, Vol. 3, pp. 481–484.

technique IMPCA vs. PCA, *Pattern Recognition* 35(9): 1997–1999.

representation and recognition, *Neurocomputing* 69: 224–231.

*Telecommunications, Industry and Regulatory Development*, Vol. 1, pp. 66–69. Sanguansat, P., Asdornwised, W., Jitapunkul, S. & Marukatat, S. (n.d.). Two-dimensional

*Communications and Information Technologies*.

for linear classifiers, *Pattern Anal. Appl.* 5(2): 121–135.

faces, *J. Optical Soc. Am.* 4: 519–524.

*report*, Microsoft Research.

*and Mach. Intell.* 26: 131–137.

*Machine Learning*, pp. 887–894.

*Recognition*, Japan.

Society, pp. 164–171.

Press, Cambridge, MA, pp. 1569–1576.

recognition, *Pattern Recognition* 39(1): 133–135.

3(1): 71–86.

diagonal random subspace analysis for face recognition, *International Conference on*

random subspace analysis for face recognition, *7th International Symposium on*

Principal Component Analysis has been widely used in different scientific areas and for different purposes. The versatility and potentialities of this unsupervised method for data analysis, allowed the scientific community to explore its applications in different fields. Even when the principles of PCA are the same in what algorithms and fundamentals concerns, the strategies employed to elucidate information from a specific data set (experimental and/or theoretical), mainly depend on the expertise and needs of each researcher.

In this chapter, we will describe how PCA has been used in three different theoretical and experimental applications, to explain the relevant information of the data sets. These applications provide a broad overview about the versatility of PCA in data analysis and interpretation. Our main goal is to give an outline about the capabilities and strengths of PCA to elucidate specific information. The examples reported include the analysis of matured distilled beverages, the determination of heavy metals attached to bacterial surfaces and interpretation of quantum chemical calculations. They were chosen as representative examples of the application of three different approaches for data analysis: the influence of data pre-treatments in the scores and loadings values, the use of specific optical, chemical and/or physical properties to qualitatively discriminate samples, and the use of spatial orientations to group conformers correlating structures and relative energies. This reason fully justifies their selection as case studies. This chapter also pretends to be a reference for those researchers that, not being in the field, may use these methodologies to take the maximum advantage from their experimental results.

<sup>\*</sup> Claudio Frausto-Reyes2, Esteban Gerbino3, Pablo Mobili3, Elizabeth Tymczyszyn3,

Edgar L. Esparza-Ibarra1, Rumen Ivanov-Tsonchev1 and Andrea Gómez-Zavaglia3

*<sup>1</sup>Unidad Académica de Física, Universidad Autónoma de Zacatecas* 

*<sup>2</sup>Centro de Investigaciones en Óptica, A.C. Unidad Aguascalientes* 

*<sup>3</sup>Centro de Investigación y Desarrollo en Criotecnología de Alimentos (CIDCA)* 

*<sup>1,2</sup>México* 

*<sup>3</sup>Argentina*

Application of Principal Component Analysis

matured or artificially matured.

interpretation are also discussed.

**2.2 Collection of UV absorption spectra** 

**2.1 Spectra pre-treatment** 

to Elucidate Experimental and Theoretical Information 25

(Abbott & Andrews, 1970). On the other hand, the use of spectroscopic techniques such as infrared (NIR and FTIR), Raman, ultraviolet/visible together with multivariate methods, has already been used for the quantification of the different components of distilled beverages (*i.e.:* ethanol, methanol, sugar, among others). This approach allows the evaluation of quality and authenticity of these alcoholic products in a non-invasive, easy, fast, portable and reliable way (Dobrinas et al., 2009; Nagarajan et al., 2006). However, up to our knowledge, none of these reports has been focused on the evaluation of the quality and

Mezcal is a Mexican distilled alcoholic beverage produced from agave plants from certain regions in Mexico (NOM-070-SCFI-1994), holding origin denomination. As many other similar matured distilled beverages, mezcal can be adulterated in the flavour and appearance (colour), these adulterations aiming to imitate the sensorial and visual characteristics of the authentic matured beverage (Wiley, 1919). Considering that the maturation process in distillate beverages has a strong impact on their taste and price, adulteration of mezcal beverage pursuit obtaining the product in less time. However, the product is of lower quality. In our group, a methodology based in the use of UV-absorption and fluorescence spectroscopy has been proposed for the evaluation of the authenticity of matured distilled beverages, and focused in mezcal. We took advantage of the absorbance/emission properties of woods extracts and molecules added to the distilled during maturation in the wood casks. In this context, principal component analysis method appears as a suitable option to analyse spectral data aiming to elucidate chemical information, thus allowing discrimination of authentic matured beverages from those non-

In this section, we present the PCA results obtained from the investigation of two sets of spectroscopic data (UV absorption and fluorescence spectra), collected from authentic mezcal samples at different stages of the maturation: *white or young* (non-maturated), *rested* (matured 2 months in wood casks), and *aged* (1 year in wood casks). Samples belonging to false matured mezcals (artificially matured) are labelled as: *abocado* (white or young mezcal artificially coloured and flavoured) and *distilled* (coloured white mezcal). These samples were included with the aim of discriminating authentic matured mezcals from those artificially matured. The discussion is focused on the influence of the pre-treatments of spectra on the scores and loadings values. The criteria used for the scores and loadings

Prior to PCA, spectra were smoothed. Additionally, both spectra data sets were mean centred (MC) prior the analysis as a default procedure. In order to evaluate the effect of the standardization pre-treatment (1/Std) over the scores and loadings values, PCA was also conducted over the standardized spectra. Multivariate spectra analysis and data pre-treatment

Spectra were collected in the 285-450 nm spectral range, using an UV/Vis spectrometer model USB4000 from the Ocean Optics company, coupled to the Deuterium tungsten

were carried out using The Unscrambler ® software version 9.8 from CAMO company.

authenticity of distilled beverages in terms of their maturation process.
