**PCA: The Basic Building Block of Chemometrics**

Christophe B.Y. Cordella

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51429

## **1. Introduction**

Modern analytical instruments generate large amounts of data. An infrared (IR) spectrum may include several thousands of data points (wave number). In a GC-MS (Gas Chromatography-Mass spectrometry) analysis, it is common to obtain in a single analysis 600,000 digital values whose size amounts to about 2.5 megabytes or more. There are different methods for dealing with this huge quantity of information. The simplest one is to ignore the bulk of the available data. For example, in the case of a spectroscopic analysis, the spectrum can be reduced to maxima of intensity of some characteristic bands. In the case analysed by GC-MS, the recording is, accordingly, for a special unit of mass and not the full range of units of mass. Until recently, it was indeed impossible to fully explore a large set of data, and many potentially useful pieces of information remained unrevealed. Nowadays, the systematic use of computers makes it possible to completely process huge data collections, with a minimum loss of information. By the intensive use of chemometric tools, it becomes possible to gain a deeper insight and a more complete interpretation of this data. The main objectives of multivariate methods in analytical chemistry include data reduction, grouping and the classification of observations and the modelling of relationships that may exist between variables. The predictive aspect is also an important component of some methods of multivariate analysis. It is actually important to predict whether a new observation belongs to any pre-defined qualitative groups or else to estimate some quantitative feature such as chemical concentration. This chapter presents an essential multivariate method, namely *principal component analysis* (PCA). In order to better understand the fundamentals, we first return to the historical origins of this technique. Then, we will show - via pedagogical examples - the importance of PCA in comparison to traditional univariate data processing methods. With PCA, we move from the one-dimensional vision of a problem to its multidimensional version. Multiway extensions of PCA, PARAFAC and Tucker3 models are exposed in a second part of this chapter with brief historical and bibliographical elements. A PARAFAC example on real data is presented in order to illustrate the interest in this powerful technique for handling high dimensional data.

© 2012 Cordella, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Cordella, licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
