**Chemometrics: Theory and Application**

Hilton Túlio Lima dos Santos, André Maurício de Oliveira, Patrícia Gontijo de Melo, Wagner Freitas and Ana Paula Rodrigues de Freitas

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/53866

## **1. Introduction**

This chapter aims to present a chemometrics as important area in chemistry to be able to help work with many among of data obtained in analysis. The term *chemometrics* was introduced in initial 70th years by Svant Wold (Swede) and Bruce Kowalski (USA). According International Chemometrics Society, founded in 1974, the accept definition to chemometrics is (i) the chemical discipline that uses mathematical and statistical methods to design or select optimal measurement procedures and experiments (ii) to provide maximum chemical information by analyzing chemical data [1]. When the study involving many variable became the study in a multivariate analysis, so it is necessary to building a typical matrix and is normal to do a pre-processing. Pre-processing is a procedure to adjust the different factors with different units in values than allow give for each factor the same change to contribute to the model. After, next step is usually the Pattern Recognition method, to find any similarity in your data. In This method is common using the unsupervised group where there are the HCA and PCA analysis and the supervised group where there is the KNN. The HCA analysis (Hierarchical Cluster Analysis) is used to examine the distance among the samples in two dimensional plot (dendogram) and cluster samples with similarity. (Figure 1). Now PCA analysis (Principal Component analysis) is used to try decrease the size data set, without lost information about samples (Figure 2) and KNN used to classify samples using cluster previously know [2].

© 2012 Santos et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Santos et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Chemometrics: Theory and Application 123

��� (3)

Distance = ∑ |X� − Y�| �

When performed the estimate for distance, so is possible plot the dendogram. A general dendogram is showing below (Figure 3). In this dendogram is possible to see the samples (letters) and the distances (numbers). Samples belonging to clusters A, has a distance of 0,2 from one another. Same time the sample B has a distance 0,5 from cluster A. The value of

**Figure 3.** The general dendogram where above are the distances and right side are the samples

The Principal Components Analysis (PCA) has the goal available the distances between the points using few axes in the row plot. In a matrix, each row is the point in the graphic below (Figure 2). So the aim is study the relationship between these samples to find the similarity and differences. In this general example are using two principal components (PC1 and PC2). The first PC (PC1) describes the major points in the graph and the maximum amount of variance, while the PC2 explain the remaining points. It is important to know that the sum of percentage described by PC´s must be close 100%. Another propriety of PC´s is about de

The PCA technical can be used to define which variables are more important in a process. For this analysis is necessary use the factors (column in the matrix) and objects (row in the matrix). When the aim is to determine which variable are more important for the process is

The Supervised methods are using when want to construct a model using the class membership for future samples. In this group, KNN is a technical widely used when the

used *loading* and when want studying the relationship between objects is used *scores* 

distance can change according with the distance used to calculate.

*2.1.2. Principal Components Analysis (PCA)* 

**2.2. Supervised methods** 

goal is this.

position. The PC´s are always perpendiculars one with another.

Where:

Xi and Yi are vectors.

**Figure 2.** Clustering by PCA

Thus, the chemometrics show to be wide may be used in several area of knowledge.
