**2.2. Multivariate data analysis**

A large number of multivariate data analysis techniques are available. Depending on the questions to be answered the adequate method is applied. The methods presented here either belong to the group of "pattern recognition" or to the group of "multivariate calibration methods". The group of "pattern recognition" comprises exploratory data evaluation such as principal component analysis (PCA) and classification methods (SIMCA). The PCA visualises the inherent data structure in the scores matrix and thus reveals hidden phenomena. The influence of variables on the data structure is illustrated by the loadings plot. Classification aims at separation of groups of data. It is a prerequisite for classification that class characteristics have to be known prior to analysis. Therefore classification is called a "supervised method" compared to a non-supervised method by which groups of data are distinguished after the data analysis without previous knowledge. From supervised methods a model can be derived in order to discriminate between the groups [21]. Thus classification is a predictive method based on category variables, e.g. material types, age, degree of degradation. A library of spectra or thermograms provides the opportunity of data evaluation according to different aspects that are expressed by category variables and allows a multiple evaluation of spectra and thermograms. Soft independent modelling of class analogy (SIMCA) is a classification procedure based on PCA class modelling. ''Soft modelling'' that is often used in chemical pattern recognition means that two classes can overlap. Thus it is possible that samples have characteristics of both defined classes, or of neither of the defined classes. Samples are assigned to a defined class if they show similar characteristics. Similarity in this context means a particular class pattern. This approach allows the samples to have their individual properties besides common features of the class that are the decisive factor for the membership. In order to find out to which degree the class models really differ, the model (class) distance is determined by fitting members from two defined classes to their own model as well as to the other model. It is calculated on the basis of pooled residual standard deviations. The distance from a model to itself is 1. According to Esbensen [21] distances of more than 3 indicate a significant segregation between the defined classes. The results obtained can be visualised by the Coomans plot. The crossing horizontal and vertical lines that divide the area into four quadrants indicate the significance level. In two quadrants the defined classes are located. If samples feature properties of both classes they are assigned to the overlapping quadrant ("both"). New samples outside the limits do not belong to the model. They are located in the quadrant "neither - nor". The 5% significance level means that 95% of the samples in the corresponding quadrants truly belong to the defined classes. New samples in these quadrants are therefore identified as members of the classes.

Partial Least Squares-Discriminant Analysis (PLS-DA) is based on PLS regression to model the differences between classes. For the separation of two classes the PLS algorithm is used with the dummy variable (e.g. -1/+1) to distinguish the two defined groups.

Methods of the "calibration" group allow models to be developed for parameter prediction if the parameter is adequately reflected by the collected data, in this study by the spectral and thermal patterns. Contrary to classification models by which class assignment according to defined properties is performed, the prediction model provides distinct values of the parameter in question. Prediction models focus on the determination of dependent Yvariables for new samples that were characterised by independent X-variables. Based on an established validated X-Y model the Y-variable can be derived from X-variables. Due to this relationship only X-measurements are necessary. This procedure can be advantageous if expensive and time-consuming methods for the determination of Y-variables are replaced by superior methods.
