**2.3 Data analysis**

The data matrices, consisting of the 1H-NMR buckets (variables) arranged in columns and VOO samples in rows, were firstly analyzed by univariate procedures (ANOVA, Fisher index and Box-Whisker plots), and afterwards, by the following multivariate techniques, already described in bibliography (Berrueta et al., 2007): unsupervised ones as principal component analysis (PCA); and supervised as linear discriminant analysis (LDA) and partial least squares discriminant analysis (PLS-DA). Statistic and chemometric data analysis were performed by means of the statistical software packages Statistica 6.1 (StatSoft Inc., Tulsa, OK, USA, 1984-2004), The Unscrambler 9.1 (Camo Process AS, Oslo, Norway, 1986- 2004) and SIMCA-P 11.0 (Umetrics AB, Umea, Sweden, 1992-2005). Strategies used for variable selection in LDA and selection of the optimum number of PLS components in PLS-DA are described elsewhere (Rosa M. Alonso-Salces et al., 2010b).

For the geographical characterization of VOOs, the supervised techniques were applied to the autoscaled (or standardised) or Pareto-scaled data matrix of the VOO profiles following these steps: (*i*) the data set was divided into a training-test set and an external data set; (*ii*) the training-test set was subsequently divided into a training set and a test set several times in order to perform cross-validation; (*iii*) the training-test set was used for the optimization of parameters characteristic of each multivariate technique by cross-validation, for instance for variable selection in LDA or the number of PLS components in PLS-DA; (*iv*) a final mathematical model was built using all the samples of the training-test set and the optimized parameters; (v) this model was validated using an independent test set of samples (external data set), i.e. performing an external validation. During the parameter optimization step, the models were validated by 3-fold cross-validation (3-fold CV) or leave-one-out cross-validation (LOO). The reliability of the classification models achieved in the cross-validation was studied in terms of recognition ability (percentage of the samples in the training set correctly classified during the modeling step) and prediction ability (percentage of the samples in the test set correctly classified by using the models developed in the training step). The reliability of the final model was evaluated in terms of classification ability (percentage of the samples of the training-test set correctly classified by using the optimized model) and the prediction ability in the external validation (percentage of the samples of the external set correctly classified by using the optimized model) (Berrueta et al., 2007).
