**2. PCA and traceability**

PCA is widely used to characterize foodstuffs according to their geographical origin (Alonso-Salces et al., 2010; Diaz et al., 2005; Gonzalvez et al. 2009; Marini et al., 2006). Such a requirement is becoming prominent in the control field, especially in the marketing of products with PDO (Protected Denomination or Origin) or PGI (Protected Geographical Indication) markings. The PDO marking is awarded to products linked strictly to a typical area. Both the production of raw materials and their transformation into the final product must be carried out in the region that lends its name to the product. As a consequence, some analytical methods, whose results could be directly linked to the sample origin, would be extremely useful in the legal battle against the fraudulent use of PDO or PGI marking.

The local nature of a food product, strongly associated with its geographical location, can be correlated to the quality of the raw material used and its production techniques. Environmental conditions in a specific geographical area also provide the raw material with set characteristics, becoming a factor of primary importance in determining the final product "typicality". The production technique is of primary importance for both agricultural products and so-called transformed products, where culture, the instruments used, the ability and experience of the operator and the addition of particular ingredients create a unique product. Brescia et al. (2005) characterized buffalo milk mozzarella samples with reference to their geographical origin (two provinces, namely Foggia, in Apulia and Caserta, in Campania, were considered), by comparing several analytical and spectroscopic techniques. Some analyses were also performed on the raw milk (from which mozzarella had been obtained) with the purpose of evaluating how the differences among milk samples had transferred to the final product. In this study, a further PCA was applied only to those analytical variables measured on both milk and mozzarella samples: fat, ash, Li, Na, K, Mg, Ca, δ15N/14N e δ13C/12C, disregarding all the analyses carried out only on mozzarella samples for which any comparison with milk samples could not be performed and vice versa. The biplots relative to PCA carried out on milk and mozzarella samples are reported in figures 1 and 2 respectively. It is easy to see that the milk samples are completely separated, according to their origin, on the PC1 (figure 1), whilst mozzarella samples lose such a strong separation, even though they maintain a good trend in their differentiation.

As already stated by Brescia et al., milk samples from Campania have a higher 13C content, whilst samples from Apulia have a greater Li, Na and K content. If PCA results relative to mozzarella samples are compared to those from milk samples, it can be deduced that geographical differences, very clearly defined in the raw material, tend to drop slightly in the final product. There is a factor (K content) whose distribution is inverted between the raw material and the final product (positive loading on PC1 for milk samples and negative loading on PC1 for mozzarella samples). Another factor (Na content) was a discriminator for the raw milk (high positive loading on PC1) but its loading in mozzarella samples rises on the PC2 (the direction perpendicular to the geographical separation) and becomes negative on PC1. As Na content is known to be linked to the salting process of a cheese, the production technique is thought to reduce some differences originating from the raw materials. In other words, the differences that exist between buffalo mozzarella from Campania and Apulia are mainly determined by the differences between the two types of raw milk, rather than between manufacturing processes.

50 Principal Component Analysis

PCA is widely used to characterize foodstuffs according to their geographical origin (Alonso-Salces et al., 2010; Diaz et al., 2005; Gonzalvez et al. 2009; Marini et al., 2006). Such a requirement is becoming prominent in the control field, especially in the marketing of products with PDO (Protected Denomination or Origin) or PGI (Protected Geographical Indication) markings. The PDO marking is awarded to products linked strictly to a typical area. Both the production of raw materials and their transformation into the final product must be carried out in the region that lends its name to the product. As a consequence, some analytical methods, whose results could be directly linked to the sample origin, would be extremely useful in the legal battle against the fraudulent use of PDO or PGI marking.

The local nature of a food product, strongly associated with its geographical location, can be correlated to the quality of the raw material used and its production techniques. Environmental conditions in a specific geographical area also provide the raw material with set characteristics, becoming a factor of primary importance in determining the final product "typicality". The production technique is of primary importance for both agricultural products and so-called transformed products, where culture, the instruments used, the ability and experience of the operator and the addition of particular ingredients create a unique product. Brescia et al. (2005) characterized buffalo milk mozzarella samples with reference to their geographical origin (two provinces, namely Foggia, in Apulia and Caserta, in Campania, were considered), by comparing several analytical and spectroscopic techniques. Some analyses were also performed on the raw milk (from which mozzarella had been obtained) with the purpose of evaluating how the differences among milk samples had transferred to the final product. In this study, a further PCA was applied only to those analytical variables measured on both milk and mozzarella samples: fat, ash, Li, Na, K, Mg, Ca, δ15N/14N e δ13C/12C, disregarding all the analyses carried out only on mozzarella samples for which any comparison with milk samples could not be performed and vice versa. The biplots relative to PCA carried out on milk and mozzarella samples are reported in figures 1 and 2 respectively. It is easy to see that the milk samples are completely separated, according to their origin, on the PC1 (figure 1), whilst mozzarella samples lose such a strong separation, even though they maintain a good trend in their differentiation.

As already stated by Brescia et al., milk samples from Campania have a higher 13C content, whilst samples from Apulia have a greater Li, Na and K content. If PCA results relative to mozzarella samples are compared to those from milk samples, it can be deduced that geographical differences, very clearly defined in the raw material, tend to drop slightly in the final product. There is a factor (K content) whose distribution is inverted between the raw material and the final product (positive loading on PC1 for milk samples and negative loading on PC1 for mozzarella samples). Another factor (Na content) was a discriminator for the raw milk (high positive loading on PC1) but its loading in mozzarella samples rises on the PC2 (the direction perpendicular to the geographical separation) and becomes negative on PC1. As Na content is known to be linked to the salting process of a cheese, the production technique is thought to reduce some differences originating from the raw materials. In other words, the differences that exist between buffalo mozzarella from Campania and Apulia are mainly determined by the differences between the two types of

raw milk, rather than between manufacturing processes.

**2. PCA and traceability** 

Fig. 1. Score plot of PC2 versus PC1 for milk samples.

Tables 1 and 2 show variances and cumulative variances associated to the principal components with eigenvalues greater than 1 for milk and mozzarella samples respectively. 4 PCs were extracted for both data set, which explain 86% of the variance for milk samples and 83% of variance for mozzarella samples.


Table 1. PCs with eigenvalues greater than 1, extracted applying PCA to milk samples.


Table 2. PCs with eigenvalues greater than 1, extracted applying PCA to mozzarella samples.

Principal Component Analysis: A Powerful

Monfreda and Gregori (2011).

2011).

33 variables, as listed in table 3.

Interpretative Tool at the Service of Analytical Methodology 53

tightly clustered in the score plots while D samples were fairly well spread out in the same score plots. This evidence was explained by considering that crude oil coming from only one place might have consistent chemical properties, compared to crude oils coming from several countries. Therefore differences existing between the raw materials had been transferred to the final products, determining very clustered samples with consistent chemical properties (for A brand) and samples with a greater variability within the class (for D brand). The score plot of PC2 versus PC1, shown in figure 3, was obtained by

Fig. 3. Score plot of PC2 versus PC1 for gasoline samples (obtained by Monfreda & Gregori,

In the study presented here, 25 diesel samples belonging to the same 5 brands studied by Monfreda and Gregori were analysed using the same analytical procedure, SPME-GC-MS. As in the previous work, chromatograms were examined using the TCC approach (Keto & Wineman, 1991, 1994; Lennard at al., 1995). Peak areas were normalized to the area of the base peak (set to 10000), which was either tridecane, tetradecane or pentadecane, depending on the sample. Three independent portions for each sample of diesel were analyzed and peak areas were averaged. Analysis of variance was carried out before the multivariate statistical analysis, in order to eliminate same variables whose variance between classes was not significantly higher than the variance within class. Tetradecane, heptadecane, octadecane and hexadecane tetramethyl were then excluded from multivariate statistical analysis. PCA was finally applied to a data set of 25 samples and

Fig. 2. Score plot of PC2 versus PC1 for mozzarella samples.

From this example, it can be deduced that the application of PCA to results obtained from chemical analyses of the raw material from which a transformed product has been obtained allows a characterization of the raw material in relation to its geographical origin. Secondly, the transformed product characterization allows to see how geographical differences among the raw materials have been spread out in the final product. In particular, it can be seen whether production techniques amplified or, indeed, reduced the pre-existing differences among the varying classes of the raw material. In other words, the application of PCA to the chemical analyses of a food product – as well as the raw material from which it has been made - allows to understand what the main elements are that provide a product characterization in relation to its origin: i.e. the quality of the raw material, the production techniques, or in fact a combination of both.

The characterization of products in relation to their origin is, however, not only important for food products. In forensic investigations, for example, it is becoming increasingly essential to identify associations among accelerants according to their source. Petroleumbased fuels (such as gasoline, kerosene, and diesel), which are often used as accelerants as they increase the rate and spread of fire, are also in fact transformed products from raw material (petroleum). Differentiation of such products in relation to their source (brand or refinery) depends both on the origin of the petroleum and the specific production techniques used during the refining process. Monfreda and Gregori (2011) differentiated 50 gasoline samples belonging to 5 brands (indicated respectively with the letters A, B, C, D and E) according to their refinery. Samples were analyzed by solid-phase microextraction (SPME) and gas chromatography-mass spectrometry (GC-MS). Some information on the origin of the crude oil was available but only for two of the brands: A samples were obtained from crude oil coming from only one country, whilst D samples were produced from crude oil coming from several countries. In addition A samples were 52 Principal Component Analysis

From this example, it can be deduced that the application of PCA to results obtained from chemical analyses of the raw material from which a transformed product has been obtained allows a characterization of the raw material in relation to its geographical origin. Secondly, the transformed product characterization allows to see how geographical differences among the raw materials have been spread out in the final product. In particular, it can be seen whether production techniques amplified or, indeed, reduced the pre-existing differences among the varying classes of the raw material. In other words, the application of PCA to the chemical analyses of a food product – as well as the raw material from which it has been made - allows to understand what the main elements are that provide a product characterization in relation to its origin: i.e. the quality of the raw material, the production

The characterization of products in relation to their origin is, however, not only important for food products. In forensic investigations, for example, it is becoming increasingly essential to identify associations among accelerants according to their source. Petroleumbased fuels (such as gasoline, kerosene, and diesel), which are often used as accelerants as they increase the rate and spread of fire, are also in fact transformed products from raw material (petroleum). Differentiation of such products in relation to their source (brand or refinery) depends both on the origin of the petroleum and the specific production techniques used during the refining process. Monfreda and Gregori (2011) differentiated 50 gasoline samples belonging to 5 brands (indicated respectively with the letters A, B, C, D and E) according to their refinery. Samples were analyzed by solid-phase microextraction (SPME) and gas chromatography-mass spectrometry (GC-MS). Some information on the origin of the crude oil was available but only for two of the brands: A samples were obtained from crude oil coming from only one country, whilst D samples were produced from crude oil coming from several countries. In addition A samples were

Fig. 2. Score plot of PC2 versus PC1 for mozzarella samples.

techniques, or in fact a combination of both.

tightly clustered in the score plots while D samples were fairly well spread out in the same score plots. This evidence was explained by considering that crude oil coming from only one place might have consistent chemical properties, compared to crude oils coming from several countries. Therefore differences existing between the raw materials had been transferred to the final products, determining very clustered samples with consistent chemical properties (for A brand) and samples with a greater variability within the class (for D brand). The score plot of PC2 versus PC1, shown in figure 3, was obtained by Monfreda and Gregori (2011).

Fig. 3. Score plot of PC2 versus PC1 for gasoline samples (obtained by Monfreda & Gregori, 2011).

In the study presented here, 25 diesel samples belonging to the same 5 brands studied by Monfreda and Gregori were analysed using the same analytical procedure, SPME-GC-MS. As in the previous work, chromatograms were examined using the TCC approach (Keto & Wineman, 1991, 1994; Lennard at al., 1995). Peak areas were normalized to the area of the base peak (set to 10000), which was either tridecane, tetradecane or pentadecane, depending on the sample. Three independent portions for each sample of diesel were analyzed and peak areas were averaged. Analysis of variance was carried out before the multivariate statistical analysis, in order to eliminate same variables whose variance between classes was not significantly higher than the variance within class. Tetradecane, heptadecane, octadecane and hexadecane tetramethyl were then excluded from multivariate statistical analysis. PCA was finally applied to a data set of 25 samples and 33 variables, as listed in table 3.

Principal Component Analysis: A Powerful

Fig. 4. Score plot of PC2 versus PC1 for diesel samples.

**3. The PCA role in classification studies** 

five refineries).

**3.1 Case 1** 

Interpretative Tool at the Service of Analytical Methodology 55

PC Variance % Cumulative % 1 59.48 59.48 2 20.70 80.18 3 11.98 92.16

Table 4. PCs with eigenvalues greater than 1, extracted applying PCA to diesel samples.

Results of both studies, carried out respectively on gasoline and diesel samples coming from the same five refineries, allow to achieve a traceability of these products according to their brands, that is to say that production techniques give well-defined features to these products. Properties of crude oil, otherwise, show a strong influence on the homogeneity of samples distribution within their class, based on information availability (only for two of

The gasoline data matrix has been used in real cases of arson to link a sample of unevaporated gasoline, found at a fire scene in an unburned can, to its brand or refinery. This helped to answer, for example, questions posed by a military body about the origin of an unevaporated gasoline sample taken from a suspected arsonist. The gasoline sample


Table 3. Target compounds used as variables in multivariate statistical analysis of diesel samples.

Three PCs were extracted, with eigenvalues greater than 1, accounting for 92.16% of the total variance, as shown in table 4. From the score plot of PC2 versus PC1 (figure 4), it can be seen that a separation of samples according to the refinery was achieved, because each group stands in a definite area in the plane of PC1 and PC2. A samples are more clustered than D samples, according to the results obtained for gasoline samples.

54 Principal Component Analysis

8 Benzene, 1-methyl-2-(1-methylethyl)

14 Benzene, 1-methyl-4-(1-methylethyl) 15 Benzene, 4-ethyl-1,2-dimethyl

Variable COMPOUND

2 Octane, 2,6-dimethyl 3 Benzene, 1-ethyl, 2-methyl

5 Benzene, 1,2,3-trimethyl 6 Benzene, 1-methyilpropyl 7 Nonane, 2,6-dimethyl

9 Benzene, 1,2,3-trimethyl 10 Cyclohexane, butyl

11 Benzene, 1-methyl-3-propyl 12 benzene, 4-ethyl-1,2-dimethyl 13 benzene, 1-methyl-2-propyl

17 Benzene, 1-ethyl-2,3-dimethyl 18 Benzene, 1,2,3,5-tetramethyl 19 Benzene, 1,2,3,4-tetramethyl

20 Cyclohexane, pentyl

22 Undecane 3,6-dimethyl 23 Cycloexane, hexyl

25 Naphthalene, 2-methyl 26 Naphthalene, 1-methyl

29 Pentadecane tetramethyl

Table 3. Target compounds used as variables in multivariate statistical analysis of diesel

Three PCs were extracted, with eigenvalues greater than 1, accounting for 92.16% of the total variance, as shown in table 4. From the score plot of PC2 versus PC1 (figure 4), it can be seen that a separation of samples according to the refinery was achieved, because each group stands in a definite area in the plane of PC1 and PC2. A samples are more clustered than D

1 Nonane

4 Decane

16 Undecane

21 Dodecane

24 Tridecane

27 Pentadecane 28 Hexadecane

30 Nonadecane 31 Eicosane 32 Heneicosane 33 Docosane

samples, according to the results obtained for gasoline samples.

samples.


Table 4. PCs with eigenvalues greater than 1, extracted applying PCA to diesel samples.

Fig. 4. Score plot of PC2 versus PC1 for diesel samples.

Results of both studies, carried out respectively on gasoline and diesel samples coming from the same five refineries, allow to achieve a traceability of these products according to their brands, that is to say that production techniques give well-defined features to these products. Properties of crude oil, otherwise, show a strong influence on the homogeneity of samples distribution within their class, based on information availability (only for two of five refineries).
