**4.3.1 Scores and loadings analysis**

Fig. 10 depicts the distribution of the arbutin conformers, in the PC-space. Three well separated groups were observed. Group A (squares), appears well separated from groups B (circles) and C (triangles) along the PC1-axis, which explain 77% of total variance. Group B is separated from Group C along the PC2-axis. In order to elucidate the main structural parameters for this separation, the one-dimensional loadings values were analyzed. The loading values of PC1 and PC2, indicate the atoms' coordinates contributing the most to structurally distinguish the conformers of arbutin in the chosen reference system [Fig. 10 (b) and (c), respectively].

According to these values, the orientation of the atoms 25 to 35, related with the spatial orientation of the phenol ring relatively to the reference glucopyranoside fragment, is the main contributing factor allowing for the differentiation among conformers. Consequently, the relative spatial orientation of the phenol ring is determined by the dihedrals interconnecting the glucopyranoside and phenol rings, C2C1O23C24 and C1O23C24C25, which are then shown to be of first importance in structural terms. Fig. 10 (d) shows the means and standard errors of the means (standard deviation of the sample divided by the square root of the sample size) of the 8 conformationally relevant dihedral angles of arbutin (the angles were first converted to the 0-360º range). From this graph, it can be clearly observed that C2C1O23C24 and C1O23C24C25 dihedral angles, describe the distribution of the three groups along the PC1 and PC2 axis, respectively. In other words, PC1 is related with the C2C1O23C24 dihedral angle, and allows the discrimination of conformers belonging to group A. In these conformers, the phenol ring is placed above and nearly perpendicular to the 38 Principal Component Analysis

1984). The optimized structures of all conformers were confirmed to correspond to true minimum energy conformations on the PES by inspection of the corresponding Hessian matrix. Vibrational frequencies were calculated at the same level of theory. PCA were

The group of Cartesian coordinates corresponding to the 35 atoms of arbutin (see Fig. 9), for each of the 130 conformers found after the conformational analysis was used as data set in this study. In other words, our data set consisted of a matrix of 130 x 105 elements, corresponding to the arbutin conformers and the x, y, z coordinates of each atom of the

In order to provide a general and fast procedure to perform the PCA on the conformational data sets, the next strategy was followed: *1)* In the Cartesian referential, all conformers were oriented, in such a way that the structurally rigid fragment of arbutin (the glucopyranoside ring) was placed as close as possible to the axes origin; *2)* All Cartesian coordinates of the 130 conformers of arbutin were then used to perform the PCA. The table of data (data matrix) was built as follows: each row corresponds to a conformer and the columns to the Cartesian coordinates: the first 35 columns, to the x- coordinates of the 35 atoms of arbutin, the second 35 columns, to the y coordinates, and the last 35 columns, to the z coordinates.

Fig. 10 depicts the distribution of the arbutin conformers, in the PC-space. Three well separated groups were observed. Group A (squares), appears well separated from groups B (circles) and C (triangles) along the PC1-axis, which explain 77% of total variance. Group B is separated from Group C along the PC2-axis. In order to elucidate the main structural parameters for this separation, the one-dimensional loadings values were analyzed. The loading values of PC1 and PC2, indicate the atoms' coordinates contributing the most to structurally distinguish the conformers of arbutin in the chosen reference system [Fig. 10 (b)

According to these values, the orientation of the atoms 25 to 35, related with the spatial orientation of the phenol ring relatively to the reference glucopyranoside fragment, is the main contributing factor allowing for the differentiation among conformers. Consequently, the relative spatial orientation of the phenol ring is determined by the dihedrals interconnecting the glucopyranoside and phenol rings, C2C1O23C24 and C1O23C24C25, which are then shown to be of first importance in structural terms. Fig. 10 (d) shows the means and standard errors of the means (standard deviation of the sample divided by the square root of the sample size) of the 8 conformationally relevant dihedral angles of arbutin (the angles were first converted to the 0-360º range). From this graph, it can be clearly observed that C2C1O23C24 and C1O23C24C25 dihedral angles, describe the distribution of the three groups along the PC1 and PC2 axis, respectively. In other words, PC1 is related with the C2C1O23C24 dihedral angle, and allows the discrimination of conformers belonging to group A. In these conformers, the phenol ring is placed above and nearly perpendicular to the

performed using The Unscrambler® software (v9.8).

molecule, respectively. Data were mean centred prior PCA.

**4.2 Theoretical data set and pre-treatment** 

**4.3.1 Scores and loadings analysis** 

**4.3 Data analysis** 

and (c), respectively].

glucopyranoside ring. On the contrary, in all conformers belonging to groups B and C, the phenol ring is pointing out of the glucopyranoside moiety, and oriented to the side of the oxygen atom from the glucopyranoside ring. PC2 allows a specific discrimination among the three groups of conformers. This specificity factor is given by the values of C1O23C24C25.

Fig. 10. (a) PCA-scores and, (b, c) the corresponding loadings grouping arbutin conformers in terms of structural similarity. (d) total average values and standard deviations of the 8 conformationally relevant dihedral angles of arbutin in the 3 groups of conformers. (copyrighted from Araujo-Andrade et al., 2010)

The relationship between the energetic and conformational parameters related with each of the three groups identified in the scores plot, was also investigated. Fig. 11 depicts the relative energy values (taken as reference the energy of the conformational ground state) for each conformer according to the group they belong. From an energetic point of view, groups B and C are equivalent. However, no conformer with relative energy below 15 kJ mol-1 belonging to Group A. This trend can be correlated with the orientations adopted by the phenol ring relatively to the glucopyranoside ring, as was described before.

Application of Principal Component Analysis

respectively for A2.b, A2.a and (A1, A3).

**-3 -2 -1 0 1 2 3**

**PC2 (28%)**

**Sub-group A2.a**

**(a) (b)** 

**Sub-group A2.b**

**Sub-group A3**

**-0.2 0.0 0.2 0.4 0.6 -0.2 0.0 0.2 0.4 0.6 -0.2 0.0 0.2 0.4 0.6**

**Z**

**-0.4 -0.2 0.0 0.2 0.4 0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 -0.4 -0.2 0.0 0.2 0.4 0.6**

**Z**

Fig. 12. (a) PCA-scores and, (b,c) the corresponding loadings and belonging to Group A in terms of structural similarity in the conformations of the substituents of the glucopyranoside ring. (d) total average values and standard errors of the means of the 5 conformationally

relevant dihedral angles of the glucopyranoside ring arbutin in the 4 subgroups.

**(d)**

(copyrighted from Araujo-Andrade et al., 2010)

**PC2-loadings**

**Y**

**X**

**C1**

**C1**

**C1**

**(c)**

**C3**

**C3**

**C3**

**C5**

**C5**

**C5**

**C7**

**C7**

**C7**

**O9**

**O9**

**O9**

**O11**

**O11**

**O11**

**H13**

**H13**

**H13**

**H15**

**H15**

**H15**

**H17**

**H17**

**H17**

**H19**

**H19**

**H19**

**H21**

**H21**

**H21**

**PC1-loadings**

**Y**

**X**

**C1**

**C1**

**C1**

**C3**

**C3**

**C3**

**C5**

**C5**

**C5**

**C7**

**C7**

**C7**

**O9**

**O9**

**O9**

**O11**

**O11**

**O11**

**H13**

**H13**

**H13**

**H15**

**H15**

**H15**

**H17**

**H17**

**H17**

**H19**

**H19**

**H19**

**H21**

**H21**

**H21**

**-4**

**-2**

**0**

**PC1 (35%)**

**2**

**4**

**Sub-group A1**

**6**

to Elucidate Experimental and Theoretical Information 41

CH2OH substituent (O6-C5-C7-O11 and C5-C7-O11-H15) are those allowing the discrimination among the four subgroups. In the case of the C5-C7-O11-H15 dihedral, one can promptly correlate the three groups corresponding to the projection of the PCA subgroups over the PC1-axis with the three dihedral mean values shown in the plot: *ca*. 60, 160 and 275º (-85º),

Fig. 11. Relative energies of the 130 lowest energy conformers of arbutin (the energy of the conformational ground state was taken as reference). (copyrighted from Araujo-Andrade et al., 2010)

Once the influence of the relative position of the two rings in arbutin on the relative energy of the conformers was evaluated, the preferred conformations assumed by the substituents of the glucopyranoside ring and their influence on energies were investigated in deeper detail. To this aim, PCA was conducted on each of the previously determined groups of conformers (A, B, C), excluding the x, y, z, coordinates corresponding to the phenol ring (atoms 23-35). This strategy allowed for the elimination of information that is not relevant for a conformational analysis within the glucopyranoside ring. PCA-scores/loadings analysis and interpretation was realized by using the methodology described above for the whole arbutin molecule. The results of this analysis are shown in Fig. 12-15. The PCA scores plot for Group A [Fig. 12 (a)] shows four well defined groups, labeled as subgroups A1, A2.a, A2.b and A3. If all elements of these four subgroups are projected over the PC1 axis, three groups can be distinguished, with one of them constituted by subgroups A1 and A3, other formed by subgroup A2.a and the third one by subgroup A2.b. On the other hand, projecting the elements over the PC2-axis allows also to distinguish three groups, but this time corresponding to A1, (A2.a, A2.b) and A3.

The observation of the one-dimensional loadings plots for PC1 and PC2 [Fig. 12 (b) & (c)], allows us to conclude that the positions of atoms C7, H21, H22, O11 and H15 are highly related with the conformers distribution in the PCA scores plot, *i.e*., the conformation exhibited by the CH2OH substituent at C5 is the main discriminating factor among subgroups. In consonance with this observation, when the mean values of the conformationally relevant dihedral angles associated with the substituted glucopyranoside ring in each subgroup are plotted [Fig. 12 (d)], it is possible to observe that the dihedral angles associated with the 40 Principal Component Analysis

al., 2010)

**Relative energy (kJ/mol)**

**29**

time corresponding to A1, (A2.a, A2.b) and A3.

*57*

**60**

*53*

**88**

*47*

**56**

*52*

**200**

*134*

**112**

*33*

**274**

*120*

**136**

*185*

**75**

*178*

**173**

*41*

**103**

*288*

**150**

*154*

**109**

*85*

**63**

*62*

**74**

*141*

**111**

*76*

**70**

*22*

**110**

*83*

**115**

*280*

**26**

*24*

**36**

Fig. 11. Relative energies of the 130 lowest energy conformers of arbutin (the energy of the conformational ground state was taken as reference). (copyrighted from Araujo-Andrade et

Once the influence of the relative position of the two rings in arbutin on the relative energy of the conformers was evaluated, the preferred conformations assumed by the substituents of the glucopyranoside ring and their influence on energies were investigated in deeper detail. To this aim, PCA was conducted on each of the previously determined groups of conformers (A, B, C), excluding the x, y, z, coordinates corresponding to the phenol ring (atoms 23-35). This strategy allowed for the elimination of information that is not relevant for a conformational analysis within the glucopyranoside ring. PCA-scores/loadings analysis and interpretation was realized by using the methodology described above for the whole arbutin molecule. The results of this analysis are shown in Fig. 12-15. The PCA scores plot for Group A [Fig. 12 (a)] shows four well defined groups, labeled as subgroups A1, A2.a, A2.b and A3. If all elements of these four subgroups are projected over the PC1 axis, three groups can be distinguished, with one of them constituted by subgroups A1 and A3, other formed by subgroup A2.a and the third one by subgroup A2.b. On the other hand, projecting the elements over the PC2-axis allows also to distinguish three groups, but this

The observation of the one-dimensional loadings plots for PC1 and PC2 [Fig. 12 (b) & (c)], allows us to conclude that the positions of atoms C7, H21, H22, O11 and H15 are highly related with the conformers distribution in the PCA scores plot, *i.e*., the conformation exhibited by the CH2OH substituent at C5 is the main discriminating factor among subgroups. In consonance with this observation, when the mean values of the conformationally relevant dihedral angles associated with the substituted glucopyranoside ring in each subgroup are plotted [Fig. 12 (d)], it is possible to observe that the dihedral angles associated with the

*40*

**51**

*6*

**59**

**Arbutin conformer**

*304*

*3*

*11*

**3039**

**7**

*71*

**58**

*4*

**45**

*44*

**10**

*168*

**98**

*180*

**1**

*73*

**16**

*195*

**39**

*177*

**119**

*113*

**79**

*107*

**186**

*104*

**116**

*137*

**133**

*145*

**123**

*261*

**157**

*69*

**306**

**232**

**121**

**156**

**211**

**143**

 Group A Group B Group C

**202**

**209**

**158**

**166**

**311**

**225**

**221**

**204**

**100**

**130**

**153**

**164**

**262**

**161**

**175**

**139**

**250**

**263**

**95**

**142**

**151**

**43**

**160**

**198**

**105**

**253**

**237**

**99**

**222**

**289**

**284**

**270**

**298**

**246**

**114**

**215**

**307**

**282**

**297**

**301**

**207**

**293**

**264**

**313**

CH2OH substituent (O6-C5-C7-O11 and C5-C7-O11-H15) are those allowing the discrimination among the four subgroups. In the case of the C5-C7-O11-H15 dihedral, one can promptly correlate the three groups corresponding to the projection of the PCA subgroups over the PC1-axis with the three dihedral mean values shown in the plot: *ca*. 60, 160 and 275º (-85º), respectively for A2.b, A2.a and (A1, A3).

Fig. 12. (a) PCA-scores and, (b,c) the corresponding loadings and belonging to Group A in terms of structural similarity in the conformations of the substituents of the glucopyranoside ring. (d) total average values and standard errors of the means of the 5 conformationally relevant dihedral angles of the glucopyranoside ring arbutin in the 4 subgroups. (copyrighted from Araujo-Andrade et al., 2010)

Application of Principal Component Analysis

**Sub-group C1**

**-4 -3 -2 -1 0 1 2 3 4 5**

**PC1 (38%)**

**Sub-group C3 Sub-group C2**

al., 2010)

**-4 -3 -2 -1 0 1 2 3 4**

**PC3 (17%)**

**4.4 Final remarks** 

**5. Conclusion** 

to Elucidate Experimental and Theoretical Information 43

**PC1-loadings**

**Y**

**X**

**-0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 0.4**

**Z**

**C1**

**C1**

**C1**

**C3**

**C3**

**C3**

**C5**

**C5**

**C5**

**C7**

**(c) (d)** 

**C7**

**C7**

**(a) (b)** 

**O9**

**O9**

**O9**

**O11**

**O11**

**O11**

**H13**

**H13**

**H13**

**H15**

**H15**

**H15**

**H17**

**H17**

**H17**

**H19**

**H19**

**H19**

**H21**

**H21**

**H21**

Fig. 14. (a) PCA-scores, and (b, c) the corresponding loadings, belonging to Group C. (d) total average values and standard errors of the means of the 5 relevant dihedral angles of the glucopyranoside ring arbutin in the 3 subgroups. (copyrighted from Araujo-Andrade et

**PC3-loadings**

**Y**

**X** **-0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 0.4**

**Z**

**C1**

**C1**

**C1**

**C3**

**C3**

**C3**

**C5**

**C5**

**C5**

**C7**

**C7**

**C7**

**O9**

**O9**

**O9**

**O11**

**O11**

**O11**

**H13**

**H13**

**H13**

**H15**

**H15**

**H15**

**H17**

**H17**

**H17**

**H19**

**H19**

**H19**

**H21**

**H21**

**H21**

PCA analyses based on atomic Cartesian coordinates of the properly oriented in the Cartesian system conformers of arbutin allowed the grouping of these conformers by structural analogies, which could be related with the conformationally relevant dihedral angles. Among them, the dihedrals interconnecting the glucopyranoside and phenol rings and those associated with the CH2OH fragment were found to be the most relevant ones.

In summary, this work represents a new simple approach for the structural analysis of

The results reported in this chapter for each experimental and theoretical application of PCA, demonstrate the versatility and capabilities of this unsupervised method to analyse samples from different origins. Three different examples were selected to show the

complex molecules and its aim was also to show another application of PCA.

Fig. 13. (a) PCA-scores and, (b, c) the corresponding loadings belonging to Group B in terms of structural similarity in the conformations of the substituents of the glucopyranoside ring. (d) total average values and standard errors of the means of the 5 conformationally relevant dihedral angles of the glucopyranoside ring arbutin in the 2 subgroups of conformers. (copyrighted from Araujo-Andrade et al., 2010)

The PCA-scores plot for the conformers belonging to the Group B [Fig. 13 (a)] shows only two clear groupings of conformers, where PC2 is the component separating these two groups the best. The loadings plot of PC2 [Fig. 13 (b)] shows that the clusters are also determined by the positions of atoms C7, H21, H22, O11 and H15, *i.e*., by the conformation of the CH2OH fragment. As expected, these observations are in agreement with the dihedral angles' mean values plot [Fig. 13 (d)], which clearly reveals that there, the values of the O6- C5-C7-O11 and C5-C7-O11-H15 dihedral angles are the ones that mainly discriminate internal coordinates among the conformers belonging to subgroups B1or B2.

A similar analysis made for conformers belonging to Group C allows concluding that three subgroups (C1, C2 and C3) can be defined [Fig. 14 (a)], once again resulting mainly from different conformations assumed by the CH2OH substituent [Fig. 14 (b) & (d)]. Regarding the energies of the conformers, subgroups are not strongly discriminative. However, subgroups A3, C2 and, in less extent B1, include conformers gradually less stable than the remaining subgroups of each main group (data not shown).

42 Principal Component Analysis

**PC2-loadings**

**Y**

**X**

**(a) (b)** 

**(c) (d)** 

**-5 -4 -3 -2 -1 0 1 2 3 4**

**PC2 (30%)**

**3 Sub-group B2**

**-3**

**-2**

**Sub-group B1**

**-1**

**0**

**PC3 (15%)**

**1**

**2**

**-0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 0.4**

**Z**

**C1**

**C1**

**C1**

**C3**

**C3**

**C3**

**C5**

**C5**

**C5**

**C7**

**C7**

**C7**

**O9**

**O9**

**O9**

**O11**

**O11**

**O11**

**H13**

**H13**

**H13**

**H15**

**H15**

**H15**

**H17**

**H17**

**H17**

**H19**

**H19**

**H19**

**H21**

**H21**

**H21**

Fig. 13. (a) PCA-scores and, (b, c) the corresponding loadings belonging to Group B in terms of structural similarity in the conformations of the substituents of the glucopyranoside ring. (d) total average values and standard errors of the means of the 5 conformationally relevant dihedral angles of the glucopyranoside ring arbutin in the 2 subgroups of conformers.

**-0.2 0.0 0.2 0.4**

**Z**

**Y**

**C1**

**C1**

**C1**

**C3**

**C3**

**C3**

**C5**

**C5**

**C5**

**C7**

**C7**

**C7**

**O9**

**O9**

**O9**

**O11**

**O11**

**O11**

**H13**

**H13**

**H13**

**H15**

**H15**

**H15**

**H17**

**H17**

**H17**

**H19**

**H19**

**H19**

**H21**

**H21**

**H21**

**-0.2 0.0 0.2 0.4**

**PC3-loadings**

**X** **-0.2 0.0 0.2 0.4**

The PCA-scores plot for the conformers belonging to the Group B [Fig. 13 (a)] shows only two clear groupings of conformers, where PC2 is the component separating these two groups the best. The loadings plot of PC2 [Fig. 13 (b)] shows that the clusters are also determined by the positions of atoms C7, H21, H22, O11 and H15, *i.e*., by the conformation of the CH2OH fragment. As expected, these observations are in agreement with the dihedral angles' mean values plot [Fig. 13 (d)], which clearly reveals that there, the values of the O6- C5-C7-O11 and C5-C7-O11-H15 dihedral angles are the ones that mainly discriminate internal

A similar analysis made for conformers belonging to Group C allows concluding that three subgroups (C1, C2 and C3) can be defined [Fig. 14 (a)], once again resulting mainly from different conformations assumed by the CH2OH substituent [Fig. 14 (b) & (d)]. Regarding the energies of the conformers, subgroups are not strongly discriminative. However, subgroups A3, C2 and, in less extent B1, include conformers gradually less stable than the

(copyrighted from Araujo-Andrade et al., 2010)

coordinates among the conformers belonging to subgroups B1or B2.

remaining subgroups of each main group (data not shown).

Fig. 14. (a) PCA-scores, and (b, c) the corresponding loadings, belonging to Group C. (d) total average values and standard errors of the means of the 5 relevant dihedral angles of the glucopyranoside ring arbutin in the 3 subgroups. (copyrighted from Araujo-Andrade et al., 2010)
