3. What number and how to select a panel of SSR marker loci according to their linkage map position and polymorphism information content

More than 800 SSR markers have been developed in apple (Malus domestica Borkh., 2n = 2 = 34), and nearly all of them have been mapped on a consensus map produced starting from five different genetic maps [127]. These markers are distributed across all 17 linkage groups, with an average of 49 microsatellites per linkage group. Moreover, the genome database for Rosaceae [128] is a long-standing community database resource providing hundreds of microsatellite loci, in most cases accompanied by a wealth of information about map position, repeat motifs, primers, PCR conditions, amplicon length, and publication source. A discriminatory set of markers should ensure the uniform distribution across the genome of the microsatellite loci to represent adequately each linkage group and, thus, the genome in its entirety [91]. In fact, assessing the genetic diversity by focusing only on restricted regions of the genome may threaten to distort results. Nevertheless, neglecting the most ambitious study on Malus domestica Borkh. carried out by Patocchi et al. [105] using an extremely high number of SSR markers (82), the number of selected and analyzed genomic loci varies from 4 to 19 with an average value of 12 6 SSR markers, less than a microsatellite locus per linkage group. Extending this reasoning to the other crops reviewed, the emerging output is often the same: for all the plant species, very detailed genetic maps are available [129–132] as well as dedicated databases for SSR markers (Table 4).

Olea europaea L. (2n = 2 = 46) includes 23 chromosome pairs and the average number of microsatellite markers used in the reviewed articles is 11 5, much less than a microsatellite locus per linkage group. The same is also true for Vitis vinifera L. (2n = 2 = 38) in which the average number of microsatellite markers explored for genotyping cultivars is 15 11 in spite of the 19 chromosome pairs of this species. Even the varietal identification of their respective derivatives (olive oil and wine) has been accomplished by exploring, on average, 8 3 and 10 4 SSR markers, respectively. On the contrary, in wheat, the varieties of both Triticum durum Desf. (2n = 4 = 28) and Triticum aestivum L. (2n = 6 = 42) have been characterized by means of genotyping with SSR markers analyzing, on average, 18 3 and 21 6 microsatellite loci respectively, that is more than one microsatellite per linkage group. This latter choice is perhaps associated with the high complexity and large size of the Triticum aestivum L. genome, approximately equal to 17 Gb/1C [148]. In fact, for a correct representation of the entire genome, not only the number of homologous chromosomes but also their size (i.e., total amount of DNA) should be considered when choosing the optimal panel of microsatellite loci to be investigated. Finally, in tomato (Solanum lycopersicum L., 2n = 2 = 24), the average number of SSR markers employed for genotyping varieties is 14 7 (Table 4).

Article searches were performed using the three most popular sources of scientific information: Scopus, Web of Science, and Google Scholar, while PubMed was excluded from the queried datasets because it focuses mainly on medicine and biomedical sciences and also because Google scholar already includes its index [126]. A total of 90 articles based on SSR genotyping analysis were selected from the international literature in the last 15 years, covering all the plant species/ food products taken as reference list. Only articles dating from 2000 to now were reviewed assuming that researches published earlier would have lost their steering effects on the activities of plant DNA genotyping, given that the development of new and large marker datasets, and technologically advanced and automated protocols has been very fast in the last 15 years.

3. What number and how to select a panel of SSR marker loci according to

More than 800 SSR markers have been developed in apple (Malus domestica Borkh., 2n = 2 = 34), and nearly all of them have been mapped on a consensus map produced starting from five different genetic maps [127]. These markers are distributed across all 17 linkage groups, with an average of 49 microsatellites per linkage group. Moreover, the genome database for Rosaceae [128] is a long-standing community database resource providing hundreds of microsatellite loci, in most cases accompanied by a wealth of information about map position, repeat motifs, primers, PCR conditions, amplicon length, and publication source. A discriminatory set of markers should ensure the uniform distribution across the genome of the microsatellite loci to represent adequately each linkage group and, thus, the genome in its entirety [91]. In fact, assessing the genetic diversity by focusing only on restricted regions of the genome may threaten to distort results. Nevertheless, neglecting the most ambitious study on Malus domestica Borkh. carried out by Patocchi et al. [105] using an extremely high number of SSR markers (82), the number of selected and analyzed genomic loci varies from 4 to 19 with an average value of 12 6 SSR markers, less than a microsatellite locus per linkage group. Extending this reasoning to the other crops reviewed, the emerging output is often the same: for all the plant species, very detailed genetic maps are available [129–132] as well as

Olea europaea L. (2n = 2 = 46) includes 23 chromosome pairs and the average number of microsatellite markers used in the reviewed articles is 11 5, much less than a microsatellite locus per linkage group. The same is also true for Vitis vinifera L. (2n = 2 = 38) in which the average number of microsatellite markers explored for genotyping cultivars is 15 11 in spite of the 19 chromosome pairs of this species. Even the varietal identification of their respective derivatives (olive oil and wine) has been accomplished by exploring, on average, 8 3 and 10 4 SSR markers, respectively. On the contrary, in wheat, the varieties of both Triticum durum Desf. (2n = 4 = 28) and Triticum aestivum L. (2n = 6 = 42) have been characterized by means of genotyping with SSR markers analyzing, on average, 18 3 and 21 6 microsatellite loci respectively, that is more than one microsatellite per linkage group. This latter choice is perhaps associated with the high complexity and large size of the Triticum aestivum L. genome, approximately equal to 17 Gb/1C [148]. In fact, for a correct representation of the

their linkage map position and polymorphism information content

dedicated databases for SSR markers (Table 4).

136 Rediscovery of Landraces as a Resource for the Future

Only few studies [65, 74, 96, 106] evaluated the position within linkage groups of the microsatellites selected: the choice often falls on SSR markers with unknown or not specified position or mapped on few chromosomes, thus resulting in a poor representation of the entire genome. In this regard, the results from Cipriani et al. [74] and van Treuren et al. [106] represent a good model for the choice of molecular markers to investigate the genetic diversity in germplasm collections and to solve synonymy/homonymy cases as well as paternity and kinship issues. The former group selected microsatellite sequences from scaffolds anchored to the 19 linkage groups of Vitis vinifera L. with the aim of analyzing 38 well-distributed SSR


Table 4. Information on the five species analyzed in this book chapter, including genome size, ploidy, available SSR database and number of microsatellite regions included, average number of SSR employed in the articles reviewed, number of cultivars, and microsatellite used as reference.

markers, ideally two loci for each linkage group, whereas the latter group also considered the specific map position of genetic and genetic association with traits of agricultural interest.

genome, and three times that of the mitochondrial genome, indicating that the evolution of mitochondrial genome has been slower and implicating lower levels of polymorphism. Nevertheless, the use of markers belonging to mitochondrial or chloroplast sequences may be useful due to their haploid nature, relative abundance, and stability in comparison with nuclear sequences. For instance, Borgo et al. [153] suggested that the circular form increases stability and resistance against heat disintegration. Boccacci et al. [113] analyzed musts and wine samples using a set of nine nSSR and seven cpSSR markers in order to identify cultivars. Findings from these studies confirm a low level of polymorphism for the extranuclear markers due to their lower frequency of mutation. Also Baleiras-Couto and Eiras-Dias [45] and Pérez-Jiménez et al.

Critical Aspects on the Use of Microsatellite Markers for Assessing Genetic Identity of Crop Plant Varieties…

The choice of the number of SSR loci usually depends on their polymorphism degree. With some exceptions for which this information is not available, the average number of marker alleles per SSR locus is equal to 7.1 for Olea europaea L., 3.5 for Solanum lycopersicum L., 8.2 for Vitis vinifera L., 6.9 for Triticum spp., 9.4 for Malus � domestica Borkh., 6.5 for olive oil, and 5.2 for wine. Both EST-SSR and cpSSR were found to be less polymorphic, with a low average number of alleles per locus, than genomic SSR markers [45, 68, 93, 113, 125]. The polymorphism degree may depend on several factors, including the SSR motif length and the SSR

In order to estimate the level of genetic diversity detected by each microsatellite, marker frequencies are widely used to estimate the polymorphism information content (PIC, Table 5) values, according to the methods of Botstein et al. [154]. The authors reported the following

PIC > 0.5 is considered as being a highly informative marker, while 0.5 > PIC > 0.25 is an informative marker and PIC is 0.25, a slightly informative marker. As reported by Nagy et al. [155], PIC can be defined as the probability that the marker genotype of a given offspring will allow deduction, in the absence of crossing-over, of which of the two marker alleles of the affected parents it received. In other words, this parameter is a modification of the heterozygosity measure that subtracts from the H value an additional probability that an individual in a linkage analysis does not contribute information to the study. On this aspect, there is no full agreement among the authors. Some studies on olive oil [58, 122] and Malus � domestica Borkh. [101, 103], referring to Anderson et al. [156], contend that the occurrence of rare marker alleles has less impact than common marker alleles on the PIC estimates and consider that this index can be assimilated to the expected heterozygosity (He), calculated by the following simplified formula:

PIC <sup>¼</sup> <sup>1</sup> � <sup>X</sup><sup>n</sup>

i¼1 p2 i !

Xn j¼iþ1

2p<sup>2</sup> i p2

th and j

<sup>j</sup> , (1)

http://dx.doi.org/10.5772/intechopen.70756

139

th marker alleles, respectively. A

, (2)

[125] have exploited this kind of SSR markers, with similar results.

formula for the calculation of the PIC value of an n-marker allele:

where pi and pj are the population frequencies of the i

where pi is the population frequency of the ith marker allele.

PIC <sup>¼</sup> <sup>1</sup> �X<sup>n</sup>

i¼1 p2 <sup>i</sup> �X<sup>n</sup>�<sup>1</sup> i¼1

localization on coding or not-coding regions.

Two important issues must be pointed out. The number of SSRs to employ should be also evaluated according to the type of analysis. For example, the EU-Project Genres CT96 No81 [139] selected six highly discriminating microsatellites, thus less than one marker per linkage group, that could be sufficient to differentiate among hundreds of grape cultivars. The same microsatellite set could be very inadequate to discriminate among clones. Moreover, it is worth noting that, in some cases, increasing the number of marker loci does not necessarily mean improving the resolution of cultivar characterization and identification. For example, Baric et al. [107] reported that extending the set of microsatellite markers to 48, from an initial analysis based on 14 SSR loci, it was impossible to improve the genetic discrimination among the 28 accessions of Malus domestica Borkh. analyzed.

Connected to the distribution and position of the microsatellite loci within a genome, there is also the possibility to choose between genomic SSR (gSSR) and EST-derived SSR (EST-SSR). Generally, EST-SSR markers are less polymorphic than genomic SSR ones, as reported for Triticum spp. [93, 95] and Solanum lycopersicum L. [68], being the formers found in selectively more constrained regions of the genome. Of particular interest is the comparison of Leigh et al. [93] between sets of 20 EST-SSR and 12 genomic SSR markers in terms of discrimination ability among 66 varieties of Triticum spp. The results indicate that the panel of EST-derived SSR markers used is slightly less efficient at discriminating between hexaploid Triticum aestivum L. varieties compared with the second panel of genomic SSR markers. EST-SSR markers also have the disadvantage that amplicon sizes can differ from expectations, as a consequence of the undetected presence of introns in flanking regions [39]. Nevertheless, these findings support the possibility that EST-SSR markers could in the near future complement and outnumber the genomic SSR markers. In fact, EST-SSR markers should have some important advantages over genomic SSR markers. In particular, they are easily obtained by bioinformatic querying of EST databases while the development phase of genomic SSR markers is quite long and expensive; EST-SSR markers could be functionally more informative than genomic SSR markers because being associated with the transcribed regions of the genome, thus reflecting the genetic diversity inside or adjacent to the genes [149]. Moreover, the rate at which SSR flanking regions evolve is lower in expressed than nonexpressed sequences and the primers designed on these sequences are more likely to be conserved across species, thus resulting in high levels of SSR transferability [150]. A suitable combination of EST-SSR and genomic-SSR markers could be optimal for distinctiveness, uniformity, and stability testing applications for crop plant varieties [93]. Overall, the vast majority of studies are based on genomic SSR markers, and only three articles out of 90 take into account the possibility of employing EST-SSR markers.

In terms of location, nuclear SSR (nSSR) markers are largely used and more exploited than plastidial and mitochondrial SSR (cpSSR and mtSSR, respectively) markers. First, the development phase of extranuclear SSR markers is complicated: high purity chloroplast or mitochondrial DNA is typically very hard to extract due to nuclear DNA contaminations [151]. Moreover, Wolfe et al. [152] have shown that comparing nuclear, chloroplast, and mitochondrial genomes, the frequency of chloroplast genome gene silencing and replacement was half that of the nuclear genome, and three times that of the mitochondrial genome, indicating that the evolution of mitochondrial genome has been slower and implicating lower levels of polymorphism. Nevertheless, the use of markers belonging to mitochondrial or chloroplast sequences may be useful due to their haploid nature, relative abundance, and stability in comparison with nuclear sequences. For instance, Borgo et al. [153] suggested that the circular form increases stability and resistance against heat disintegration. Boccacci et al. [113] analyzed musts and wine samples using a set of nine nSSR and seven cpSSR markers in order to identify cultivars. Findings from these studies confirm a low level of polymorphism for the extranuclear markers due to their lower frequency of mutation. Also Baleiras-Couto and Eiras-Dias [45] and Pérez-Jiménez et al. [125] have exploited this kind of SSR markers, with similar results.

markers, ideally two loci for each linkage group, whereas the latter group also considered the specific map position of genetic and genetic association with traits of agricultural interest.

Two important issues must be pointed out. The number of SSRs to employ should be also evaluated according to the type of analysis. For example, the EU-Project Genres CT96 No81 [139] selected six highly discriminating microsatellites, thus less than one marker per linkage group, that could be sufficient to differentiate among hundreds of grape cultivars. The same microsatellite set could be very inadequate to discriminate among clones. Moreover, it is worth noting that, in some cases, increasing the number of marker loci does not necessarily mean improving the resolution of cultivar characterization and identification. For example, Baric et al. [107] reported that extending the set of microsatellite markers to 48, from an initial analysis based on 14 SSR loci, it was impossible to improve the genetic discrimination among

Connected to the distribution and position of the microsatellite loci within a genome, there is also the possibility to choose between genomic SSR (gSSR) and EST-derived SSR (EST-SSR). Generally, EST-SSR markers are less polymorphic than genomic SSR ones, as reported for Triticum spp. [93, 95] and Solanum lycopersicum L. [68], being the formers found in selectively more constrained regions of the genome. Of particular interest is the comparison of Leigh et al. [93] between sets of 20 EST-SSR and 12 genomic SSR markers in terms of discrimination ability among 66 varieties of Triticum spp. The results indicate that the panel of EST-derived SSR markers used is slightly less efficient at discriminating between hexaploid Triticum aestivum L. varieties compared with the second panel of genomic SSR markers. EST-SSR markers also have the disadvantage that amplicon sizes can differ from expectations, as a consequence of the undetected presence of introns in flanking regions [39]. Nevertheless, these findings support the possibility that EST-SSR markers could in the near future complement and outnumber the genomic SSR markers. In fact, EST-SSR markers should have some important advantages over genomic SSR markers. In particular, they are easily obtained by bioinformatic querying of EST databases while the development phase of genomic SSR markers is quite long and expensive; EST-SSR markers could be functionally more informative than genomic SSR markers because being associated with the transcribed regions of the genome, thus reflecting the genetic diversity inside or adjacent to the genes [149]. Moreover, the rate at which SSR flanking regions evolve is lower in expressed than nonexpressed sequences and the primers designed on these sequences are more likely to be conserved across species, thus resulting in high levels of SSR transferability [150]. A suitable combination of EST-SSR and genomic-SSR markers could be optimal for distinctiveness, uniformity, and stability testing applications for crop plant varieties [93]. Overall, the vast majority of studies are based on genomic SSR markers, and only three articles out of 90 take into account the possibility of employing EST-SSR markers.

In terms of location, nuclear SSR (nSSR) markers are largely used and more exploited than plastidial and mitochondrial SSR (cpSSR and mtSSR, respectively) markers. First, the development phase of extranuclear SSR markers is complicated: high purity chloroplast or mitochondrial DNA is typically very hard to extract due to nuclear DNA contaminations [151]. Moreover, Wolfe et al. [152] have shown that comparing nuclear, chloroplast, and mitochondrial genomes, the frequency of chloroplast genome gene silencing and replacement was half that of the nuclear

the 28 accessions of Malus domestica Borkh. analyzed.

138 Rediscovery of Landraces as a Resource for the Future

The choice of the number of SSR loci usually depends on their polymorphism degree. With some exceptions for which this information is not available, the average number of marker alleles per SSR locus is equal to 7.1 for Olea europaea L., 3.5 for Solanum lycopersicum L., 8.2 for Vitis vinifera L., 6.9 for Triticum spp., 9.4 for Malus � domestica Borkh., 6.5 for olive oil, and 5.2 for wine. Both EST-SSR and cpSSR were found to be less polymorphic, with a low average number of alleles per locus, than genomic SSR markers [45, 68, 93, 113, 125]. The polymorphism degree may depend on several factors, including the SSR motif length and the SSR localization on coding or not-coding regions.

In order to estimate the level of genetic diversity detected by each microsatellite, marker frequencies are widely used to estimate the polymorphism information content (PIC, Table 5) values, according to the methods of Botstein et al. [154]. The authors reported the following formula for the calculation of the PIC value of an n-marker allele:

$$\text{PIC} = 1 - \sum\_{i=1}^{n} \mathbf{p}\_i^2 - \sum\_{i=1}^{n-1} \sum\_{j=i+1}^{n} 2\mathbf{p}\_i^2 \mathbf{p}\_j^2 \tag{1}$$

where pi and pj are the population frequencies of the i th and j th marker alleles, respectively. A PIC > 0.5 is considered as being a highly informative marker, while 0.5 > PIC > 0.25 is an informative marker and PIC is 0.25, a slightly informative marker. As reported by Nagy et al. [155], PIC can be defined as the probability that the marker genotype of a given offspring will allow deduction, in the absence of crossing-over, of which of the two marker alleles of the affected parents it received. In other words, this parameter is a modification of the heterozygosity measure that subtracts from the H value an additional probability that an individual in a linkage analysis does not contribute information to the study. On this aspect, there is no full agreement among the authors. Some studies on olive oil [58, 122] and Malus � domestica Borkh. [101, 103], referring to Anderson et al. [156], contend that the occurrence of rare marker alleles has less impact than common marker alleles on the PIC estimates and consider that this index can be assimilated to the expected heterozygosity (He), calculated by the following simplified formula:

$$\text{PIC} = 1 - \left(\sum\_{i=1}^{n} \mathbf{p}\_i^2\right) \tag{2}$$

where pi is the population frequency of the ith marker allele.


pi and pj are the frequencies of the ith and jth marker alleles.

\*\*pi is the frequency of the ith marker genotype.

Table 5. Summary information on the main parameters assessed by the 90 papers reviewed.

In addition to the PIC value, calculated taking into account allelic frequencies, there are several indexes focusing on genotype frequencies. For example, as reported by Aranzana et al. [157], other two important indexes that should be evaluated are the power of discrimination (usually PD)—or diversity index (D), as reported by Zulini et al. [71] and Martínez et al. [24]—and the confusion probability (C). The first one provides an estimate of the probability that two randomly sampled accessions of the study would be differentiated by their marker allele profiles:

$$\text{PD} = 1 - \sum\_{i=1}^{n} \mathbf{p}\_{i\prime}^{2} \tag{3}$$

where PDi is the power of discrimination value of the ith locus. Notwithstanding its informativeness, only three articles of the 90 reviewed take into account this value (Table 5). Martínez et al. [24] in their attempt to assess the genetic diversity of Vitis vinifera L. varieties calculated

Critical Aspects on the Use of Microsatellite Markers for Assessing Genetic Identity of Crop Plant Varieties…

PD <sup>¼</sup> <sup>1</sup> � C being C <sup>¼</sup> <sup>X</sup><sup>n</sup>

where pi is the frequency of different marker genotypes for a given locus. In this case, C is the probability of coincidence, corresponding to the probability that two varieties match by chance

About 21 articles, mainly focused on the species Vitis vinifera L. and oil from Olea europaea L., report also the probability of identity (PI) index of each single SSR marker locus either in addition or in substitution of PD value (Table 5). This index can be estimated as follows:

where pi and pj are the frequencies of ith and jth marker alleles, respectively. It represents the probability that two individuals drawn at random from a population will have the same genotype at one marker locus. For example, Vietina et al. [122] and Corrado et al. [58] in their studies, regarding the genetic traceability of monovarietal olive oils, refer to this value in order to determine the efficacy of the SSR marker pool to discriminate among the cultivars. Martínez

Equally interesting is the total probability of identity (PIt) that represents a compound probability defined as the probability of two cultivars sharing the same marker genotype by chance

> PIt <sup>¼</sup> <sup>Y</sup><sup>n</sup> i¼1

The use of standardized parameters is essential to make SSR data comparable across species and laboratories, and it can be especially beneficial for the preliminary evaluation of the

Finally, Qanbari et al. [158] reported that PD and PI are complementary parameters:

X j

X j

2pi pj � �<sup>2</sup>

2pi pj � �<sup>2</sup>

PI <sup>¼</sup> <sup>X</sup> i pi � �<sup>4</sup> þ<sup>X</sup> i

et al. [24] adopted the following formula to calculate the same value:

PI <sup>¼</sup> <sup>X</sup> i pi � �<sup>4</sup> �<sup>X</sup> i

where PIi is the probability of identity value of the ith marker locus.

discriminant ability and applicability of SSR marker loci.

i¼1 p2

<sup>i</sup> , (5)

http://dx.doi.org/10.5772/intechopen.70756

141

, (6)

: (7)

PIi, (8)

PD ¼ 1 � PI: (9)

the power of discrimination index as follows:

at one locus.

and calculated as follows:

where pi is the frequency of the ith marker genotype. As already described for the PIC, among the authors, there are different interpretations and procedures to calculate the PD index. Pasqualone et al. [25] in their study on Olea europaea L. genotyping reported that "the power of discrimination, sometimes referred to as polymorphism information content, or diversity index, was calculated […]," assuming in this way that PD and PIC correspond to the same parameter.

The confusion probability (C) index, also defined as the combined power of discrimination of overall loci [23], is the probability that any two cultivars are identical in their genotypes at all SSR loci by chance alone and it depends on PD. It can be estimated as follows:

$$\mathbf{C} = \prod\_{i=1}^{n} (1 - \mathbf{PD}\_i)\_{\prime} \tag{4}$$

where PDi is the power of discrimination value of the ith locus. Notwithstanding its informativeness, only three articles of the 90 reviewed take into account this value (Table 5). Martínez et al. [24] in their attempt to assess the genetic diversity of Vitis vinifera L. varieties calculated the power of discrimination index as follows:

$$\text{PD} = 1 - \mathbb{C} \quad \text{being} \quad \mathbb{C} = \sum\_{i=1}^{n} \mathbf{p}\_{i}^{2} \tag{5}$$

where pi is the frequency of different marker genotypes for a given locus. In this case, C is the probability of coincidence, corresponding to the probability that two varieties match by chance at one locus.

About 21 articles, mainly focused on the species Vitis vinifera L. and oil from Olea europaea L., report also the probability of identity (PI) index of each single SSR marker locus either in addition or in substitution of PD value (Table 5). This index can be estimated as follows:

$$\text{PI} = \sum\_{\text{i}} \left( \mathbf{p}\_{\text{i}} \right)^{4} + \sum\_{\text{i}} \sum\_{\text{j}} \left( 2 \mathbf{p}\_{\text{i}} \mathbf{p}\_{\text{j}} \right)^{2} \tag{6}$$

where pi and pj are the frequencies of ith and jth marker alleles, respectively. It represents the probability that two individuals drawn at random from a population will have the same genotype at one marker locus. For example, Vietina et al. [122] and Corrado et al. [58] in their studies, regarding the genetic traceability of monovarietal olive oils, refer to this value in order to determine the efficacy of the SSR marker pool to discriminate among the cultivars. Martínez et al. [24] adopted the following formula to calculate the same value:

$$\text{PI} = \sum\_{\text{i}} \left( \mathbf{p}\_{\text{i}} \right)^{4} - \sum\_{\text{i}} \sum\_{\text{j}} \left( 2 \mathbf{p}\_{\text{i}} \mathbf{p}\_{\text{j}} \right)^{2}. \tag{7}$$

Equally interesting is the total probability of identity (PIt) that represents a compound probability defined as the probability of two cultivars sharing the same marker genotype by chance and calculated as follows:

$$\text{PI}\_{\text{t}} = \prod\_{i=1}^{n} \text{PI}\_{\text{i}\prime} \tag{8}$$

where PIi is the probability of identity value of the ith marker locus.

In addition to the PIC value, calculated taking into account allelic frequencies, there are several indexes focusing on genotype frequencies. For example, as reported by Aranzana et al. [157], other two important indexes that should be evaluated are the power of discrimination (usually PD)—or diversity index (D), as reported by Zulini et al. [71] and Martínez et al. [24]—and the confusion probability (C). The first one provides an estimate of the probability that two randomly sampled accessions of the study would be differentiated by their marker allele profiles:

Table 5. Summary information on the main parameters assessed by the 90 papers reviewed.

Index Full name Formula Definition No. of papers

received [155]

<sup>i</sup> Probability that two randomly sampled

chance alone [157]

<sup>i</sup>¼<sup>1</sup> ð Þ <sup>1</sup> � PDi Probability that any two individuals are

<sup>i</sup>¼<sup>1</sup> PIi Probability of two individuals sharing the

marker allele profiles [157]

<sup>2</sup> Probability that two individuals drawn at

<sup>j</sup> Probability that the marker genotype of a given offspring will allow deduction, in the absence of crossing-over, of which of the two marker alleles of the affected parents it

accessions would be differentiated by their

identical in their genotypes at all SSR loci by

random from a population will have the same genotype at one marker locus [122]

same marker genotype by chance [122]

PIC\* Polymorphism Information Content

PD\*\* Power of

C\*\* Confusion probability

PI\* Probability of Identity

> probability of identity

\* Total

PIt

\*

Discrimination

<sup>1</sup> � <sup>P</sup><sup>n</sup> <sup>i</sup>¼<sup>1</sup> <sup>p</sup><sup>2</sup>

140 Rediscovery of Landraces as a Resource for the Future

<sup>1</sup> � <sup>P</sup><sup>n</sup> <sup>i</sup>¼<sup>1</sup> <sup>p</sup><sup>2</sup>

Q<sup>n</sup>

∑i(pi)

Q<sup>n</sup>

\*\*pi is the frequency of the ith marker genotype.

pi and pj are the frequencies of the ith and jth marker alleles.

<sup>4</sup> + ∑i∑j(2pipj)

<sup>1</sup> � <sup>P</sup><sup>n</sup>�<sup>1</sup> i¼1 P<sup>n</sup> <sup>j</sup>¼iþ<sup>1</sup> <sup>2</sup>p<sup>2</sup> i p2

PD <sup>¼</sup> <sup>1</sup> �X<sup>n</sup>

where pi is the frequency of the ith marker genotype. As already described for the PIC, among the authors, there are different interpretations and procedures to calculate the PD index. Pasqualone et al. [25] in their study on Olea europaea L. genotyping reported that "the power of discrimination, sometimes referred to as polymorphism information content, or diversity index, was calculated […]," assuming in this way that PD and PIC correspond to the same parameter. The confusion probability (C) index, also defined as the combined power of discrimination of overall loci [23], is the probability that any two cultivars are identical in their genotypes at all

SSR loci by chance alone and it depends on PD. It can be estimated as follows:

<sup>C</sup> <sup>¼</sup> <sup>Y</sup><sup>n</sup> i¼1

i¼1 p2

<sup>i</sup> , (3)

account for it

36

14

3

21

2

ð Þ 1 � PD<sup>i</sup> , (4)

Finally, Qanbari et al. [158] reported that PD and PI are complementary parameters:

$$\text{PD} = 1 - \text{PI} \tag{9}$$

The use of standardized parameters is essential to make SSR data comparable across species and laboratories, and it can be especially beneficial for the preliminary evaluation of the discriminant ability and applicability of SSR marker loci.
