**5. DArTseq™: an effective tool for genome diversity in** *C. arabica*

The DArTseq™ technology, developed by DArT company (https://www.diversityarrays.com), is one of those methods that have received increasing interest worldwide since it can generate thousands of high-quality SNPs in a timely and cost-effective manner [59, 60]. The DArTseq™ method, a variation of GBS, implements complexity reduction methods that effectively targets low-copy sequences of the genome [61]. Besides, this process is optimized for each organism and type of study, by using combinations of restriction enzymes (REs) and selecting the most effective in reducing genome complexity [59].

The DArTseq™ technology has been utilized in diploid but more often in polyploid plant species, such as rice (*Oryza sativa*; [62]), barley (*Hordeum vulgare*; [63]) and maize (*Zea mays*; [64]), because SNP detection is facilitated by high fidelity REs, rather than relying on the annealing of primers to genomic targets in the presence of homologous annealing sequences [65].

In coffee, we have reported a genetic diversity study in 87 accessions of *Coffea* spp. These accessions were selected from the National Coffee Germplasm Bank located at 19°10′ 27" N and 96° 57′ 50" W and 1345 masl, in Huatusco, Veracruz, Mexico. Accessions were previously characterized by DArTseq™ method and SNP markers in Spinoso-Castillo et al. [66].

As a result, 16,995 SNP markers, derived from 34,000 unique sequences, were obtained by DArTseq™ from 87 accessions of different *Coffea* spp. After removing the markers with more than 10% of the missing data and MAF <5%, there were 1,739 polymorphic SNP markers for the analysis. After imputation and elimination of markers based on MAF, a heat map of the 87 accessions was obtained by using the genomic relations matrix *G* (**Figure 1**).

For the heat map, the genomic relations matrix *G* can be easily calculated using the following expression:

$$\mathbf{G} = \frac{\mathbf{Z}\mathbf{Z}^\*}{p},\tag{1}$$

**51**

**genomics**

**Figure 1.**

**Figure 2.**

*Genetic Diversity of* Coffea arabica *L.: A Genomic Approach*

Steiger et al. [67], using AFLP markers, that *C. canephora* and *C. arabica* were more genetically similar, revealing inter-species diversity even though *C. arabica* resulted

*Bar graphic of the STRUCTURE software used to study the diversity of the 87 coffee accessions using SNP marker data. The 87 genotypes are represented below the graphic, and were divided into five (K = 5) groups.*

*Heat map for the 87 accessions of* Coffea *spp. from the National Bank of Coffee Germplasm in Mexico using DArTseq Technology. Red small squares indicate an individual's genetic relatedness to itself, dark orange color* 

*represents high kinship relations while lighter colors (yellow) represent weaker relations.*

The results obtained from this *Coffea* spp. central collection are similar to those reported in the study of Sant'Ana et al. [10] who found in the population structure analyses the presence of two to three groups (K = 2 and K = 3), corresponding to the east and west sides of the Great Rift Valley and an additional group formed by wild *C. arabica* accessions collected in the western forests. Sousa et al. [29] analyzed the population structure of coffee genotypes of interest for breeding studies, they used

High quality reference genome assemblies accelerate plant breeding by selecting desirable genes with improved agronomic traits, including high yield, tolerance to

from a recent hybridization between *C. canephora* and *C. eugenioides* [13].

11,187 SNP markers from which two groups (K = 2) were obtained.

**6. Advantages and disadvantages of NGS techniques in** *C. arabica* 

*DOI: http://dx.doi.org/10.5772/intechopen.96640*

where *Z* is the matrix of markers of dimension n = 87 rows (individuals) and p = 1,739 columns (markers), which is obtained by centering and standardizing the columns of the matrix of markers. The model-based Bayesian cluster analysis in STRUCTURE visualized the population structure under examination (**Figure 2**). Five distinct sub-populations were found across cultivars.

The sub-populations were denoted as Pop1, Pop2, Pop3, Pop4 and Pop5. The first group clustered *C. liberica* (84) and *C. canephora* (85, 86 and 87) species, the second group clustered mostly *C. arabica* accessions of the central collection, which evidenced the greater dissimilarity of these accessions with *C. liberica* and *C. canephora* species; the third group clustered CIRAD's F1-hybrids (74–79). Fourth and fifth clusters compiled different *C. arabica* accessions among them. Also, it was shown by *Genetic Diversity of* Coffea arabica *L.: A Genomic Approach DOI: http://dx.doi.org/10.5772/intechopen.96640*

#### **Figure 1.**

*Landraces - Traditional Variety and Natural Breed*

effective in reducing genome complexity [59].

markers in Spinoso-Castillo et al. [66].

the genomic relations matrix *G* (**Figure 1**).

Five distinct sub-populations were found across cultivars.

the following expression:

the presence of homologous annealing sequences [65].

Although significant, the number of reports concerning genomic resources in *Coffea*, even for a specie of commercial importance, such as *C. arabica*, is still low. Already, genotyping profiles of SNPs were identified and tested in *C. arabica* by Moncada et al. [57], Sousa et al. [29], Sant'Ana et al. [10] and Merot-L'anthoene et al. [4]. High-throughput genotyping assays are still needed in order to rapidly characterize the coffee genetic diversity and to evaluate the introgression of different cultivars in a cost-effective way. Measures must be taken to construct high-density genetic maps in *Coffea* [57, 58]. However, the use of SNP markers to generate denser maps is still low.

**5. DArTseq™: an effective tool for genome diversity in** *C. arabica*

The DArTseq™ technology, developed by DArT company (https://www.diversityarrays.com), is one of those methods that have received increasing interest worldwide since it can generate thousands of high-quality SNPs in a timely and cost-effective manner [59, 60]. The DArTseq™ method, a variation of GBS, implements complexity reduction methods that effectively targets low-copy sequences of the genome [61]. Besides, this process is optimized for each organism and type of study, by using combinations of restriction enzymes (REs) and selecting the most

The DArTseq™ technology has been utilized in diploid but more often in polyploid plant species, such as rice (*Oryza sativa*; [62]), barley (*Hordeum vulgare*; [63]) and maize (*Zea mays*; [64]), because SNP detection is facilitated by high fidelity REs, rather than relying on the annealing of primers to genomic targets in

In coffee, we have reported a genetic diversity study in 87 accessions of *Coffea* spp. These accessions were selected from the National Coffee Germplasm Bank located at 19°10′ 27" N and 96° 57′ 50" W and 1345 masl, in Huatusco, Veracruz, Mexico. Accessions were previously characterized by DArTseq™ method and SNP

As a result, 16,995 SNP markers, derived from 34,000 unique sequences, were obtained by DArTseq™ from 87 accessions of different *Coffea* spp. After removing the markers with more than 10% of the missing data and MAF <5%, there were 1,739 polymorphic SNP markers for the analysis. After imputation and elimination of markers based on MAF, a heat map of the 87 accessions was obtained by using

For the heat map, the genomic relations matrix *G* can be easily calculated using

*ZZ*′ *G= , p*

where *Z* is the matrix of markers of dimension n = 87 rows (individuals) and p = 1,739 columns (markers), which is obtained by centering and standardizing the columns of the matrix of markers. The model-based Bayesian cluster analysis in STRUCTURE visualized the population structure under examination (**Figure 2**).

The sub-populations were denoted as Pop1, Pop2, Pop3, Pop4 and Pop5. The first group clustered *C. liberica* (84) and *C. canephora* (85, 86 and 87) species, the second group clustered mostly *C. arabica* accessions of the central collection, which evidenced the greater dissimilarity of these accessions with *C. liberica* and *C. canephora* species; the third group clustered CIRAD's F1-hybrids (74–79). Fourth and fifth clusters compiled different *C. arabica* accessions among them. Also, it was shown by

(1)

**50**

*Heat map for the 87 accessions of* Coffea *spp. from the National Bank of Coffee Germplasm in Mexico using DArTseq Technology. Red small squares indicate an individual's genetic relatedness to itself, dark orange color represents high kinship relations while lighter colors (yellow) represent weaker relations.*

#### **Figure 2.**

*Bar graphic of the STRUCTURE software used to study the diversity of the 87 coffee accessions using SNP marker data. The 87 genotypes are represented below the graphic, and were divided into five (K = 5) groups.*

Steiger et al. [67], using AFLP markers, that *C. canephora* and *C. arabica* were more genetically similar, revealing inter-species diversity even though *C. arabica* resulted from a recent hybridization between *C. canephora* and *C. eugenioides* [13].

The results obtained from this *Coffea* spp. central collection are similar to those reported in the study of Sant'Ana et al. [10] who found in the population structure analyses the presence of two to three groups (K = 2 and K = 3), corresponding to the east and west sides of the Great Rift Valley and an additional group formed by wild *C. arabica* accessions collected in the western forests. Sousa et al. [29] analyzed the population structure of coffee genotypes of interest for breeding studies, they used 11,187 SNP markers from which two groups (K = 2) were obtained.
