**Abstract**

*Coffea arabica* L. produces a high-quality beverage, with pleasant aroma and flavor, but diseases, pests and abiotic stresses often affect its yield. Therefore, improving important agronomic traits of this commercial specie remains a target for most coffee improvement programs. With advances in genomic and sequencing technology, it is feasible to understand the coffee genome and the molecular inheritance underlying coffee traits, thereby helping improve the efficiency of breeding programs. Thanks to the rapid development of genomic resources and the publication of the *C. canephora* reference genome, third-generation markers based on single-nucleotide polymorphisms (SNPs) have gradually been identified and assayed in *Coffea*, particularly in *C. arabica*. However, high-throughput genotyping assays are still needed in order to rapidly characterize the coffee genetic diversity and to evaluate the introgression of different cultivars in a cost-effective way. The DArTseq™ platform, developed by Diversity Arrays Technology, is one of these approaches that has experienced an increasing interest worldwide since it is able to generate thousands of high quality SNPs in a timely and cost-effective manner. These validated SNP markers will be useful to molecular genetics and for innovative approaches in coffee breeding.

**Keywords:** *Coffea* spp., high throughput genotyping, molecular markers, plant breeding, DArTseq

#### **1. Introduction**

Coffee is an important crop and the second most traded commodity in the world (after petroleum) providing a living to more than 125 million people. Commercial coffee production is controlled by only two species belonging to the *Coffea* genus: *Coffea arabica* L. (Arabica coffee) and *Coffea canephora* Pierre ex A. Froehner (Robusta coffee), which supplied 60 and 40% of the world coffee production in 2018/19, respectively [1]. Although *C. canephora* does not have the cup quality of the more popular *C. arabica*, it continues to be widely grown, especially in regions where farming is low intensive because of its tolerance to diseases and pests as well as abiotic stresses [2].

*C. arabica* produces a high-quality beverage, with pleasant aroma and flavor, but a range of biotic and abiotic stresses often affect its yield [3]. Therefore, improving important agronomic traits of both commercial species remains a target for most

coffee breeding programs. Advances in genomic and sequencing technology, make possible to understand the coffee genome and the molecular inheritance underlying coffee traits, thereby helping improve the efficiency of coffee breeding [3].

The development of new genomic tools can help us explore, more deeply and more precisely, the genomic diversity at intra and inter-specific levels [4]. Two examples of high-throughput platforms include next-generation sequencing (NGS) [5] and the development of DNA microarrays [6]. Compared to a whole-genome sequencing methodology, an SNP array approach provides time-effective, low-cost and more straightforward genotyping technology for germplasm screening [7, 8].

Thanks to the rapid development of genomic resources and the publication of the reference genome [9], third-generation markers based on single-nucleotide polymorphisms (SNPs) have gradually been identified and assayed in *Coffea*, particularly in *C. arabica* [10, 11].

## **2. Genetic diversity of** *Coffea arabica* **L.**

The *Coffea* genus belongs to the Rubiaceae family that includes around 124 species, most of them are diploids (2n = 2x = 22). The only allotetraploid is *C. arabica* L. with 2n = 4x = 44 [12], which was originated from the natural cross between *Coffea eugenioides* S. Moore and *C. canephora* Pierre ex A. Froehner [13], *C. arabica* is the only self-fertile among the other cultivated species. This specie is genetically less diverse when compared to the diploid species [14, 15], a situation that has been associated with its susceptibility to the common coffee diseases [16].

*C. arabica* is mainly native to the highlands of southwestern Ethiopia, South Sudan (Boma plateau), and north Kenya (Mount Marsabi). *C. arabica* cultivars grown all around the world are derived from either 'Typica' or 'Bourbon' genetic base [17]. Studies report wide agronomic diversity of Arabica coffee accessions collected in these regions of Ethiopia regarding leaf size, height, biotic and abiotic stresses tolerance and yield [18, 19]. In addition, studies using molecular markers indicated the presence of higher genetic variability of Ethiopian (ET) accessions compared with cultivars, demonstrating the potential of these accessions for breeding purposes [10, 20–23]. These accessions also showed a great variability of metabolite profiles contents of coffee beans for cup quality improvement [10, 24].

The assessment of population structure and genetic relationships of these ET accessions, among themselves and in relation to traditional cultivars is fundamental for efficient use of genetic diversity of these genotypes in Arabica coffee breeding programs [25]. However, selection of genetically diverse parental lines based on morphological and agronomic traits is often difficult because of a high degree of morphological similarities [26].

During the past 30 years, molecular markers have been increasingly used in germplasm diversity assessment of various crops [27, 28]. The molecular information allows gaining insight into the genetic structure of individual genotypes, and eventually helps in accurate selection of superior genotypes for maximizing selection gains [29].

#### **3.** *C. arabica* **diversity assessment by molecular markers**

Several works on the assessment of Arabica genetic diversity have been carried out with different results. Generally, among different types of material (cultivars, accessions, hybrids, and spontaneous genotypes) practically all studies show a very

**49**

(*Litchi chinensis*; [56]).

production [44].

*Genetic Diversity of* Coffea arabica *L.: A Genomic Approach*

tion structure and its genetic diversity in *Coffea* genus.

to the severe nematode problems in Central America [41].

**4. Next generation sequencing techniques in** *C. arabica*

plants have provided useful resources for genomic studies [45].

genetic clusters that comprise crop germplasm [50].

low genetic variation by using different marker systems [3]. Arabica's genetic diversity has been evaluated by a range of molecular markers, such as Random Amplified Polymorphic DNA (RAPD) [30, 31], Inter Simple Sequence Repeat (ISSR) [32], Simple Sequence Repeat (SSR) [23, 29, 33, 34], SSR and Amplified Fragment

In a recent study presented in the World Coffee Research annual report a genetic diversity assessment of 800 Arabica's accessions from the collection at CATIE, Costa Rica, shows the least genetic diversity of *C. arabica* compared to other major crops [37]. This study also found that coffee cultivars contain almost 45% of the genetic diversity found in the 800 above-mentioned accessions indicating the limitation of variability for breeding programs [3]. Therefore, it is crucial to assess the popula-

Of course, all *C. arabica* germplasm available in ex situ collections may represent only a fraction of the total genetic diversity of the remaining wild and semi-wild forest coffees in S.W. Ethiopia [38]. However, Arabica's breeders do have already an idea of the potential and limits of ET germplasm, in particular in regard to host resistances to diseases and pests. For example, none of the modern Arabica cultivars with host resistances to CLR derive from these ET germplasm [39]. Also cultivars resistant to CBD outside Ethiopia do not have ET germplasm as progenitors [40], while nematode resistance found in ET accessions provide only limited protection

In contrast, ET germplasm may be a good source for sensory quality traits in cup. The cup quality profile of the new Arabica's F1-hybrids developed for Central America is said to derive largely from one of the two progenitors, being a selected ET accession of the FAO-1964 pool [42]. Silvarolla et al. found three coffee plants in offspring of ET germplasm, which were nearly caffeine-free [43]. Male sterility has been detected in a few ET accessions, a character useful for F1-hybrid seed

NGS incorporate technologies which, at low cost and in short time, produce millions of short DNA sequence. The most commonly used platforms for highthroughput, useful genomic research, especially in non-model plant species include second generation sequencing techniques (SGseqTs): Illumina/Solexa, 454/Roche, ABI/SOLiD, and Helicos (read mostly in the range of 25 and 700 bp in length) [45]. Results obtained from such research point to the fact that NGS techniques (NGSTs) should not be restricted to the genomes of model organisms only as non-model

In contrast to classical molecular markers, SNPs are the most abundant markers, particularly in the non-coding regions of the genome [46]. NGS used jointly with different complexity reduction methods, Genotyping by sequencing (GBS) and DArTseq™ (Sequencing-based diversity array technology) methods, enable a largescale discovery of SNPs in a wide variety of non-model organisms [47–49]. These techniques provide measures of genetic divergence and diversity within the major

The genotyping profiles of SNPs can be compared across laboratories and sequencing platforms. These benefits have resulted in the increasing use of SNPs as high-quality markers for genotype identification in a wide range of crops [51], as recently demonstrated in cacao (*Theobroma cacao*; [52]), pummelo (*Citrus maxima*; [53]), tea (*Camellia sinensis*; [54]), longan (*Dimocarpus longan*; [55]), and litchi

*DOI: http://dx.doi.org/10.5772/intechopen.96640*

Length Polymorphism (AFLP) [35, 36].

*Landraces - Traditional Variety and Natural Breed*

germplasm screening [7, 8].

particularly in *C. arabica* [10, 11].

morphological similarities [26].

**2. Genetic diversity of** *Coffea arabica* **L.**

with its susceptibility to the common coffee diseases [16].

coffee breeding programs. Advances in genomic and sequencing technology, make possible to understand the coffee genome and the molecular inheritance underlying

The development of new genomic tools can help us explore, more deeply and more precisely, the genomic diversity at intra and inter-specific levels [4]. Two examples of high-throughput platforms include next-generation sequencing (NGS) [5] and the development of DNA microarrays [6]. Compared to a whole-genome sequencing methodology, an SNP array approach provides time-effective, low-cost and more straightforward genotyping technology for

Thanks to the rapid development of genomic resources and the publication of the reference genome [9], third-generation markers based on single-nucleotide polymorphisms (SNPs) have gradually been identified and assayed in *Coffea*,

The *Coffea* genus belongs to the Rubiaceae family that includes around 124 species,

most of them are diploids (2n = 2x = 22). The only allotetraploid is *C. arabica* L. with 2n = 4x = 44 [12], which was originated from the natural cross between *Coffea eugenioides* S. Moore and *C. canephora* Pierre ex A. Froehner [13], *C. arabica* is the only self-fertile among the other cultivated species. This specie is genetically less diverse when compared to the diploid species [14, 15], a situation that has been associated

*C. arabica* is mainly native to the highlands of southwestern Ethiopia, South Sudan (Boma plateau), and north Kenya (Mount Marsabi). *C. arabica* cultivars grown all around the world are derived from either 'Typica' or 'Bourbon' genetic base [17]. Studies report wide agronomic diversity of Arabica coffee accessions collected in these regions of Ethiopia regarding leaf size, height, biotic and abiotic stresses tolerance and yield [18, 19]. In addition, studies using molecular markers indicated the presence of higher genetic variability of Ethiopian (ET) accessions compared with cultivars, demonstrating the potential of these accessions for breeding purposes [10, 20–23]. These accessions also showed a great variability of metabolite profiles contents of coffee beans for cup quality improvement [10, 24]. The assessment of population structure and genetic relationships of these ET accessions, among themselves and in relation to traditional cultivars is fundamental for efficient use of genetic diversity of these genotypes in Arabica coffee breeding programs [25]. However, selection of genetically diverse parental lines based on morphological and agronomic traits is often difficult because of a high degree of

During the past 30 years, molecular markers have been increasingly used in germplasm diversity assessment of various crops [27, 28]. The molecular information allows gaining insight into the genetic structure of individual genotypes, and eventually helps in accurate selection of superior genotypes for maximizing selection

Several works on the assessment of Arabica genetic diversity have been carried out with different results. Generally, among different types of material (cultivars, accessions, hybrids, and spontaneous genotypes) practically all studies show a very

**3.** *C. arabica* **diversity assessment by molecular markers**

coffee traits, thereby helping improve the efficiency of coffee breeding [3].

**48**

gains [29].

low genetic variation by using different marker systems [3]. Arabica's genetic diversity has been evaluated by a range of molecular markers, such as Random Amplified Polymorphic DNA (RAPD) [30, 31], Inter Simple Sequence Repeat (ISSR) [32], Simple Sequence Repeat (SSR) [23, 29, 33, 34], SSR and Amplified Fragment Length Polymorphism (AFLP) [35, 36].

In a recent study presented in the World Coffee Research annual report a genetic diversity assessment of 800 Arabica's accessions from the collection at CATIE, Costa Rica, shows the least genetic diversity of *C. arabica* compared to other major crops [37]. This study also found that coffee cultivars contain almost 45% of the genetic diversity found in the 800 above-mentioned accessions indicating the limitation of variability for breeding programs [3]. Therefore, it is crucial to assess the population structure and its genetic diversity in *Coffea* genus.

Of course, all *C. arabica* germplasm available in ex situ collections may represent only a fraction of the total genetic diversity of the remaining wild and semi-wild forest coffees in S.W. Ethiopia [38]. However, Arabica's breeders do have already an idea of the potential and limits of ET germplasm, in particular in regard to host resistances to diseases and pests. For example, none of the modern Arabica cultivars with host resistances to CLR derive from these ET germplasm [39]. Also cultivars resistant to CBD outside Ethiopia do not have ET germplasm as progenitors [40], while nematode resistance found in ET accessions provide only limited protection to the severe nematode problems in Central America [41].

In contrast, ET germplasm may be a good source for sensory quality traits in cup. The cup quality profile of the new Arabica's F1-hybrids developed for Central America is said to derive largely from one of the two progenitors, being a selected ET accession of the FAO-1964 pool [42]. Silvarolla et al. found three coffee plants in offspring of ET germplasm, which were nearly caffeine-free [43]. Male sterility has been detected in a few ET accessions, a character useful for F1-hybrid seed production [44].
