**6. Advantages and disadvantages of NGS techniques in** *C. arabica*  **genomics**

High quality reference genome assemblies accelerate plant breeding by selecting desirable genes with improved agronomic traits, including high yield, tolerance to

various abiotic and biotic stresses, and resistance to pathogens [68]. However, draft genomes are suffering from unknown sequences and ambiguous assembly due to homologous sequences, while high-quality genomes are required for comparative genomics and functional annotation to crop improvement [68, 69].

These NGSTs are classified as second and third generation. The success of these NGSTs is mainly due to advancement in nanofluidics and automated single molecule imaging [69]. SGseqTs refer to those methods which require a PCR step for signal intensification prior to sequencing and third generation sequencing techniques (TGSeqTs) are those which can perform single molecule sequencing (SMS) [70].

As an advantage, in SGseqT the variation is different in their sequencing chemistry, cost, accuracy, speed and read length; SGseqTs produce thousands to billions of nucleotide long reads (25–800 nucleotides) as compared to first generation sequencing method [69, 70]. However, as a disadvantage, the accuracy of SGseqTs differs due to dependence on several multiplication steps during library preparation, each manipulation causes various artifacts in DNA measurements; additionally, the small reads produced by these procedures are not suitable for de novo genome assembly [69, 70].

Therefore, novel technologies are being designed in such a way that involve a minimum or no manipulation of the natural DNA molecule; TGSTs are able to analyze natural DNA/RNA molecules without any manipulation and without amplification [70] TGSTs have average read length longer to 10 kb, the availability of long reads constitutes a great advantage.

The first SMS technology, was developed by Quake and commercialized in 2009 by Helicos BioSciences; it worked similar to Illumina sequencers, but without any bridge amplification [70, 71]. However, it was slow, expensive and produced relatively short reads, around 35 bp long; therefore, two single-molecule approaches were technologically advanced to overcame these disadvantages [72].

The first approach, Single Molecule Real-Time (SMRT) sequencing was developed by Craighead, Korlach, Turner and Webb and was further refined and commercialized by Pacific Biosciences (PacBio) since 2011 [73]. The second approach, Nanopore sequencing, was first hypothesized in the 1990s and further developed and commercialized by Oxford Nanopore Technologies (ONT) since 2005; the advantages of SMRT sequencing over NGS have come at the price of higher per base sequencing costs [70].

Finally, DArTseq™ technique is based on genomic complexity reduction. This technique benefitted from the development in NGSTs and now DArTseq™ markers are replaced by NGS-DArT markers. Sansaloni et al. [60] found that the combined use of DArTseq™ with NGS make available more quantity of markers than conventional DArT method. DArTseq™ markers in combination with other molecular techniques have been used to create deeper genetic maps in *C. arabica* to perform association studies [4, 74, 75].

## **7. A future in genomic resources of** *C. arabica*

Arabica's cultivars and landraces are generally propagated by seed. The mating system is primarily based on self-fertilization. Thereby, autogamy leads to high levels of inbreeding. Besides, an effective clonal propagation system is being adopted but limited for F1 Arabica hybrids. It is evident that molecular analyses of genetic diversity are needed to support this scenario [74, 75].

The development of a new coffee variety takes about 25 years. An efficient selection can be addressed when sequencing approaches are adopted in the variety

**53**

*Genetic Diversity of* Coffea arabica *L.: A Genomic Approach*

development process [66, 76]. In the 1990s, Marker-Assisted Selection (MAS) was proposed, which enabled selecting individuals with specific alleles. However, MAS has shown to be inefficient in polygenic and/or low heritability traits [77]. Due to its potential and importance, genome-wide selection (GS) was developed by

With the development of NGSTs, GS has become a reality for several economically important species. However, the procedure requires precaution for polyploid species, which have subgenomes with duplicate regions or with high similarity, such as *C. arabica* [77]. Despite the economic importance of *C. arabica*, GS works in Arabica coffee are scarce. Coffee trees have been selected based on biometric analyses using phenotypic data of yield and resistance to biotic and abiotic stresses. However, due to the complexity and number of genes that control most of the agronomic traits of this *Coffea* spp., GS studies are promising for they allow estimating the effects of all loci that explain the genetic variation and the genomic estimated

Genome sequencing initiatives of Arabica accessions have been launched by several research groups (https://coffeegenome.ucdavis.edu/, among others) but an open-access genome assembly, with a reliable sorting of homologous sequences, is not yet available [77, 79]. Decoding the allotetraploid genome of *C. arabica* is

DArTseq™ technology identifies thousands of high quality SNP polymorphic markers in a timely and cost-effective manner. Our study confirmed that the genotyping method by DArTseq™ can be successfully used in studies of genetic diversity specially in coffee. In addition, trait-associated-SNPs identified by GWAS may be helpful to develop strategies aiming to improve the biochemical quality of coffee or another important trait. These SNPs markers may be useful for marker-assisted selection (MAS) in Arabica coffee breeding programs and genomic selection.

This work was supported by the FONDO SECTORIAL SAGARPA-CONACYT

The authors have no conflicting interests, and all authors have approved the

manuscript and agree with its submission to IntechOpen.

therefore required to have accurate GS studies in this species.

*DOI: http://dx.doi.org/10.5772/intechopen.96640*

breeding value (GEBV) [74, 75, 77].

Meuwissen et al. [78].

**8. Conclusions**

**Acknowledgements**

[2016-2101-277838].

**Conflict of interest**

*Genetic Diversity of* Coffea arabica *L.: A Genomic Approach DOI: http://dx.doi.org/10.5772/intechopen.96640*

development process [66, 76]. In the 1990s, Marker-Assisted Selection (MAS) was proposed, which enabled selecting individuals with specific alleles. However, MAS has shown to be inefficient in polygenic and/or low heritability traits [77]. Due to its potential and importance, genome-wide selection (GS) was developed by Meuwissen et al. [78].

With the development of NGSTs, GS has become a reality for several economically important species. However, the procedure requires precaution for polyploid species, which have subgenomes with duplicate regions or with high similarity, such as *C. arabica* [77]. Despite the economic importance of *C. arabica*, GS works in Arabica coffee are scarce. Coffee trees have been selected based on biometric analyses using phenotypic data of yield and resistance to biotic and abiotic stresses. However, due to the complexity and number of genes that control most of the agronomic traits of this *Coffea* spp., GS studies are promising for they allow estimating the effects of all loci that explain the genetic variation and the genomic estimated breeding value (GEBV) [74, 75, 77].

Genome sequencing initiatives of Arabica accessions have been launched by several research groups (https://coffeegenome.ucdavis.edu/, among others) but an open-access genome assembly, with a reliable sorting of homologous sequences, is not yet available [77, 79]. Decoding the allotetraploid genome of *C. arabica* is therefore required to have accurate GS studies in this species.

### **8. Conclusions**

*Landraces - Traditional Variety and Natural Breed*

(SMS) [70].

genome assembly [69, 70].

sequencing costs [70].

association studies [4, 74, 75].

**7. A future in genomic resources of** *C. arabica*

diversity are needed to support this scenario [74, 75].

of long reads constitutes a great advantage.

various abiotic and biotic stresses, and resistance to pathogens [68]. However, draft genomes are suffering from unknown sequences and ambiguous assembly due to homologous sequences, while high-quality genomes are required for comparative

These NGSTs are classified as second and third generation. The success of these NGSTs is mainly due to advancement in nanofluidics and automated single molecule imaging [69]. SGseqTs refer to those methods which require a PCR step for signal intensification prior to sequencing and third generation sequencing techniques (TGSeqTs) are those which can perform single molecule sequencing

As an advantage, in SGseqT the variation is different in their sequencing chemistry, cost, accuracy, speed and read length; SGseqTs produce thousands to billions of nucleotide long reads (25–800 nucleotides) as compared to first generation sequencing method [69, 70]. However, as a disadvantage, the accuracy of SGseqTs differs due to dependence on several multiplication steps during library preparation, each manipulation causes various artifacts in DNA measurements; additionally, the small reads produced by these procedures are not suitable for de novo

Therefore, novel technologies are being designed in such a way that involve a minimum or no manipulation of the natural DNA molecule; TGSTs are able to analyze natural DNA/RNA molecules without any manipulation and without amplification [70] TGSTs have average read length longer to 10 kb, the availability

The first SMS technology, was developed by Quake and commercialized in 2009 by Helicos BioSciences; it worked similar to Illumina sequencers, but without any bridge amplification [70, 71]. However, it was slow, expensive and produced relatively short reads, around 35 bp long; therefore, two single-molecule approaches

The first approach, Single Molecule Real-Time (SMRT) sequencing was developed by Craighead, Korlach, Turner and Webb and was further refined and commercialized by Pacific Biosciences (PacBio) since 2011 [73]. The second approach, Nanopore sequencing, was first hypothesized in the 1990s and further developed and commercialized by Oxford Nanopore Technologies (ONT) since 2005; the advantages of SMRT sequencing over NGS have come at the price of higher per base

Finally, DArTseq™ technique is based on genomic complexity reduction. This technique benefitted from the development in NGSTs and now DArTseq™ markers are replaced by NGS-DArT markers. Sansaloni et al. [60] found that the combined use of DArTseq™ with NGS make available more quantity of markers than conventional DArT method. DArTseq™ markers in combination with other molecular techniques have been used to create deeper genetic maps in *C. arabica* to perform

Arabica's cultivars and landraces are generally propagated by seed. The mating system is primarily based on self-fertilization. Thereby, autogamy leads to high levels of inbreeding. Besides, an effective clonal propagation system is being adopted but limited for F1 Arabica hybrids. It is evident that molecular analyses of genetic

The development of a new coffee variety takes about 25 years. An efficient selection can be addressed when sequencing approaches are adopted in the variety

were technologically advanced to overcame these disadvantages [72].

genomics and functional annotation to crop improvement [68, 69].

**52**

DArTseq™ technology identifies thousands of high quality SNP polymorphic markers in a timely and cost-effective manner. Our study confirmed that the genotyping method by DArTseq™ can be successfully used in studies of genetic diversity specially in coffee. In addition, trait-associated-SNPs identified by GWAS may be helpful to develop strategies aiming to improve the biochemical quality of coffee or another important trait. These SNPs markers may be useful for marker-assisted selection (MAS) in Arabica coffee breeding programs and genomic selection.

#### **Acknowledgements**

This work was supported by the FONDO SECTORIAL SAGARPA-CONACYT [2016-2101-277838].

#### **Conflict of interest**

The authors have no conflicting interests, and all authors have approved the manuscript and agree with its submission to IntechOpen.
