**5. Evolution and review of beneficial genes of the D genomes**

The evolution of the *Gossypium* genus started around 10-20 million years ago [32-33]. The initial step of this process might be started with the formation or origin of the American diploids or New World cottons, which may be estimated at around 6.7 million years ago. Following the formation of the diploid cottons was the allopolyploid formation that derived the New World tetraploid cottons around 1-2 million years ago [34], which included *G. hirsutum* and *G. barbadense* cottons. The origin of the allotetraploids is still not well understood. However, it is well established that the allotetraploids combine one genome derived from an A-genome ancestor and a second genome from a D-genome ancestor [9,33-36]. There is no evidence of any A-genome species in the New World, and there is no evidence of any D-genome species outside the New World. There has been considerable speculation over the years as to which D-genome species is the closest living relative of the ancestor of the Dt subgenome of the allotetraploid cottons. Based on molecular data, the best species model of the allotetraploid (AD) Dt subgenome is *G. raimondii*. However, recent discoveries through molecular data also revealed that *G. gossypioides* may be closer than originally thought to *G. raimondii* despite the geographical separation of these two species based on chloroplast (cpDNA) genes [13]. In terms of haploid nuclear DNA content or amounts (1C) the *Gossypium* genomes range from 1 to 3.8 pg=picograms (980 Mbp to 3425 Mbp). The D model genome is smaller with 2C amounts of 1 pg and a haploid length of 980 Mbp while the A-genome diploid nuclear genome contains about 3.8 pg of DNA (2C) and the length of a single copy of the genome is approximately 1860 Mbp. The genome size in the AD tetraploids for the most part is additive with 5.8 pg (2C) and with a haploid length of 3835 Mbp [35,37-40].

As previously established, each genomic designation (A, B, C, D…etc.) represents a functional group of chromosomes that share similar sizes and structures, as well as success of interspecific crosses. These designations also help breeders to find sources of genetic variability for the introgression of beneficial genes into elite cultivars and to determine rates of success of the introgression of these beneficial genes. Within the same designed genome, hybrid chromosomes recombine during meiosis and tend to be fertile. However, crosses made using genomes with different designation with similar basic chromosome number; hybrids are generally infertile with few stable bivalents at meiosis [39]. Breeders first turn to sources of genetic variability in the primary germplasm pool or within the same species, which includes wild or exotic and landrace germplasm of *G. hirsutum* and *G. barbadense* species. The elite public and private cultivars [*G. hirsutum* (Upland) and *G. barbadense* (Pima)] of these species contain a number of traits that originated in the primary germ‐ plasm pool e.g., the blight resistance genes [35], the nectariless trait from *G. tomentosum* [41], root-knot nematode resistance from landraces [42-43], resistance to Fusarium (*Fusarium oxysporum* f.sp. *vasinfectum* Atk. Sny & Hans) and Verticillium (*Verticillium dahlia* Kleb) wilt from landraces of *G. hirsutum* and from *G. darwinii* [44].

genetically distant geographical accession-ecotypes of *G. aridum.* In addition to US-72 (newly identified taxon) [10,27], five newly collected accessions [D4-10-O (US-41), D4-2-P (US-05), D4-12-G (US-76), D4-19-C (US-122), and D4-32-N (US-150)] from five different ecotypes and states from the country of Mexico were proposed by Ulloa et al [9] to be recognized as new species based on GD. These collected accessions had the larger GD when compared with any other recognized *Gossypium* species of the D genome, GD > 0.28 and GD ≤ 0.41 [9]. Based on the most recent explorations/collections in the country of Mexico [8-9], the existing taxonomic classification of *Gossypium* of the D4 diploid species made by Fryxell [31] and Fryxell et al [7] needs to be revised.

The evolution of the *Gossypium* genus started around 10-20 million years ago [32-33]. The initial step of this process might be started with the formation or origin of the American diploids or New World cottons, which may be estimated at around 6.7 million years ago. Following the formation of the diploid cottons was the allopolyploid formation that derived the New World tetraploid cottons around 1-2 million years ago [34], which included *G. hirsutum* and *G. barbadense* cottons. The origin of the allotetraploids is still not well understood. However, it is well established that the allotetraploids combine one genome derived from an A-genome ancestor and a second genome from a D-genome ancestor [9,33-36]. There is no evidence of any A-genome species in the New World, and there is no evidence of any D-genome species outside the New World. There has been considerable speculation over the years as to which D-genome species is the closest living relative of the ancestor of the Dt subgenome of the allotetraploid cottons. Based on molecular data, the best species model of the allotetraploid (AD) Dt subgenome is *G. raimondii*. However, recent discoveries through molecular data also revealed that *G. gossypioides* may be closer than originally thought to *G. raimondii* despite the geographical separation of these two species based on chloroplast (cpDNA) genes [13]. In terms of haploid nuclear DNA content or amounts (1C) the *Gossypium* genomes range from 1 to 3.8 pg=picograms (980 Mbp to 3425 Mbp). The D model genome is smaller with 2C amounts of 1 pg and a haploid length of 980 Mbp while the A-genome diploid nuclear genome contains about 3.8 pg of DNA (2C) and the length of a single copy of the genome is approximately 1860 Mbp. The genome size in the AD tetraploids for the most part is additive with 5.8 pg (2C) and

As previously established, each genomic designation (A, B, C, D…etc.) represents a functional group of chromosomes that share similar sizes and structures, as well as success of interspecific crosses. These designations also help breeders to find sources of genetic variability for the introgression of beneficial genes into elite cultivars and to determine rates of success of the introgression of these beneficial genes. Within the same designed genome, hybrid chromosomes recombine during meiosis and tend to be fertile. However, crosses made using genomes with different designation with similar basic chromosome number; hybrids are generally infertile with few stable bivalents at meiosis [39]. Breeders first turn to sources of genetic variability in the primary germplasm pool or within the same species, which includes wild or exotic and landrace germplasm of *G. hirsutum* and *G. barbadense*

**5. Evolution and review of beneficial genes of the D genomes**

with a haploid length of 3835 Mbp [35,37-40].

220 World Cotton Germplasm Resources

The diploid species of the A and D genome belong to the secondary germplasm pool and have contributed to improving Upland and Pima cultivars [4]. Bacterial blight resistance genes from species such as *G. arboreum*, *G. herbaceum* and *G. anomalum* have been introgressed into Upland cultivars [35]. Cytoplasm and restorer factors from *G. harknessii* [45] and *G. trilobum* [46] conditioning cytoplasmic male sterility, and D2 smoothness [47] have also been introgressed into Upland cultivars using these diploid species. Moreover, improvement of fiber quality characteristics or properties has been done via the triple hybrid (*G. hirsutum* x *G. arboreum* x *G. thurberi*) [48]. Introgression of high fiber strength and improvement of fiber quality param‐ eters were obtained using progeny from these hybrid combinations. In addition, similar triple hybrid combinations that include *G. thurberi* [49] and *G. aridum* [50] have provided progeny that have been used to develop resistant germplasm and cultivars to root-knot nematode (*rkn*, *Meloidogyne incognita* Kofoid and White) and reniform nematode (Renari, *Rotylenchulus reniformis* Linford and Oliveira). Resistance to several pests and diseases has been found in diploid cottons. However, in nature, the hybridization of diploid species with allotetraploid (Upland or Pima) species produces sterile hybrids because uneven genome or chromosome basic number and pairing during meiosis. One of the satisfactory mutagenic agents used by the breeders to induce doubling of chromosomes and balance chromosome paring on hybrids is colchicine. The difficulties of obtaining agronomically suitable introgressed progeny are high through this type of interspecific hybridization. The most successful method of intro‐ gression has been via hexaploid bridging.

Even though the *Gossypium* species of the D genome are not well known and utilized in cotton improvement and breeding, their significance as great reservoirs of important genes is starting to be noticed and documented. In a comparison quantitative trait loci (QTL) review-study [4], the Dt subgenome exhibits from 32% [4,51] to 57% [52] of QTLs on different chromosomes with QTL effects on different important traits for cotton improvements. These QTLs were located on different chromosomes of the Dt subgenome. And even though the species of the D genome does not produce spinnable fibers, the Dt subgenome of the tetraploid cotton was found to possess QTLs positively affecting fiber quality and morphological traits [53-55] and therefore harboring greater allelic diversity among tetraploid forms. Recently, based on the concept that some diploid species are tolerant to stress and may harbor important genes, a large number of genes were obtained from leaf and root tissues of the diploid *G. aridum* species. Plants of this species were subjected to various salt stresses to examine gene expression and to under‐ stand the salt tolerance mechanisms in *Gossypium* [56]. Most of the salt-regulated transcripts were found to be homologous to genes that are known to be associated with salt tolerance e.g., ethylene-responsive transcript factor, aquaporin PIP1, protein kinases (CBL-interacting and mitogen-activated) [56]. New transcriptome data from these plant tissue-species when eventually compared with available marker-QTL DNA sequence data and/or whole genome sequences will provide new insights into the evolution and expression of genes affecting important traits of cotton. QTL hotspots have been found affecting multiple fiber traits [32,51,57]. DNA sequences of marker-QTLs were found to be contributed by the D genome based on changes in expression of functionally diverse cotton genes [29]. Additional studies using the next generation sequencing (NGS) technology will provide additional information on these unique flowering and fruiting habits (following defoliation in the dry season), and salt and drought resistance mechanisms that allow the D genome species to survive extended periods without rain or stress conditions.

sequence of the best model of the Dt subgenome *(G. raimondii)* was published [32,61]. *G. raimondii* was found to be 47% (around 350 Mb) of euchromatin, spanning 2,059 centiMorgan (cM) and 53% (around 390 Mb) of heterochromatin, spanning a repeat-rich of 186 cM. Trans‐ posable-elements accounted for 61% in which 53% were long-terminal-repeats (LTRs) retro‐ transposons. It was reported that the *G. raimondii* genome contains around 37, 505 assembled

The Diploid D Genome Cottons (*Gossypium* spp.) of the New World

http://dx.doi.org/10.5772/58387

223

Increasing our knowledge and understanding of how cotton (*Gossypium* spp.) can sustain yield under drought and attack of pathogens is essential for sustained profitability and long-term survival of the cotton industry. Researchers and breeders are working to develop sources of germplasm that can improve water use efficiency (WUE), drought and extreme heat tolerance. An emergent concept from genomic studies is that different regulatory networks related to plant stress may be interconnected. For example, in most situations heat stress and drought stress are thought to be linked, and often resistance to pathogens is comprised by abiotic stress factors [64]. Improving cotton productivity in stress environments calls for the understanding of many traits involved in different system-level interactions. The integration of these plant responses and interactions with quantitative phenotypes is complex and requires the use of new approaches and technologies. These new NGS data and analyses will provide new information about the utility of the newly published *G. raimondii* genome sequence to target traits of interest in the allopolyploid species. When new NGS data of the future sequenced genomes and the genome saturation get combined with quantitative genetic analyses, cotton breeders finally will have the tools they need to identify the location of the genes (quantitative trait loci or QTLs) conditioning the expression of critical agronomic traits, such as yield, drought tolerance and water use efficiency [65-66]. If *in situ* diversity of the New World cottons is severely eroded, then current and additional accessions in International Collections and in the USDA Cotton Germplasm Collection assume a highly significant role in preservation of the diversity previously residing in these D genome species. The key to increasing genetic diversity among cultivated cottons is to continue collecting, evaluating and utilizing many different cotton germplasm, including diploid species of the *Gossypium* genus. The diploid D genome cottons (*Gossypium* spp.) of the New World are part of a great reservoir of important genes for improving fiber quality, pest and disease resistance, and drought and salt tolerance in the modern cultivated Upland/Acala (*G. hirsutum*) and Pima or Sea Island (*G. barbadense*)

In memory of Dr. James McD. Stewart. The author would like to thank Zack Quaintance, and Jazmine and Rebecca Ulloa for their time and efforts in helping to improve this chapter. The research contribution by the author of some of the information in this chapter was partially supported by a specific cooperative agreement between USDA-ARS and the Mexican agency INIFAP (ARIS Log Nos. 5303-21220-001-10S and 5303-2-F159). Mention of trade names or commercial products in this article is solely for the purpose of providing specific information

genes with 77,267 protein-coding annotated transcripts [31].

cottons.

**Acknowledgements**
