**6. Future research of gene discovery and mining of the D genomes**

The most widely cultivated cotton species in the world, which is known by various common names (e.g., Acala or Upland cotton, short staple cotton, Mocó cotton, and Cambodia cotton) is *G. hirsutum* [4-5,8,12]. Recent advances in genomics have provided considerable information regarding the discovery and expression of genes controlling important crop traits. In the future, the new generation of cotton breeders will have the opportunity to benefit from the vast information generated by NGS on genomic research. This information could be used to improve existing tools such as MAS for molecular breeding or to develop new tools to locate and characterize germplasm and cultivars or gene-sequences of DNA encoding proteins that controlled their expression of important traits, developmentally and temporally.

With the decrease in sequencing cost by NGS technology, it has been possible to obtain large numbers of base pairs of DNA sequences for identifying polymorphisms and directly mapping genes responsible for important traits through direct whole genome sequencing [58-59]. Molecular markers are being continuously developed, which will allow cotton geneticists to sample all regions of the cotton genome [4,9,32,60-61]. Plant breeders find molecular markers useful as a selection tool in monitoring alien genome introgression in cotton breeding programs [62-63]. Alien genome has the potential to increase genetic variability for economically valuable traits in cotton cultivars. The process of introgression of alien genes/genomes is not easy but clearly increases the amount of genetic diversity available for selection because it is likely that many useful alleles are to be found outside the current cultivated gene pools. Even when the inferred gene is yet to be located, sequenced, and characterized, molecular breeders could use these natural DNA sequence (SNP – single nucleotide polymorphism) to identify differences among germplasm and breeding lines, and applying traditional genetic analyses to infer genes for MAS. In addition, the efficacy of transgenic technology is entirely dependent on gene discovery. The functional genes identified with NGS would be important resources to improve the cotton crop through transgenic technology.

The understanding of the cotton genome is complex, especially the evolution and function of the major cultivated species, e.g., *G. hirsutum*. This complexity arises from the joint presence of the two subgenomes (At and Dt) in its nucleus [60]. The complete sequence of the allote‐ traploid genomes, including *G. hirsutum*, is still not completed. However, recently the genome sequence of the best model of the Dt subgenome *(G. raimondii)* was published [32,61]. *G. raimondii* was found to be 47% (around 350 Mb) of euchromatin, spanning 2,059 centiMorgan (cM) and 53% (around 390 Mb) of heterochromatin, spanning a repeat-rich of 186 cM. Trans‐ posable-elements accounted for 61% in which 53% were long-terminal-repeats (LTRs) retro‐ transposons. It was reported that the *G. raimondii* genome contains around 37, 505 assembled genes with 77,267 protein-coding annotated transcripts [31].

Increasing our knowledge and understanding of how cotton (*Gossypium* spp.) can sustain yield under drought and attack of pathogens is essential for sustained profitability and long-term survival of the cotton industry. Researchers and breeders are working to develop sources of germplasm that can improve water use efficiency (WUE), drought and extreme heat tolerance. An emergent concept from genomic studies is that different regulatory networks related to plant stress may be interconnected. For example, in most situations heat stress and drought stress are thought to be linked, and often resistance to pathogens is comprised by abiotic stress factors [64]. Improving cotton productivity in stress environments calls for the understanding of many traits involved in different system-level interactions. The integration of these plant responses and interactions with quantitative phenotypes is complex and requires the use of new approaches and technologies. These new NGS data and analyses will provide new information about the utility of the newly published *G. raimondii* genome sequence to target traits of interest in the allopolyploid species. When new NGS data of the future sequenced genomes and the genome saturation get combined with quantitative genetic analyses, cotton breeders finally will have the tools they need to identify the location of the genes (quantitative trait loci or QTLs) conditioning the expression of critical agronomic traits, such as yield, drought tolerance and water use efficiency [65-66]. If *in situ* diversity of the New World cottons is severely eroded, then current and additional accessions in International Collections and in the USDA Cotton Germplasm Collection assume a highly significant role in preservation of the diversity previously residing in these D genome species. The key to increasing genetic diversity among cultivated cottons is to continue collecting, evaluating and utilizing many different cotton germplasm, including diploid species of the *Gossypium* genus. The diploid D genome cottons (*Gossypium* spp.) of the New World are part of a great reservoir of important genes for improving fiber quality, pest and disease resistance, and drought and salt tolerance in the modern cultivated Upland/Acala (*G. hirsutum*) and Pima or Sea Island (*G. barbadense*) cottons.

#### **Acknowledgements**

eventually compared with available marker-QTL DNA sequence data and/or whole genome sequences will provide new insights into the evolution and expression of genes affecting important traits of cotton. QTL hotspots have been found affecting multiple fiber traits [32,51,57]. DNA sequences of marker-QTLs were found to be contributed by the D genome based on changes in expression of functionally diverse cotton genes [29]. Additional studies using the next generation sequencing (NGS) technology will provide additional information on these unique flowering and fruiting habits (following defoliation in the dry season), and salt and drought resistance mechanisms that allow the D genome species to survive extended

**6. Future research of gene discovery and mining of the D genomes**

controlled their expression of important traits, developmentally and temporally.

The most widely cultivated cotton species in the world, which is known by various common names (e.g., Acala or Upland cotton, short staple cotton, Mocó cotton, and Cambodia cotton) is *G. hirsutum* [4-5,8,12]. Recent advances in genomics have provided considerable information regarding the discovery and expression of genes controlling important crop traits. In the future, the new generation of cotton breeders will have the opportunity to benefit from the vast information generated by NGS on genomic research. This information could be used to improve existing tools such as MAS for molecular breeding or to develop new tools to locate and characterize germplasm and cultivars or gene-sequences of DNA encoding proteins that

With the decrease in sequencing cost by NGS technology, it has been possible to obtain large numbers of base pairs of DNA sequences for identifying polymorphisms and directly mapping genes responsible for important traits through direct whole genome sequencing [58-59]. Molecular markers are being continuously developed, which will allow cotton geneticists to sample all regions of the cotton genome [4,9,32,60-61]. Plant breeders find molecular markers useful as a selection tool in monitoring alien genome introgression in cotton breeding programs [62-63]. Alien genome has the potential to increase genetic variability for economically valuable traits in cotton cultivars. The process of introgression of alien genes/genomes is not easy but clearly increases the amount of genetic diversity available for selection because it is likely that many useful alleles are to be found outside the current cultivated gene pools. Even when the inferred gene is yet to be located, sequenced, and characterized, molecular breeders could use these natural DNA sequence (SNP – single nucleotide polymorphism) to identify differences among germplasm and breeding lines, and applying traditional genetic analyses to infer genes for MAS. In addition, the efficacy of transgenic technology is entirely dependent on gene discovery. The functional genes identified with NGS would be important resources to improve

The understanding of the cotton genome is complex, especially the evolution and function of the major cultivated species, e.g., *G. hirsutum*. This complexity arises from the joint presence of the two subgenomes (At and Dt) in its nucleus [60]. The complete sequence of the allote‐ traploid genomes, including *G. hirsutum*, is still not completed. However, recently the genome

periods without rain or stress conditions.

222 World Cotton Germplasm Resources

the cotton crop through transgenic technology.

In memory of Dr. James McD. Stewart. The author would like to thank Zack Quaintance, and Jazmine and Rebecca Ulloa for their time and efforts in helping to improve this chapter. The research contribution by the author of some of the information in this chapter was partially supported by a specific cooperative agreement between USDA-ARS and the Mexican agency INIFAP (ARIS Log Nos. 5303-21220-001-10S and 5303-2-F159). Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U. S. Department of Agriculture. The U. S. Department of Agriculture is an equal opportunity provider and employer.

[8] Ulloa, M, Stewart JM, Garcia-C EA, Godoy-A A, Gaytán-M A, Acosta-N S. Cotton ge‐ netic resources in the western states of Mexico: in situ conservation status and germ‐ plasm collection for ex situ preservation. Genet. Resour. Crop Evol; 2006. 53: 653-668.

The Diploid D Genome Cottons (*Gossypium* spp.) of the New World

http://dx.doi.org/10.5772/58387

225

[9] Ulloa M, Abdurakhmonov IY, Perez-M C, Percy R, Stewart McDJ. Genetic diversity and population structure of cotton (*Gossypium* spp.) of the New World assessed by

[10] Alvarez I, Wendel JF. Cryptic interspecific introgression and genetic differentiation within *Gossypium aridum* (Malvaceae) and its relatives. Evolution 2006; 60:505-517.

[11] Feng C, Ulloa M, Perez-M C, Stewart, JM. Distribution and molecular diversity of ar‐

[12] Fryxell PA, Taxonomy and Germplasm Resources, In: Kohel RJ and Lewis CF (eds)

[13] Cronn RC, Small RL, Haselkorn T, Wendel JF. Duplicated genes evolve independent‐

[14] Ram SG, Thiruvengadam V, Vinod KK. Genetic diversity among cultivars, landraces and wild relatives of rice as revealed by microsatellite markers. J. Appl. Genet. 2007;

[15] Vigouroux Y, Glaubitz JC, Matsuoka Y, Goodman MM, Sanchez GJ, Doebley J. Popu‐ lation structure and genetic diversity of New World maize races assessed by DNA

[16] Wendel JF, Schnabel A, Seelanan T. An unusual ribosomal DNA sequence from *Gos‐ sypium gossypioides* reveals ancient, cryptic intergenomic introgression. Mol. Phylo‐

[17] Buckler ES, Ippolito A, Holtsford TP. The Evolution of ribosomal DNA: divergent

[18] Pillay M, Myers GO. Genetic diversity in cotton assessed by variation in ribosomal

[19] Small RL, Ryburn JA, Cronn RC, Seelanan T, Wendel JF. The tortoise and the hare: Choosing between noncoding plastome and nuclear Adh sequences for phylogeny reconstruction of a recently diverged plant group. Am. J. Bot. 1998; 85:1301-1315.

[20] Zhao X, Si Y, Hanson RE, Crane CF, Price HJ, Stelly DM, Wendel JF, Paterson AH. Dispersed repetitive DNA has spread to new genomes since polyploidy formation in

[21] Hanson RE, Zhao XP, Islam-Faridi MN, Paterson AH, Zwick MS, Crane CF, McKnight TD, Stelly DM, Price HJ. Evolution of interspersed repetitive elements in

paralogues and phylogenetic implications. Genetics. 1997; 145:821-832.

RNA genes and AFLP markers. Crop Sci. 1999; 39:1881-1886.

SSR markers. Botany 2013; 91:251-259.

48:337-345.

gen. Evol. 1995; 4:298-313.

borescent *Gossypium* species. Botany; 2011; 89:615-624

microsatellites. Am. J. Bot. 2008; 95:1240-1253.

cotton. Genome Research. 1998; 8:479-492.

*Gossypium* (Malvaceae). Am. J. Bot. 1998; 85:1364-1368.

Cotton. Agronomy No. 24, ASA, CSSA & SSSA. 1984; p. 27-58.

ly after polyploid formation in cotton. 2003; Evolution: 2475-2489.
