**1. Introduction**

Cotton (*Gossypium* spp.) is the unique, most important natural fiber crop in the world that brings significant economic income, with an annual average ranging from \$27 – 29 billion worldwide from lint fiber production (Campbell et al., 2010). The worldwide economic impact of the cotton industry is estimated at ~\$500 billion/yr with an annual utilization of ~115-million bales or ~27-million metric tons (MT) of cotton fiber (Chen et al., 2007). In 2011 and 2012, global cotton production is projected to increase 8% (to 26.9 million MT). This will be the largest crop since 2004 and 2005 (International Cotton Advisory Committee [ICAC], 2011).

Cotton is also a significant food source for humans and livestock (Sunilkumar et al., 2006). Cotton fiber production and its export, being one of the main economic resources, annually brings an average of ~\$0.9 to 1.2 billion economic income for Uzbekistan (Abdurakhmonov, 2007) that represented 22% of all Uzbek exports from 2001-2003 (Campbell et al., 2010). The economic income from cotton production accounts for roughly 11% of the Uzbekistan's GDP in 2009 (http://www.state.gov/r/pa/ei/bgn/2924.htm, verified on September 15, 2011).

The level of genetic diversity of crop species is an essential element of sustainable crop production in agriculture, including cotton. The amplitude of genetic diversity of *Gossypium* species is exclusively wide, encompassing wide geographic and ecological niches. It is conserved *in situ* at centers for cotton origin (Ulloa et al., 2006) and preserved *ex situ* within worldwide cotton germplasm collections and materials of breeding programs. Cotton

<sup>\*</sup> Zabardast T. Buriev1, Shukhrat E. Shermatov1, Alisher A. Abdullaev1, Khurshid Urmonov1, Fakhriddin Kushanov1, Sharof S. Egamberdiev1, Umid Shapulatov1, Abdusttor Abdukarimov1, Sukumar Saha2, Johnnie N. Jenkins2, Russell J. Kohel3, John Z. Yu3, Alan E. Pepper4, Siva P. Kumpatla5 and Mauricio Ulloa6 *1Center of Genomic Technologies, Institute of Genetics and Plant Experimental Biology, Academy of Sciences of Uzbekistan, Yuqori Yuz, Qibray Region, Tashkent, Uzbekistan* 

*<sup>2</sup>United States Department of Agriculture-Agriculture Research Service, Crop Science Research Laboratory, Starkville, Mississippi State, USA* 

*<sup>3</sup>United States Department of Agriculture-Agriculture Research Service, Crop Germplasm Research Unit, College Station, Texas, USA* 

*<sup>4</sup>Department of Biology, Texas A&M University, College Station, Texas, USA* 

*<sup>5</sup>Department of Biotechnology Regulatory Sciences, Dow AgroSciences LLC, Indianapolis, Indiana, USA 6United States Department of Agriculture-Agriculture Research Service, Western Integrated Cropping Systems, USA* 

productivity and the future of cotton breeding efforts tightly depend on 1) the level of the genetic diversity of cotton gene pools and 2) its effective exploitation in cotton breeding programs. Elucidating the details of genetic diversity is also very important to determine timeframe of cotton agronomy, develop a strategy for genetic gains in breeding, and conserve existing gene pools of cotton.

During past decades, because of advances in molecular marker technology, there have been extensive efforts to explore the molecular genetic diversity levels in various cotton gene pools and genomic groups, varietal and breeding collections, and specific germplasm resources. These efforts reinforced a serious concern about the narrow genetic base of cultivated cotton germplasm, which has obviously been associated with a "genetic bottleneck" occurred during historic cotton domestication process (Iqbal et al., 2001). A narrow genetic base of cultivated germplasm was one of the major factors causing the recent cotton yield and quality declines (Esbroeck & Bowman, 1998; Paterson et al., 2004).These declines, however, are largely due to challenges and the lack of innovative tools to effectively exploit genetic diversity of *Gossypium* species. The most effective utilization of genetic diversity of *Gossypium* species further requires modern genomics technologies that help to reveal the molecular basis of genetic variations of agronomic importance. Sequencing the cotton genome(s) (Chen et al., 2007;) is a pivotal step that will facilitate the fine-scale mapping and better utilization of functionally significant variations in cotton gene pools (Abdurakhmonov, 2007). Once exploited effectively, these wide ranges of genetic diversity of the genus, in particular reservoir of potentially underutilized genetic diversity in exotic wild cotton germplasm, are the 'golden' resources to improve cotton cultivars and solve many fundamental problems associated with fiber quality, resistance to insects and pathogens and tolerance to abiotic stresses (Abdurakhmonov, 2007). In this chapter, we describe cotton germplasm resources, the amplitude of morphobiological and agronomic diversity of *Gossypium* genus and review efforts on molecular genetic diversity of cotton gene pools as well as highlight examples, challenges and perspectives of exploiting genetic diversity in cotton.

#### **2. Description of cotton gene pools and worldwide germplasm collections**

Although wild cottons (*Gossypium* spp) are perennial shrubs and trees, the domesticated cottons are tropic and sub-tropic annual crops cultivated since prehistoric times of the development of human civilization. The *Gossypium* genus of the *Malvaceae* family contains more than 45 diploid species and 5 allotetraploid species (Fryxell et al., 1992; Percival et al., 1999; Ulloa et al., 2007). These species are grouped into nine genomic types (*x* = *2n* = 26, or *n* = 13) with designations: AD, A, B, C, D, E, F, G, and K (Percival et al., 1999). The species are largely spread throughout the diverse geographic regions of the world. Based on the usage of these *Gossypium* species in cotton breeding and their genetic hybridization properties, they can be grouped into 1) primary gene pool, which includes the two species from the New World, *G. hirsutum* L. and *G. barbadense* L.*,* as well as remaining three wild tetraploid species, *G. tomentosum* Nuttall *ex* Seemann, *G. mustelinum* Miers *ex* Watt and *G. darwinii*  Watt; 2) secondary gene pool, including A, B, D and F genome diploid cotton species; 3) tertiary gene pool, including C, E, G, K genome *Gossypium* species (Stelly et al., 2007; Campbell et al., 2010).

productivity and the future of cotton breeding efforts tightly depend on 1) the level of the genetic diversity of cotton gene pools and 2) its effective exploitation in cotton breeding programs. Elucidating the details of genetic diversity is also very important to determine timeframe of cotton agronomy, develop a strategy for genetic gains in breeding, and

During past decades, because of advances in molecular marker technology, there have been extensive efforts to explore the molecular genetic diversity levels in various cotton gene pools and genomic groups, varietal and breeding collections, and specific germplasm resources. These efforts reinforced a serious concern about the narrow genetic base of cultivated cotton germplasm, which has obviously been associated with a "genetic bottleneck" occurred during historic cotton domestication process (Iqbal et al., 2001). A narrow genetic base of cultivated germplasm was one of the major factors causing the recent cotton yield and quality declines (Esbroeck & Bowman, 1998; Paterson et al., 2004).These declines, however, are largely due to challenges and the lack of innovative tools to effectively exploit genetic diversity of *Gossypium* species. The most effective utilization of genetic diversity of *Gossypium* species further requires modern genomics technologies that help to reveal the molecular basis of genetic variations of agronomic importance. Sequencing the cotton genome(s) (Chen et al., 2007;) is a pivotal step that will facilitate the fine-scale mapping and better utilization of functionally significant variations in cotton gene pools (Abdurakhmonov, 2007). Once exploited effectively, these wide ranges of genetic diversity of the genus, in particular reservoir of potentially underutilized genetic diversity in exotic wild cotton germplasm, are the 'golden' resources to improve cotton cultivars and solve many fundamental problems associated with fiber quality, resistance to insects and pathogens and tolerance to abiotic stresses (Abdurakhmonov, 2007). In this chapter, we describe cotton germplasm resources, the amplitude of morphobiological and agronomic diversity of *Gossypium* genus and review efforts on molecular genetic diversity of cotton gene pools as well as highlight examples, challenges and perspectives of exploiting genetic

**2. Description of cotton gene pools and worldwide germplasm collections** 

Although wild cottons (*Gossypium* spp) are perennial shrubs and trees, the domesticated cottons are tropic and sub-tropic annual crops cultivated since prehistoric times of the development of human civilization. The *Gossypium* genus of the *Malvaceae* family contains more than 45 diploid species and 5 allotetraploid species (Fryxell et al., 1992; Percival et al., 1999; Ulloa et al., 2007). These species are grouped into nine genomic types (*x* = *2n* = 26, or *n* = 13) with designations: AD, A, B, C, D, E, F, G, and K (Percival et al., 1999). The species are largely spread throughout the diverse geographic regions of the world. Based on the usage of these *Gossypium* species in cotton breeding and their genetic hybridization properties, they can be grouped into 1) primary gene pool, which includes the two species from the New World, *G. hirsutum* L. and *G. barbadense* L.*,* as well as remaining three wild tetraploid species, *G. tomentosum* Nuttall *ex* Seemann, *G. mustelinum* Miers *ex* Watt and *G. darwinii*  Watt; 2) secondary gene pool, including A, B, D and F genome diploid cotton species; 3) tertiary gene pool, including C, E, G, K genome *Gossypium* species (Stelly et al., 2007;

conserve existing gene pools of cotton.

diversity in cotton.

Campbell et al., 2010).

Diploid cottons, referred as Old World cottons, are classified into eight (A-G to K) cytogenetically defined genome groups that have African/Asian, American, and Australian origin (Endrizzi et al., 1985). Two of these Old World cottons from Asian origin, *G. arboreum*  L. and *G. herbaceum* L., with a spinnable seed fiber, were originally cultivated in Asian continent. Today, Old World cultivated cottons remain primarily for non-industrial consumption in India and adjacent Asian countries.

The New World diploid *Gossypium* comprises of 14 (one undescribed taxon US-72) D genome species (Ulloa et al., 2006; Alvarez and Wendel, 2006; Feng et al., 2011). Taxonomically, these species are recognized as the *Houzingenia* subgenus (Fryxell, 1979, 1992). Twelve of the 14 species of this group are distributed in Mexico and extending northward into Arizona. Five species are adapted to the desert environments of Baja California [*G*. *armourianum* Kearney (D2-1), *G*. *harknessii* Brandegee (D2-2)*,* and *G. davidsonii*  Kellogg (D3-d)] and NW mainland Mexico [*G*. *turneri* Fryxell (D10) and *G. thurberi* Todaro (D1)]. An additional seven species [*G.* sp. US-72, *G. aridum* (Rose & Standley) Skovsted (D4)*, G. lobatum* Gentry (D7)*, G. laxum* Phillips (D9)*, G. schwendimanii* Fryx. & Koch (D11)*, G*. *gossypioides* (Ulbrich) Standley (D6), and *G. trilobum* (Mociño & Sessé ex DC.) Skovsted (D8)] are located in the Pacific coast states of Mexico and, with the exception of the last species, are arborescent in growth habit (Ulloa et al., 2006). The other two species with disjunct distributions, *G. raimondii* Ulbrich (D5) is endemic to Peru, while *G. klotzschianum* Andersson (D3-k) is found in the Galápagos Islands. The D-genome species (subgenus *Houzingenia*) are classified into six sections: Section *Houzingenia* Fryxell (D1 and D8); Section *Integrifolia* Todaro (D3-d and D3-k); Section *Caducibracteolata* Mauer (D2-1, D2-2, and D10); Section *Erioxylum* Rose & Standley (US-72, D4, D7, D9, and D11); Section *Selera* (Ulbrich) Fryxell (D6); and Section *Austroamericana* Fryxell (D5) (Percival et al., 1999).

Until recently, evaluation of the New World D-genome species of *Gossypium*, especially Section *Houzingenia* and Section *Erioxylum*, has been limited by the lack of resource material for *ex situ* evaluation. In recent years, the United States Department of Agriculture and the Mexican Instituto Nacional de Investigaciónes Forestales Agricolas y Pecuarias (INIFAP) have sponsored joint *Gossypium* germplasm collection trips by U.S. and Mexican cotton scientists (Ulloa et al., 2006; Feng et al., 2011). As a result of these efforts, a significant number of additional *Gossypium* accessions of the subgenus *Houzingenia* from various parts of Mexico are now available for evaluation, including several accessions of each of the arborescent species (Ulloa et al., 2006). Although none of these diploid species produces cotton fibers, the D genome is one of the parental lineages of the modern allotetraploid cultivated cottons, Upland and Pima (Ulloa, 2009). Studying these D genome species is the first critical step to fulfill the pressing need to document the *in situ* conservation, to assess the genetic diversity in *Gossypium* species for the preservation of the D genome species, and to facilitate their use for cotton improvement. *In situ* conservation of some of these species is threatened by population growth and industrialized agriculture. These *Gossypium* species are donors of important genes for cotton improvement (Ulloa et al., 2006).

Hybridization between A-genome (Old World cottons) and D-genome (New World cottons) diploids and subsequent polyploidization about 1.5 million years ago created the five AD allotetraploid lineages belonging to the primary gene pool that are indigenous to America and Hawaii (Phillips, 1964; Wendel & Albert, 1992; Adams et al., 2004). These New World allotetraploid cottons include the commercially important species, *G. hirsutum* and *G.*  *barbadense*, which are extensively cultivated worldwide (Abdurakhmonov, 2007; Campbell et al., 2010).

*G. hirsutum* (also called Acala or Upland, short stapled, Mocó, and Cambodia cotton) is the most widely cultivated (90%) and industrial cotton among all *Gossypium* species. It includes the Upland cotton cultivars and other early maturing, annually grown herbal bushes. The center of origin for *G. hirsutum* is Mesoamerica (Mexico and Guatemala), but it spread throughout Central America and Caribbean. According to archaeobotanical findings, *G. hirsutum* probably was domesticated originally within the Southern end of Mesoamerican gene pool (Wendel, 1995; Brubaker et al., 1999). Consequently, two centers of genetic diversity exist within *G. hirsutum*: Southern Mexico-Guatemala and Caribbean (Brubaker et al., 1999); Mexico-Guatemala gene pool is considered the site of original domestication and primary center of diversity. Within this range, *G. hirsutum* exhibits diverse types of morphological forms, including wild, primitive to domesticated accessions. According to Mauer (1954), there are four groups of sub-species of *G. hirsutum*: (1) *G. hirsutum* ssp. mexicanum, (2) *G. hirsutum* ssp. paniculatum, (3) *G. hirsutum* ssp. punctatum, and (4) *G. hirsutum* ssp. euhirsutum (domesticated cultivars). These four groups of sub-species include within themselves a number of wild landraces and primitive predomesticated forms such as yucatanense, richmondi, punctatum, latifolium, palmeri, morilli, purpurascens and their accessions as well as a number of domesticated variety accessions from 80 different cotton growing countries worldwide (Sunilkumar et al., 2006; Lacape et al., 2007; Abdurakhmonov, 2007).

*G. barbadense* (also called as long staple fibered Pima, Sea Island or Egyptian cotton), accounting for about 9% of world cotton production, was originally cultivated in coastal islands and lowland of the USA and became known as Sea Island cotton. Sea Island cottons, then, were introduced into Nile Valley of Egypt and widely grown as Egyptian cotton to produce long staple fine fibers (Abdalla et al., 2001). The wide-distribution of *G. barbadense* included mostly South America, southern Mesoamerica and the Caribbean basin (Fryxell, 1979). *G. barbadense* can be divided into two botanical races *brasilense* (with kidney-seed trait) and *barbadense* (with nonaggregated seeds) that both widely present as semi-domesticated forms in Brazil (de Almeida et al., 2009). The *brasilense* race, considered to have been domesticated in the Amazonian basin (de Almeida et al., 2009) is considered a locally domesticated form for *G. barbadense* cotton (Brubaker et al., 1999; de Almeida et al., 2009).

The other three AD tetraploid species of cotton, *G. mustelinum* with specific distribution in the Northeast Brazil (Wendel et al., 1994), *G. darwinii* endemic to Galapagos Islands (Wendel & Percy, 1990), and *G. tomentosum* Nutall ex Seemann endemic to Hawaiian Islands (DeJoode and Wendel, 1992; Hawkins et al., 2005), are truly wild species (Westengen et al., 2005).

The main *ex situ* cotton germplasm collections are in the US, France, China, India, Russia, Uzbekistan, Brazil, and Australia. Although there are a few other cotton germplasm collections present in other countries of the world, these eight countries represent the majority of the world's cotton germplasm resources. Each country has a germplasm storage and conservation program in place (Campbell et al., 2010). The history of collecting an initial cotton germplasm through the specific expeditions of cotton scientists to the centers of *Gossypium* origins are well described by Ulloa et al. (2006) that were the basis, perhaps, for the majority of the current cotton germplasm collections worldwide. Consequently, to protect the world-wide economic value of cotton and cotton byproducts, cotton germplasm collections worldwide were enriched with numerous cotton germplasm accessions and

*barbadense*, which are extensively cultivated worldwide (Abdurakhmonov, 2007; Campbell

*G. hirsutum* (also called Acala or Upland, short stapled, Mocó, and Cambodia cotton) is the most widely cultivated (90%) and industrial cotton among all *Gossypium* species. It includes the Upland cotton cultivars and other early maturing, annually grown herbal bushes. The center of origin for *G. hirsutum* is Mesoamerica (Mexico and Guatemala), but it spread throughout Central America and Caribbean. According to archaeobotanical findings, *G. hirsutum* probably was domesticated originally within the Southern end of Mesoamerican gene pool (Wendel, 1995; Brubaker et al., 1999). Consequently, two centers of genetic diversity exist within *G. hirsutum*: Southern Mexico-Guatemala and Caribbean (Brubaker et al., 1999); Mexico-Guatemala gene pool is considered the site of original domestication and primary center of diversity. Within this range, *G. hirsutum* exhibits diverse types of morphological forms, including wild, primitive to domesticated accessions. According to Mauer (1954), there are four groups of sub-species of *G. hirsutum*: (1) *G. hirsutum* ssp. mexicanum, (2) *G. hirsutum* ssp. paniculatum, (3) *G. hirsutum* ssp. punctatum, and (4) *G. hirsutum* ssp. euhirsutum (domesticated cultivars). These four groups of sub-species include within themselves a number of wild landraces and primitive predomesticated forms such as yucatanense, richmondi, punctatum, latifolium, palmeri, morilli, purpurascens and their accessions as well as a number of domesticated variety accessions from 80 different cotton growing countries

worldwide (Sunilkumar et al., 2006; Lacape et al., 2007; Abdurakhmonov, 2007).

*G. barbadense* (also called as long staple fibered Pima, Sea Island or Egyptian cotton), accounting for about 9% of world cotton production, was originally cultivated in coastal islands and lowland of the USA and became known as Sea Island cotton. Sea Island cottons, then, were introduced into Nile Valley of Egypt and widely grown as Egyptian cotton to produce long staple fine fibers (Abdalla et al., 2001). The wide-distribution of *G. barbadense* included mostly South America, southern Mesoamerica and the Caribbean basin (Fryxell, 1979). *G. barbadense* can be divided into two botanical races *brasilense* (with kidney-seed trait) and *barbadense* (with nonaggregated seeds) that both widely present as semi-domesticated forms in Brazil (de Almeida et al., 2009). The *brasilense* race, considered to have been domesticated in the Amazonian basin (de Almeida et al., 2009) is considered a locally domesticated form for *G. barbadense* cotton (Brubaker et al., 1999; de Almeida et al., 2009).

The other three AD tetraploid species of cotton, *G. mustelinum* with specific distribution in the Northeast Brazil (Wendel et al., 1994), *G. darwinii* endemic to Galapagos Islands (Wendel & Percy, 1990), and *G. tomentosum* Nutall ex Seemann endemic to Hawaiian Islands (DeJoode

The main *ex situ* cotton germplasm collections are in the US, France, China, India, Russia, Uzbekistan, Brazil, and Australia. Although there are a few other cotton germplasm collections present in other countries of the world, these eight countries represent the majority of the world's cotton germplasm resources. Each country has a germplasm storage and conservation program in place (Campbell et al., 2010). The history of collecting an initial cotton germplasm through the specific expeditions of cotton scientists to the centers of *Gossypium* origins are well described by Ulloa et al. (2006) that were the basis, perhaps, for the majority of the current cotton germplasm collections worldwide. Consequently, to protect the world-wide economic value of cotton and cotton byproducts, cotton germplasm collections worldwide were enriched with numerous cotton germplasm accessions and

and Wendel, 1992; Hawkins et al., 2005), are truly wild species (Westengen et al., 2005).

et al., 2010).

breeding materials/lines as source of the genetic diversity through continuous research efforts of specific cotton breeding programs and mutual germplasm exchange over the last 100 years (Abdurakhmonov, 2007; Campbell et al., 2010).

The brief descriptions for some of worldwide cotton germplasm collections were highlighted in several documents by Abdurakhmonov (2007), Chen et al. (2007), Stelly et al. (2007), Ibragimov et al. (2008), Wallace et al. (2009) and Campbell et al. (2010). In particular, a recent report of cotton researchers published in Crop science journal (Campbell et al., 2010) has widely described the current status of global cotton germplasm resources. Campbell et al. (2010) provided information regarding: 1) members of the collection, 2) maintenance and storage procedures, 3) seed request and disbursement, 4) funding apparatus and staffing, 5) characterization methodology, 6) data management, and 7) past and present explorations.

The contents and distribution of cotton germplasm accessions across the eight collections is summarized by Campbell et al. (2010), so we will not review the details of each collection to avoid redundancy, but rather found appropriate to list brief information in regards to the overall content and specificity of these world cotton collections. Based on a number of preserved cotton accessions in the collection, the eight major world collections can be positioned as follows: Uzbekistan (18971 accessions), India (10469 accessions), USA (10318 accessions), China (8837 accessions), Russia (6276 accessions), Brazil (4296 accessions), CIRAD (France; 3070 accessions) and Australia (1711 accessions). The main content of these collections consists of accessions for two cultivated cotton species, *G. hirsutum* and *G. barbadense*. Uzbekistan (2680 accessions), India (2283 accessions) and USA (1923 accessions) collections are the richest ones to maintain a great number of accessions for Asian diploid cottons, *G. herbaceum* and *G. arboreum* belonging to the secondary gene pool. If the collection of wild species belonging to primary, secondary and tertiary gene pools are considered Brazil (889 accessions), USA (509 accessions) and CIRAD (295 accessions) are the richest cotton collections in the world.

#### **3. Spectra of morphological and agronomic diversity in cotton**

The amplitude of genetic diversity of cotton (*Gossypium* spp), including all its morphological, physiological and agronomic properties, is exclusively wide (Mauer, 1954). There is a great deal of genetic diversity in the *Gossypium* genus with characteristics such as plant architecture, stem pubescence and color, leaf plate shape, flower color, pollen color, boll shape, fiber quality, yield potential, early maturity, photoperiod dependency, and resistance to multi-adversity environmental stresses that are important for the applied breeding of cotton. The glimpse of genetic diversity on some morphological traits is demonstrated in Figs. 1 and 2.

Besides morphological diversity in *Gossypium* genus, representatives of different genomic groups have diverse characteristics in many agronomically useful traits. Considering only *G. hirsutum* accessions, exotic and cultivar germplasm represent a wide range of genetic diversity in yield and fiber quality parameters. For example, in the analyses of ~1000 *G. hirsutum* exotic and cultivated accessions in the two different environments, Mexico and Uzbekistan, we found a wide range of useful agronomic diversities (Abdurakhmonov et al., 2004, 2006, 2008, 2009). In one or two environments, the cotton boll mass varies in a range of 1-9 grams per boll, 1000 seed mass varies in a range of 50-170 grams, the lint percentage varies in a range of 0-45%, Micronaire varies in a range of 3-7 mic, the fiber length varies in a range of 1-1.28 inch, and fiber strength varies in a range of 26-36 g/tex. There was also a wide range of variation in photoperiodic flowering (day neutral, weak to strong photoperiodic dependency) and maturity (Abdurakhmonov, 2007). This wide phenotypic diversity of cotton shows the extensive plasticity of cotton plants and potential of their wide utilization in the breeding programs as an initial material.
