**4. Phenotyping and characterization**

of this species has been replaced by intense and extensive agricultural production of guava (*Psidium* spp.). We speculate that the crop of guava and *G. trilobum* must have very similar environmental requirements, particularly as influenced by altitude (around 4,000 feet). Whereas in the past *G. trilobum* was considered to be a common, widely distributed species, based on surveys (J.M. Stewart and M. Ulloa, 2004 expedition), we are of the opinion that urbanization and agricultural development have very severely eroded its habitat, and that the species is becoming extinct in the wild (Drs. Stewart and Ulloa *personal communication*). Within the eight races described in *G. hirsutum,* much of the original diversity existing *in situ* appears to have been lost. According to information obtained from local sources, eradication of naturally occurring landrace, feral, and dooryard cottons was attempted in areas of southern Mexico in the 1980's in efforts to remove perceived insect reservoirs. Apparently all attempts at commercial cotton production since then have been abandoned. No commercial fields of cotton were encountered during expeditions between 2002 and 2004 in the central and southern part of Mexico [19,26]. Currently, with the exception of the northern cotton produc‐ tion regions of Mexico, the diversity of the *G. hirsutum* is limited to feral plants that occur opportunistically in waste areas and as occasional home garden plants maintained by rural

Due to its relative proximity to the U. S. and its status as a U.S. territory, Puerto Rico (PR) was a target of opportunity for collecting efforts in 2013. Puerto Rico was revisited in the most recent germplasm collecting effort by James Frelichowski, the curator of the U.S. collection, Louis Prom, of the USDA-ARS, College Station, TX and by collaborators from the USDA-ARS Tropical Agriculture Research Station (TARS) in Mayaguez, PR. Previous reports on wild cotton in the Caribbean [30,31] and the recent taxonomic identification of a new species in the Dominican Republic [32] justified continued exploration for *Gossypium* in Puerto Rico. The current collection inventory indicates 49 accessions originally collected from Puerto Rico and surrounding islands, but GPS locations and details on habitats are lacking. Judging from the limited passport data, the trend was for wild cotton to occur in areas with drier climates in or near coastal regions. Only four Puerto Rican accessions in the collection were observed to have maturity and productivity ratings similar to cultivar checks in the CWN. The Puerto Rican accessions were more characteristic of landrace cottons, and suggest that cotton in Puerto Rico

In the most recent collecting efforts in Puerto Rico, attempts to address gaps in the knowledge of the island's diversity and diversity structure were addressed through tagging of all collection sites and resulting accessions with GPS coordinates and with geographic mapping of native cottons. Seventy-nine cotton plants or populations were photographed and tagged with GPS coordinates to give a detailed map based survey of PR. Numerous other cotton plants or populations were sighted and recorded on a GPS map, but not collected due to time restrictions or safety or access concerns. Southern Puerto Rico was much more populated with cotton than the north, but morphological diversity was seen throughout the island and justified the wide search of the island for wild cotton. Differences in fiber length, quality, seed fuzz, and leaf pubescence were evident during collection, and sometimes individual traits were observed to vary among neighboring plants. This confirms the need to collect from individual

is likely native or is cotton of tropical origin that has become naturalized.

peoples or village residents [26].

174 World Cotton Germplasm Resources

Phenotypic characterization of the national collection has historically served dual purposes. Until the very recent past (when molecular tools became available), phenotypic descriptors were the primary means of describing the diversity contained in the germplasm collection and rationally classifying that variability. The second role of phenotypic descriptors was to assist breeders and others in identifying germplasm of interest in genetic improvement efforts. However, to be useful, phenotypic descriptors needed to be well defined and universally applied. Throughout the history of the collection, this has been a goal that has been striven for but not completely attained. The convention in the past has been to follow, as much as possible, the guidelines set by Bioversity International (formerly IBPGR) in 1980 [33], revised in 1985 and again in 1995 [34] in creating descriptors. Prior to the universal adoption of computers, the internet, and electronic databases, descriptor data files of the collection's holdings were periodically published, with the last publication occurring in 1987 [10]. Presently, data files of descriptors for accessions of the collection can be found in GRIN-GLOBAL at http://distribu‐ tion.grin-global.org/gringlobal/search.aspx or in the CottonGen database at www.cotton‐ gen.org. Descriptors and descriptor sets for phenotyping the collection have not remained static over time, but have evolved as their use has evolved. In 2009, the Cotton Crop Germplasm Committee [14] summarized the need to expand the collection's databases with morphological, agronomic, molecular, and priority trait evaluations to improve user utility of the collection. Of particular interest to the breeding and crop improvement community was the addition of traits of agronomic value, to supplement traits of a primarily botanical or taxonomic nature adopted from Bioversity International.

**Agronomic Traits**

Architecture

Physiological

Retention

Morphological

Production

**Descriptor Name**

Plant Height (cm)

Photoperiod

Square

Boll

Flower Score

Flower Duration

1st Flower Date

Nodes to 1st Fruiting

Productive-ness

**Method of Observation Descriptor State**

The U.S. National Cotton Germplasm Collection – Its Contents, Preservation, Characterization, and Evaluation

ht. (cm)

Compact → spreading Intermediate → bushy

http://dx.doi.org/10.5772/58386

177

day neutral → strong photo dependent → weak

good → weak fair → none

good → weak fair → none

fair→ none

Early → medium → late

short → medium → long

number of days after planting

node number (cotyledon = 0)

most productive good → fair → poor

Plant height in centimeters, grown in

Growth Habit General architectural arrangement of plant type grown in greenhouse.

Measured from pot soil surface to terminal.

Response of plants flowering habit over a short day to long day growth duration in greenhouse as compared to day neutral upland varieties grown in Lubbock, Texas.

Retention of squares that mature into bolls

Retention of bolls grown to maturity in greenhouse and are not shed due to insect

Fiber Retention of fiber in burr until harvested. good → weak

Flower score of early, medium or late as determined by the greenhouse planting

Length of time a plant will produce floral buds, determined by photoperiodic response of plants growing in greenhouse during the fall and winter months. Lubbock,

Date of first flowers when grown in greenhouse in number of days after

Node of first sympodial branch to set fruit on plant growing in greenhouse. Lubbock,

Quality and quantity of mature fiber and seed produced from plants grown in greenhouse. Lubbock, Texas

**Table 3.** Agronomic trait characteristics documented from *Gossypium* species grown in greenhouse environment for

Seed Index The weight of 100 fuzzy seed in grams. WT. (g)

planting. Lubbock, Texas

and are not shed due to insect or environmental conditions.

or environment.

date. Lubbock, Texas

Texas

Texas

phenotypic evaluation at Texas A&M AgriLife Research, Lubbock, TX, USA

greenhouse.

Within the collection, a goal has been set to routinely characterize or re-characterize approxi‐ mately 1,000 accessions (or a tenth of the collection) annually in the CWN, or in local fields or greenhouses at College Station, TX. Although the photoperiodism of a large portion of the collection requires that it be renewed in the tropically located CWN, collecting descriptors at that site has been challenging, due to travel costs and time limitations. With the great diversity of accessions being grown in the CWN, and their variable maturation rates, collecting complete descriptor sets for accessions has been a lengthy, multi-year process. To aid in the description of the collection, digital imaging has been adopted. An immediate concern has been to develop a pictorial reference of the descriptor scores in order to establish clear standards and reduce subjectivity that occurs when scoring descriptors on such a spectrum of diversity in the collection. Mackay and Alercia [35] have described the use of digital imaging to document variation in germplasm, and this approach has been used in several collections [36,37]. Digital imaging can be tailored with software to detect minute differences among accessions and to determine duplicates [38]. Digital photography to document *Gossypium* diversity has been used by collaborators [38,39] and recently by the NCGC staff during descriptor collection. Currently within the collection, digital imaging has been concentrated upon plant structures that are most informative in differentiating and classifying the diversity of the genus. Samples of typical leaves, flowers, and bolls are gathered per accession, tagged, and quickly assembled under high resolution digital cameras. Only mature, fully expanded leaves (below the fifth main stem node) are photographed. Cotton flowers are photographed on the day of anthesis to reveal flower color, petal spots, pollen color, and stigma length. Fully expanded, mature but non-cleft bolls are photographed to reveal shape and locule number. To obtain photographs, several cameras are fixed with tripods, some for an overall view of size of plant tissues and others for a close up of minute leaf hairs, nectaries, and glands. All images are produced with a measured grid background for a consistent size reference. Portable shade is constructed at the CWN or in the local field to stabilize lighting conditions and reduce wind disturbance. Images are identified by photographed tags, and then named according to accession inventory number, year/environment of planting, and a code for the plant tissue or structure photo‐ graphed.

To date (2011-2013), standardized descriptors and digital images have been collected on over 4,900 cotton producing accessions planted at the CWN and in the field at College Station, TX. These images are in the process of being uploaded into the CottonGen database (www.cot‐ tongen.org) for pairing with descriptors for optimal use by the cotton community. The digital library or "virtual herbarium" created by the use of high resolution cameras to produce 'virtual' voucher specimens provides easy access to examine the morphological variation within the genus that was previously unattainable with classic herbarium specimens. It is hoped that the digital image library promotes standardization of descriptor data and image creation by cooperating groups and collections, thereby promoting a greater ability to characterize the diversity within *Gossypium*, address gaps in the U.S. and other collections and effectively share and backup germplasm between collections.

Of particular interest to the breeding and crop improvement community was the addition of traits of agronomic value, to supplement traits of a primarily botanical or taxonomic nature

Within the collection, a goal has been set to routinely characterize or re-characterize approxi‐ mately 1,000 accessions (or a tenth of the collection) annually in the CWN, or in local fields or greenhouses at College Station, TX. Although the photoperiodism of a large portion of the collection requires that it be renewed in the tropically located CWN, collecting descriptors at that site has been challenging, due to travel costs and time limitations. With the great diversity of accessions being grown in the CWN, and their variable maturation rates, collecting complete descriptor sets for accessions has been a lengthy, multi-year process. To aid in the description of the collection, digital imaging has been adopted. An immediate concern has been to develop a pictorial reference of the descriptor scores in order to establish clear standards and reduce subjectivity that occurs when scoring descriptors on such a spectrum of diversity in the collection. Mackay and Alercia [35] have described the use of digital imaging to document variation in germplasm, and this approach has been used in several collections [36,37]. Digital imaging can be tailored with software to detect minute differences among accessions and to determine duplicates [38]. Digital photography to document *Gossypium* diversity has been used by collaborators [38,39] and recently by the NCGC staff during descriptor collection. Currently within the collection, digital imaging has been concentrated upon plant structures that are most informative in differentiating and classifying the diversity of the genus. Samples of typical leaves, flowers, and bolls are gathered per accession, tagged, and quickly assembled under high resolution digital cameras. Only mature, fully expanded leaves (below the fifth main stem node) are photographed. Cotton flowers are photographed on the day of anthesis to reveal flower color, petal spots, pollen color, and stigma length. Fully expanded, mature but non-cleft bolls are photographed to reveal shape and locule number. To obtain photographs, several cameras are fixed with tripods, some for an overall view of size of plant tissues and others for a close up of minute leaf hairs, nectaries, and glands. All images are produced with a measured grid background for a consistent size reference. Portable shade is constructed at the CWN or in the local field to stabilize lighting conditions and reduce wind disturbance. Images are identified by photographed tags, and then named according to accession inventory number, year/environment of planting, and a code for the plant tissue or structure photo‐

To date (2011-2013), standardized descriptors and digital images have been collected on over 4,900 cotton producing accessions planted at the CWN and in the field at College Station, TX. These images are in the process of being uploaded into the CottonGen database (www.cot‐ tongen.org) for pairing with descriptors for optimal use by the cotton community. The digital library or "virtual herbarium" created by the use of high resolution cameras to produce 'virtual' voucher specimens provides easy access to examine the morphological variation within the genus that was previously unattainable with classic herbarium specimens. It is hoped that the digital image library promotes standardization of descriptor data and image creation by cooperating groups and collections, thereby promoting a greater ability to characterize the diversity within *Gossypium*, address gaps in the U.S. and other collections and

effectively share and backup germplasm between collections.

adopted from Bioversity International.

176 World Cotton Germplasm Resources

graphed.


**Table 3.** Agronomic trait characteristics documented from *Gossypium* species grown in greenhouse environment for phenotypic evaluation at Texas A&M AgriLife Research, Lubbock, TX, USA

Although the U.S. NCGC tries to set standards and methodologies for characterizing the germplasm collection through its internal efforts, it also recognizes and encourages coopera‐ tion from the research community in this task. Due to the volume of accessions available for characterization, the finite resources of the collection, and the impact of genotype x environ‐ ment interaction on many phenotypic traits of interest, a unilateral effort to collect descriptor data is not considered desirable. Collaborative efforts within the research community offer an attractive way to collect relevant data and make it widely available. One such effort has been conducted by Texas A&M Agrilife, Lubbock, TX from 2005 until 2013. Seed of accessions from the U.S. NCGC at College Station, TX were obtained and planted in the greenhouse at Lubbock for seed increase. In conjunction with seed increases, phenotypic descriptors of various *Gossypium* species were recorded and documented using digital photography. Traits selected for digital imaging were based on previous descriptive data recorded using the GRIN guidelines and published taxonomic documents. [27,10] Additional documentation elements were expanded to include descriptor name, state, and method of evaluation in a greenhouse environment (Table 3). Digital images depicting characteristic and state were used as reference standards to help alleviate misinterpretation of the descriptor. This allowed for more decisive, objective, and consistent data collection. Qualitative data has been documented on a nominal, ordinal, or binary scale and quantitative data consist of continuous or discrete numerical values. Over 800 accessions have been regenerated, digitally documented and evaluated (Table 1). The descriptors used by the Texas A&M AgriLife Research and Extension Center did not encompass all aspects of trait documentation, but were intended to give a general observation of an accession's attribute, measurable trait, or characteristic.

More recently, in 2009, eleven ongoing evaluations of collection germplasm for disease, insect, and nematode resistance were reported in a status report of U.S. cotton germplasm [14].

The U.S. National Cotton Germplasm Collection – Its Contents, Preservation, Characterization, and Evaluation

http://dx.doi.org/10.5772/58386

179

In an ongoing project, the Cotton Improvement Program at the Texas A&M AgriLife Research Center, Lubbock, TX began in 2005 to screen accessions from the working collection for resistance to the thrips (Thysanoptera: Thripidae) pest. Thrips were ranked as the number three pest of U. S. cotton in 2012, reducing yield by 0.374 percent and causing a loss of 123,947 bales [48]. In the panhandle and far west Texas, 860,476 acres were treated for pest thrips in 2012 at a cost of \$1.28 per acre [48]. By 2013, 516 accessions from the active collection of the U.S. National Cotton Germplasm Collection had been screened (Table 4). Resistance to thrips was identified in *G. barbadense* accession TX110 (PI 163608) in the first year of screening [49]. A series of studies conducted through much of the 20th century identified, confirmed and characterized thrips resistance in *G. barbadense*. Studies in this progression include: the discovery of resistance in glabrous Egyptian cotton cultivars and the conclusion that resistance is most likely due to a thicker leaf epidermal layer on lower sides of leaves allowing cotton seedlings to tolerate more thrips feeding [50]; confirmation of this work [51]; further charac‐ terization of feeding by thrips and measurement of the protrusion of the mandible past the fused maxillae at 11 microns [52] and the identification of a resistant group of *G. barbadense* cottons with a lower leaf epidermal layer of 10-13 microns [53]. This series of studies was summarized by Bowman and McCarty [54] who investigated the heritability of the traits conferring resistance. Interspecific hybridization between resistant *G. barbadense* accession, TX110 and two unreleased elite lines from the Lubbock Texas A&M AgriLife Research Cotton Improvement Program was used to begin a cultivar development project. Selections for thrips resistance and day-neutral flowering habit were made in the segregating F2 plots, and this process continued for five years. Resistance has been carried to the F5 and F6 generations in many individuals, and day-neutral flowering habit and favorable agronomic traits have been

**Species Genome Accessions tested**

**Table 4.** By species, number of cotton accessions from U.S. National Cotton Germplasm Collection screened for resistance to the Texas High Plains pest thrips complex at Texas A&M AgriLife Research in Lubbock, Texas, 2005-2013.

*Gossypium hirsutum* AD1 473 *Gossypium barbadense* AD2 25 *Gossypium mustelinum* AD4 3 *Gossypium darwinii* AD5 6 *Gossypium herbaceum* A1 2 *Gossypium arboreum* A2 4 *Gossypium thurberi* D1 1 *Gossypium harknesii* D2-2 1 *Gossypium davidsonii* D3 1

improved.

#### **5. Evaluation efforts**

The screening and evaluation of the national collection for traits such as disease, insect, and environmental stress resistance are beyond the capacity and resources of the national collection to achieve, and necessitate research community participation. Despite this fact, the collection in recent years has tried to move from a passive supplier of germplasm resources to an active participant in germplasm disease and insect resistance screening and evaluation efforts. The collection has a long history of evaluation efforts by the research community for biotic and abiotic stress resistance. In fact the origins of the collection are intertwined with the entry of the boll weevil into the United States in the 1890's and the search for a source of host plant resistance to that pest [4]. Numerous evaluations have been conducted on the collection for phenotypic characteristics of agronomic importance. A sampling of evaluations includes investigations of seed protein [40], seed oil [41], seed gossypol [42], boll weevil resistance [43] and Cercospora leaf spot and Verticillium wilt resistance [44]. Prior to 2000, over 320 accessions of the collection had been screened for resistance to pink bollworm (*Pectinophora gossypiella* Saunders) [45], and 471 race stocks of *G. hirsutum* were screened for resistance to root knot nematode (*Meloidogyne incognita*) and resistance found in 18 lines [46]. As of 1986, over 200 accessions of the collection had been reported to carry resistance to one or more pests [47]. More recently, in 2009, eleven ongoing evaluations of collection germplasm for disease, insect, and nematode resistance were reported in a status report of U.S. cotton germplasm [14].

Although the U.S. NCGC tries to set standards and methodologies for characterizing the germplasm collection through its internal efforts, it also recognizes and encourages coopera‐ tion from the research community in this task. Due to the volume of accessions available for characterization, the finite resources of the collection, and the impact of genotype x environ‐ ment interaction on many phenotypic traits of interest, a unilateral effort to collect descriptor data is not considered desirable. Collaborative efforts within the research community offer an attractive way to collect relevant data and make it widely available. One such effort has been conducted by Texas A&M Agrilife, Lubbock, TX from 2005 until 2013. Seed of accessions from the U.S. NCGC at College Station, TX were obtained and planted in the greenhouse at Lubbock for seed increase. In conjunction with seed increases, phenotypic descriptors of various *Gossypium* species were recorded and documented using digital photography. Traits selected for digital imaging were based on previous descriptive data recorded using the GRIN guidelines and published taxonomic documents. [27,10] Additional documentation elements were expanded to include descriptor name, state, and method of evaluation in a greenhouse environment (Table 3). Digital images depicting characteristic and state were used as reference standards to help alleviate misinterpretation of the descriptor. This allowed for more decisive, objective, and consistent data collection. Qualitative data has been documented on a nominal, ordinal, or binary scale and quantitative data consist of continuous or discrete numerical values. Over 800 accessions have been regenerated, digitally documented and evaluated (Table 1). The descriptors used by the Texas A&M AgriLife Research and Extension Center did not encompass all aspects of trait documentation, but were intended to give a general observation

The screening and evaluation of the national collection for traits such as disease, insect, and environmental stress resistance are beyond the capacity and resources of the national collection to achieve, and necessitate research community participation. Despite this fact, the collection in recent years has tried to move from a passive supplier of germplasm resources to an active participant in germplasm disease and insect resistance screening and evaluation efforts. The collection has a long history of evaluation efforts by the research community for biotic and abiotic stress resistance. In fact the origins of the collection are intertwined with the entry of the boll weevil into the United States in the 1890's and the search for a source of host plant resistance to that pest [4]. Numerous evaluations have been conducted on the collection for phenotypic characteristics of agronomic importance. A sampling of evaluations includes investigations of seed protein [40], seed oil [41], seed gossypol [42], boll weevil resistance [43] and Cercospora leaf spot and Verticillium wilt resistance [44]. Prior to 2000, over 320 accessions of the collection had been screened for resistance to pink bollworm (*Pectinophora gossypiella* Saunders) [45], and 471 race stocks of *G. hirsutum* were screened for resistance to root knot nematode (*Meloidogyne incognita*) and resistance found in 18 lines [46]. As of 1986, over 200 accessions of the collection had been reported to carry resistance to one or more pests [47].

of an accession's attribute, measurable trait, or characteristic.

**5. Evaluation efforts**

178 World Cotton Germplasm Resources

In an ongoing project, the Cotton Improvement Program at the Texas A&M AgriLife Research Center, Lubbock, TX began in 2005 to screen accessions from the working collection for resistance to the thrips (Thysanoptera: Thripidae) pest. Thrips were ranked as the number three pest of U. S. cotton in 2012, reducing yield by 0.374 percent and causing a loss of 123,947 bales [48]. In the panhandle and far west Texas, 860,476 acres were treated for pest thrips in 2012 at a cost of \$1.28 per acre [48]. By 2013, 516 accessions from the active collection of the U.S. National Cotton Germplasm Collection had been screened (Table 4). Resistance to thrips was identified in *G. barbadense* accession TX110 (PI 163608) in the first year of screening [49]. A series of studies conducted through much of the 20th century identified, confirmed and characterized thrips resistance in *G. barbadense*. Studies in this progression include: the discovery of resistance in glabrous Egyptian cotton cultivars and the conclusion that resistance is most likely due to a thicker leaf epidermal layer on lower sides of leaves allowing cotton seedlings to tolerate more thrips feeding [50]; confirmation of this work [51]; further charac‐ terization of feeding by thrips and measurement of the protrusion of the mandible past the fused maxillae at 11 microns [52] and the identification of a resistant group of *G. barbadense* cottons with a lower leaf epidermal layer of 10-13 microns [53]. This series of studies was summarized by Bowman and McCarty [54] who investigated the heritability of the traits conferring resistance. Interspecific hybridization between resistant *G. barbadense* accession, TX110 and two unreleased elite lines from the Lubbock Texas A&M AgriLife Research Cotton Improvement Program was used to begin a cultivar development project. Selections for thrips resistance and day-neutral flowering habit were made in the segregating F2 plots, and this process continued for five years. Resistance has been carried to the F5 and F6 generations in many individuals, and day-neutral flowering habit and favorable agronomic traits have been improved.


**Table 4.** By species, number of cotton accessions from U.S. National Cotton Germplasm Collection screened for resistance to the Texas High Plains pest thrips complex at Texas A&M AgriLife Research in Lubbock, Texas, 2005-2013. In another ongoing project funded by USAID and USDA to identify sources of resistance to Cotton Leaf Curl Virus (CLCuV), germplasm resources were made available through the U.S. NCGC. The ready availability of accessions from the collection, combined with winter nursery seed increase capabilities, GRIN database information and especially the recent addition of standardized descriptor data and digital images, made possible a rapid coordinated CLCuV screening program. CLCuV is a major threat to cotton production in Pakistan and parts of India and has been reported in cotton producing countries in Africa, as well as China and Uzbekistan. This project to identify sources of resistance to CLCuV, helps not only countries such as Pakistan where the virus is already a problem, but also makes resistant germplasm available, should CLCuV become a threat to cotton production in other countries. As part of the project, accessions from the collection were increased at the CWN in Tecoman, Mexico. At the CWN, the collection curator and staff collected descriptor data and digital images for all the acces‐ sions. Part of the seed was sent to Pakistan and screened for CLCuV and the rest returned to the collection. In Pakistan, the disease is endemic which allowed replicated screening at three locations. At each location, the screening nurseries included regularly spaced host rows of highly susceptible plants. Any plants identified as resistant were re-tested the following year. All the screening data is being made available for inclusion in the collection database. Previous screening tests had identified *G. arboreum* and *G. herbaceum* as potential sources of resistance, so 1,050 *G. arboreum* and 100 *G. herbaceum* were increased at the CWN and sent for screening. Transferring resistance from diploid *G. arboreum* or *G. herbaceum* into tetraploid *G. hirsutum* is extremely difficult, so the project expanded to evaluate *G. hirsutum.* The NPGS GRIN database was used to identify a subset of 920 *G. hirsutum* accessions with a range of morphologies and originating from diverse geographic regions. CLCuV screening identified 12 resistant *G. hirsutum* accessions originating from northeast Brazil, Central America and the Caribbean. Travel to Pakistan is difficult and the plots could only be evaluated by the USDA researchers once during the field season. However, descriptor data and digital images were e-mailed throughout the season from the Pakistani partners for each of the 12 accessions. These could be compared to the descriptor data and digital images made by the collection curator at the CWN. This Collection information proved essential as 11 of the 12 accessions were photoperiod sensitive and did not flower during the field season in Pakistan. Additional accessions from these areas, as well as other geographic regions, were selected using GRIN and are being screened in Pakistan in 2013. This project has served as a model for a germplasm evaluation effort that serves the germplasm collection as well as the research community. In addition to identifying resistant sources to CLCuV for future cotton improvement efforts, the project made possible the seed renewal of numerous accessions of the collection under controlled conditions and the characterization and digital imaging of these same accessions.

impact of periods of short-term or prolonged water stress on yield or quality. Plants vary in their physiological response to stress and their responses vary with the severity and duration of the limited water availability, extreme heat exposure, and other abiotic stresses [55-57]. Diversity in cotton for root architecture has been reported [58], as well as variation in root resistance expressed as the redistribution of water through the soil layer-profile by the root system (hydraulic lift) [59]. Untapped genetic variability for plant and root morphologyarchitecture types is present in germplasm resources but lacking in modern commercial cultivars [60,61]. However, methods of morphological or phenotypical characterization have progressed slowly in the last 30 years [62]. Currently, accessions from the NCGC are being used by the USDA-ARS Cropping Systems Research Laboratory, Lubbock, TX to initiate this phenotypical characterization. Specifically, drought responses are being examined in acces‐ sions of the Gossypium Diversity Reference Set, created for an ongoing diversity study being conducted by the USDA-ARS at College Station, Texas. Physiological and biochemical plants responses such as photosynthesis and CO2 rates, stomata conductance, and osmotic adjust‐ ment will be examined and monitored under heat and/or low-temperature stress conditions. Also, a rapid bioassay is being used for monitoring plant stress based on the ability of a "source" leaf to provide sufficient energy for plant growth. Water-deficit stress reduces new growth and lessens the demand on the source leaves. [63,64]. In addition, plant mapping (plant height, nodes, and fruiting position), agronomic (seed cotton yield, seed index, number of bolls per plants, lint percentage, etc.), and fiber quality (length, strength, and fineness) data are used to monitor drought tolerance among cotton entries or accessions. Accessions are evaluated in field replicated plots under well-irrigated and water-deficit stress conditions, using a random‐ ized complete block or incomplete block design with 3–4 replications using five plants as subsamples per replication, or two replications when seed availability from accessions of

The U.S. National Cotton Germplasm Collection – Its Contents, Preservation, Characterization, and Evaluation

In another project to screen the germplasm collection for drought tolerance, 400 accessions have been evaluated for variation in growth parameters by the Cotton Breeding Program at the Texas A&M AgriLife Research Center, Lubbock, TX. Measurements were obtained for the 400 accessions under non-stress conditions by growing plants in granular diatomaceous earth contained in 30 inch tubes made from 3 inch PVC pipe. Adequate water and fertilizer were

to obtain maximum growth rates. After 20 days, plants were removed from the growth medium, washed, and taproot and shoot length were measured. Plants were then separated

length, shoot length, total root and shoot weights, and shoot to root ratio. All experiments were conducted using a RCB design with five blocks. Recognizing that in many regions declining water levels are associated with increased salinity in irrigation water, the Lubbock AgriLife research group evaluated 290 collection accessions for NaCl tolerance in a hydroponic system. Plants grown in hydroponic medium were evaluated continuously for a variety of salt injury indicators. Seedlings were exposed to increasing increments of salt (NaCl) solution at a consistent rate from 3,000 to 35,000 ppm until plants suffered near 100 percent mortality. Preliminary studies conducted in 2006-2009 show accession, TX 307 (PI 165390) to have

F and 67-69o

F for 48 hours. Parameters obtained were: taproot

F, respectively,

http://dx.doi.org/10.5772/58386

181

applied, and day and night temperatures were maintained at 75-85o

selected germplasm is limited [65].

into roots and shoots, and dried at 140o

Another current effort involves evaluating germplasm resources to identify lines with physiological and morphological traits that can improve water use efficiency and tolerance to extreme temperature and drought. The decline of water in glaciers, reservoirs and aquifers in many regions, combined with climate change and the unpredictability of precipitation during the growing season, has stimulated efforts to identify germplasm resources that can minimize the elevated production risks associated with crop water deficits. Cultivars with extensive root systems or physiological adaptations to their aerial parts may offer a means to minimize the impact of periods of short-term or prolonged water stress on yield or quality. Plants vary in their physiological response to stress and their responses vary with the severity and duration of the limited water availability, extreme heat exposure, and other abiotic stresses [55-57]. Diversity in cotton for root architecture has been reported [58], as well as variation in root resistance expressed as the redistribution of water through the soil layer-profile by the root system (hydraulic lift) [59]. Untapped genetic variability for plant and root morphologyarchitecture types is present in germplasm resources but lacking in modern commercial cultivars [60,61]. However, methods of morphological or phenotypical characterization have progressed slowly in the last 30 years [62]. Currently, accessions from the NCGC are being used by the USDA-ARS Cropping Systems Research Laboratory, Lubbock, TX to initiate this phenotypical characterization. Specifically, drought responses are being examined in acces‐ sions of the Gossypium Diversity Reference Set, created for an ongoing diversity study being conducted by the USDA-ARS at College Station, Texas. Physiological and biochemical plants responses such as photosynthesis and CO2 rates, stomata conductance, and osmotic adjust‐ ment will be examined and monitored under heat and/or low-temperature stress conditions. Also, a rapid bioassay is being used for monitoring plant stress based on the ability of a "source" leaf to provide sufficient energy for plant growth. Water-deficit stress reduces new growth and lessens the demand on the source leaves. [63,64]. In addition, plant mapping (plant height, nodes, and fruiting position), agronomic (seed cotton yield, seed index, number of bolls per plants, lint percentage, etc.), and fiber quality (length, strength, and fineness) data are used to monitor drought tolerance among cotton entries or accessions. Accessions are evaluated in field replicated plots under well-irrigated and water-deficit stress conditions, using a random‐ ized complete block or incomplete block design with 3–4 replications using five plants as subsamples per replication, or two replications when seed availability from accessions of selected germplasm is limited [65].

In another ongoing project funded by USAID and USDA to identify sources of resistance to Cotton Leaf Curl Virus (CLCuV), germplasm resources were made available through the U.S. NCGC. The ready availability of accessions from the collection, combined with winter nursery seed increase capabilities, GRIN database information and especially the recent addition of standardized descriptor data and digital images, made possible a rapid coordinated CLCuV screening program. CLCuV is a major threat to cotton production in Pakistan and parts of India and has been reported in cotton producing countries in Africa, as well as China and Uzbekistan. This project to identify sources of resistance to CLCuV, helps not only countries such as Pakistan where the virus is already a problem, but also makes resistant germplasm available, should CLCuV become a threat to cotton production in other countries. As part of the project, accessions from the collection were increased at the CWN in Tecoman, Mexico. At the CWN, the collection curator and staff collected descriptor data and digital images for all the acces‐ sions. Part of the seed was sent to Pakistan and screened for CLCuV and the rest returned to the collection. In Pakistan, the disease is endemic which allowed replicated screening at three locations. At each location, the screening nurseries included regularly spaced host rows of highly susceptible plants. Any plants identified as resistant were re-tested the following year. All the screening data is being made available for inclusion in the collection database. Previous screening tests had identified *G. arboreum* and *G. herbaceum* as potential sources of resistance, so 1,050 *G. arboreum* and 100 *G. herbaceum* were increased at the CWN and sent for screening. Transferring resistance from diploid *G. arboreum* or *G. herbaceum* into tetraploid *G. hirsutum* is extremely difficult, so the project expanded to evaluate *G. hirsutum.* The NPGS GRIN database was used to identify a subset of 920 *G. hirsutum* accessions with a range of morphologies and originating from diverse geographic regions. CLCuV screening identified 12 resistant *G. hirsutum* accessions originating from northeast Brazil, Central America and the Caribbean. Travel to Pakistan is difficult and the plots could only be evaluated by the USDA researchers once during the field season. However, descriptor data and digital images were e-mailed throughout the season from the Pakistani partners for each of the 12 accessions. These could be compared to the descriptor data and digital images made by the collection curator at the CWN. This Collection information proved essential as 11 of the 12 accessions were photoperiod sensitive and did not flower during the field season in Pakistan. Additional accessions from these areas, as well as other geographic regions, were selected using GRIN and are being screened in Pakistan in 2013. This project has served as a model for a germplasm evaluation effort that serves the germplasm collection as well as the research community. In addition to identifying resistant sources to CLCuV for future cotton improvement efforts, the project made possible the seed renewal of numerous accessions of the collection under controlled conditions

180 World Cotton Germplasm Resources

and the characterization and digital imaging of these same accessions.

Another current effort involves evaluating germplasm resources to identify lines with physiological and morphological traits that can improve water use efficiency and tolerance to extreme temperature and drought. The decline of water in glaciers, reservoirs and aquifers in many regions, combined with climate change and the unpredictability of precipitation during the growing season, has stimulated efforts to identify germplasm resources that can minimize the elevated production risks associated with crop water deficits. Cultivars with extensive root systems or physiological adaptations to their aerial parts may offer a means to minimize the

In another project to screen the germplasm collection for drought tolerance, 400 accessions have been evaluated for variation in growth parameters by the Cotton Breeding Program at the Texas A&M AgriLife Research Center, Lubbock, TX. Measurements were obtained for the 400 accessions under non-stress conditions by growing plants in granular diatomaceous earth contained in 30 inch tubes made from 3 inch PVC pipe. Adequate water and fertilizer were applied, and day and night temperatures were maintained at 75-85o F and 67-69o F, respectively, to obtain maximum growth rates. After 20 days, plants were removed from the growth medium, washed, and taproot and shoot length were measured. Plants were then separated into roots and shoots, and dried at 140o F for 48 hours. Parameters obtained were: taproot length, shoot length, total root and shoot weights, and shoot to root ratio. All experiments were conducted using a RCB design with five blocks. Recognizing that in many regions declining water levels are associated with increased salinity in irrigation water, the Lubbock AgriLife research group evaluated 290 collection accessions for NaCl tolerance in a hydroponic system. Plants grown in hydroponic medium were evaluated continuously for a variety of salt injury indicators. Seedlings were exposed to increasing increments of salt (NaCl) solution at a consistent rate from 3,000 to 35,000 ppm until plants suffered near 100 percent mortality. Preliminary studies conducted in 2006-2009 show accession, TX 307 (PI 165390) to have significantly better growth when treated with NaCl using the hydroponic screening system. Seed from a plant selection of TX 307 was increased to use as a control in further salinity studies.

genetic standards for *Gossypium hirstum* and *G. barbadense*, respectively [73,75]. Using a balanced diversity panel of 12 cultivated and exotic cotton genotypes representing six *Gossypium* species [77, Table 5], a core set of 105 SSR markers has been developed for coordi‐ nated germplasm characterization [72]. This initial set of core SSR markers were carefully selected on the basis of criteria that included reproducible DNA amplification, ability to be multiplexed, reasonable polymorphism information content (PIC), representation of different marker sources, and uniform distribution across the tetraploid cotton genome. The initial core set has only two DNA markers proposed for each chromosome arm of the 26 tetraploid cotton chromosomes [67]. The core markers serve as the standard descriptors to characterize workable

The U.S. National Cotton Germplasm Collection – Its Contents, Preservation, Characterization, and Evaluation

http://dx.doi.org/10.5772/58386

183

With the development of molecular markers, genetic diversity studies within *Gosssypium* have been pursued using accessions from the U.S. NCGC with varying objectives, including understanding the evolutionary process of interspecific gene flow within *G. aridum* [78], the genetic relationships of geographically diverse *G. arboreum* cultivars [79], and the utility of microsatellite primers developed in tetraploid species in the diploid *G. davidsonii* [80], to name a few examples. As cultivars possess the most readily accessible genetic variation for use in breeding programs, a number of diversity investigations have occurred using improved cotton cultivars [81-84]. Typically, many of these diversity studies have been limited in the scope of accessions investigated or the number of markers used [85,79,82]. There has been little standardization in these studies, as different groups of molecular markers were used for each group of accessions. Attempts to characterize larger genomic groups using a standardized marker set have been made in the germplasm collections of Uzbekistan [86,87] and France [88]. Recognizing the need to characterize the genetic diversity within the NCGC and the utility of markers in characterizing and managing collection diversity, the working collection at College Station, TX initiated research to characterize a major portion of the collection using the core SSR marker set described above. Objectives of this characterization included: determining the structure of genetic variability within the collection with the goal of identifying targets for further germplasm collecting and exchange efforts; identification of redundancies, misidenti‐ fications, unintended introgression, and gaps within the collection; and developing and validating a core marker set that could be used in comparative studies across collections and over time. The core marker set developed to accomplish this task was used to genotype and analyze a *Gossypium* Diversity Reference Set (GDRS) of 2,254 accessions (approximately 20% of the NCGC). The GDRS represented the range of diversity of *Gossypium*, including nine genomes (eight diploid and one tetraploid genome) and 33 species as represented in the collection (Table 6). DNA profiles of these accessions showed strengths and deficiencies of

When applied to the GDRS, the core marker set was most successful at revealing DNA profiles in the *G. hirsutum* and *G. barbadense* species. This was expected, since the SSRs used in the core set were developed from these two species. Many of the remaining species showed incomplete DNA profiles due to lack of PCR amplification or non-informative profiles due to the ampli‐ fication of monomorphic DNA fragments. The current 105 marker core set was capable of discriminating tetraploid species, and discriminating between the diploid A genome and all

sets of *Gossypium* accessions across different gene pools or germplasm sources.

using the core set of 105 SSR markers.

Efforts also are underway to develop remote sensing and high throughput phenotyping methodology to evaluate germplasm variation for response to drought and other environ‐ mental stresses. Remote sensing instruments measure and record responses to drought by infrared camera monitoring, sensors and thermometers for leaf and plant canopy measure‐ ments. Soil water content is monitored using neutron moisture gages and bucket rain gages. Several research groups are working to develop high throughput field based equipment to phenotype thousands of plots and/or accessions in the minimum amount of time. Equipped with sensors for measuring spectral reflectance, canopy temperature, and plant mapping imaging; a high throughput vehicle has the potential to assess the responses of hundreds of accessions to stress in a fairly short time [62]. Many of the above instruments are developed or under development. Knowledge being acquired will provide acutely needed means of conducting uniform evaluations of germplasm for valuable drought, heat, salt, and other environmental stresses.

#### **6. Genotyping**

Molecular tools provide the means to characterize underlying genetic diversity that is not measurable through classical phenotypic descriptors [66]. With the advance of DNA marker technologies, it is now possible to characterize *Gossypium* germplasm not only phenotypically at the levels of whole plants but also genotypically at the levels of whole genomes [67,68]. In general, cotton lags behind other major crops in genomic tools that are available for effective manipulation and exploitation of beneficial genes otherwise buried in *Gossypium* germplasm collections. The first molecular maps for cotton [69-70] were based on the cumbersome and expensive restriction fragment length polymorphism (RFLP) technology that requires large amounts of genomic DNA and generation of radioactive probes physically disseminated to the research community as plasmid or phage clones. Thus, RFLP markers were poorly suited for high-throughput genotyping experiments in germplasm characterization [67]. New DNA marker technologies were needed to coordinate the systematic characterization and the simultaneous comparison among various research efforts involved in cotton diversity analysis and genetic resource preservation.

Microsatellites of simple sequence repeat (SSR) markers, in contrast to RFLP markers, are abundant, co-dominant, and widely distributed throughout the genomes of higher plants. They are amenable to high-throughput assay via multiplex polymerase chain reaction (PCR) bins on automated sequencers [72]. Thus, SSR markers are portable and simple to use for germplasm characterization. Over the last decade, several thousand SSR markers have been developed for the tetraploid genome of cultivated cottons [73-76]. For various applications in cotton, SSR markers have been systematically characterized and genetically mapped in cotton genomes. High-density genetic maps were constructed based on an immortal recombinant inbred line (RIL) population that was developed from a cross between TM-1 and 3-79, the genetic standards for *Gossypium hirstum* and *G. barbadense*, respectively [73,75]. Using a balanced diversity panel of 12 cultivated and exotic cotton genotypes representing six *Gossypium* species [77, Table 5], a core set of 105 SSR markers has been developed for coordi‐ nated germplasm characterization [72]. This initial set of core SSR markers were carefully selected on the basis of criteria that included reproducible DNA amplification, ability to be multiplexed, reasonable polymorphism information content (PIC), representation of different marker sources, and uniform distribution across the tetraploid cotton genome. The initial core set has only two DNA markers proposed for each chromosome arm of the 26 tetraploid cotton chromosomes [67]. The core markers serve as the standard descriptors to characterize workable sets of *Gossypium* accessions across different gene pools or germplasm sources.

significantly better growth when treated with NaCl using the hydroponic screening system. Seed from a plant selection of TX 307 was increased to use as a control in further salinity studies.

Efforts also are underway to develop remote sensing and high throughput phenotyping methodology to evaluate germplasm variation for response to drought and other environ‐ mental stresses. Remote sensing instruments measure and record responses to drought by infrared camera monitoring, sensors and thermometers for leaf and plant canopy measure‐ ments. Soil water content is monitored using neutron moisture gages and bucket rain gages. Several research groups are working to develop high throughput field based equipment to phenotype thousands of plots and/or accessions in the minimum amount of time. Equipped with sensors for measuring spectral reflectance, canopy temperature, and plant mapping imaging; a high throughput vehicle has the potential to assess the responses of hundreds of accessions to stress in a fairly short time [62]. Many of the above instruments are developed or under development. Knowledge being acquired will provide acutely needed means of conducting uniform evaluations of germplasm for valuable drought, heat, salt, and other

Molecular tools provide the means to characterize underlying genetic diversity that is not measurable through classical phenotypic descriptors [66]. With the advance of DNA marker technologies, it is now possible to characterize *Gossypium* germplasm not only phenotypically at the levels of whole plants but also genotypically at the levels of whole genomes [67,68]. In general, cotton lags behind other major crops in genomic tools that are available for effective manipulation and exploitation of beneficial genes otherwise buried in *Gossypium* germplasm collections. The first molecular maps for cotton [69-70] were based on the cumbersome and expensive restriction fragment length polymorphism (RFLP) technology that requires large amounts of genomic DNA and generation of radioactive probes physically disseminated to the research community as plasmid or phage clones. Thus, RFLP markers were poorly suited for high-throughput genotyping experiments in germplasm characterization [67]. New DNA marker technologies were needed to coordinate the systematic characterization and the simultaneous comparison among various research efforts involved in cotton diversity analysis

Microsatellites of simple sequence repeat (SSR) markers, in contrast to RFLP markers, are abundant, co-dominant, and widely distributed throughout the genomes of higher plants. They are amenable to high-throughput assay via multiplex polymerase chain reaction (PCR) bins on automated sequencers [72]. Thus, SSR markers are portable and simple to use for germplasm characterization. Over the last decade, several thousand SSR markers have been developed for the tetraploid genome of cultivated cottons [73-76]. For various applications in cotton, SSR markers have been systematically characterized and genetically mapped in cotton genomes. High-density genetic maps were constructed based on an immortal recombinant inbred line (RIL) population that was developed from a cross between TM-1 and 3-79, the

environmental stresses.

182 World Cotton Germplasm Resources

and genetic resource preservation.

**6. Genotyping**

With the development of molecular markers, genetic diversity studies within *Gosssypium* have been pursued using accessions from the U.S. NCGC with varying objectives, including understanding the evolutionary process of interspecific gene flow within *G. aridum* [78], the genetic relationships of geographically diverse *G. arboreum* cultivars [79], and the utility of microsatellite primers developed in tetraploid species in the diploid *G. davidsonii* [80], to name a few examples. As cultivars possess the most readily accessible genetic variation for use in breeding programs, a number of diversity investigations have occurred using improved cotton cultivars [81-84]. Typically, many of these diversity studies have been limited in the scope of accessions investigated or the number of markers used [85,79,82]. There has been little standardization in these studies, as different groups of molecular markers were used for each group of accessions. Attempts to characterize larger genomic groups using a standardized marker set have been made in the germplasm collections of Uzbekistan [86,87] and France [88]. Recognizing the need to characterize the genetic diversity within the NCGC and the utility of markers in characterizing and managing collection diversity, the working collection at College Station, TX initiated research to characterize a major portion of the collection using the core SSR marker set described above. Objectives of this characterization included: determining the structure of genetic variability within the collection with the goal of identifying targets for further germplasm collecting and exchange efforts; identification of redundancies, misidenti‐ fications, unintended introgression, and gaps within the collection; and developing and validating a core marker set that could be used in comparative studies across collections and over time. The core marker set developed to accomplish this task was used to genotype and analyze a *Gossypium* Diversity Reference Set (GDRS) of 2,254 accessions (approximately 20% of the NCGC). The GDRS represented the range of diversity of *Gossypium*, including nine genomes (eight diploid and one tetraploid genome) and 33 species as represented in the collection (Table 6). DNA profiles of these accessions showed strengths and deficiencies of using the core set of 105 SSR markers.

When applied to the GDRS, the core marker set was most successful at revealing DNA profiles in the *G. hirsutum* and *G. barbadense* species. This was expected, since the SSRs used in the core set were developed from these two species. Many of the remaining species showed incomplete DNA profiles due to lack of PCR amplification or non-informative profiles due to the ampli‐ fication of monomorphic DNA fragments. The current 105 marker core set was capable of discriminating tetraploid species, and discriminating between the diploid A genome and all


**Genome Species**

**Accessions Accession total number (% ) in NCGC**

http://dx.doi.org/10.5772/58386

185

A *G. arboreum* 145 (8.4%) 1729

The U.S. National Cotton Germplasm Collection – Its Contents, Preservation, Characterization, and Evaluation

AD *G. barbadense* 430 (27.1%) 1584

B *G. anomalum* 5 (71.4%) 7 C *G. nandewarense* 1 (16.7%) 6

D *G. aridum* 4 (28.6%) 14

E *G. areysianum* 1 (50.0%) 2

F *G. longicalyx* 4 (100.0%) 4 G *G. australe* 3 (27.3%) 11

K *G. costulatum* 1 (50.0%) 2

Overall 2256 (22.0%) 10276

**Table 6.** Accessions in the *Gossypium* Diversity Reference Set (GDRS), by genome and species.

*G. herbaceum* 49 (25.3%) 194

*G. darwinii* 4 (2.9%) 138 *G. hirsutum* 1541 (24.5%) 6302 *G. mustelinum* 7 (36.8%) 19 *G. tomentosum* 2 (12.5%) 16

*G. sturtianum* 3 (42.9%) 7

*G. armourianum* 2 (20.0%) 10 *G. davidsonii* 9 (29.0%) 31 *G. gossypioides* 7 (100.0%) 7 *G. klotzschianum* 1 (1.7%) 59 *G. laxum* 1 (50.0%) 2 *G. lobatum* 1 (25.0%) 4 *G. raimondii* 4 (7.1%) 56 *G. thurberi* 9 (24.3%) 37 *G. trilobum* 6 (54.5%) 11

*G. somalense* 2 (66.7%) 3 *G. stocksii* 2 (50.0%) 4

*G. bickii* 4 (80.0%) 5 *G. nelsonii* 3 (75.0%) 4

*G. exiguum* 1 (100.0%) 1 *G. marchantii* 1 (100.0%) 1 *G. nobile* 1 (100.0%) 1 *G. populifolium* 1 (25.0%) 4 *G. pulchellum* 1 (100.0%) 1

**Table 5.** A standardized panel of *Gossypium* germplasm diversity for cotton marker development (CMD) [12]

other diploid genomes. Within *G. hirsutum* and *G. barbadense*, the marker set revealed misclas‐ sification, introgression, and accession redundancy and uniqueness, and therefore will be a significant tool in maintaining collection integrity within the primary species. It was deter‐ mined that detailed characterization efforts would continue to use those core SSR markers that were informative, but further characterization would require the addition of markers that were genome-or species-specific. In this ongoing research effort, diversity reference sets will be identified to represent a minimum of 25% of the accessions available for a given species, genome, or group within the NCGC. Currently, reference sets are being created for the A genome, D genome, and tertiary genomes. These Individual reference sets are being assembled to minimize similarity between accessions within the set, with priority being given to those accessions representing geographic, ecological, and morphological diversity; and differing degrees of human manipulation (i.e., wild species, semi-adapted landraces, or improved cultivars) that would be of interest in a diversity structure analysis. This approach will more readily illustrate the genetic diversity of the 10,000+ accessions in the collection and enhance the usage of this large collection by geneticists and breeders world-wide. The analyses of results from genotyping reference sets will be used to guide the development of smaller core collections for species and genomes. Marker information also will be used to prioritize regeneration efforts, identify redundancy and uniqueness in the collection, and monitor integrity of accessions through regeneration cycles. Coordinated characterization between collections using markers could enhance exchange efforts and allow mutual protection of holdings. Efforts are underway to organize a cooperative international effort to genotype cotton collections with a standard set of markers and a uniform distribution of species and ecotypes.

The U.S. National Cotton Germplasm Collection – Its Contents, Preservation, Characterization, and Evaluation http://dx.doi.org/10.5772/58386 185


other diploid genomes. Within *G. hirsutum* and *G. barbadense*, the marker set revealed misclas‐ sification, introgression, and accession redundancy and uniqueness, and therefore will be a significant tool in maintaining collection integrity within the primary species. It was deter‐ mined that detailed characterization efforts would continue to use those core SSR markers that were informative, but further characterization would require the addition of markers that were genome-or species-specific. In this ongoing research effort, diversity reference sets will be identified to represent a minimum of 25% of the accessions available for a given species, genome, or group within the NCGC. Currently, reference sets are being created for the A genome, D genome, and tertiary genomes. These Individual reference sets are being assembled to minimize similarity between accessions within the set, with priority being given to those accessions representing geographic, ecological, and morphological diversity; and differing degrees of human manipulation (i.e., wild species, semi-adapted landraces, or improved cultivars) that would be of interest in a diversity structure analysis. This approach will more readily illustrate the genetic diversity of the 10,000+ accessions in the collection and enhance the usage of this large collection by geneticists and breeders world-wide. The analyses of results from genotyping reference sets will be used to guide the development of smaller core collections for species and genomes. Marker information also will be used to prioritize regeneration efforts, identify redundancy and uniqueness in the collection, and monitor integrity of accessions through regeneration cycles. Coordinated characterization between collections using markers could enhance exchange efforts and allow mutual protection of holdings. Efforts are underway to organize a cooperative international effort to genotype cotton collections with a standard set of markers and a uniform distribution of species and

**Table 5.** A standardized panel of *Gossypium* germplasm diversity for cotton marker development (CMD) [12]

**CMD># CMD name Description**

184 World Cotton Germplasm Resources

CMD01 TM-1 *G. hirsutum* (AD1) genetic standard CMD02 3-79 *G. barbadense* (AD2) genetic standard

CMD04 DPL 458BR Upland cotton (AD1) with significant acreage CMD05 Paymaster 1218BR Upland cotton (AD1) with significant acreage CMD06 Fibermax 832 Upland cotton (AD1) with significant acreage CMD07 Stoneville 4892BR Upland cotton (AD1) with significant acreage CMD08 Pima S-6 Pima (AD2) germplasm breeding source

CMD03 Acala Maxxa California Upland cotton (AD1)

CMD09 *G. arboreum* (A2-8) A genome representative CMD10 *G. raimondii* (D5-3) D genome representative CMD11 *G. tomentosum* (AD3) Introgression breeding source CMD12 *G. mustelinum* (AD4) Introgression breeding source

ecotypes.

**Table 6.** Accessions in the *Gossypium* Diversity Reference Set (GDRS), by genome and species.

The low resolution of the 105 core marker set used in the U.S. collection offers a first glimpse at, or general survey of the cotton germplasm collection, but it does not provide detailed characterization of specific genomic regions that may harbor important genes of interest. The size and complexity of the cotton genomes require many more widely applicable DNA markers for more effective germplasm characterization and gene discovery [72-74]. While the growing collection of well characterized SSR markers offers an opportunity of expanding the current set with additional core markers to meet the demand, single nucleotide polymorphism (SNP) markers represent a promising DNA marker system [89]. SNP markers are co-dominant and more abundant in a given genome than SSR markers, and they offer a new opportunity for generating large numbers of intraspecific polymorphisms [90]. Current efforts to develop large numbers of cotton SNP markers are based on reduced genome representation [91] or other limited sequence resources [92]. A targeted genotyping by sequencing (GBS) approach, that is simple and cost-effective, is being explored to simultaneously discover and map SNP markers for traits of interest [93]. With the current efforts to develop whole genome sequence in cotton [94,95] including several members of germplasm standards and mapping parents, hundreds of thousands of SNP markers will be identified in the cotton genomes. These SNP loci will be validated and sorted out in various orthologs, homeologs and paralogs as often encountered in the allotetraploid cotton. Individual sets of core markers specific for each genome, species or land race will be developed from evenly distributed genomic and genic regions, and they will be arrayed in public SNP genotyping chips for high-throughput and more detailed characterization and exploitation of genetic diversity of *Gossypium* germplasm [93,96].

The 448 selected SSR markers revealed a total of 1590 alleles belonging to 732 loci that were distributed across the whole cotton genome (Table 7). Of these 732 loci, 523 were polymorphic and 209 were monomorphic. One hundred thirty nine unique alleles were observed in 69 cultivars. Of the 130 U.S. cultivars, 94 had information on the years of their release, ranging from 1899 ('Mebane') to 2010 ('UA48') [99,100]. Using this information, an analysis of unique marker allele numbers contained in the cultivars clearly demonstrated that modern U.S. upland cotton has been gradually losing its genetic diversity during the past century. The pairwise genetic similarity between all cultivars averaged 0.760 with a range from 0.640 ('Pak 4F CB 4025' and 'Paymaster HS200') to 0.993 ('Rowden' and 'DES 716'). These cultivars were assembled into 15 groups with sub-groups present in groups 4 and 6 based on the phylogenic tree (Table 8). A detailed phylogenic tree can be found in Fang et al. 2013 [81]. Molecular marker analysis revealed little relationship between the genetic make-up of cultivars and their countries of origin. Instead, the results from genetic diversity analysis and phylogenic grouping based on molecular markers were largely congruent with the breeding history and

The U.S. National Cotton Germplasm Collection – Its Contents, Preservation, Characterization, and Evaluation

http://dx.doi.org/10.5772/58386

187

The information obtained from this research may benefit cotton geneticists and breeders in a variety of ways. First, breeders could use a subset or a whole set of the markers to genotype their germplasm. Then, they would be able to compare their germplasm to the cultivars analyzed in this study based on the molecular profiles. Second, depending on their breeding objectives, breeders may choose cultivars from the list as the parents to make new crosses, or choose one cultivar to cross with their own favorite cultivar(s) based on their molecular profiles and phylogenic groups. Third, breeders may use the information to re-evaluate their breeding programs. Fourth, breeders or geneticists may use all or a subset of these cultivars as starting

**Chromosome (A sub-genome) No. loci Chromosome (D sub-genome) No. loci**

1 17 14 31 2 18 15 25 3 27 16 20 4 16 17 16 5 39 18 18 6 17 19 39 7 22 20 13 8 21 21 21 9 28 22 20 10 14 23 24 11 21 24 18 12 26 25 19 13 17 26 19

Total 732

**Table 7.** Chromosome distribution of 732 loci within cotton genome for 448 markers used to genotype 193 cultivars.

pedigree information of many cultivars [100,101].

materials to conduct an association mapping study.

not mapped 166

Presently, an investigation has been made of the diversity of upland cotton cultivars using a larger set of available SSR markers. Commercial *G. hirsutum* cultivars being the most accessible sources of genetic variation available to breeding and genetic improvement efforts, a cooper‐ ative effort involving the collection was undertaken to determine the variability available and the structure of diversity in this elite group using a broader set of molecular markers. A set of 193 upland cotton cultivars from 26 countries that include most cotton growing regions of the world were sampled from the national collection and elsewhere. These cultivars represent a wide spectrum of the genetic diversity found within the cultivated upland cotton. Due to strong contributions and influence of the U.S. cultivars to early breeding efforts in other countries, a special emphasis was placed upon representing U.S. cultivars in this study. When selecting U.S. cultivars, the following factors were taken into consideration: breeding program, pedigree (if known), geographic region, era of release, planting acreage and breeding value (*e.g.* as a common parent in breeding programs). The 130 cultivars from the U.S. represented more than 100 years of breeding history. Four hundred forty-eight SSR markers were selected based on their mapping positions in the G. *hirsutum* TM-1xG. *barbadense* 3-79 genetic map [76]. In addition, whenever possible, the physical positions of the selected SSR markers were also located based on the reference sequence of *G. raimondii* genome [95]. The genome-wide linkage disequilibrium (LD) between pairs of SSR marker loci was calculated using the software package JMP Genomics 6.0 (SAS Corporation, Cary, NC). Genetic similarities between all pairs of cultivars were calculated based on the Dice's coefficient [97] using the software package NTSYS-pc [98]. Neighbor-joining (NJ) trees were generated to view the phylogenic relationship among the cultivars.

The 448 selected SSR markers revealed a total of 1590 alleles belonging to 732 loci that were distributed across the whole cotton genome (Table 7). Of these 732 loci, 523 were polymorphic and 209 were monomorphic. One hundred thirty nine unique alleles were observed in 69 cultivars. Of the 130 U.S. cultivars, 94 had information on the years of their release, ranging from 1899 ('Mebane') to 2010 ('UA48') [99,100]. Using this information, an analysis of unique marker allele numbers contained in the cultivars clearly demonstrated that modern U.S. upland cotton has been gradually losing its genetic diversity during the past century. The pairwise genetic similarity between all cultivars averaged 0.760 with a range from 0.640 ('Pak 4F CB 4025' and 'Paymaster HS200') to 0.993 ('Rowden' and 'DES 716'). These cultivars were assembled into 15 groups with sub-groups present in groups 4 and 6 based on the phylogenic tree (Table 8). A detailed phylogenic tree can be found in Fang et al. 2013 [81]. Molecular marker analysis revealed little relationship between the genetic make-up of cultivars and their countries of origin. Instead, the results from genetic diversity analysis and phylogenic grouping based on molecular markers were largely congruent with the breeding history and pedigree information of many cultivars [100,101].

The low resolution of the 105 core marker set used in the U.S. collection offers a first glimpse at, or general survey of the cotton germplasm collection, but it does not provide detailed characterization of specific genomic regions that may harbor important genes of interest. The size and complexity of the cotton genomes require many more widely applicable DNA markers for more effective germplasm characterization and gene discovery [72-74]. While the growing collection of well characterized SSR markers offers an opportunity of expanding the current set with additional core markers to meet the demand, single nucleotide polymorphism (SNP) markers represent a promising DNA marker system [89]. SNP markers are co-dominant and more abundant in a given genome than SSR markers, and they offer a new opportunity for generating large numbers of intraspecific polymorphisms [90]. Current efforts to develop large numbers of cotton SNP markers are based on reduced genome representation [91] or other limited sequence resources [92]. A targeted genotyping by sequencing (GBS) approach, that is simple and cost-effective, is being explored to simultaneously discover and map SNP markers for traits of interest [93]. With the current efforts to develop whole genome sequence in cotton [94,95] including several members of germplasm standards and mapping parents, hundreds of thousands of SNP markers will be identified in the cotton genomes. These SNP loci will be validated and sorted out in various orthologs, homeologs and paralogs as often encountered in the allotetraploid cotton. Individual sets of core markers specific for each genome, species or land race will be developed from evenly distributed genomic and genic regions, and they will be arrayed in public SNP genotyping chips for high-throughput and more detailed characterization and exploitation of genetic diversity of *Gossypium* germplasm [93,96].

Presently, an investigation has been made of the diversity of upland cotton cultivars using a larger set of available SSR markers. Commercial *G. hirsutum* cultivars being the most accessible sources of genetic variation available to breeding and genetic improvement efforts, a cooper‐ ative effort involving the collection was undertaken to determine the variability available and the structure of diversity in this elite group using a broader set of molecular markers. A set of 193 upland cotton cultivars from 26 countries that include most cotton growing regions of the world were sampled from the national collection and elsewhere. These cultivars represent a wide spectrum of the genetic diversity found within the cultivated upland cotton. Due to strong contributions and influence of the U.S. cultivars to early breeding efforts in other countries, a special emphasis was placed upon representing U.S. cultivars in this study. When selecting U.S. cultivars, the following factors were taken into consideration: breeding program, pedigree (if known), geographic region, era of release, planting acreage and breeding value (*e.g.* as a common parent in breeding programs). The 130 cultivars from the U.S. represented more than 100 years of breeding history. Four hundred forty-eight SSR markers were selected based on their mapping positions in the G. *hirsutum* TM-1xG. *barbadense* 3-79 genetic map [76]. In addition, whenever possible, the physical positions of the selected SSR markers were also located based on the reference sequence of *G. raimondii* genome [95]. The genome-wide linkage disequilibrium (LD) between pairs of SSR marker loci was calculated using the software package JMP Genomics 6.0 (SAS Corporation, Cary, NC). Genetic similarities between all pairs of cultivars were calculated based on the Dice's coefficient [97] using the software package NTSYS-pc [98]. Neighbor-joining (NJ) trees were generated to view the phylogenic relationship

among the cultivars.

186 World Cotton Germplasm Resources

The information obtained from this research may benefit cotton geneticists and breeders in a variety of ways. First, breeders could use a subset or a whole set of the markers to genotype their germplasm. Then, they would be able to compare their germplasm to the cultivars analyzed in this study based on the molecular profiles. Second, depending on their breeding objectives, breeders may choose cultivars from the list as the parents to make new crosses, or choose one cultivar to cross with their own favorite cultivar(s) based on their molecular profiles and phylogenic groups. Third, breeders may use the information to re-evaluate their breeding programs. Fourth, breeders or geneticists may use all or a subset of these cultivars as starting materials to conduct an association mapping study.


**Table 7.** Chromosome distribution of 732 loci within cotton genome for 448 markers used to genotype 193 cultivars.


aligned genetic markers, transcripts and protein homologs. These whole genome data can be accessed through genome pages, search tools and the genome browser GBrowse. Most of the published cotton genetic maps can be viewed and compared using the comparative map viewer CMap, and are searchable via map search tools. Search tools also exist for markers,

The U.S. National Cotton Germplasm Collection – Its Contents, Preservation, Characterization, and Evaluation

http://dx.doi.org/10.5772/58386

189

**Figure 1.** An example of how germplasm names can be used and displayed in CottonGen. A. The details page of germplasm '108F' which shows this germplasm individual has been recorded as 13 different names (aliases). B. lists the 13 names (aliases), some of them are the accessions from various germplasm collections or germplasm groups, such as 'PI 274464' which is an accession name used in GRIN. Other names show the others formats used for this germplasm, such '108 F' and 'No. 108F'. C. Any of the names in this germplasm aliases list can be used in the germ‐ plasm name search, such as 'No. 108F', the same result table will be displayed it matches any of the aliases or the

A critical function of any community database is to provide data and information that is as accurate, unambiguous and non-redundant as possible. For germplasm pedigree and trait data, this can be particularly challenging as it involves identifying and rationalizing the many aliases used for accession names, and the different descriptors and measurement scales for the same trait. This is often compounded by a lack of standardization within and between collections due to data storage system restrictions, nomenclature differences, and unrecog‐ nized errors at the time of data entry. When these issues are resolved by the database curator, the curated data are integrated with other associated data and made accessible to scientists

The first step in the germplasm curation effort involved creating a validated list of unique cotton germplasm with unique IDs. A good starting point for this list was obtained from the

through easy-to-use interfaces to ensure utility to scientists.

QTLs, germplasm, publications, and trait evaluation data.

standardized name.

**Table 8.** Summary of phylogenic grouping of 191 cultivars

#### **7. Germplasm databases**

In this chapter we have tried to briefly describe the U.S. National Cotton Germplasm Collection and some of the current efforts made to preserve, characterize, and distribute its contents. However, the utility of the germplasm collection and its genetic resources is directly propor‐ tional to the knowledge of the genetic diversity within the collection, and the accessibility of that knowledge. Without the ability to disseminate information about the collection broadly to a community with free and easy access, the utility of the collection is greatly diminished. Currently, information about the collection can be accessed at two main database sources: GRIN-GLOBAL and CottonGen. GRIN-GLOBAL (http://distribution.grin-global.org/gringlo‐ bal/search.aspx)has been adequately described elsewhere and will not be described here. The following is a description of CottonGen and efforts to expand its contents and capabilities.

CottonGen (www.cottongen.org, [102,103]) is a curated and integrated web-based relational database providing centralized access to publicly available genomic, genetic and breeding data for cotton. Initiated in 2011, as the successor to CottonDB [102], CottonGen contains annotated whole genome sequences, genes, transcripts, markers, trait loci, genetic maps, taxonomy, germplasm, publications, data analysis tools, and communication resources for the cotton community. Annotated whole genome sequences of *Gossypium raimondii* are available with aligned genetic markers, transcripts and protein homologs. These whole genome data can be accessed through genome pages, search tools and the genome browser GBrowse. Most of the published cotton genetic maps can be viewed and compared using the comparative map viewer CMap, and are searchable via map search tools. Search tools also exist for markers, QTLs, germplasm, publications, and trait evaluation data.


**Figure 1.** An example of how germplasm names can be used and displayed in CottonGen. A. The details page of germplasm '108F' which shows this germplasm individual has been recorded as 13 different names (aliases). B. lists the 13 names (aliases), some of them are the accessions from various germplasm collections or germplasm groups, such as 'PI 274464' which is an accession name used in GRIN. Other names show the others formats used for this germplasm, such '108 F' and 'No. 108F'. C. Any of the names in this germplasm aliases list can be used in the germ‐ plasm name search, such as 'No. 108F', the same result table will be displayed it matches any of the aliases or the standardized name.

**7. Germplasm databases**

**Table 8.** Summary of phylogenic grouping of 191 cultivars

**Group**

**No. cultivars**

188 World Cotton Germplasm Resources

**No. unique alleles**

**No. countries of origin**

3 10 8 1 Pee Dee germplasm

7 13 4 2 Acala type

1 5 6 5 Common factor unknown or not clear 2 8 3 3 Cultivars with Coker 100 in their pedigrees

8 4 7 3 Common factor unknown or not clear

 4 0 3 Common factor unknown or not clear 6 14 2 Common factor unknown or not clear 3 1 3 Common factor unknown or not clear 3 11 2 Common factor unknown or not clear 3 1 2 Coker cultivars related to Coker 310

**Description**

<sup>4</sup> <sup>22</sup> <sup>34</sup> <sup>6</sup> Cultivars from Pakistan, multi-adversity resistance breeding

5 15 4 5 Cultivars from S. America, and those with disease resistance 6 70 29 4 Commercial cultivars developed in the U.S., Australia and China

9 6 1 2 Cultivars from Paymaster program in Texas high plain

program and Paymaster program in Texas

In this chapter we have tried to briefly describe the U.S. National Cotton Germplasm Collection and some of the current efforts made to preserve, characterize, and distribute its contents. However, the utility of the germplasm collection and its genetic resources is directly propor‐ tional to the knowledge of the genetic diversity within the collection, and the accessibility of that knowledge. Without the ability to disseminate information about the collection broadly to a community with free and easy access, the utility of the collection is greatly diminished. Currently, information about the collection can be accessed at two main database sources: GRIN-GLOBAL and CottonGen. GRIN-GLOBAL (http://distribution.grin-global.org/gringlo‐ bal/search.aspx)has been adequately described elsewhere and will not be described here. The following is a description of CottonGen and efforts to expand its contents and capabilities.

15 19 14 9 Cultivars from Africa, and Coker cultivars with wilt resistance

CottonGen (www.cottongen.org, [102,103]) is a curated and integrated web-based relational database providing centralized access to publicly available genomic, genetic and breeding data for cotton. Initiated in 2011, as the successor to CottonDB [102], CottonGen contains annotated whole genome sequences, genes, transcripts, markers, trait loci, genetic maps, taxonomy, germplasm, publications, data analysis tools, and communication resources for the cotton community. Annotated whole genome sequences of *Gossypium raimondii* are available with A critical function of any community database is to provide data and information that is as accurate, unambiguous and non-redundant as possible. For germplasm pedigree and trait data, this can be particularly challenging as it involves identifying and rationalizing the many aliases used for accession names, and the different descriptors and measurement scales for the same trait. This is often compounded by a lack of standardization within and between collections due to data storage system restrictions, nomenclature differences, and unrecog‐ nized errors at the time of data entry. When these issues are resolved by the database curator, the curated data are integrated with other associated data and made accessible to scientists through easy-to-use interfaces to ensure utility to scientists.

The first step in the germplasm curation effort involved creating a validated list of unique cotton germplasm with unique IDs. A good starting point for this list was obtained from the USDA-ARS GRIN [104] cotton collection PI numbers. In order to resolve the discrepancies, the apparently unique lines that could be identified from the passport data were given an unique "cdbgm" ID in CottonGen, and the accession numbers or names from different collections or germplasm groups (such as the CMD panel), and alternate names cross-referenced to these CottonGen unique IDs. The unique ID, germplasm names and aliases were then expanded to include other germplasm from the GRIN Cotton Crop Science Registration, GRIN Plant Variety Protection (PVP), pedigrees of 642 Upland and 20 Pima cultivars, plus 205 parents representing most of the cotton breeding lines and obsolete cultivars, (kindly provided by Dr. Bowman [100], the cotton germplasm collection of the China Cotton Research Institute, the Chinese Academy of Agricultural Sciences, and the cotton germplasm collection of Uzbekistan Center of Genomics and Bioinformatics, Academy of Sciences of Uzbekistan. As a result of curation efforts, CottonGen currently contains 14,959 assigned unique IDs connected to 53,645 corre‐ sponding aliases obtained from 14 collections for 49 *Gossypium* spp. Access and display of germplasm names (standardized and aliases) in CottonGen are shown in Figure 1.

In addition to germplasm name (and aliases), germplasm data housed in CottonGen includes pedigrees, publicly available passport data, stock collection center information, associated maps, libraries and sequences. The trait evaluation database in CottonGen contains over 118,000 trait scores from 9,000 accessions. The *Gossypium* species summary page (http:// www.cottongen.org/data/species) provides a list of species along with information such as genome group, haploid chromosome number, geographic origin, and number of accessions, sequences and DNA libraries per species. The species name in the table leads to individual species pages, which show more detail such as common name, images and other additional data such as functional analysis of the genes, both from the National Center for Biotechnology Information (NCBI) and whole genome sequences, which includes Kyoto Encyclopedia of Genes and Genomes (KEGG) [105] and Gene Ontology (GO) [106] analysis reports. The germplasm search page, accessible from http://www.cottongen.org/search/germplasm provides access to different types of searchable data (Figure 2). The search by collection page provides a list of germplasm along with stock collection center information. The search can be filtered by collection center name, germplasm name and/or accession name in the stock center. The search by pedigree page provides an interface to search germplasm by pedigree and the search germplasm by country page searches by the country of origin. From the germplasm search page, researchers can go to the germplasm details page, which shows all the detailed information such as pedigree, passport, collection center, image and associated genotypic and phenotypic data (Figure 2). Germplasm can also be searched based on their trait evaluation data. Both the qualitative trait evaluation search and quantitative trait evaluation search site allows the trait values of up to three trait descriptors to be specified to view the germplasm

The U.S. National Cotton Germplasm Collection – Its Contents, Preservation, Characterization, and Evaluation

http://dx.doi.org/10.5772/58386

191

trait data. Data from all the search result sites can be downloaded in Excel files.

Provision of other germplasm data to CottonGen is actively encouraged.

**8. Conclusion**

Among ongoing efforts of CottonGen is the development of a digital image library to store over 100,000 images provided by the USDA-ARS Research Project: "Genotypic and Phenotypic Analysis and Digital Imaging of Accessions in the US National Cotton Germplasm Collection". The associated phenotypic data from the same project will be stored in CottonGen as well.

The National Cotton Germplasm Collection is a complex amalgamation of several previously existing collections, which present challenges to its continued growth, preservation, charac‐ terization, and evaluation. Although habitat loss and international treaties have had significant impact on germplasm collection and exchange efforts, the NCGC continues to grow through mutually beneficial collecting efforts and germplasm exchanges with cooperating countries. Given finite and sometimes constricting resources; efficiency and effectiveness in preserving, characterizing, and evaluating the collection's contents becomes imperative. One means of increasing the efficiency and effectiveness of the collection has been to enlist the research community in characterizing and evaluating the collection. Currently there are dynamic cooperative efforts to evaluate the collection for drought, heat, and other environmental stresses associated with global climate change. Efforts to find resistance to biotic stresses within

**Figure 2.** Germplasm search site in CottonGen. A. Multiple germplasm search sites are available based on the type of information users are interested in. B. An example search interface where users can view and search for germplasm and their collection center. C. Germplasm detail page with various tabs to show the detailed information. D. Map tab of the germplasm page shows all the maps for which the germplasm has been used. E. From the map page users can go to CMap for accessed to marker between and within maps, with hyperlinks to the markers detail page.

In addition to germplasm name (and aliases), germplasm data housed in CottonGen includes pedigrees, publicly available passport data, stock collection center information, associated maps, libraries and sequences. The trait evaluation database in CottonGen contains over 118,000 trait scores from 9,000 accessions. The *Gossypium* species summary page (http:// www.cottongen.org/data/species) provides a list of species along with information such as genome group, haploid chromosome number, geographic origin, and number of accessions, sequences and DNA libraries per species. The species name in the table leads to individual species pages, which show more detail such as common name, images and other additional data such as functional analysis of the genes, both from the National Center for Biotechnology Information (NCBI) and whole genome sequences, which includes Kyoto Encyclopedia of Genes and Genomes (KEGG) [105] and Gene Ontology (GO) [106] analysis reports. The germplasm search page, accessible from http://www.cottongen.org/search/germplasm provides access to different types of searchable data (Figure 2). The search by collection page provides a list of germplasm along with stock collection center information. The search can be filtered by collection center name, germplasm name and/or accession name in the stock center. The search by pedigree page provides an interface to search germplasm by pedigree and the search germplasm by country page searches by the country of origin. From the germplasm search page, researchers can go to the germplasm details page, which shows all the detailed information such as pedigree, passport, collection center, image and associated genotypic and phenotypic data (Figure 2). Germplasm can also be searched based on their trait evaluation data. Both the qualitative trait evaluation search and quantitative trait evaluation search site allows the trait values of up to three trait descriptors to be specified to view the germplasm trait data. Data from all the search result sites can be downloaded in Excel files.

Among ongoing efforts of CottonGen is the development of a digital image library to store over 100,000 images provided by the USDA-ARS Research Project: "Genotypic and Phenotypic Analysis and Digital Imaging of Accessions in the US National Cotton Germplasm Collection". The associated phenotypic data from the same project will be stored in CottonGen as well. Provision of other germplasm data to CottonGen is actively encouraged.
