**4. Other cotton germplasm resources beyond seed collections**

#### **4.1. Mapping populations**

The CSIRO Canberra collection is in long-term storage at -20°C. This collection provides a resource, but is not currently funded or maintained. Many of its accessions are also available from major US germplasm collections, as US researchers were often involved in the initial exploration. As the majority of these lines have been collected during genetic explorations, an internal database contains details of the collector, location and date. This database is not publicly available. There has been a decline in funding in CSIRO over the last few decades for extending germplasm collections for native *Gossypium* and other crop relatives like *Glycine*, as more and more biodiversity research shifts to consolidation of national collections, electronic

There are increasing challenges associated with maintenance and regeneration of these collections. Ensuring genetic purity of accessions has always been difficult when regeneration is conducted in the field. However, the GM era has added another level of complexity to the problem. Even though Australia does not have large numbers of insects that will cause cross pollination in cotton (only the European honey bee), measurable out-crossing can occur in some seasons and some locations [49, 50]. Care must be taken to locate regeneration blocks away from known sources of bee activity as well as isolated from commercial GM cotton crops. CSIRO also has substantial activities in areas of biotech research in addition to the traits sourced from third parties and some of these activities are discussed in a later section. This research is geographically isolated (in Canberra) from the breeding program and only traits that have reached the advanced field evaluation stage are grown at Narrabri. As CSIRO is developing cultivars for release commercially, stewardship of germplasm and GM traits is extremely important and internal quality assurance protocols are rigorously adhered to. These protocols dictate that all material needs to be tested via protein or DNA analysis prior to handover to

The current status of the accession descriptions varies depending on when and who imported them, as well as the donor. As many characteristics are either invisible or not able to be determined except in specific environments (e.g., disease resistance), the data provided by the donor is often critical. However, in many cases, particularly for older material, this information is simply not available for many accessions that may have been obtained via an intermediary collection or supplier. When it was actively funded, the ATCF collection generated substantial passport data on all lines including morphological characteristics, boll size, lint percent and fibre quality parameters. This is invaluable information for a breeder searching for specific characteristics. Descriptions in the CSIRO collection are more *ad-hoc*. This is largely a function of the collection not being 'public', and simply a resource for the breeding program that operates it. The data that does exist largely relate to the traits that were initially identified as

The ATCF collection is based on a national collection model with small quantities (20-30 seeds) of germplasm freely available to researchers worldwide. The CSIRO collection is not publicly

archiving of data and making them more broadly accessible on-line.

commercial partners for seed increase and sale.

being of interest *i.e*., HPR to insects and disease or fibre quality.

**3.7. Germplasm passport data**

12 World Cotton Germplasm Resources

**3.8. Sharing and exchange**

Improving disease resistance against indigenous Australian strains of Fusarium has been a high priority in the Australian breeding program and a number of new sources of resistance have been identified through the screening of introductions (mostly) from the US, China and India where Fusarium is also endemic*.* To investigate the genetic basis of the Fusarium wilt resistance in the Indian *G. hirsutum* cultivar MCU-5, a bi-parental cross was made between MCU-5 and Siokra 1-4; a *Fusarium* susceptible Australian okra leaf *G. hirsutum* cultivar [51]. An F3 population consisting of 244 lines was developed from this cross, and from single seed decent from each F3 line, 244 F4 lines subsequently produced. The F3 and F4 populations were assayed for Fusarium wilt resistance using a glasshouse bioassay [52] and genotyped with 151 markers (95 SSR and 56 AFLP). QTL analysis revealed the presence of multiple regions that were associated with resistance that provide targets for introgression into elite cultivars to improve Fusarium wilt resistance [51]. Subsequently, it was found that MCU-5 which was derived from a multi-line cross between Indian Cambodia-type cultivars (MCU-1 and MCU-2) and cultivars from East Africa, the West Indies and the US, including some contribution from *G. barbadense*, also possesses significant resistance to non-defoliating strains of *Verticillium dahliae,* as well as possessing significantly longer fibre than Siokra 1-4.

In an international collaboration with CIRAD (France), CSIRO obtained a 140 RIL inter-specific population ranging from the F6 to F9 stages of selfing through single seed descent. The two parents; Guazuncho 2 (*G. hirsutum*), and VH8-4602 (*G. barbadense*), were chosen for their agronomic performance (Guazuncho 2) and superior fibre quality parameters (VH8-4602) [53]. This population was grown in several countries including Australia in the glasshouse and field and analysed for many different traits (including fibre, leaf shape hairiness, boll size and number and earliness) and genotypes consisting of 1,745 loci derived from 597 SSR and 763 ALFP markers [54-56]. Not all of the lines performed well under Australian conditions with only about 55 flowering within an acceptable timeframe for harvesting and thus seed stocks are severely restricted for most of the RILs. The QTLs associated with fibre traits identified in Australia may help generate markers for selection for these traits in breeding populations. The population also represents a resource for linking markers with mite resistance, as the parent lines show differentiation in resistance.

#### **4.2. Near isogenic lines of host plant resistance traits**

Cotton mutants can be of interest to cotton breeders for their agronomic or host plant resistance possibilities. Near Isogenic lines (NIL) are plants that mostly genetically identical except in DNA regions associated with a specific trait or gene mutation. They are usually created by the repeated backcrossing of a mutant plant to a recurring parent, and are useful for quantifying the effect of the mutation or phenotype on agronomic performance, by reducing genetic background differences between the mutant and the normal plant type. Although very important for defining the agronomic value of an altered phenotype, NILs require a large investment of time to produce. Thomson [57] developed NILs for glabrousness (T2 arm), frego bract (fg), okra leaf (L2 0 ), and nectariless (ne1,ne2). These mutations are known to affect resistance to specific insects and diseases, and in the case of glabrous also reduces lint trash [57]. All 16 NIL combinations of the four mutants were developed in the Deltapine 61 back‐ ground. The 16 NIL were derived from an initial crossing of a Deltapine-related experimental line homozygous for all four mutant characters and Deltapine 61. Four backcrosses were then made with Deltapine 61. A final backcross was made and from a large F2 population homo‐ zygous genotypes were selected and maintained. Subsequent hybridizations were done using lines from this original population as parents and a total of nine cultivars were released up until 1999 including the original Siokra and Sicot cultivars.

**4.4. Transgenics**

A transgenic cotton plant contains gene/s that have been artificially inserted. The inserted gene (known as a transgene) may come from another cotton plant, or from a completely different species, such as *Bt* cotton; which possesses the *Bt* toxin from the bacterium *Bacillus thuringien‐ sis*. Transgenic technology enables plant breeders to bring together in one plant traits from potentially any organism, not just from within Upland cotton or sexually compatible species. Potentially this technology could be a quicker and cleaner method of bringing in traits from poor agronomic sources compared to backcrossing, but regulatory costs makes this currently unfeasible. Transgenic technology also provides the means for studying specific genes in cotton and enables the definitive assignment of genes to specific functions through either over-

Australian Cotton Germplasm Resources http://dx.doi.org/10.5772/58414 15

It is difficult and time consuming to generate transgenic cotton, as it requires *Agrobacterium tumefaciens* mediated gene insertion via callus, generated via tissue culture. It was found in the 1980s that Coker cultivars were generally superior for gene transfer using tissue culture, although a small number of Australian cultivars could be transformed with relatively low efficiency [63]. Coker 315 is the cultivar used for all transformation events in Australia as it was thought to possess better agronomic traits under Australian conditions compared to other transformable Coker cultivars, such as Coker 312 that have been used by international biotech companies. Currently, all transgenic traits present in Australian commercial cultivars were obtained internationally under License from either Monsanto or Bayer CropScience. The focus of Australian transgenic R&D work has been on developing potential traits of more specific relevance to the Australian Industry and on basic research to understand gene function, particularly the functions of genes involved in fibre initiation and formation, plant defence and seed oil formation. Such work has demonstrated the importance of GhMyb 25 [64], GhMyb25-like [65] and GhHD-1 [66] for fibre initiation and for controlling fibre initial numbers. As part of CSIRO research, transgenic cotton plants that are altered in the expression of GhMYB109 were imported under a Material Transfer Agreement for research use [67]. CSIRO transgenic research has demonstrated through altering the expression of a cotton sucrose synthase [68] or overexpressing a potato sucrose synthase [69] the essential role of the different sucrose synthases in cotton fibre and seed formation. Expression of the *Talaromyces flavus* glucose oxidase gene in cotton demonstrated increased resistance against Verticillium wilt [70], and plants with suppressed Cadinene synthase gene expression revealed its role in bacterial blight infection [71]. Although cotton is principally a crop grown for fibre, its seeds are also a valuable source of oil. Transgenic silencing of key fatty acid desaturase genes altered the cottonseed oil composition to improve its nutritional profile, making it more competitive with other oilseed crops has also been achieved, although commercialisation of this trait is still under discussion [72]. Currently none of these CSIRO transgenic derived traits have been

Successful expression of transgenes in cotton also requires the ability to precisely express genes at high levels. Experiments with transgenic plants have demonstrated the value of the soybean lectin gene promoter to drive transgenes in the embryo of cotton seeds [73], and subterranean clover stunt virus promoters and terminators [74] for general and high level expression in

expression or silencing of individual genes or gene families.

incorporated into commercial cultivars.

#### **4.3. Mutant populations**

Mutation and mutation breeding is a tool for producing novel variation that potentially cannot be found in existing cotton species, for genetic improvement and in aiding the study of gene function. Mutations are nucleotide base changes within the genome of an organism that are not brought on by normal recombination and segregation, and occur naturally at a low frequency. Many naturally derived mutant cotton plants have been isolated that contain no fibre, which has aided our molecular understanding of fibre formation [58]. The rate of mutation can be greatly increased or induced in plants using chemical mutagens, ionizing radiation or transposable elements. Induced mutants have been generated and used to create valuable traits in many major crops, but have only occasionally been used in improving cotton. Mutagenesis of cotton has resulted in 'naked and tufted' seeds, herbicide resistance and plants with longer fibre [59-62] that have direct application within the cotton industry. Other mutants, such as those possessing inferior fibre traits, provide powerful tools for understanding what genes are associated with fibre formation [58].

CSIRO is investigating the general usefulness of mutagenised cotton populations for conven‐ tional plant breeding. Populations of *G. hirsutum*, *G. barbadense* and *G. arboreum* have been mutagenised with the chemical mutagens sodium azide and ethyl methanesulfonate, as well as heavy-ion mutagenesis using the Riken Ring Cyclotron at the RI Beam facility in Japan in collaboration with Dr Tomoko Abe. The mutagenised populations are currently being screened for a number of traits including: herbicide resistance, fibre traits, boll size, plant architecture and flowering.

#### **4.4. Transgenics**

**4.2. Near isogenic lines of host plant resistance traits**

0

until 1999 including the original Siokra and Sicot cultivars.

genes are associated with fibre formation [58].

bract (fg), okra leaf (L2

14 World Cotton Germplasm Resources

**4.3. Mutant populations**

and flowering.

Cotton mutants can be of interest to cotton breeders for their agronomic or host plant resistance possibilities. Near Isogenic lines (NIL) are plants that mostly genetically identical except in DNA regions associated with a specific trait or gene mutation. They are usually created by the repeated backcrossing of a mutant plant to a recurring parent, and are useful for quantifying the effect of the mutation or phenotype on agronomic performance, by reducing genetic background differences between the mutant and the normal plant type. Although very important for defining the agronomic value of an altered phenotype, NILs require a large

resistance to specific insects and diseases, and in the case of glabrous also reduces lint trash [57]. All 16 NIL combinations of the four mutants were developed in the Deltapine 61 back‐ ground. The 16 NIL were derived from an initial crossing of a Deltapine-related experimental line homozygous for all four mutant characters and Deltapine 61. Four backcrosses were then made with Deltapine 61. A final backcross was made and from a large F2 population homo‐ zygous genotypes were selected and maintained. Subsequent hybridizations were done using lines from this original population as parents and a total of nine cultivars were released up

Mutation and mutation breeding is a tool for producing novel variation that potentially cannot be found in existing cotton species, for genetic improvement and in aiding the study of gene function. Mutations are nucleotide base changes within the genome of an organism that are not brought on by normal recombination and segregation, and occur naturally at a low frequency. Many naturally derived mutant cotton plants have been isolated that contain no fibre, which has aided our molecular understanding of fibre formation [58]. The rate of mutation can be greatly increased or induced in plants using chemical mutagens, ionizing radiation or transposable elements. Induced mutants have been generated and used to create valuable traits in many major crops, but have only occasionally been used in improving cotton. Mutagenesis of cotton has resulted in 'naked and tufted' seeds, herbicide resistance and plants with longer fibre [59-62] that have direct application within the cotton industry. Other mutants, such as those possessing inferior fibre traits, provide powerful tools for understanding what

CSIRO is investigating the general usefulness of mutagenised cotton populations for conven‐ tional plant breeding. Populations of *G. hirsutum*, *G. barbadense* and *G. arboreum* have been mutagenised with the chemical mutagens sodium azide and ethyl methanesulfonate, as well as heavy-ion mutagenesis using the Riken Ring Cyclotron at the RI Beam facility in Japan in collaboration with Dr Tomoko Abe. The mutagenised populations are currently being screened for a number of traits including: herbicide resistance, fibre traits, boll size, plant architecture

), and nectariless (ne1,ne2). These mutations are known to affect

arm), frego

investment of time to produce. Thomson [57] developed NILs for glabrousness (T2

A transgenic cotton plant contains gene/s that have been artificially inserted. The inserted gene (known as a transgene) may come from another cotton plant, or from a completely different species, such as *Bt* cotton; which possesses the *Bt* toxin from the bacterium *Bacillus thuringien‐ sis*. Transgenic technology enables plant breeders to bring together in one plant traits from potentially any organism, not just from within Upland cotton or sexually compatible species. Potentially this technology could be a quicker and cleaner method of bringing in traits from poor agronomic sources compared to backcrossing, but regulatory costs makes this currently unfeasible. Transgenic technology also provides the means for studying specific genes in cotton and enables the definitive assignment of genes to specific functions through either overexpression or silencing of individual genes or gene families.

It is difficult and time consuming to generate transgenic cotton, as it requires *Agrobacterium tumefaciens* mediated gene insertion via callus, generated via tissue culture. It was found in the 1980s that Coker cultivars were generally superior for gene transfer using tissue culture, although a small number of Australian cultivars could be transformed with relatively low efficiency [63]. Coker 315 is the cultivar used for all transformation events in Australia as it was thought to possess better agronomic traits under Australian conditions compared to other transformable Coker cultivars, such as Coker 312 that have been used by international biotech companies. Currently, all transgenic traits present in Australian commercial cultivars were obtained internationally under License from either Monsanto or Bayer CropScience. The focus of Australian transgenic R&D work has been on developing potential traits of more specific relevance to the Australian Industry and on basic research to understand gene function, particularly the functions of genes involved in fibre initiation and formation, plant defence and seed oil formation. Such work has demonstrated the importance of GhMyb 25 [64], GhMyb25-like [65] and GhHD-1 [66] for fibre initiation and for controlling fibre initial numbers. As part of CSIRO research, transgenic cotton plants that are altered in the expression of GhMYB109 were imported under a Material Transfer Agreement for research use [67]. CSIRO transgenic research has demonstrated through altering the expression of a cotton sucrose synthase [68] or overexpressing a potato sucrose synthase [69] the essential role of the different sucrose synthases in cotton fibre and seed formation. Expression of the *Talaromyces flavus* glucose oxidase gene in cotton demonstrated increased resistance against Verticillium wilt [70], and plants with suppressed Cadinene synthase gene expression revealed its role in bacterial blight infection [71]. Although cotton is principally a crop grown for fibre, its seeds are also a valuable source of oil. Transgenic silencing of key fatty acid desaturase genes altered the cottonseed oil composition to improve its nutritional profile, making it more competitive with other oilseed crops has also been achieved, although commercialisation of this trait is still under discussion [72]. Currently none of these CSIRO transgenic derived traits have been incorporated into commercial cultivars.

Successful expression of transgenes in cotton also requires the ability to precisely express genes at high levels. Experiments with transgenic plants have demonstrated the value of the soybean lectin gene promoter to drive transgenes in the embryo of cotton seeds [73], and subterranean clover stunt virus promoters and terminators [74] for general and high level expression in cotton. The CSIRO derived sub-clover duplicated stunt7 viral promoter was deregulated in 2008 as a commercial event in cotton (T304-40) driving the *Cry1Ab* insect resistance gene and forms part of Bayer CropScience's TwinLink product. A cotton rubisco small subunit promoter has also been shown to be expressed at high levels in green photosynthetic tissues throughout the development of cotton in the field and hence a useful promoter for expressing transgenes in leaves [75].

**5.1. SNP discovery in germplasm important for Australia**

could be validated at a rate >90% [Zhu, Q-H, pers comm.].

rate >90% [Zhu, Q-H, pers comm.].

performed using genomic DNA sources.

**5.2. International cotton SNP consortium**

blue) that allows determination of which genome the specific short reads sequences are derived from.

**5.2. International cotton SNP consortium** 

our SNP identification will in future be performed using genomic DNA sources.

Cultivar-1A ……AGCGTAGTCAGATT**A**AGTGGAATCCCTGATG…… Cultivar-1D ……AGC**C**TAGTCAGATTGAGTGGAATCCCTGATG…… Cultivar-2A ……AGCGTAGTCAGATT**G**AGTGGAATCCCTGATG…… Cultivar-2D ……AGC**C**TAGTCAGATTGAGTGGAATCCCTGATG……

**Figure 2.** Stretch of DNA sequence from two cultivars showing the sequences for both the A and D genomes. SNPs with the best validation rates are where a varietal SNP (in red) is in close proximity to a sub-genome-specific SNP (in

Although RNASeq data has allowed us to progress towards being able to effectively use SNPs for genotyping in breeding projects, the protein coding regions of genomes have been found to have a significantly lower level of DNA polymorphism than non-coding regions, so polymorphisms within genes between closely related cultivars are going to be less frequent and hence less useful. With the availability of the assembled *G. raimondii* genome and possibly of the *G. arboreum* genome soon, to serve as a framework for short read sequence alignment,

significant improvement over older technologies for large-scale genotyping.

SNP Chips enable high throughput parallel analysis (millions at a time) whereas older markers like SSR markers tend to be performed only one or a few at a time. Therefore SNP Chips represent a significant improvement over older technologies for large-scale genotyping.

To ensure that SNPs are informative to the breeding populations being developed in Australia, it is essential to find SNPs among cotton cultivars that constitute the major germplasm sources of the elite cultivars we are developing. The most straightforward method to identifying SNPs, in the absence of the Upland cotton genome sequence, is to sequence expressed gene transcripts (RNA-seq) by isolating mRNA and converting it into cDNA for sequencing. This method has been used successfully for many other plants such as maize and wheat [85]. RNA-seq targets SNP discovery to genes that are actively transcribed and therefore more likely to be associated with conferring trait differences. The disadvantage is that cDNA is likely to have lower SNP frequencies than non-expressed regions as they are constrained by the genes function. CSIRO RNA-seq data was generated on a set of 18 cultivars that represented significant genetic variation present within current Australian commercial cultivars were selected; containing old Australian and US cultivars, as well as cultivars from China and India. Over 50 million reads (of ~90 bp) for each Upland cotton sample was obtained. We found the key to identifications of varietal SNPs confidently was when a sub-genome-specific SNP was also found in close proximity to the varietal SNP (Figure 2). This enabled representative sequences from both genomes in each cultivar to be identified and compared. From the 18 cultivars ~38,000 varietal SNPs were identified. A selected subset of >1,500 of these putative SNPs were analysed using a combination of SNP platforms (GoldenGate and Sequenom) and it was found that these SNP

the Upland cotton genome sequence, is to sequence expressed gene transcripts (RNA-seq) by isolating mRNA and converting it into cDNA for sequencing. This method has been used successfully for many other plants such as maize and wheat [85]. RNA-seq targets SNP discovery to genes that are actively transcribed and therefore more likely to be associated with conferring trait differences. The disadvantage is that cDNA is likely to have lower SNP frequencies than non-expressed regions as they are constrained by the genes function. CSIRO RNA-seq data was generated on a set of 18 cultivars that represented significant genetic variation present within current Australian commercial cultivars were selected; containing old Australian and US cultivars, as well as cultivars from China and India. Over 50 million reads (of ~90 bp) for each Upland cotton sample was obtained. We found the key to identifications of varietal SNPs confidently was when a sub-genome-specific SNP was also found in close proximity to the varietal SNP (Figure 2). This enabled representative sequences from both genomes in each cultivar to be identified and compared. From the 18 cultivars ~38,000 varietal SNPs were identified. A selected subset of >1,500 of these putative SNPs were analysed using a combination of SNP platforms (GoldenGate and Sequenom) and it was found that these SNP could be validated at a

Australian Cotton Germplasm Resources http://dx.doi.org/10.5772/58414 17

Figure 2. Stretch of DNA sequence from two cultivars showing the sequences for both the A and D genomes. SNPs with the best validation rates are where a varietal SNP (in red) is in close proximity to a sub-genome-specific SNP (in blue) that allows determination of which genome the specific short reads sequences are derived from.

Although RNASeq data has allowed us to progress towards being able to effectively use SNPs for genotyping in breeding projects, the protein coding regions of genomes have been found to have a significantly lower level of DNA polymorphism than non-coding regions, so polymorphisms within genes between closely related cultivars are going to be less frequent and hence less useful. With the availability of the assembled *G. raimondii* genome and possibly of the *G. arboreum* genome soon, to serve as a framework for short read sequence alignment, our SNP identification will in future be

SNP Chips enable high throughput parallel analysis (millions at a time) whereas older markers like SSR markers tend to be performed only one or a few at a time. Therefore SNP Chips represent a

Recently an international cotton consortium was formed to create a 70,000 public Illumina Infinium SNP array for cotton. This array was made available for purchase in late 2013 and contains ~ 50,000 intra‐specific *G. hirsutum* SNPs, *~*16,000 inter-specific SNPs predominantly from *G. barbadense* but also *G. tomentosum* and *G. mustelinum*, and small numbers (~4,000) of SNPs from two diploids *G. longicalyx*  and *G. armourianum*. The publicly available SNPs were provided by a number of international groups including; CSIRO, Texas A&M, University of California-Davis, Cotton Incorporated, Brigham Young University and United States Department of Agriculture-Agricultural Research Service, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Council of Scientific and Industrial Research- National Botanical Research Institute (CSIR-NBRI), and Dow
