**4. Association mapping of fiber traits using genotyping by sequencing (GBS)**

Molecular markers are highly favored for linkage map development because they are polymorphic, easily transferred to next generation with Mendelian ratio and do not show epistasis. Molecular breeding with highly saturated maps having QTLs connected with economic traits through impactful genetic markers provides a good source for cotton improvement [64]. Genomic analysis in many crop species including cotton has been done using populations derived from hybridization of only two ancestors; which is major drawback for omics information. Therefore, there has been hindrance in applying QTL information gained from such populations to accomplishing breeding objectives, as, in these populations, the genetic aspects are the same owing to the share of genetically similar backgrounds.

The foundation of association mapping is on hypothesis about occurrence of markers as a panel in which the alleles are found almost adjacent to the required traits with co-segregation and thought to be in linkage disequilibrium. Germplasm entries are used for determining QTLs of interest using genome wide association mapping [82]. There are many agents including type of copulation, gene flow frequency and population structure can affect such mapping approach [18]. Association mapping allows to overcome drawbacks found in bi-parental mapping from traditional methods which include using populations which are found as well-established genotypes, detects only the required gene and identify high polymorphism [83–85]. This methodology also urges to use knowledge based on linkage disequilibrium instead of linkage mapping.

Marker assisted breeding involves recent approaches of genomics combined with traditional breeding procedures for improving traits in crop sciences. For this reproducibility is essential among genetic markers. Morphological characters grading and genotyping with molecular markers is accomplished [86]. Molecular markers are very effective for identifying and overcoming problems for transfer of traits from other species such as segregation distortion [87]. Genetic markers are effective for determining genetic variation in Gossypium gene pool. [88] classified DNA markers into groups: 1) non-hybridization based; which include Amplified Fragment Length Polymorphism (AFLP), Simple Sequence Repeats (SSR), Sequence Repeat Amplified polymorphism (SRAP), İnter-Simple Sequence Repeats (ISSR), Expressed Sequence Tag (EST-SSR), Single Nucleotide Polymorphism (SNPs) etc. Numerous linkage maps have been developed in allotetraploid cotton employing diverse mapping populations and different DNA markers techniques [76, 89–94]. Numerous SSRs and SNPs have been evolved in cotton [95–99]. Saturated genetic maps development through loci information of SSR and SNPs in cotton paves the way for ascertaining quantitative traits related to breeder objectives [100–104] Nonetheless, association analysis and very fine mapping is not possible owing to less information from these maps. It is need of the day that highly saturated mapping should be devised in cotton for overcoming the sequencing drawbacks and fastening the variety development.

Availability of microsatellites (SSR) and single-nucleotide polymorphisms (SNPs) have fastened genome mapping owing to their wider applicability in diverse populations derived from discrete genetic backgrounds [93, 95, 99, 105–107]. Thanks to advances in genotyping and SNPs calling tools; broadening of genetic base is being explored excessively in plants owing to availability of valuable loci information [108–114].

Single nucleotide polymorphisms are distinct points of nucleotides on chromosomes between two genotypes differentiated by a single base [64]. [115] speculated that each SNP is found after 100-300 bp in any genome while revealed that such genetic markers are highest in occurrence than any other marker and manifest higher degree compared to microsatellites. SNPs can be formed rapidly with economical cost owing to availability of high-throughput tools for genotyping [116]. Assessment of gene expression [117, 118], genome wide association [68, 119] and SNPs detection has been carried among the individuals having different sizes of genomes and also polyploid species having limited genetic variation like cotton [10, 120] and wheat [121] through low-cost high-throughput genotyping tools. SNPs have been explored and genotyped among different species via diverse ways [10, 120–122].

Genotyping-by-sequencing (GBS) is powerful and easy approach which paves the way for the discovery of numerous SNPs concurrently among large number of genotypes [123]. Restriction enzymes with methyl sensitivity are used to mark the flanking restriction sites in the genome for the development of reduced representation of the genome via GBS [121, 122]. GBS method is much easier, requires lower amount of DNA and library preparation is achieved in just two steps on plates, circumvents DNA fragment analysis preceded by PCR amplification of pooled library in contrast to reduced representation libraries (RRL) and restriction site associated DNA (RAD) [122]. The discovery and verification of reproducibility is not required in this procedure and can be applied in any species having polymorphism or mapping population with diverse size [124]. A number of SNPs has been discovered in many species using GBS like maize [122], wheat, barley [121], sorghum [125], rice [126], soybean [127], oat [128] and cotton [10, 79, 129, 130].

Association mapping furnishes saturated map of desired trait in contrast to pair of genes harboring a required character [131]. Therefore, verification of QTLs is compulsory for mapping. Association mapping is the way to examine genetic variation of required characters; integrates the variation of the desired characters through reproducibility of the alleles and genetic markers are selected connected to economic traits using linkage disequilibrium extent [132]. Moreover, LD elaborates the ancestral pattern through information among populations and ecology [133, 134].

LD based association mapping has been applied by using different strategies for determining genetic diversity contributing source pattern and design of population [135, 136]. Grouping of population individuals with combined genetic distance among the entries established via LD [137–139]. LD extent among natural population is not contributed by linked loci but non-homologous chromosomes are also involved, accountable to selection, behavior of population and hybridization. Owing to which immense care should be considered for analyzing such relations. Reproducibility in a sequence controlling a specific character is the property of this mapping [140]. Moreover, considerable concern is prevailed among association studies and linkage mapping relating to depth and precision of QTLs, the magnitude of knowledge and evaluating procedures [132].

In spite of the fact, statistical analysis is not appropriate with LD derived tools. Natural population partitioned into distinct categories with model-based procedures [141]. Bayesian modeling is used widely for assessing the probability of a genotype related to a specific population category through allele repetition. With this technique the genotypes are allotted to particular population which can be

interspersed into statistical methods for association mapping with population organization. The population framework is analyzed by using STRUCTURE software [135] which has been used for association studies in many plants. Various studies have been conducted in cotton for different aspects in cotton through association mapping like seedcotton yield and components [142–144], salt tolerance [145], architecture of plant, earliness [146] and protein and oil contents [147] and fiber quality [8, 60, 132, 148–150].

In-contrast to genetic mapping in populations developed from hybridization of parents using conventional ways are not saturated, labor intensive, always in danger, high investment for development and more work after evaluating numerous genotypes of gene pool [84]. Nonetheless, association mapping use LD and overcomes the requirement of bi-parental populations by utilizing the extent of genetic variation present within the available stable populations like cultivars, accessions developed with the time and maintained as gene pool. Association mapping on whole genome has been studied in Arabidopsis [151], rice [152] for observing loci connected to economical characters. Association studies allow the development of highly saturated maps via determination of QTLs related to economic characters at whole genome level in permanent mapping populations.

Abdurakhmonov et al. [60] used association analysis for observing association among fiber traits in cotton among germplasm entries for utilizing the genetic variation in marker-based breeding. Linkage disequilibrium based association mapping determined in the germplasm having diverse genotypes from all over the world. 95 SSR were screened among all germplasm entries for ascertaining QTLs at whole genome level associated with fiber properties. They found about 11–12% LD among all SSRs. They also observed significant population orientation among all entries. They employed mixed linear model and general linear model using kinship and population structure and as a whole determined 6 & 13% pair of primers related to fiber quality. They concluded that the markers selected in this study can be used for refinement of fiber using hidden sources of genetic variability.

Genetic variation, population behavior and LD based association analysis for fiber conducted in germplasm under two different climatic zones [85]. The upland gene pool containing 335 elite entries screened with 202 SSRs. Mean of LD prolonged to 25 cM at whole genome level among all genotypes at 0.01 probability. They found that LD dropped to about 5 cM at (r2 > 0.2) showing potential for association among genotypes for yield contributing characters. They performed mixed linear model and population analysis for observing association contributing to permutation significance and population pattern. As a whole developed many common markers for fiber traits among genotypes in both locations. They revealed that mixed linear model associations ranged from 7 to 43% having strong to very strong relation to fiber properties as confirmed by Bayes factor which will be a very effective source for association analysis of yield improvement in marker based breeding techniques.

Wang et al. [153] found association among yield and fiber characters in using mixed linear model in pima cotton germplasm entries. They observed 72 loci, out of which 46 were connected to fiber while 26 related to cotton. They concluded that marker-associations among fiber characters are of vital value for enhancing quality.

Fang et al. [154] used multi-parents population for observing association among yield and fiber quality traits. They revealed that common and new QTLs deducted in this study can be used for overcoming problems in fiber quality enhancement. They screened 1582 polymorphic microsatellites among 275 RILs in first set developed from diverse parents for screening QTLs connected to fiber. 131 QTLs found for fiber quality sharing characters via association analysis with TASSEL while same

#### *Association Mapping for Improving Fiber Quality in Upland Cottons DOI: http://dx.doi.org/10.5772/intechopen.94405*

QTLs verified in second set of 275 RILs with 270 SSR. The distinction showed that 54 new QTLs and 77 QTLs are in accordance to previous studies.

Genetic map constructed using RIL developed from transference of superior fiber quality from *G. barbadense* (TM-1) to *G. hirsutum* cv. NM24016 and relationship determined among yield components and fiber. 429 SSR and 412 GBS-based single nucleotides were involved in the development of map which spanned to about half length of upland cotton genome [10]. They revealed that all makers are distributed randomly among all loci of the genome. The yield components and fiber characters showed extreme phenotypic expression under multiple locations. They found 28 QTLs which are useful from breeding perspectives for agronomic and fiber properties.

Cai et al. [8] used 99 upland cotton genotypes to ascertain the association for fiber traits. The relationship among fiber components determined with 97 polymorphic microsatellites. The genomic regions associated with fiber were 107 including 70 in 2 or more than 2 zones and 37 found in just one. It was revealed that most of the associations were reliable as verified from earlier findings for fiber quality. They also observed genomic regions related with 2 or more characters and assumed that such regions derived from the genotypes which are having minor allele frequency less than five, from local sources or acclimatized in china. They concluded that fiber traits can be renovated by using such loci from diverse resources.

Islam et al. [123] carried GBS for observing SNPs which can be used for improving economic traits in cotton gene pool. RILs and 11 contrasting parents were used in the study with two separate methods were applied for determining SNPs with variant allele frequency of >0.1. SNPs quality control performed and calling done with available G. raimondii Ulbrich genome. As a whole 1071 and 1223 SNPs observed among At and Dt genomes respective. Moreover these SNPs were found in coding region usually in higher frequency. GBS was conducted in germplasm consisting of 154 accessions for the verification of 111 of total SNPs and the SNPs verified in all parents and none of the genotype was found with same SNP. They revealed that SNPs can be determined in *G. hirsutum* with ease and genetic improvement can be done after getting true SNPs.

Association among fiber traits conducted in germplasm collection of Hawaiian cotton consisting of 503 genotypes [132]. They used 494 microsatellites at whole genome and as a whole 179 replicable SSRs were screened among genotypes under diverse climatic conditions. Population pattern and LD used for observing association among various fiber traits with mixed linear model via TASSEL program. The QTLs were selected among markers and phenological characters with association values. 426 alleles were evolved and germplasm was differentiated into seven subgroups upon the basis of hybridization, climate and topographical pattern. 216 polymorphic loci were associated with fiber contributing characters having mean of 2.7% and showed phenotypic variation from 0.58–5.12%. LD decreased significantly to 0-5 cM and observed 13 QTLs which are same to earlier findings and 3 connected to similar character while 7 QTLs were corresponded to fiber formation. They concluded that novel alleles identified based association mapping based LD for fiber quality can be applied in breeding cultivars for tagging genes of interest.

GBS carried in a population evolved using various parents for overcoming the inverse relation among yield and fiber traits [155]. They assumed that GBS will serve as a valuable source for the development of high saturated map with the development of large frequency of SNPs. Association analysis via mixed linear model in TASSEL observed among fiber traits in four separate climates with 5071 SNPs developed from GBS and 223 SSRs from 547 RILs. One QTL cluster related to fiber traits including length, short fiber content, strength and uniformity found and verified on locus A07. They also studied the ultimate genes connected to fiber

traits and revealed that SNP (CFBid0004) formed from deletion of 10 bp GhRBB1\_ A07 is directly associated with fiber traits among RIL and 104 approved American varieties. Moreover, GhRBB1\_A07 can be used in MAS for the improvement of fiber traits among germplasm entries.

Sun et al. [150] studied the genetic architecture of major fiber traits in cotton germplasm using association mapping under different climatic zones. The mixed linear model association analysis showed that fiber length, strength and uniformity had 16, 10 and 7 SNPs respectively while G. raimondii 7th chromosome had two main genomic locations and fiber length contributing four genes were also observed. Moreover population structure showed that populations from low peaks were having less genetic variation among accessions compared to high peaks. The valuable allelic frequency was more in genotypes from less elevation in-contrast to high. They concluded that the desired allelic number among genotypes can be used for enhancement of fiber.

Association was observed for plant ideotype, heat tolerance, yield contributing traits and fiber quality among germplasm collection under different climatic conditions for consecutive three years at whole genome [156]. The genetic stock associations were observed using SNPs. Fiber characters were found to be low to highly heritable as value ranged from 0.26–0.89 for boradsense heritability as compared to yield components having 0.14–0.43. Phylogenetic analysis showed that the genotypes were developed from diverse parents having multiple characters from breeding perspectives. They pointed that less number of informative markers can be used for association mapping studies as LD value found upto 5Mbp which decreased to 2Mbp at r2 ≥ 0.2. 17 significant SNPs connected fiber length while 50 SNPs for fineness were observed using mixed linear model. The results revealed that associations among most of the characters at whole genome were non-significant as numerous SNPs impact on phenotype was found lower than 5% and assumed this to be due to low reproducibility of markers among cotton or SNP Chip less coverage in the germplasm.

Sun et al. [150] used association analysis in germplasm containing wide variation among genotypes at multiple locations for fiber quality traits. Illumnia SNP array was used for genome-wide study for quality analysis. They found 10,511 SNPs which were distributed over all loci and 46 SNPs associated with fiber quality with significance. They observed two QTLs for strength and length on At07 and Dt11.
