**2. Genomic selection scheme**

Genomic selection is to assemble a training population for individuals for which both genotypes and phenotypes are available and use those data to create a statistical model that relates variation in observed genotypes marker loci to variation in the observed phenotypes of the individuals. Multiple generations of parents and progenies provided powerful training population than a single generation individuals and larger number of individual's generations and markers provide more powerful training population (TP). The statistical model obtained from genotype and phenotype is then applied to a prediction population comprised of individuals for which genotypes are available, but phenotypes are not. GS is based on similarity between the training population (TP) and breeding population (BP) in the LD between marker loci and trait loci. This similarity may exist because breeding population is selected from training population or descended from training population or because density of markers is so high that every trait locus is in disequilibrium with at least one marker locus across the entire population of the target species. The training population is genotyped and phenotyped to train the genomic selection (GS) prediction model. In Genomic selection main role of phenotyping is to calculate effect of markers & cross validation. Genotypic information from the breeding material is then fed into the model to calculate genome estimated breeding values (GEBV) for these lines (**Figure 2**).

The availability of low cost and extensive molecular markers in plants has allowed breeders to raise however molecular markers might best be used to win breeding progress. Additionally advances in high-throughput genotyping have markedly reduced the value per data point of molecular markers and increasing genome coverage. This reduction was in the main the results of three parallel developments [2] (i) the invention of huge numbers of single nucleotide polymorphism (SNP) markers in several species; (ii) development of high-throughput technologies, like multiplexing and gel-free deoxyribonucleic acid arrays, for screening SNP polymorphisms; and (iii) automation of the marker-genotyping method, together with efficient procedures for deoxyribonucleic acid extraction [2]. Phenotyping prices are increased Genotyping prices are being reduced and marker densities are being

**Figure 2.** Genomic selection scheme. Information on phenotype and genotype for a training population allows estimating

The Usage of Genomic Selection Strategy in Plant Breeding

http://dx.doi.org/10.5772/intechopen.76247

97

Statistical strategies are inadequate for improving polygenic traits controlled by several loci of small impact. There will be more markers (explanatory variables) than lines (observations) that introduce statistical issues. Drawback of small p (number of traits) and enormous m (number of markers) ends up in a lack of degrees of freedom. The foremost acceptable statistical model is required to at the same time estimate several marker effects from a limited range of phenotypes. In so-called "large p, small m" problems, standard multiple linear regression cannot be used without variable selection, that conflicts with the initial goal of avoiding marker selection. To overcome these issues, a range of ways, e.g., best linear unbiased prediction, ridge regression, Bayesian regression, kernel regression and machine learning methods

The most economical use of GS is to exchange expensive and long phenotyping by a prediction of the genetic worth of the character below selection (or any multi trait index). Thus, the foremost expected advantage is to shorten selection cycles. However, to learn from shorter

are projected to develop prediction models for genomic selection.

increased speedily.

parameters for the model. (Modified: Castro et al. 2012) [12].

## **2.1. Need of genomic selection**

Traditional marker assisted selection, whereas helpful for merely transmitted traits controlled by few loci, loses effectiveness because the number of loci will increase. This is often true for individual quantitative traits or once multiple traits are below selection. Quantitative traits like grain yield, abiotic stress have verified hard to enhance with marker-assisted selection. The main limitations are (i) tiny population sizes and traditional statistical strategies that have inadequate power to find and accurately estimate effects of small-effect quantitative trait loci (QTL) and (ii) gene x gene interactions (epistasis) and (iii) genotype x environment interactions (G.E) that have restricted the exchangeability of quantitative trait loci result estimates across populations and environments. The Beavis effect is a statistical phenomenon in biology that refers to the overestimation of the effect size of quantitative trait loci (QTL) as a result of small sample sizes in QTL studies.

quantitative traits, new statistical approaches that might account for this uncertainty were required to get the most effective predictions potential. Finding problem with locus identification, entailed that the consequences for all marker loci be at the same time estimated. Once a prediction based on allele effects, the allele becomes the unit of analysis. Alleles are so the units that need to be replicated inside and across environments. However that replication will occur in spite of the particular lines carrying the alleles such lines themselves no longer need to be replicated. Within the breeding context, removing the requirement for line replication opens the likelihood of dramatically increasing the amount of lines pushed through the pipe-

Genomic selection is to assemble a training population for individuals for which both genotypes and phenotypes are available and use those data to create a statistical model that relates variation in observed genotypes marker loci to variation in the observed phenotypes of the individuals. Multiple generations of parents and progenies provided powerful training population than a single generation individuals and larger number of individual's generations and markers provide more powerful training population (TP). The statistical model obtained from genotype and phenotype is then applied to a prediction population comprised of individuals for which genotypes are available, but phenotypes are not. GS is based on similarity between the training population (TP) and breeding population (BP) in the LD between marker loci and trait loci. This similarity may exist because breeding population is selected from training population or descended from training population or because density of markers is so high that every trait locus is in disequilibrium with at least one marker locus across the entire population of the target species. The training population is genotyped and phenotyped to train the genomic selection (GS) prediction model. In Genomic selection main role of phenotyping is to calculate effect of markers & cross validation. Genotypic information from the breeding material is then fed into the model to calculate genome estimated breeding values (GEBV) for

Traditional marker assisted selection, whereas helpful for merely transmitted traits controlled by few loci, loses effectiveness because the number of loci will increase. This is often true for individual quantitative traits or once multiple traits are below selection. Quantitative traits like grain yield, abiotic stress have verified hard to enhance with marker-assisted selection. The main limitations are (i) tiny population sizes and traditional statistical strategies that have inadequate power to find and accurately estimate effects of small-effect quantitative trait loci (QTL) and (ii) gene x gene interactions (epistasis) and (iii) genotype x environment interactions (G.E) that have restricted the exchangeability of quantitative trait loci result estimates across populations and environments. The Beavis effect is a statistical phenomenon in biology that refers to the overestimation of the effect size of quantitative trait loci (QTL) as a result of

line of a breeding program, and successively of accelerating selection intensity.

**2. Genomic selection scheme**

96 Next Generation Plant Breeding

these lines (**Figure 2**).

**2.1. Need of genomic selection**

small sample sizes in QTL studies.

**Figure 2.** Genomic selection scheme. Information on phenotype and genotype for a training population allows estimating parameters for the model. (Modified: Castro et al. 2012) [12].

The availability of low cost and extensive molecular markers in plants has allowed breeders to raise however molecular markers might best be used to win breeding progress. Additionally advances in high-throughput genotyping have markedly reduced the value per data point of molecular markers and increasing genome coverage. This reduction was in the main the results of three parallel developments [2] (i) the invention of huge numbers of single nucleotide polymorphism (SNP) markers in several species; (ii) development of high-throughput technologies, like multiplexing and gel-free deoxyribonucleic acid arrays, for screening SNP polymorphisms; and (iii) automation of the marker-genotyping method, together with efficient procedures for deoxyribonucleic acid extraction [2]. Phenotyping prices are increased Genotyping prices are being reduced and marker densities are being increased speedily.

Statistical strategies are inadequate for improving polygenic traits controlled by several loci of small impact. There will be more markers (explanatory variables) than lines (observations) that introduce statistical issues. Drawback of small p (number of traits) and enormous m (number of markers) ends up in a lack of degrees of freedom. The foremost acceptable statistical model is required to at the same time estimate several marker effects from a limited range of phenotypes. In so-called "large p, small m" problems, standard multiple linear regression cannot be used without variable selection, that conflicts with the initial goal of avoiding marker selection. To overcome these issues, a range of ways, e.g., best linear unbiased prediction, ridge regression, Bayesian regression, kernel regression and machine learning methods are projected to develop prediction models for genomic selection.

The most economical use of GS is to exchange expensive and long phenotyping by a prediction of the genetic worth of the character below selection (or any multi trait index). Thus, the foremost expected advantage is to shorten selection cycles. However, to learn from shorter cycles, the genetic gain per selection cycle ought to be near that predicted from phenotypic or combined MAS + phenotypic selection. Progeny testing schemes have a high accuracy of selection, however the time interval is also additional, takes long term to perform a cycle of selection that decreases the genetic gain. The univariate breeder's equation was used for the GS-BPs as a result of they include just one stage of selection [3]. Selection accuracy is adequate to the correlation between selection criteria and breeding value (i.e., correlation between phenotypes or GEBVs and true breeding values [TBVs]). In oxen, Schaeffer [4] determined that the time and value savings exploitation GS with GEBV accuracy of 0.75 would increase genetic gain twofold and supply a price savings of ninety two in comparison to the present ways.

The power to calculate extremely correct GEBVs and also the potential to drastically cut back makeup analysis frequency and selection cycle time expedited a speedy adoption of genomic

) + *ei* (1)

The Usage of Genomic Selection Strategy in Plant Breeding

http://dx.doi.org/10.5772/intechopen.76247

). Further similarities among GS models can be seen

is a *1 x p* vector of SNP gen-

a residual

99

selection and is revolutionizing the oxen breeding trade (**Figure 3**).

is an observed phenotype of individual i (i = 1 … n) and *xi*

the well-known cost function is simply the sum of squared residuals.

lines that genotypical, however no phenotypical, information is available.

otypes on individual *i, g(xi)* is a function relating genotypes to phenotypes, and *ei*

by recognizing that they all seek to minimize a certain cost function. In least squares analysis,

Evaluating GEBV accuracy through cross validation (CV). CV entails splitting the data into training and validation set. The ratio of observations in each set varies, but often a fivefold CV is used, that is, the data set is randomly divided into five sets, with four sets being combined to form the training set and the remaining set designated as the validation set. Each subset of the data is used as the validation set once, before applying of the prediction model to the breeding population, the accuracy of the model should be tested. For this, most of the training population is used to create a prediction model, which is then used to estimate the genomic estimation breeding values of the remaining individuals in the training population, using genotypic data only. This permits researchers to "test" and refine the prediction model to make sure the prediction accuracy is high enough that future predictions are often relied upon. Once valid, the model is often applied to a breeding population to calculate GEBVs of

The prediction accuracy of the GEBVs is evaluated by the correlation between the GEBVs and empirically estimated breeding values, r(GEBV: EBV), where the EBV can be obtained in a number of ways, most simply, as a phenotypic mean. This correlation provides an estimate of selection accuracy and thus directly relates GEBV prediction accuracy to selection response [2]. Other statistics such as mean-square error (MSE) are used occasionally [3]. Genomic selection accuracy is defined as the correlation between GEBV and the true breeding value (TBV), that is, r(GEBV:TBV). Since we can only measure r(GEBV:EBV), this measure needs to be converted to an estimate of r(GEBV:TBV). To do so, it is assumed that r(GEBV:EBV) = r(GEBV:TBV) X r(EBV:TBV). This assumption is correct if the only component common between the GEBV and the EBV is the TBV itself. In other words, the assumption holds if GEBV = TBV + e1 and EBV = TBV + e2, where e1 and e2 are uncorrelated residuals. The assumption could be violated if the training and validation data were collected in the same

**2.2. Model for genomic selection**

The basic model may be denoted as

where *Yi*

**2.3. Cross-validation**

*Yi* = g(*xi*

term. The GEBV is generally equal to g (xi

**2.4. Genomic selection prediction accuracies**

**Figure 3.** Genomic selection scheme.

The power to calculate extremely correct GEBVs and also the potential to drastically cut back makeup analysis frequency and selection cycle time expedited a speedy adoption of genomic selection and is revolutionizing the oxen breeding trade (**Figure 3**).
