**4. Soybean molecular design breeding**

For molecular design breeding, the breeders could design the superior genotypes with particular breeding objectives based on the molecular networks regulating the agronomic traits. The breeding process can be simulated and optimized "in silico" before going to the field, which enhanced breeding accuracy and efficiency. Recent advances in genomics and phenomics have opened up new possibilities for more efficient molecular design breeding [71, 72]. Several soybean databases have been created for genomes, transcriptomics, proteomics, and germplasm analysis. SoyBase is a USDA-ARS database for genetics and genomics [73, 74]; SoyTEdb is a database of transposable elements [75]; SoyNet is a database for cofunctional networks [76]; SoyProDB is a database for seed proteins [77]; SoyPro [78]. These databases are quite useful for soybean molecular design breeding, which could provide the multiple levels of soybeans (**Table 2**).

#### **4.1 Applications of genome selection in soybean design breeding**

The genomics-assisted breeding (GAB) is one of important tools for molecular design breeding, which has allowed for higher genetic gain for complex traits at a lower cost, but it requires a molecular understanding of the trait [86]. MAS and GS are the two basic techniques used in GAB [87]. MAS is dependent on the presence of markers linked to the trait of interest, which can be discovered by linkage mapping or genome-wide association studies (GWASs). Many previous studies have shown that MAS may be successfully used in soybean by adding significant genes and large-effect QTLs for many attributes [88, 89]. Minor genes, on the other hand, control the majority of inheritance in complex characteristics, but they have never been studied because of the limitation of MAS [90]. Furthermore, the influence of the environment, epistatic interactions, and the effect of genetic background have made breeding complex traits extremely difficult. As a result, plant breeders have concluded that MAS is not an appropriate method for breeding complex plant characteristics [91].

GS uses the entire genome-wide marker profile of breeding lines to predict the genomics- estimated breeding value (GEBV) using several models, preventing the loss of a significant percentage of variation dictated by modest impact QTLs/ genes [90]. However, precise genotyping and phenotyping analyses are required for accurate detection of marker-trait relationships and determination of GEBV, which determines the effectiveness of GAB. Manual low-throughput phenotyping and genotyping frequently result in the identification of false positives or negatives [37]. In this sense, high-throughput genotyping and phenotyping based on next-generation sequencing (NGS) enables for successful MAS and GS, as well as greater molecular design breeding programs success [92, 93]. The availability of high-throughput NGS-based genotyping methods has significantly speeded up the gene identification

#### *Soybean Molecular Design Breeding DOI: http://dx.doi.org/10.5772/intechopen.105422*


**Table 2.**

*Resources and databases of soybean.*

and GS, particularly in agricultural plants with bigger and more complex genomes, such as soybeans [87]. In this regard, phenomics and genomics are equally important for accurate gene identification and the development of a GS model to quantify the breeding population's GEBV (BP). Consequently, integrating these methodologies with suitable genetic diversity, soil and meteorological data, analytical tools, and databases, new varieties with improved yield, quality, and stress tolerance might be developed quickly [91].

MAS has not yielded satisfactory results in soybean for minor genes that contribute only a modest amount of obvious phenotypic variation for the complex trait [48, 87]. Most economically important soybean traits, including as yield, oil and protein content, and stress tolerance, are complex in nature, with modest effect genes controlling the majority of phenotyping variance for these traits [94]. The GS develops a prediction model by combining marker profile and phenotypic data from the training population, which is then used to estimate the GEBV of all BP individuals [95, 96]. Cross-validation on subsets of the training population is used to assess the accuracy of the prediction model before using it to select individuals from BP [87]. Following successful validation, this model can be used to select desirable plants from the BP based on GEBVs estimated solely from marker/genotypic data; hence, only genotypic data are utilized to predict the phenotypic performance of BP individuals [97]. The main benefit of GAB is that genotypic data collected at an early stage of plant development (such as seedling) can be utilized to predict phenotypic performance in mature individuals. As a result, it can significantly reduce the amount of time, money, and labor required for broad phenotypic examination across many habitats and years [98]. GAB also allows higher number of breeding selection cycles and genetic gain per unit time [87].
