**1. Introduction**

Soybean (*Glycine max* [L.] Merr.) accounts around 60% of the world's oilseed consumption and also 68% of world protein meal consumption (http://www.soystats.com), which plays an important role year by year. In addition, during oil purification, protein-rich soybean meal is produced, which also provided around 75% of protein meal for animal feed worldwide [1]. Thus, improvement of soybean quality is important for worldwide commercial production, and it is also a key target for soybean breeding.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## **1.1. Soybean protein and oil content QTL analysis**

Soybean oil and protein content were quantitative trait and effected by multiple genes and environments factors [2, 3]; there were over 312 soybean oil QTLs and 231 soybean protein QTLs having been detected by different population and environments (SoyBase, http://www. soybase.org), with the main mapping methods including the analysis of variance (ANOVA; [4]), interval mapping (IM; [5–7]), composite interval mapping (CIM; [8, 9]), multiple interval mapping (MIM; [10]) and inclusive composite interval mapping (ICIM; [11]). Among the published soybean oil content QTLs, some of them showed 'hot regions' that have been identified four or more times at the same or similar intervals in different studies, which include Gm05: 35.2–40.8 Mb, Gm09: 40.3–46.8 Mb, Gm12: 34.1–40.6 Mb, Gm14: 33.8–49.2 Mb, Gm15: 0.8–13.9 Mb, Gm18: 51.6–59.8 Mb, Gm19: 32.9–48.0 Mb and Gm20: 23.5–34.6 Mb [12]. For soybean protein content, there were also some 'hot regions' included Gm04: 43.6–47.7 Mb, Gm05: 39.7–41.4 Mb, Gm07: 4.2–9.6 Mb, Gm08: 5.8–10.2 Mb, Gm14: 4.8–9.6 Mb, Gm15: 0.0–7.5 Mb, Gm18: 47.9–54.0 Mb, Gm19: 35.5–42.1 Mb and Gm20: 2.1–34.2 Mb [13, 14]. Meta-analysis is a statistical method that could combine results from different sources in a single study [15]; it can increase QTL precision and validity by using mathematical models to refine the integration of QTLs [16] and have been performed in maize [17] and soybean [18] at the beginning of application. Meta-analysis method has also been employed to analyze the soybean oil and protein content separately by Qi et al. [19, 20].

the biochemical pathway about synthesis of lipid has been studied thoroughly, the regulation mechanism is unclear till now [41–47]. De novo synthesis of fatty acid mainly started in plant plastid. Acetyl -CoA is a precursor of soybean seed fatty acid synthesis. It is an important intermediate of many cellular metabolisms, and it synthesizes a lot in plant cell and then acetyl-CoA carboxylase (ACCase) catalyzes the first committed step of fatty acid synthesis, acetyl-CoA carboxylate to malonyl-CoA [48]. After that, malonyl-CoA has been catalyzed by fatty acid synthase complex (FAS) and proceeding of continuous polymerization reaction based on the acyl carbon chains synthesized with a frequency of two carbons per cycle. The growing acyl carbon chain binds to acyl-carried proteins (ACP) and termination with the acyl-ACP thioesterase or acyltransferase form into acyl ACP. Furthermore, different lengths of acyl ACP synthesized the acyl-CoA with acyl-CoA synthetase and transferred from the plasmids to the endoplasmic reticulum or the cytoplasm. At last, fatty acids were attached to glycerol to synthesize triacylglycerides (TAGs) with three different acyltransferases respectively [49–52]. Till now, seed oil content can be increased by changing the expression levels of individual enzymes involved in oil metabolism [53–59]. However, the key enzyme responsible for TAG assembly is encoded by diacylglycerol acyltransferase 1 (*DGAT1*) [59–61], and expression of *DGAT1* can be used to draw fatty acids into TAG; overexpression of *DGAT1* could increase both seed oil content (by 9–12%) and seed weight (40–100%) in *Arabidopsis* [55]. Overexpression of *TmDGAT1a* and *TmDGAT1b* could increase soybean seed oil content [62]. *SiDGAT1* encoding acyl-CoA could also increase soybean seed oil content [63]. When expressing *VgDGAT1A*, (from *Vernonia galamensis*) it could make soybean oil content increase obviously [64]. Furthermore, the speed limit of fatty acid biosynthesis enzyme in dicotyledonous plants is biotin carboxylase (BC), which is a vital subunit of acetyl-CoA. Li et al. [65] cloned four genes encoding *BC* from *Brassica napus* and elucidated the evolution and the regulation of ACCase in the *Brassica*. The cytosolic enzyme glyceraldehyde-3-phosphate dehydrogenase (GAPC) catalyzes a key reaction in glycolysis, whose levels are directly correlated with seed

Soybean Breeding on Seed Composition Trait http://dx.doi.org/10.5772/intechopen.74353 25

Fatty acid composition were determined mainly by five fatty acids, palmitic (C16:0), stearic (C18:0), oleic (C18:1), linoleic (C18:2) and linolenic (C18:3) [67, 68]. Most palmitic acid (16:0) produced by the type II synthase is elongated to stearic acid (18:0) [67, 69]. In recent decades, there were many reports about the QTLs of each components of fatty acid, and there were also some 'hot regions' for soybean seed linoleic included Gm05 39.36–40.87 Mb and Gm18 48.35–50.78 Mb (with the original QTLs from Diers and Shoemaker [70]; Bachlava et al. [71]; Li et al. [65]; Xie et al. [72]); for soybean seed linolenic included Gm02 17.07–34.9 Mb, Gm09 34.56–37.74 Mb, Gm14 17.08–39.5 Mb and 45.68–46.78 Mb, Gm15 6.7–7.71 Mb, 13.07–25.6 Mb and Gm19 35.75–37.38 Mb (with the original QTLs from Li et al. [65], Bachlava et al. [71]; Diers and Shoemaker [70]; Spencer et al. [73]; Reinprecht et al. [39]; Xie et al. [72]; Shibata et al. [74]; Hyten et al. [38]); for soybean seed oleic included Gm05 39.07–40.80 Mb and Gm18 49.24– 51.95 Mb (with the original QTLs from Diers and Shoemaker [70]; Reinprecht et al. [39]; Xie et al. [72]); for soybean seed palmitic included Gm05 2.84–3.92 Mb, Gm09 7.74–11.83 Mb and 34.59–38.73 Mb, Gm15 9.13–13.16 Mb, Gm17 7.60–9.45 Mb and Gm18 38.38–41.09 Mb (with the original QTLs from Li et al. [75]; Wang et al. [76]; Xie et al. [72]; Hyten et al. [38]; Li et al. [65]; Kim et al. [77], Reinprecht et al. [39]). In soybean, stearoyl-acyl carrier protein desaturase

oil accumulation [66].

However, soybean oil and protein content always showed the opposite relationship [21, 22], with the observation and data collections from many classical genetic analysis, the high oil variety with lower protein content and high protein variety with lower oil content [23]. And also, many classical genetic and breeding books or data noted the opposite relationship for soybean oil and protein content [2, 24–34]. Although it was very hard to find the locus which could increase soybean oil and protein content at the same time [35], based on the big amounts of QTL mapping results, few regions showed the same direction of contribution to soybean oil and protein content in the same genetic population. Orf et al. [36] mapped the additive QTL affected the soybean oil content at 39.5–41.2 Mb of Gm05 with the population crossed by Minsoy and Noir1, the results implied Minsoy bring the positive alleles for increasing soybean oil and protein content, however, Specht et al. [37] identified the similar region with the opposite results that Noir1 bring the positive alleles. Hyten et al. [38] identified a QTL at 4.8–8.7 Mb of Gm07 and the parent Williams bring the positive alleles for both traits. Reinprecht et al. [39] also demonstrated that the variety OX948 bring the positive alleles. Mao et al. [40] identified the additive QTLs affected the soybean oil and content at 51.2–56.3 Mb of Gm01, 1.0–2.3 Mb of Gm09 and 39.4–46.1 Mb of Gm19 in the cross population of Hefeng47 and Heinong37, which indicated that the soybean variety Heinong37 bring the positive alleles of those regions that could increase the soybean oil and protein content at the same time. Heinong37 was the only one Chinese variety, which may bring the positive alleles for both traits based on published data.

## **1.2. Soybean fatty acid composition biosynthesis and transcriptional regulation**

The accumulation of starch, lipid and protein supplied the raw materials and energy for soybean seed growth and maturity. Lipid was one of the three significant raw materials, although the biochemical pathway about synthesis of lipid has been studied thoroughly, the regulation mechanism is unclear till now [41–47]. De novo synthesis of fatty acid mainly started in plant plastid. Acetyl -CoA is a precursor of soybean seed fatty acid synthesis. It is an important intermediate of many cellular metabolisms, and it synthesizes a lot in plant cell and then acetyl-CoA carboxylase (ACCase) catalyzes the first committed step of fatty acid synthesis, acetyl-CoA carboxylate to malonyl-CoA [48]. After that, malonyl-CoA has been catalyzed by fatty acid synthase complex (FAS) and proceeding of continuous polymerization reaction based on the acyl carbon chains synthesized with a frequency of two carbons per cycle. The growing acyl carbon chain binds to acyl-carried proteins (ACP) and termination with the acyl-ACP thioesterase or acyltransferase form into acyl ACP. Furthermore, different lengths of acyl ACP synthesized the acyl-CoA with acyl-CoA synthetase and transferred from the plasmids to the endoplasmic reticulum or the cytoplasm. At last, fatty acids were attached to glycerol to synthesize triacylglycerides (TAGs) with three different acyltransferases respectively [49–52]. Till now, seed oil content can be increased by changing the expression levels of individual enzymes involved in oil metabolism [53–59]. However, the key enzyme responsible for TAG assembly is encoded by diacylglycerol acyltransferase 1 (*DGAT1*) [59–61], and expression of *DGAT1* can be used to draw fatty acids into TAG; overexpression of *DGAT1* could increase both seed oil content (by 9–12%) and seed weight (40–100%) in *Arabidopsis* [55]. Overexpression of *TmDGAT1a* and *TmDGAT1b* could increase soybean seed oil content [62]. *SiDGAT1* encoding acyl-CoA could also increase soybean seed oil content [63]. When expressing *VgDGAT1A*, (from *Vernonia galamensis*) it could make soybean oil content increase obviously [64]. Furthermore, the speed limit of fatty acid biosynthesis enzyme in dicotyledonous plants is biotin carboxylase (BC), which is a vital subunit of acetyl-CoA. Li et al. [65] cloned four genes encoding *BC* from *Brassica napus* and elucidated the evolution and the regulation of ACCase in the *Brassica*. The cytosolic enzyme glyceraldehyde-3-phosphate dehydrogenase (GAPC) catalyzes a key reaction in glycolysis, whose levels are directly correlated with seed oil accumulation [66].

**1.1. Soybean protein and oil content QTL analysis**

24 Next Generation Plant Breeding

protein content separately by Qi et al. [19, 20].

bring the positive alleles for both traits based on published data.

**1.2. Soybean fatty acid composition biosynthesis and transcriptional regulation**

The accumulation of starch, lipid and protein supplied the raw materials and energy for soybean seed growth and maturity. Lipid was one of the three significant raw materials, although

Soybean oil and protein content were quantitative trait and effected by multiple genes and environments factors [2, 3]; there were over 312 soybean oil QTLs and 231 soybean protein QTLs having been detected by different population and environments (SoyBase, http://www. soybase.org), with the main mapping methods including the analysis of variance (ANOVA; [4]), interval mapping (IM; [5–7]), composite interval mapping (CIM; [8, 9]), multiple interval mapping (MIM; [10]) and inclusive composite interval mapping (ICIM; [11]). Among the published soybean oil content QTLs, some of them showed 'hot regions' that have been identified four or more times at the same or similar intervals in different studies, which include Gm05: 35.2–40.8 Mb, Gm09: 40.3–46.8 Mb, Gm12: 34.1–40.6 Mb, Gm14: 33.8–49.2 Mb, Gm15: 0.8–13.9 Mb, Gm18: 51.6–59.8 Mb, Gm19: 32.9–48.0 Mb and Gm20: 23.5–34.6 Mb [12]. For soybean protein content, there were also some 'hot regions' included Gm04: 43.6–47.7 Mb, Gm05: 39.7–41.4 Mb, Gm07: 4.2–9.6 Mb, Gm08: 5.8–10.2 Mb, Gm14: 4.8–9.6 Mb, Gm15: 0.0–7.5 Mb, Gm18: 47.9–54.0 Mb, Gm19: 35.5–42.1 Mb and Gm20: 2.1–34.2 Mb [13, 14]. Meta-analysis is a statistical method that could combine results from different sources in a single study [15]; it can increase QTL precision and validity by using mathematical models to refine the integration of QTLs [16] and have been performed in maize [17] and soybean [18] at the beginning of application. Meta-analysis method has also been employed to analyze the soybean oil and

However, soybean oil and protein content always showed the opposite relationship [21, 22], with the observation and data collections from many classical genetic analysis, the high oil variety with lower protein content and high protein variety with lower oil content [23]. And also, many classical genetic and breeding books or data noted the opposite relationship for soybean oil and protein content [2, 24–34]. Although it was very hard to find the locus which could increase soybean oil and protein content at the same time [35], based on the big amounts of QTL mapping results, few regions showed the same direction of contribution to soybean oil and protein content in the same genetic population. Orf et al. [36] mapped the additive QTL affected the soybean oil content at 39.5–41.2 Mb of Gm05 with the population crossed by Minsoy and Noir1, the results implied Minsoy bring the positive alleles for increasing soybean oil and protein content, however, Specht et al. [37] identified the similar region with the opposite results that Noir1 bring the positive alleles. Hyten et al. [38] identified a QTL at 4.8–8.7 Mb of Gm07 and the parent Williams bring the positive alleles for both traits. Reinprecht et al. [39] also demonstrated that the variety OX948 bring the positive alleles. Mao et al. [40] identified the additive QTLs affected the soybean oil and content at 51.2–56.3 Mb of Gm01, 1.0–2.3 Mb of Gm09 and 39.4–46.1 Mb of Gm19 in the cross population of Hefeng47 and Heinong37, which indicated that the soybean variety Heinong37 bring the positive alleles of those regions that could increase the soybean oil and protein content at the same time. Heinong37 was the only one Chinese variety, which may

Fatty acid composition were determined mainly by five fatty acids, palmitic (C16:0), stearic (C18:0), oleic (C18:1), linoleic (C18:2) and linolenic (C18:3) [67, 68]. Most palmitic acid (16:0) produced by the type II synthase is elongated to stearic acid (18:0) [67, 69]. In recent decades, there were many reports about the QTLs of each components of fatty acid, and there were also some 'hot regions' for soybean seed linoleic included Gm05 39.36–40.87 Mb and Gm18 48.35–50.78 Mb (with the original QTLs from Diers and Shoemaker [70]; Bachlava et al. [71]; Li et al. [65]; Xie et al. [72]); for soybean seed linolenic included Gm02 17.07–34.9 Mb, Gm09 34.56–37.74 Mb, Gm14 17.08–39.5 Mb and 45.68–46.78 Mb, Gm15 6.7–7.71 Mb, 13.07–25.6 Mb and Gm19 35.75–37.38 Mb (with the original QTLs from Li et al. [65], Bachlava et al. [71]; Diers and Shoemaker [70]; Spencer et al. [73]; Reinprecht et al. [39]; Xie et al. [72]; Shibata et al. [74]; Hyten et al. [38]); for soybean seed oleic included Gm05 39.07–40.80 Mb and Gm18 49.24– 51.95 Mb (with the original QTLs from Diers and Shoemaker [70]; Reinprecht et al. [39]; Xie et al. [72]); for soybean seed palmitic included Gm05 2.84–3.92 Mb, Gm09 7.74–11.83 Mb and 34.59–38.73 Mb, Gm15 9.13–13.16 Mb, Gm17 7.60–9.45 Mb and Gm18 38.38–41.09 Mb (with the original QTLs from Li et al. [75]; Wang et al. [76]; Xie et al. [72]; Hyten et al. [38]; Li et al. [65]; Kim et al. [77], Reinprecht et al. [39]). In soybean, stearoyl-acyl carrier protein desaturase (SAD) catalyzes the first step in seed oil biosynthesis, converting stearoyl-ACP to oleoyl-ACP, which plays a key role in determining the ratio of total saturated to unsaturated fatty acid in plants [35, 78, 79].Then, microsomal oleate desaturase (FAD2) and linoleoyl desaturase (FAD3) catalyze oleic to linoleic acid mainly in the sn-2 position, and then, fatty acid elongase converts fatty acids into a long-chain fatty acid [80]. The *FAD2* gene family of soybean was consisted of at least five members in four genome regions and was responsible for the conversion of oleic acid to linoleic acid [81–84]. The *FAD3* enzyme contributes to the synthesis of α-linolenic acids (18:3) in the polyunsaturated fatty acid pathway. To improve soybean oil quality, we aim at reducing the percentage of α-linolenic acids. *GmFAD3* mutant can reduce α-linolenic acid content in soybean seed oil, which has been verified in many studies [58, 85–87].

and glutelins (weak acid/weak base-soluble) [113, 114]. Globulin is the main component of SSP and can be classified into four groups according to different sedimentation coefficients, which are 2S (including trypsin inhibitors and cytochrome and other ingredients), 7S (β-conglycinin), 11S (glycinin) and 15S (polymer of glycinin) [115]. 7S and 11S are the main components of soybean seed storage protein, and they are accounting for 60–80% of the whole soybean seed storage protein [116–120]. Till now, about the genetic mechanisms of 7S and 11S, globulin subunits are clear in general [121–124]. β-conglycinin is accounting for roughly 30–40% of the total seed protein and is mainly composed of α-(76kD), α '-(72kD) and β-(53kD) subunits [125–127]. Glycinin is accounting for roughly 40–60% of the total seed protein and is mainly composed of G1, G2, G3, G4 and G5 subunits (approximately 56, 54, 54, 64 and 58 kD, respectively) [113, 118, 128]. In the past several years, few QTL mapping researches were conducted for soybean seed 7S and 11S; the QTL region of 11S includes Gm09 45.6–47.6 Mb and 103.7–105.8 Mb, Gm17 79–81 Mb, Gm19 55.1–57.1 Mb, Gm19 60.3–62.35 Mb and Gm20 81.7–83.7 Mb [129]; the QTL region of 7S includes one QTL of α'-7S located on Gm08 35.7– 37.7 Mb and nine QTLs of β-7S located on Gm01 65–104 Mb, Gm03 75.4–77.49 Mb, Gm17 26–81 Mb, Gm19 30–31 Mb, 100.7–115 Mb and Gm20 92–98 Mb [129, 130]. The genes of 11S and 7S have been reported, the genes of 11S subunit include *Gy1*, *Gy2*, *Gy3*, *Gy4*, *Gy5* and *Gy7* and the genes of the 7S subunit mainly include CG-alpha-1 (*7sα*), CG-alpha'-1 (*7sα'*) and CG-beta-1 (*7sβ*) [131–134]. Three genes encoding 11S, *AtCRU1*, *AtCRU2* and *AtCRU3*, have been verified in *Arabidopsis thaliana* [135]. Wang et al. [136] mapped a QTL *qBSC-1* (7S), which could regulate the SSP. Knockdown of 7S globulin subunits can change nitrogen content in transgenic soybean seeds [137]. Furthermore, the ratio of 11S to 7S is ranged from 0.5 to 1.7 among cultivar soybean and affects nutritional quality and functional properties of soybean seed storage protein directly [138, 139]. And also, it is amusing that the content of 7S and 11S are significantly negative correlation [140]. Yang et al. [141] demonstrated that the lack of 11S4A induced the compensatory accumulation of 7S globulins. By adjusting the subunit composition of soybean seed storage protein, it can remove sensitization protein efficiently; at the same time, it is an approach to improve the quality of the soy protein nutrition and

Soybean Breeding on Seed Composition Trait http://dx.doi.org/10.5772/intechopen.74353 27

Accumulation of soybean seed storage protein is always coupling with TAGs and some key transcription factors involved in the process [144]. B3-type transcription factors can act directly on the expression of SSP genes [145]. The B3 domain, identified as the DNA-binding motif, recognizes the RY motif (CATGCA) as the target sequence [146], and RY motif (CATGCA) is a cis-acting element as a seed-specific promoter, which is the most legume seed storage protein gene that contain one or more RY repeating elements [65, 128]. Several studies have shown that the binding of the *ABI3* with the RY motif can regulate the accumulation of storage proteins in Arabidopsis seeds [147–150]. The seed-specific B3 domain transcription factors, *LEC2*, *FUS3* and *ABI3*, have been identified, and the mutations of these genes often showed the negative accumulation of seed storage proteins [151–154]. In addition of *ABI3*, *ABI4* and *LEC1* also showed the interaction to regulate the SSP [96, 155]. Some previous studies showed that these genes affect the induction of storage protein gene expression directly [156–159]. Furthermore, expression *OLEOSIN* required activation of *LEC2* and two RY elements on its promoter [146]. Both *LEC1* and *LEC2* act as positive regulators upstream of *ABI3* and *FUS3*, function analysis showed influence on the expression of seed storage protein (SSP) genes [44,

production and processing [42, 103, 142, 143].

However, overexpression of a single gene of fatty acid synthesis does not significantly improve the fatty acid biosynthesis [88, 89]. Fatty acid synthesis is regulated by some major classical transcription factors coupling with seed development, including *WRINKLED1* (*WRI1*) *LEAFY COTYLEDON1* (*LEC1*), *LEC2*, *ABSCISIC ACID INSENSITIVE3* (*ABI3*), and *FUSCA3* (*FUS3*) [90–95] were the plant-specific B3 transcription factor family, *LEC1* was an NFY-B-type or CCAAT-binding factor-type transcription factor [96] and *WRI1* encodes a transcription factor of APETALA2-ethylene responsive element-binding protein (AP2-EREBP) family [90]. *WRI1* is a potential global regulator of *de novo* fatty acid biosynthesis that specifies the regulatory action of the direct target of *LEC2* [97]. Overexpression of the transcription factor *WRI1*, which controls the expression of genes involved in lipid metabolism, including glycolysis and fatty acid biosynthesis, increased seed oil content by 10–20% compared to the wild type [40, 90, 98–101]. *LEC1* function was partially dependent on *ABI3*, *FUS3* and *WRI1* in the regulation of fatty acid biosynthesis; both *LEC1* and *LEC1*-like genes were acted as key regulators to coordinate the expression of fatty acid biosynthetic genes [92]. *LEC2* can regulate *WRI1* directly and is necessary for the regulatory action of fatty acid metabolism [97]. Ectopic expression of *FUS3* can trigger the expression of fatty acid biosynthetic genes [41], and interaction of *FUS3* and *AKIN10* positively regulates auxin biosynthesis and indirectly regulates fatty acid biosynthesis [102]. Furthermore, few new soybean transcription factors have been identified for fatty acid biosynthesis in recent years, mainly including *GmbZIP123* regulates lipid accumulation indirectly through the sugar translocation [103]; *GmMYB73* was functioned as a repressor for negative regulator *GLABRA2 (GL2)* [104] and relieved *GL2*-inhibited expression of *PLDα1* to accelerate conversion of phosphatidylcholine to TAG [43]; *GmZF351* will improve oil accumulation by directly activating *WRI1*, *BCCP2*, *KASIII*, *TAG1* and *OLEO2* [104]; *GmNFYA* has been identified to increase seed oil content based on RNA-seq and gene coexpression networks [46] and *GmDOF4* and *GmDOF11* can increase lipid content in seeds by direct activation of lipid biosynthesis genes [41, 105]. In recent, regulatory mechanisms of seed oil content have been updated by duplicated genes in soybean [106].

In addition, other transcription factors have been identified to affect oil content in Arabidopsis, including *GL2*, *TT1*, *TT2*, *bZIP67*, *MED*, *MYB* [58, 107, 108] and *BASS2* [43, 107–112].

## **1.3. Soybean seed storage protein (SSP) and transcriptional regulation**

Soybean seed storage proteins (SSP) have been identified and classified into four basic categories, including albumins (water-soluble), globulins (salt-soluble), prolamins (alcohol-soluble) and glutelins (weak acid/weak base-soluble) [113, 114]. Globulin is the main component of SSP and can be classified into four groups according to different sedimentation coefficients, which are 2S (including trypsin inhibitors and cytochrome and other ingredients), 7S (β-conglycinin), 11S (glycinin) and 15S (polymer of glycinin) [115]. 7S and 11S are the main components of soybean seed storage protein, and they are accounting for 60–80% of the whole soybean seed storage protein [116–120]. Till now, about the genetic mechanisms of 7S and 11S, globulin subunits are clear in general [121–124]. β-conglycinin is accounting for roughly 30–40% of the total seed protein and is mainly composed of α-(76kD), α '-(72kD) and β-(53kD) subunits [125–127]. Glycinin is accounting for roughly 40–60% of the total seed protein and is mainly composed of G1, G2, G3, G4 and G5 subunits (approximately 56, 54, 54, 64 and 58 kD, respectively) [113, 118, 128]. In the past several years, few QTL mapping researches were conducted for soybean seed 7S and 11S; the QTL region of 11S includes Gm09 45.6–47.6 Mb and 103.7–105.8 Mb, Gm17 79–81 Mb, Gm19 55.1–57.1 Mb, Gm19 60.3–62.35 Mb and Gm20 81.7–83.7 Mb [129]; the QTL region of 7S includes one QTL of α'-7S located on Gm08 35.7– 37.7 Mb and nine QTLs of β-7S located on Gm01 65–104 Mb, Gm03 75.4–77.49 Mb, Gm17 26–81 Mb, Gm19 30–31 Mb, 100.7–115 Mb and Gm20 92–98 Mb [129, 130]. The genes of 11S and 7S have been reported, the genes of 11S subunit include *Gy1*, *Gy2*, *Gy3*, *Gy4*, *Gy5* and *Gy7* and the genes of the 7S subunit mainly include CG-alpha-1 (*7sα*), CG-alpha'-1 (*7sα'*) and CG-beta-1 (*7sβ*) [131–134]. Three genes encoding 11S, *AtCRU1*, *AtCRU2* and *AtCRU3*, have been verified in *Arabidopsis thaliana* [135]. Wang et al. [136] mapped a QTL *qBSC-1* (7S), which could regulate the SSP. Knockdown of 7S globulin subunits can change nitrogen content in transgenic soybean seeds [137]. Furthermore, the ratio of 11S to 7S is ranged from 0.5 to 1.7 among cultivar soybean and affects nutritional quality and functional properties of soybean seed storage protein directly [138, 139]. And also, it is amusing that the content of 7S and 11S are significantly negative correlation [140]. Yang et al. [141] demonstrated that the lack of 11S4A induced the compensatory accumulation of 7S globulins. By adjusting the subunit composition of soybean seed storage protein, it can remove sensitization protein efficiently; at the same time, it is an approach to improve the quality of the soy protein nutrition and production and processing [42, 103, 142, 143].

(SAD) catalyzes the first step in seed oil biosynthesis, converting stearoyl-ACP to oleoyl-ACP, which plays a key role in determining the ratio of total saturated to unsaturated fatty acid in plants [35, 78, 79].Then, microsomal oleate desaturase (FAD2) and linoleoyl desaturase (FAD3) catalyze oleic to linoleic acid mainly in the sn-2 position, and then, fatty acid elongase converts fatty acids into a long-chain fatty acid [80]. The *FAD2* gene family of soybean was consisted of at least five members in four genome regions and was responsible for the conversion of oleic acid to linoleic acid [81–84]. The *FAD3* enzyme contributes to the synthesis of α-linolenic acids (18:3) in the polyunsaturated fatty acid pathway. To improve soybean oil quality, we aim at reducing the percentage of α-linolenic acids. *GmFAD3* mutant can reduce α-linolenic acid con-

However, overexpression of a single gene of fatty acid synthesis does not significantly improve the fatty acid biosynthesis [88, 89]. Fatty acid synthesis is regulated by some major classical transcription factors coupling with seed development, including *WRINKLED1* (*WRI1*) *LEAFY COTYLEDON1* (*LEC1*), *LEC2*, *ABSCISIC ACID INSENSITIVE3* (*ABI3*), and *FUSCA3* (*FUS3*) [90–95] were the plant-specific B3 transcription factor family, *LEC1* was an NFY-B-type or CCAAT-binding factor-type transcription factor [96] and *WRI1* encodes a transcription factor of APETALA2-ethylene responsive element-binding protein (AP2-EREBP) family [90]. *WRI1* is a potential global regulator of *de novo* fatty acid biosynthesis that specifies the regulatory action of the direct target of *LEC2* [97]. Overexpression of the transcription factor *WRI1*, which controls the expression of genes involved in lipid metabolism, including glycolysis and fatty acid biosynthesis, increased seed oil content by 10–20% compared to the wild type [40, 90, 98–101]. *LEC1* function was partially dependent on *ABI3*, *FUS3* and *WRI1* in the regulation of fatty acid biosynthesis; both *LEC1* and *LEC1*-like genes were acted as key regulators to coordinate the expression of fatty acid biosynthetic genes [92]. *LEC2* can regulate *WRI1* directly and is necessary for the regulatory action of fatty acid metabolism [97]. Ectopic expression of *FUS3* can trigger the expression of fatty acid biosynthetic genes [41], and interaction of *FUS3* and *AKIN10* positively regulates auxin biosynthesis and indirectly regulates fatty acid biosynthesis [102]. Furthermore, few new soybean transcription factors have been identified for fatty acid biosynthesis in recent years, mainly including *GmbZIP123* regulates lipid accumulation indirectly through the sugar translocation [103]; *GmMYB73* was functioned as a repressor for negative regulator *GLABRA2 (GL2)* [104] and relieved *GL2*-inhibited expression of *PLDα1* to accelerate conversion of phosphatidylcholine to TAG [43]; *GmZF351* will improve oil accumulation by directly activating *WRI1*, *BCCP2*, *KASIII*, *TAG1* and *OLEO2* [104]; *GmNFYA* has been identified to increase seed oil content based on RNA-seq and gene coexpression networks [46] and *GmDOF4* and *GmDOF11* can increase lipid content in seeds by direct activation of lipid biosynthesis genes [41, 105]. In recent, regulatory mechanisms of seed oil content have been

In addition, other transcription factors have been identified to affect oil content in Arabidopsis,

Soybean seed storage proteins (SSP) have been identified and classified into four basic categories, including albumins (water-soluble), globulins (salt-soluble), prolamins (alcohol-soluble)

including *GL2*, *TT1*, *TT2*, *bZIP67*, *MED*, *MYB* [58, 107, 108] and *BASS2* [43, 107–112].

**1.3. Soybean seed storage protein (SSP) and transcriptional regulation**

tent in soybean seed oil, which has been verified in many studies [58, 85–87].

26 Next Generation Plant Breeding

updated by duplicated genes in soybean [106].

Accumulation of soybean seed storage protein is always coupling with TAGs and some key transcription factors involved in the process [144]. B3-type transcription factors can act directly on the expression of SSP genes [145]. The B3 domain, identified as the DNA-binding motif, recognizes the RY motif (CATGCA) as the target sequence [146], and RY motif (CATGCA) is a cis-acting element as a seed-specific promoter, which is the most legume seed storage protein gene that contain one or more RY repeating elements [65, 128]. Several studies have shown that the binding of the *ABI3* with the RY motif can regulate the accumulation of storage proteins in Arabidopsis seeds [147–150]. The seed-specific B3 domain transcription factors, *LEC2*, *FUS3* and *ABI3*, have been identified, and the mutations of these genes often showed the negative accumulation of seed storage proteins [151–154]. In addition of *ABI3*, *ABI4* and *LEC1* also showed the interaction to regulate the SSP [96, 155]. Some previous studies showed that these genes affect the induction of storage protein gene expression directly [156–159]. Furthermore, expression *OLEOSIN* required activation of *LEC2* and two RY elements on its promoter [146]. Both *LEC1* and *LEC2* act as positive regulators upstream of *ABI3* and *FUS3*, function analysis showed influence on the expression of seed storage protein (SSP) genes [44, 153, 158, 160, 161]. *LEC1* and *L1L* can active the promoter of *CRUCIFERIN C* (*CRC*), and *LEC1* can also regulate *CRC* and other SSP genes working with *FUS3* and *ABI3* [161]. In addition to RY motifs, the presence of G-Box elements is also proper activation of target promoters of *LEC1*, *LEC2*, *ABI3* and *FUS3* [162]. Some studies showed that *LEC2*, *ABI3* and *FUS3* collaborate with *bZIPs* TFs that interact with these G-Box elements to activate SSP genes [163, 164]. Furthermore, *GmDOF4* and *GmDOF11* can bind with the promoter of *CRA1* to regulate the expression of SSP [41]. *GmDREBL* can be upregulated by *GmABI3* and *GmABI5* and be regulated by the late stage of SSP genes [44]. *DGAT* can reduce the soluble carbohydrate content of mature seeds and increase the seed protein content at the same time [165]. Therefore, in addition to *WAR1*, *LEC1*, *LEC2*, *ABI3* and *FUS3*, transcription factors of *MYB*, *bZIP*, *MADS*, *DOF* or *AP2* families are also involved in the accumulation of storage compounds (oil and SSPs) and seed development regulatory network, as partners or direct target genes [162].

soybean miR15/49 in soybean cotyledons were further demonstrated [170]. Ye et al. identified and analyzed the whole genome of miRNA endogenous target gene mimic (eTM) and the phagemid-generated siRNA (PHAS) in soybean, with a focus on lipid metabolismrelated genes. Lipid metabolism was found to be regulated by a potentially complex noncoding network in soybean, of which 28 may be miRNA-regulated and nine may be further

Soybean Breeding on Seed Composition Trait http://dx.doi.org/10.5772/intechopen.74353 29

As sequencing development of soybean genome, the cultivar Williams 82 genome has been released by Schmutz et al. [172], and it update the quality of assembly of the reference genome year by year. In present version (*Glycine max Wm82.a2.v1*), 56,044 protein-coding loci and 88,647 transcripts have been predicted, and all related data have been released in Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org\_Gmax). At the basis of the reference genome, around 265 cultivated soybean varieties, 92 wild soybean varieties and 10 semi-wild soybean varieties have been resequenced; these information give a foundation for functional genomic analyses such as transcriptomic, proteomic, epigenomic and non-coding

Although many genes and regulators of seed oil content and SSP have been identified and their associated regulatory networks have been well studied in Arabidopsis, there are still unclear in soybean in addition to *WAR1*, *LEC1*, *LEC2*, *ABI3* and *FUS3* due to the 75% duplication genome [172]. Combination and application of multiple omics (genomics, functional genomics, transcriptomic, proteomics and epigenomics) and advanced biotechnology (genome editing) needed to clarify the soybean seed oil content and SSP gene and regulatory network. Secondary population including recombinant heterozygous lines (RHL), chromosome segment substitution line (CSSL) and/or near isogenic lines (NIL) need to be applied to reduce the variable for analyzing the effects of single gene or transcription factors and used to identify the effective alleles and evaluate its effects and contribution. Combination of general loci could be further used for design of selection chip assay, which may lead to the foundation

This study was supported by the National Key R&D Program of China (2016 YFD0100500, 2016YFD0100300, 2016YFD0100201-21), the National Natural Science Foundation of China (31701449, 31471516, 31401465, 31400074, 31501332), the Natural Science Foundation of Heilongjiang (QC2017013), the Young Innovative Talent training plan of undergraduate colleges and universities in Heilongjiang province (UNPYSCT-2016144), special financial aid to post-doctor research fellow in Heilongjiang (To Qi Zhaoming), the Heilongjiang Funds for Distinguished Young Scientists (JC2016004) and the Outstanding Academic Leaders

regulated [171].

RNA analyses [173].

**Acknowledgements**

**2. Conclusion and perspectives**

of high oil or high seed storage protein breeding.
