**3.1 Positional cloning of the responsible gene for the** *E2* **locus**

The line RIL6-8 was found to be heterozygous for the *FT2* locus and was designated as RHL6-8 (Fig. 4). DNA marker analysis showed that RHL6-8 harbored a heterozygous region covering approximately 10 cM including the *FT2* locus. The RHL6-8 generated NILs6-8-*FT2* and –*ft2* among its progenies. Using BSA, a polymorphic AFLP marker, E7M19, was detected between the early-flowering bulk and late-flowering bulk derived from the progeny of RHL6-8. This marker was located close to the LOD peak position of the QTL assigned *FT2* (Fig. 5). We developed additional DNA markers tightly linked to the *FT2* locus using NILs6-8. Among the products amplified from all possible 4,096 primer pair combinations, only five polymorphic bands showed constant polymorphism between the contrasting genotypes of *FT2*/*FT2* and *ft2*/*ft2* in NILs6-8. These polymorphic bands were excised from the gel, sequenced and converted to SCAR markers. Three SCAR markers, originating from five AFLP bands, were developed and used for screening of 10 BAC clones from two independent BAC libraries. A contig covering the *FT2* region was constructed based on the results of PCR analysis using the BAC end sequences. Five of the 10 BAC clones were then subjected to shotgun sequence analysis. Each BAC clone was separately analyzed and assembled, and the sequence information then combined using overlapping sequences. The total length covered by the five clones was approximately 430 Kb. A total of three DNA markers, including one AFLP-derived marker (marker 2) and two PCR-based markers developed from BAC sequences (markers 1 and 3), were used in the fine mapping to minutely restrict the *FT2* locus (Table 1). The positions of these markers are shown in Fig. 6.

Positional Cloning of the Responsible Genes for Maturity Loci *E1*, *E2* and *E3* in Soybean 57

Misuzudaizu homozygous allele Moshidou Gong 503 homozygous allele

114F08RV

39C03RV

319A04M4

AFLP\_E60M38

*FT2* **locus**

300H01\_02SSR

AFLP\_E37M31

35C13M4\_MspI 300H01RV 300H01\_77SSR

AFLP\_E37M47\_27

Heterozygous allele Not determined

Satt358

Gly-3

GM214b

Satt385 Satt173

Satt478

Satt477

Satt592 Satt331 Satt581 E7M19 K138GA26 Satt153 Sat\_307 GM097

Fig. 5. QTL analysis for the *FT2* locus in the RIL population. The LOD scores for the *FT2* locus calculated by composite interval mapping and displayed in the left panel. DNA

A population consisting of 888 plants, derived from several RHL6-8 plants, was used for fine mapping of the *FT2* locus. Recombination between in this region was found in 21 plants among 843 plants. The remaining 45 individuals were omitted from the analysis because of missing data for phenotypes or genotypes. The number of *FT2* homozygous late-flowering genotypes (n=213), heterozygous (n=420), and *ft2* homozygous early-flowering genotypes (n=210) fitted well with a 1: 2: 1 segregation ratio. The additive effect and dominant effect of

markers closely linked to the *FT2* locus are shown in the right panel.

A1 A2 B1 B2 C1 C2 D1a D1b D2 E F G H I J K L M N O

Fig. 4. Graphical genotype of RHL6-8. Solid bars, open bars and meshed bar indicate Misuzudaizu homozygous, Moshidou Gong homozygous genotypes and heterozygous

LOD score

20 10 0

RHL6-8

genotype, respectively.

*FT2(E2)*

Position 123.3cM LOD score 21.0 Additive effect -4.3day

10cM

Fig. 3. QTLs for flowering time identified in the RIL population. PVE: phenotypic variance explained by each QTL.

Satt404

C2 A121

D1aSatt071

D1bSat\_096 A481 A725 Satt216 CENa Satt157 Satt296 Satt266 A747a Satt290 Satt611

D2 GM209a Satt006 GM060

<sup>E</sup> Q015 Satt384 Satt212

> A374 A203 A078 A975

A086 A454 Sat\_380 E7M19mo700 E6M22co400 Satt263 GM252 B212 GMS008 A136 A711 Satt231

GM183 A124 Satt135 Satt372

Satt665 Satt669 Satt397 GM128 E4M21mi420 Satt355 A469 Satt301 Satt310 Asnase

A230

*FT3*: PVE -5%

*FT1*: PVE 60-70%

*FT2*: PVE 10-12%

N A071a

> A006 GM073

O Satt358

SDS.A5

GM214b

Satt385 Satt173

Satt478

Satt477

Satt592 Satt331 Satt581 E7M19Mo190D K138GA26 Satt153

Sat033

Satt549 Q006 Satt237 Q026 Sat\_304 M6\_31

K494b GM134 GM120b Satt410 E3M26co350

A593 A516 A234 Satt703

Satt202 B142

Sat289b

GM085b

Satt370 A747c A315 Satt179 Q008b B214a A691

A841

A398

A122 A059

Satt432 Satt281 Satt520 E4M21co340 Satt422 A655 Satt457

A063 Sat\_207a A635 Satt363 Satt286 Satt277 T Satt365 Satt557 Satt658 GmN93 Satt489 Satt100 E8M18Mo1000A Satt708 Sat\_238 A538a A748 osppcr A703a GM118 A676 Satt357

A1 A487 Satt276 A2 A170 A104 E8M18mo250 B1 K417a K069b A588

B2 A685b

E4M23mo400 A343a A352 E16M23mo150 C1 Sct186 AC47 GM192 Q028 A130 K300

k138CT25

Sat\_322 GM063 Satt399 Satt357 E4M21mo250 E7M19mo160 Satt476 Satt294 Sat\_207b Sat\_311 Sct\_191

Satt164 GM222a

Satt168 A584 Sat\_362 E14M18mi120 K390b A636 B124b E12M24co470

A519 Satt527 B221

Satt251

A132b Satt197 A520

Satt597 GM199 Satt430 Satt332 E12M24Mi150BD Satt415

Satt359

Satt453 A567 E8M31co760

Satt589 A085

seedcol Satt187 A486 A111

Sat115 Satt377

A117 A096 Satt327 Satt329 A690 E8M18mi2000 GM026

A505 E8M31mi1500

K644b Satt409

Satt378 GMS116

> H pep7 A858a

> > A685a A131 A703b

I Satt587

> GM222b Satt367 Satt127 Satt239 GM106 GM126 Satt354

J AG45 E4M23mi100 Q002 A459 K102 Sct\_065 GM087a GM168 A538c B122 A523 CENb K375

K GM035b

> Sat119 PHYBa

L A257 A169

> GMS018 GM251 E16M23co1200 GM017 E3M26mi260 B124a B046a GM041 Satt156 GM267 Sat\_113 Satt448 Satt166

M GM035a Satt308 GM209b

> Satt210 Satt346 Satt250

> A715 Satt702 Satt323 Satt626 A946 Satt463 GM260 Satt435

Sat289a Sat\_316 Satt201 Satt636 Satt404

Dt1 Sat\_286 Sat\_184 Satt664 A489 Satt229 GM043 E6M22mi400 GM120a Satt513 Satt373 Sat\_245

A656 GmDGAT Satt518 Satt381 E4M21mi900 GM055b E7M19mi200 Satt673 GM195

K003 A668c A235 GM099 BCCP K390a

Fig. 3. QTLs for flowering time identified in the RIL population. PVE: phenotypic variance

GM225 arpha

Satt162 Satt148 Q020

A878

A036 A262 A132a A381 Satt666 A668a

Satt235 Satt324 K069 A112 E12M24mi260 Satt594 Satt427 A458a A638 cta A073 A404a Satt503

Satt472 Sat117 A378 A586 A681

explained by each QTL.

G E8M18co200 Satt038 B053 Satt309

F Satt395

> GM072 Satt554

> Satt072

K644a A186 GM233 E12M24Mi550D E14M18M230D FLOWER.C GM162 GM022 GM009 Satt146 GMS101

Satt155 Satt356 K400 Satt619 Satt545 AG50 K636 Satt599 GM254 Satt174 bc Satt236 GMS016

Fig. 4. Graphical genotype of RHL6-8. Solid bars, open bars and meshed bar indicate Misuzudaizu homozygous, Moshidou Gong homozygous genotypes and heterozygous genotype, respectively.

Fig. 5. QTL analysis for the *FT2* locus in the RIL population. The LOD scores for the *FT2* locus calculated by composite interval mapping and displayed in the left panel. DNA markers closely linked to the *FT2* locus are shown in the right panel.

A population consisting of 888 plants, derived from several RHL6-8 plants, was used for fine mapping of the *FT2* locus. Recombination between in this region was found in 21 plants among 843 plants. The remaining 45 individuals were omitted from the analysis because of missing data for phenotypes or genotypes. The number of *FT2* homozygous late-flowering genotypes (n=213), heterozygous (n=420), and *ft2* homozygous early-flowering genotypes (n=210) fitted well with a 1: 2: 1 segregation ratio. The additive effect and dominant effect of

Positional Cloning of the Responsible Genes for Maturity Loci *E1*, *E2* and *E3* in Soybean 59

considered a strong candidate for the *FT2* locus. We isolated the complete predicted coding region using an RNA sample extracted from leaves of NILs6-8-*FT2*. We refer to this gene as *GmGIa*, since another *GI* gene, *GmGIb*, was also obtained from the same RNA sample. The coding sequence of *GmGIa*-Mo from Moshidou Gong 503 was extended to a 20Kb genomic region and contained 14 exons (Fig. 7A). Marker 2, which cosegragated with the *FT2* genotypes and originated from the AFLP marker, E60M38, was located in the 5th intron (Fig. 7). Compared to *GmGIa*-Mo, the Misuzudaizu early flowering allele, *GmGIa*-Mi, showed four single nucleotide polymorphisms (SNPs) in its coding sequence. One of these SNPs, detected in the 10th exon, introduced a premature stop codon mutation that led to a truncated 521 amino acids GI protein in the *GmGIa*-Mi allele (Fig. 7B). This stop codon mutation was considered a candidate for a functional nucleotide polymorphism in *GmGIa*. A derived amplified polymorphic sequence (dCAPs) marker was developed to examine the identity of this stop codon mutation in other NILs originating from Harosoy (*e2*/*e2*). The genotypes of all NILs tested coincided well with the genotype of this diagnostic dCAPs marker. This result indicated that the responsible gene for the *FT2* and *E2* loci was identical to each other, and that a conserved mutation might have caused the early flowering phenotype in the recessive alleles. To validate the significance of the mutation in the *GmGIa*, we screened a mutant line from X-ray irradiated and ethyl methanesulfonate (EMS) treated libraries by targeting-induced local lesions in genomes (TILLING) (McCallum et al., 2000). The sequence of *GmGIa* in the wild type Bay cultivar was completely identical to that of the *E2* allele. One mutant line harboring a deletion in the 10th exon that caused a truncated protein (735 amino acids) (Fig. 7B) showed a significant earlier (8days) flowering phenotype than the wild type under natural day-length conditions. These results indicate that *GmGIa* is

25Kb 30Kb 35Kb 40Kb 45Kb

*ft2* **(***e2***)**

in *ft2* (*e2*) and the mutant allele (*E2*-mut) are indicated by the solid triangles.

Fig. 7. Variation of gene structure of *GmGIa*. A: Exons, a part of the 3'UTR, and introns of the *GmGIa* gene in the 24-45 Kb region of MiB3300H01 are indicated by bold boxes, open boxes and lines, respectively. The location of marker 2, originating from AFLP marker E60M38, is presented in the 5th intron by the gray box. B: The truncated sites of amino acid sequences

521 aa

*E2***-mut**

735 aa

AFLP E60M38

> **stop codon** 1170 aa

3758bp

the gene responsible for the *E2* locus.

24-45Kbp (start codon 24257bp)

BAC MiB300H01

**A**

**B**

GmGIa\_CDS

this QTL were estimated to be -5.17 days and 0.57 days, respectively. The ratio of genetic variance explained by the *FT2* locus accounted for 87.9 % of the total variance, indicating that the variation observed in this population was largely controlled by the single QTL effect. The genotypes of the selected 3 markers and flowering genotypes confirmed by progeny test are shown in Fig. 6. The genotypes of marker 2 cosegregated with flowering genotypes indicating that the QTL was close to this marker. Among the recombinants, line 6-8\_501 rec had a recombination point between marker 1 and marker 2. Another lines, 6-8\_452rec\_A, 528rec\_B and 6-8\_120 rec, generated a recombination between marker 2 and marker 3. Marker 1 and 3 originated from the end sequences of a BAC clone MiB300H01. Considering the recombination points in each line and their flowering genotypes, this indicated that the *FT2* locus was restricted to the single BAC clone, MiB300H01. To identify the responsible gene for this QTL, the nucleotide sequence of this BAC clone was determined.


a) The digestion w ith the restriction enzyme *Eco*RI w as needed to detect polymorphism.

b) Physical position at Gm10 in Glyma1.0 (http://w w w .phytozome.net/).

Table 1. List of DNA markers used for fine mapping of the *FT2* locus.

Fig. 6. Fine mapping of the *FT2* locus. The genotypes of each DNA marker of recombinants are shown in the left panel and segregation of flowering in the progenies is displayed in the right panel at the bottom of the figure. The interquatile region, median, and range are indicated by a box, vertical line, and horizontal line, respectively.

In the 94 Kb sequence of MiB300H01, nine annotated genes were predicted. One of these genes, Glyma 10g36600 (assigned in phytozome ver. Glyma 1.0 http:// www.phytozome.net/), with a high level of similarity to *GIGANTEA* (*GI*) gene, was

this QTL were estimated to be -5.17 days and 0.57 days, respectively. The ratio of genetic variance explained by the *FT2* locus accounted for 87.9 % of the total variance, indicating that the variation observed in this population was largely controlled by the single QTL effect. The genotypes of the selected 3 markers and flowering genotypes confirmed by progeny test are shown in Fig. 6. The genotypes of marker 2 cosegregated with flowering genotypes indicating that the QTL was close to this marker. Among the recombinants, line 6-8\_501 rec had a recombination point between marker 1 and marker 2. Another lines, 6-8\_452rec\_A, 528rec\_B and 6-8\_120 rec, generated a recombination between marker 2 and marker 3. Marker 1 and 3 originated from the end sequences of a BAC clone MiB300H01. Considering the recombination points in each line and their flowering genotypes, this indicated that the *FT2* locus was restricted to the single BAC clone, MiB300H01. To identify the responsible gene for this QTL,

Marker 1 BAC end GMJMiB300H01RV Fw CATAGCCGACCTTCTCCAAA 44,787,669

Marker 3 BAC end GMJMiB300H01fw Fw GAGAGCAGGGTTATTGGATGA 44,696,157

AFLP(SCAR) E60M38 Fw CAGTGTTCGCCAGGCTTAGT 44,726,500

the nucleotide sequence of this BAC clone was determined.

Marker name Type of marker Clone name Direction Sequence (5'-3')

Table 1. List of DNA markers used for fine mapping of the *FT2* locus.

35C13M4\_MspI 319A04M4

AFLP\_E37M47\_27 AFLP\_E60M38

b) Physical position at Gm10 in Glyma1.0 (http://w w w .phytozome.net/).

a) The digestion w ith the restriction enzyme *Eco*RI w as needed to detect polymorphism.

6-8\_04h178A 6-8\_501rec 6-8\_04h174B

6-8\_528rec\_B 6-8\_452rec\_A 6-8\_120rec

DNA marker

Marker 2 a

Physical contigs

Genetic mapping

*FT2 (E2)* locus

indicated by a box, vertical line, and horizontal line, respectively.

Marker 1 Marker 2 Marker 3

GMJMiB300H01 GMJMiB319A04 GM\_WBb35C13 GMJMiB039C03

75 80 85 90

Fig. 6. Fine mapping of the *FT2* locus. The genotypes of each DNA marker of recombinants are shown in the left panel and segregation of flowering in the progenies is displayed in the right panel at the bottom of the figure. The interquatile region, median, and range are

In the 94 Kb sequence of MiB300H01, nine annotated genes were predicted. One of these genes, Glyma 10g36600 (assigned in phytozome ver. Glyma 1.0 http:// www.phytozome.net/), with a high level of similarity to *GIGANTEA* (*GI*) gene, was

Days to flowering

Flowering segregation in progeny

114F08RV

100Kbp

Glyma1.0 (Gm10) b

Misuzudaizu homozygous allele Moshidou Gong 503 homozygous allele

Heterozygous allele

AFLP\_E37M31

Rv AGCCCAATATGGCAGCATAC 44,787,287

Rv GCTTGGGTAAACATCCCAAA 44,726,011

Rv GCCACTGTGCCACATTACAC 44,696,810

GM\_WBb225N14

considered a strong candidate for the *FT2* locus. We isolated the complete predicted coding region using an RNA sample extracted from leaves of NILs6-8-*FT2*. We refer to this gene as *GmGIa*, since another *GI* gene, *GmGIb*, was also obtained from the same RNA sample. The coding sequence of *GmGIa*-Mo from Moshidou Gong 503 was extended to a 20Kb genomic region and contained 14 exons (Fig. 7A). Marker 2, which cosegragated with the *FT2* genotypes and originated from the AFLP marker, E60M38, was located in the 5th intron (Fig. 7). Compared to *GmGIa*-Mo, the Misuzudaizu early flowering allele, *GmGIa*-Mi, showed four single nucleotide polymorphisms (SNPs) in its coding sequence. One of these SNPs, detected in the 10th exon, introduced a premature stop codon mutation that led to a truncated 521 amino acids GI protein in the *GmGIa*-Mi allele (Fig. 7B). This stop codon mutation was considered a candidate for a functional nucleotide polymorphism in *GmGIa*. A derived amplified polymorphic sequence (dCAPs) marker was developed to examine the identity of this stop codon mutation in other NILs originating from Harosoy (*e2*/*e2*). The genotypes of all NILs tested coincided well with the genotype of this diagnostic dCAPs marker. This result indicated that the responsible gene for the *FT2* and *E2* loci was identical to each other, and that a conserved mutation might have caused the early flowering phenotype in the recessive alleles. To validate the significance of the mutation in the *GmGIa*, we screened a mutant line from X-ray irradiated and ethyl methanesulfonate (EMS) treated libraries by targeting-induced local lesions in genomes (TILLING) (McCallum et al., 2000). The sequence of *GmGIa* in the wild type Bay cultivar was completely identical to that of the *E2* allele. One mutant line harboring a deletion in the 10th exon that caused a truncated protein (735 amino acids) (Fig. 7B) showed a significant earlier (8days) flowering phenotype than the wild type under natural day-length conditions. These results indicate that *GmGIa* is the gene responsible for the *E2* locus.

Fig. 7. Variation of gene structure of *GmGIa*. A: Exons, a part of the 3'UTR, and introns of the *GmGIa* gene in the 24-45 Kb region of MiB3300H01 are indicated by bold boxes, open boxes and lines, respectively. The location of marker 2, originating from AFLP marker E60M38, is presented in the 5th intron by the gray box. B: The truncated sites of amino acid sequences in *ft2* (*e2*) and the mutant allele (*E2*-mut) are indicated by the solid triangles.

Positional Cloning of the Responsible Genes for Maturity Loci *E1*, *E2* and *E3* in Soybean 61

As a result of marker analysis, the heterozygous region in RHL1-146 extended for about 5 cM including the *FT3* locus. In contrast, the heterozygous region in RHL6-22 extended for about 40 cM including the *FT3* and *Dt1* loci. Two groups of NILs, NILs1-146 and NILs6-22, were used to develop the AFLP markers tightly linked to the *FT3* locus. Of all possible 4096 primer pairs, only six fragments showed constant polymorphism between the genotypes of *FT3*/*FT3* and *ft3*/*ft3* in NILs1-146 and NILs6-22. These polymorphic bands were excised from the gel, then sequenced, and converted to codominant SCAR markers. Several BAC and transformation-competent bacterial artificial chromosome (TAC) clones were screened using the SCAR markers. The nucleotide sequences of a BAC clone, GMJMiB242F01, and a TAC clone, GM\_TMiH\_H17D12, were determined. These BAC/TAC sequences were used to develop new PCR-based markers. A total of six DNA markers, including three AFLPderived markers (markers 1, 3, and 6) and three PCR-based markers developed from the BAC/TAC sequences (markers 2, 4, and 5) were used for fine mapping of the *FT3* locus

A population of 897 plants derived from seven RHL1-146 plants was used for precise mapping of the *FT3* locus. No recombination between these markers was found in 883 plants. The numbers of *FT3* homozygous late-flowering genotype (n=208) and heterozygous (n=441) and *ft3* homozygous early flowering genotypes (n=234) fitted a 1: 2: 1 segregation ratio. These results suggested the presence of a single QTL for flowering time within a small heterozygous region in RHL1-146. The additive effect and the dominant effect of this QTL were estimated to be 3.0 and 0.98 days, respectively. The ratio of genetic variance explained by the *FT3* locus accounted for 70.7 % of the total variance. On the other hand, 14 plants showed recombination between these markers (Fig. 9) and the recombination points were determined by the genotype of markers 2-5. The *FT3* genotypes in each recombinant completely coincided with the genotypes of marker 3 that originated from the closest AFLP marker E6M22 to the LOD peak position (Fig. 8). Moreover, recombination points occurred on both sides of marker 3 and corresponded to both sides of the TAC clone, GM\_TMiH\_H17D12. These results suggested that the gene responsible for the *FT3* locus was

restricted to the physical region covered by GM\_TMiH\_H17D12 (Fig. 9).

Table 2. List of DNA markers used for fine mapping of the *FT3* locus.

(Table 2).
