**2. Strategy for fine mapping and positional cloning**

52 Soybean – Genetics and Novel Techniques for Yield Enhancement

functions as a transcription factor (Putterill et al., 1995), and GI is a large protein involved in circadian clock function (Fowler et al., 1999; Park et al., 1999). FT is a small protein with some resemblance to RAF kinase inhibitors (Kardailsky et al., 1999; Kobayashi et al., 1999) that is produced in leaves and moves to the SAM (Corbesier et al., 2007; Jaeger & Wigge, 2007; Mathieu et al., 2007; Tamaki et al., 2007; Notaguchi et al., 2008). The rice orthologs of *Arabidopsis CO* and *FT* genes, *Heading date 1* (*HD1*) and *Heading date 3a* (*Hd3a*), respectively, have been identified (Yano et al., 2000; Kojima et al., 2002; Hayama et al., 2003). The promotion of flowering in *Arabidopsis* in LD conditions results from activation of *FT* by *CO*, while the delay in flowering in rice in LD conditions results from repression of *Hd3a* by *Hd1* (Izawa et al., 2000; Kojima et al., 2002; Roden et al., 2002; Hayama et al., 2003). A *CO*/*FT* module is likely to be conserved throughout the plant kingdom. CYCLING DOF FACTORS (CDFs) exhibit circadian cycling and bind to *CO* promoter and repress *CO* expression. The abundance of CDFs is controlled by FLAVIN-BINDING, KELCH REPEAT, F-BOX PROTEIN1 (FKF1) that appears to be involved in the ubiquitin-mediated degradation of CDFs. GI protein physically interacts with FKF1 and stabilizes it promoting CDF degradation and subsequent *CO* expression (Imaizumi et al., 2005.; Sawa et al., 2007; Fornara and Coupland, 2009; Imaizumi, 2009). Despite the conserved functions of *FT* orthologs, their expression may be controlled by different systems in different species. Non-*CO*/*FT* pathways have been proposed for several plants, such as morning glory (*Pharbitis nil*) (Hayama et al., 2007) and tomato (Ben-Naim et al., 2006; Lifschitz et al., 2006). In rice, *Early heading date 1* (*Ehd1*) has been found to promote flowering by inducing *FT*-like gene expression only under SD conditions independently of *Hd1* (Doi et al., 2004). There is no

Soybean is a typical SD plant whose photoperiodic sensitivity was discovered by Garner and Allard in 1920. Compared to the model plants, photoperiodic control of flowering in soybean is far less understood. The eight loci, *E1* to *E8*, conditioning flowering has been genetically identified (Bernard, 1971; Buzzel, 1971; Buzzel and Voldeng, 1980; McBlain and Bernard, 1987; Bonato and Vello, 1999; Cober and Voldeng, 2001; Cober et al., 2010). At each of these loci, two alleles have been identified, and except for *E6*, the recessive alleles at the *E* loci condition early flowering under both LD and SD conditions. The partially dominant alleles at the *E* loci delay flowering under LD conditions. Near-isogenic lines (NILs) for *E* loci have been developed and used for studies to elucidate the flowering in soybean (Saidon et al., 1989a,b; Upadhyay et al., 1994a,b; Cober et al., 1996a). Among these *E* loci, *E1*, *E3*, *E4* and *E7* are known to be involved in the response to the phtoperiod (Buzzell, 1971; Buzzell and Voldeng, 1980; McBlain et al., 1987; Cober et al., 1996b; Cober and Voldeng, 2001; Abe et al., 2003). The *E3* locus was first identified with the use of fluorescent lamps to extend day length. The *e3e3* recessive homozygote can initiate flowering under LD conditions where the day length was extended to 20 hr using fluorescent lamps (FLD) with a high red to far-red (R: FR) ratio (Buzzell, 1971). The *E4* locus was identified by extending the natural day length to 20 hr with incandescent lamps with a low R: FR ratio (Buzzell and Voldeng, 1980). The insensitivity of *e4e4* genotype to LD conditions with a low R: FR ratio is necessary of *e3e3* background (Buzzell and Voldeng, 1980; Saindon et al., 1989b; Cober et al. 1996b). The *E1* and *E7* loci are involved in the control of insensitivity to artificially induced LD conditions in the *e3* and *e4* backgrounds (Cober et al., 1996b; Cober and Voldeng 2001). Of the known *E* loci, the *E1* locus is considered to have the largest effect on time to flowering under field

*Ehd1* ortholog in *Arabidopsis*.

conditions (Stewart et al., 2003).

As flowering time is a quantitative trait, we employed QTL analysis (Tanksley, 1993) to dissect the genetic factors for flowering time into individual components by using recombinant inbred lines (RIL) derived from Misuzudaizu, a Japanese variety, and Moshidou Gong 503, a weedy line from China. To identify the underlying molecular basis for each QTL, map-based cloning method was performed because molecular or biochemical information for soybean flowering was very few or totally not available. Although NILs are usually used for fine mapping of each QTL, developing NILs is time-consuming and laborious process especially in soybean. Alternatively, we have proposed fine mapping using residual heterozygous lines (RHLs) (Yamanaka et al., 2005). An RHL selected from an RIL population harbors a heterozygous region where the target QTL is located but contains a homozygous background for most other regions of the genome. The progenies of the RHL are expected to show a simple phenotypic segregation based on the effects of the target QTL at the heterozygous region (Fig. 1). A similar term, heterogeneous inbred family (HIF), was used by Tuinstra et al. (1997) to identify the QTL associated with seed weight in sorghum. The RHL strategy has already been used to identify loci underlying pathogen resistance in soybean (Njiti et al., 1998; Meksem et al., 1999; Triwitayakorn et al., 2005). Genotypes of a trait in recombinants identified in the progenies of RHL, could be determined in the next generation.

The probability of discovering RHLs for a target QTL depends on the heterozygosity ratio in a population and the size of the population. If p is the ratio of hetrozygosity of any population with size n, then the probability of detecting k individuals with a heterozygous genotype is supposed as nCk pk (1-p)n-k based on a binomial distribution. In the case of an F7 generation of RILs, the ratio of heterozygosity (p) is 0.0156 and with a population size of 200 (n), the probability of detecting at least one RHL is more than 0.95. We propose that QTL analysis using the F6-F8 RIL population in combination with the RHL strategy is useful for dissecting genetic factors for an agronomic trait into each QTL where the homozygous ratio is sufficiently high to evaluate traits with replication and the heterozygosity ratio is not so low and will allow the identification of a sufficient number of RHLs.

In progenies of an RHL, we can identify NILs for the target QTL. New DNA markers in the heterozygous region were developed using NILs, bulked segregant analysis (BSA) in progenies of the RHL, and sequences of bacterial artificial chromosome (BAC) clones covering the target QTL. We usually developed amplified fragment length polymorphism (AFLP), simple sequence repeat (SSR) and sequence characterized amplified region (SCAR) markers. Genetic analyses of flowering phenotypes and DNA markers were performed in the

Positional Cloning of the Responsible Genes for Maturity Loci *E1*, *E2* and *E3* in Soybean 55

progenies of RHLs with a large population. Recombinants of DNA markers were identified in the population and the genotypes of flowering time of recombinants were confirmed by progeny test. The cosegragated region of DNA markers with genotypes of flowering time, and BAC contig covering the region were identified. Sequencing of BAC clones covering the target region and annotation of sequences were performed. Confirmation of a candidate gene was carried out by association of phenotypes and sequence polymorphism of several

Population size of progenies of RHL for fine mapping depends on recombination frequency, that is, the position of a QTL. We usually used about 1,000 individuals but more than 10,000 plants are necessary when the target locus is located in the peri-centromeric or centromeric region. For high throughput genotyping, the cotyledon flour was obtained by drilling a hole on the surface of seed without any damage to the embryonic axis (Fig. 2). The initially drilled material was discarded to eliminate any possible contamination from the seed coat. Collected materials were transferred into wells in 384-well plate. The drill and tube were

**3. Positional cloning of the responsible genes for the** *E1, E2* **and** *E3* **loci** 

**3.1 Positional cloning of the responsible gene for the** *E2* **locus** 

positions of these markers are shown in Fig. 6.

A population of 156 RILs (F8:10) derived from a cross between Misuzudaizu and Moshidou Gong 503 was used for QTL analysis of flowering. Three QTLs for flowering time, *FT1*, *FT2* and *FT3* were identified at LG C2 (Chr. 6), LG O (Chr. 10) and LG L (Chr. 19), respectively (Fig. 3). The *FT1*, *FT2* and *FT3* were considered to correspond to *E1*, *E2*, *E3*, respectively, based on their map positions (Yamanaka et al., 2001; Watanabe et al., 2004). The lateflowering alleles *FT1*, *FT2* and *FT3* are partially dominant over the early-flowering alleles, *ft1*, *ft2* and *ft3*, respectively. Misuzudaizu harbored the late-flowering allele of the *FT1* and *FT3* loci, whereas Moshidou Gong 503 carried the late-flowering alleles of the *FT2* locus.

The line RIL6-8 was found to be heterozygous for the *FT2* locus and was designated as RHL6-8 (Fig. 4). DNA marker analysis showed that RHL6-8 harbored a heterozygous region covering approximately 10 cM including the *FT2* locus. The RHL6-8 generated NILs6-8-*FT2* and –*ft2* among its progenies. Using BSA, a polymorphic AFLP marker, E7M19, was detected between the early-flowering bulk and late-flowering bulk derived from the progeny of RHL6-8. This marker was located close to the LOD peak position of the QTL assigned *FT2* (Fig. 5). We developed additional DNA markers tightly linked to the *FT2* locus using NILs6-8. Among the products amplified from all possible 4,096 primer pair combinations, only five polymorphic bands showed constant polymorphism between the contrasting genotypes of *FT2*/*FT2* and *ft2*/*ft2* in NILs6-8. These polymorphic bands were excised from the gel, sequenced and converted to SCAR markers. Three SCAR markers, originating from five AFLP bands, were developed and used for screening of 10 BAC clones from two independent BAC libraries. A contig covering the *FT2* region was constructed based on the results of PCR analysis using the BAC end sequences. Five of the 10 BAC clones were then subjected to shotgun sequence analysis. Each BAC clone was separately analyzed and assembled, and the sequence information then combined using overlapping sequences. The total length covered by the five clones was approximately 430 Kb. A total of three DNA markers, including one AFLP-derived marker (marker 2) and two PCR-based markers developed from BAC sequences (markers 1 and 3), were used in the fine mapping to minutely restrict the *FT2* locus (Table 1). The

alleles and gene disruption by induced mutation.

cleaned by air flow.

Fig. 1. A schematic representation of RHL. An RHL harbors a heterozygous region where the target QTL is located but contains a homozygous background for most other regions of the genome. Meshed circles show heterozygous individuals.

Fig. 2. A procedure for seed genotyping.

P2

**RHL**

**HIF**

the genome. Meshed circles show heterozygous individuals.

Fig. 1. A schematic representation of RHL. An RHL harbors a heterozygous region where the target QTL is located but contains a homozygous background for most other regions of

Seeds

F2

P1

F1

X

F8

Drilling

Seed powder Collection

Transfer

Fig. 2. A procedure for seed genotyping.

**QTL location**

Simple phenotypic segregation based on a single QTL in the progeny of RHL

DNA extraction

PCR and Genotyping

Selection of recombinant plants

Cleaning with air flow

Cleaning with air flow

progenies of RHLs with a large population. Recombinants of DNA markers were identified in the population and the genotypes of flowering time of recombinants were confirmed by progeny test. The cosegragated region of DNA markers with genotypes of flowering time, and BAC contig covering the region were identified. Sequencing of BAC clones covering the target region and annotation of sequences were performed. Confirmation of a candidate gene was carried out by association of phenotypes and sequence polymorphism of several alleles and gene disruption by induced mutation.

Population size of progenies of RHL for fine mapping depends on recombination frequency, that is, the position of a QTL. We usually used about 1,000 individuals but more than 10,000 plants are necessary when the target locus is located in the peri-centromeric or centromeric region. For high throughput genotyping, the cotyledon flour was obtained by drilling a hole on the surface of seed without any damage to the embryonic axis (Fig. 2). The initially drilled material was discarded to eliminate any possible contamination from the seed coat. Collected materials were transferred into wells in 384-well plate. The drill and tube were cleaned by air flow.
