**7. Molecular markers offer a new vision of tomato diversity**

Natural genetic diversity is the fuel of evolution. No evolutive forces or adaptation to environment changes can apply without it (Alonso-Blanco, Aarts et al. 2009). Consequently it is a vital characteristic for species adaptation in general and for crop breeding in particular. Genetic variation occurs both within cultivated tomato (intraspecific) and between wild species (interspecific). Tomato breeding for adaptation to specific growing areas is in progress for more than two centuries now (Stevens and Rick 1986). Since the early days of quantitative genetics, initiatives were developed to improve the understanding of trait inheritance. Attempts to construct genetic maps based on interspecific crosses (*S. pimpinellifolium* x *S. lycoperpersicum*) and to map disease resistance genes are performed for years (Langford 1937). A linkage map showing the distribution of agronomic trait with Mendelian inheritance, based on linkage between two or three mutations, was proposed (Butler 1952). Nevertheless, the lack of polymorphic and neutral markers was strongly limitant. Development of isozymes allowed a first evaluation of wild germplasm (Rick and Fobes 1975) and introgression diagnostic (Tanksley, Medina-Filho et al. 1981), but isozyme marker scarcity and their low polymorphism was still limitant. This limitation was progressively overcome since the 80's thanks to the discovery of several molecular marker types.

#### **7.1 Ecological and evolution in wild tomato related species**

Molecular studies provide important clues into ecological and evolutionary questions in wild tomatoes species. In speciation process, hybrid sterility is frequently due to dysfunctional interactions between loci that accumulate between different lineages. A "snowballing effect" characterizes loci controlling such reproductive barrier and hybrid sterility that should accumulate faster than linearly with time. Such "snowballing" effect has been recently described within distinct populations derived from crosses of *S. lycopersicum* with *S. pennellii*, *S. habrochaites* and *S. lycopersicoides* (Moyle and Nakazato 2010). However, further investigations are suggested to confirm these results (Stadler, Florez-Rueda et al. 2011).

Tellier and colleagues quantified the number of adaptive and deleterious mutations and the distribution of fitness effects of new mutations within housekeeping genes in 4 species, *S. arcanum, S. chilense, S. habrochaites* and *S. peruvianum*. Little evidence for adaptive mutations was shown but strong purifying selection in coding regions was detected (Tellier, Fischer et al. 2011). This suggests that closely related species with similar genetic backgrounds but contrasted environments differ in the frequency of deleterious fitness effects.

The west coastal area between the Andes and the ocean, from Ecuador to Chile is widely recognized as the center of origin of the species from the Solanum sect. *lycopersicon*. This area covers a wide range of geographical conditions. Complex geography and ecology of Andes had a major impact in species divergence and hybridization between *S. pimpinellifolium* and *S. lycopersicum* (Nakazato and Housworth 2011). The two species present a distinct lineage, separated by the Andes. They hybridize extensively in north and central Ecuador. Nakazato and colleagues demonstrated using molecular markers and geographic information system (GIS) data that *S. lycopersicum* has likely experienced a severe population bottleneck during the colonization of the eastern Andes followed by a rapid population expansion. In plant, resistance genes and homologs (RGA) tend to be highly variable. Caicedo et al (2004) studied the geographic distribution of a RGA family Cf-2 (see Table 2) within and among plant populations of *S. pimpinellifolium*. They underlined that the geographical distribution of RGA diversity has been primarily shaped by demographic factors and selective pressure (Caicedo and Schaal 2004; Caicedo 2008). The authors underlined the reduction of natural habitat. This phenomenon is also observed on Galapagos Islands. The endemic species *S. cheesmanii* shows a reduction of its population due to human activity. Differentiation within *S. cheesmanii* was also observed (Nuez, Prohens et al. 2004) as well as hybridization with the two introduced species *S. lycopersicum* and *S. pimpinellifolium* (Darwin, Knapp et al. 2003)*.* 

#### **7.2 Diversity analysis among wild and cultivated germplasm**

Allelic richness (number of different alleles segregating in the population) is used to measure the genetic diversity and is considered as a key parameter for genetic resources management. It reveals past fluctuations in population size (Nei, Maruyama et al. 1975). Molecular differences between more than 200 Peruvian and Ecuadorian *S. pimpinellifolium* accessions were highlighted by Zuriaga and colleagues. Climate and genetic data were highly correlated. Thus the non-uniform nature of climates between the two countries is shown to be an important factor. Highest diversity was found in North Peru, lowest on Galapagos Islands. Authors stressed the fact that interspecific variation between *S. pimpinellifolium* and *S. lycopersicum* was indicating a very close relatedness between the two species (Zuriaga, Blanca et al. 2009).

Cherry tomato accessions show typically a large genetic diversity and an intermediate fruit size *between S. pimpinellifolium* and large cultivated ones. Botanists postulate that cherry tomato accessions are feral plants (also called revertant) or a possible genetic admixture of wild and cultivated germplasm (Rick and Holle 1990; Peralta et al. 2007a). Recently molecular analysis of the structure of a large set of accessions of wild *S. pimpinellifolium*, cherry tomato and cultivated accessions showed that domesticated and wild tomatoes have

*S. habrochaites* and *S. lycopersicoides* (Moyle and Nakazato 2010). However, further

Tellier and colleagues quantified the number of adaptive and deleterious mutations and the distribution of fitness effects of new mutations within housekeeping genes in 4 species, *S. arcanum, S. chilense, S. habrochaites* and *S. peruvianum*. Little evidence for adaptive mutations was shown but strong purifying selection in coding regions was detected (Tellier, Fischer et al. 2011). This suggests that closely related species with similar genetic backgrounds but

The west coastal area between the Andes and the ocean, from Ecuador to Chile is widely recognized as the center of origin of the species from the Solanum sect. *lycopersicon*. This area covers a wide range of geographical conditions. Complex geography and ecology of Andes had a major impact in species divergence and hybridization between *S. pimpinellifolium* and *S. lycopersicum* (Nakazato and Housworth 2011). The two species present a distinct lineage, separated by the Andes. They hybridize extensively in north and central Ecuador. Nakazato and colleagues demonstrated using molecular markers and geographic information system (GIS) data that *S. lycopersicum* has likely experienced a severe population bottleneck during the colonization of the eastern Andes followed by a rapid population expansion. In plant, resistance genes and homologs (RGA) tend to be highly variable. Caicedo et al (2004) studied the geographic distribution of a RGA family Cf-2 (see Table 2) within and among plant populations of *S. pimpinellifolium*. They underlined that the geographical distribution of RGA diversity has been primarily shaped by demographic factors and selective pressure (Caicedo and Schaal 2004; Caicedo 2008). The authors underlined the reduction of natural habitat. This phenomenon is also observed on Galapagos Islands. The endemic species *S. cheesmanii* shows a reduction of its population due to human activity. Differentiation within *S. cheesmanii* was also observed (Nuez, Prohens et al. 2004) as well as hybridization with the two introduced species *S. lycopersicum*

Allelic richness (number of different alleles segregating in the population) is used to measure the genetic diversity and is considered as a key parameter for genetic resources management. It reveals past fluctuations in population size (Nei, Maruyama et al. 1975). Molecular differences between more than 200 Peruvian and Ecuadorian *S. pimpinellifolium* accessions were highlighted by Zuriaga and colleagues. Climate and genetic data were highly correlated. Thus the non-uniform nature of climates between the two countries is shown to be an important factor. Highest diversity was found in North Peru, lowest on Galapagos Islands. Authors stressed the fact that interspecific variation between *S. pimpinellifolium* and *S. lycopersicum* was indicating a very close relatedness between the two

Cherry tomato accessions show typically a large genetic diversity and an intermediate fruit size *between S. pimpinellifolium* and large cultivated ones. Botanists postulate that cherry tomato accessions are feral plants (also called revertant) or a possible genetic admixture of wild and cultivated germplasm (Rick and Holle 1990; Peralta et al. 2007a). Recently molecular analysis of the structure of a large set of accessions of wild *S. pimpinellifolium*, cherry tomato and cultivated accessions showed that domesticated and wild tomatoes have

investigations are suggested to confirm these results (Stadler, Florez-Rueda et al. 2011).

contrasted environments differ in the frequency of deleterious fitness effects.

and *S. pimpinellifolium* (Darwin, Knapp et al. 2003)*.* 

species (Zuriaga, Blanca et al. 2009).

**7.2 Diversity analysis among wild and cultivated germplasm** 

evolved as a species complex with intensive hybridization. This highlighted the admixture position of *S. lycopersicum* var. *cerasiforme* (Ranc, Muños et al. 2008) which is illustrated on figure 2 using a data from Ranc et al (2010) and analyzed using Structure 2.0 (Pritchard et al. 2000) output data. Accessions display clustering patterns (circled) following two phenotypic traits: fruit size and stigma insertion. Structuration effect of those domestication traits can be observed. The emergence of molecular markers has allowed quantifying with accuracy the diversity within germplasm material. The first molecular diversity studies on cultivated tomato revealed the very low polymorphism compared to wild species, whether it was based on RFLP7 (Miller and Tanksley 1990), SSR8 (Bredemeijer, Cooke et al. 2002; He, Poysa et al. 2003) AFLP9 (Park, West et al. 2004; Berloo, Zhu et al. 2008), SSAP10 (Tam, Mhiri et al. 2005) or SNP11 (Yang, Bai et al. 2004; Labate and Baldo 2005). However, Bredemeijer et al (2002) characterized 500 cultivated lines from European lines and showed that it was possible to distinguish them all from each other using a set of 20 SSR markers. When comparing old varieties (or landraces) to modern hybrids, a higher level of molecular diversity in landraces is usually observed (Mazzucato, Papa et al. 2008; van Berloo, Zhu et al. 2008).

Fig. 2. Principal Coordinate Analysis of 318 accessions tomato core collection.

<sup>7</sup> Restriction Fragment Lenght Polymorphism

<sup>8</sup> Simple Sequence Repeats

<sup>9</sup> Amplified Fragment Lenght Polymorphism

<sup>10</sup> Sequence-Specic Amplication Polymorphism

<sup>11</sup> Single Nucleotide Polymorphism

If interspecific populations for genetic analyses and diversity studies answered to many questions, it has left a void in the understanding of genotypic variation within tomato breeding programs which focus on intra-specific populations (Van Deynze, Stoffel et al. 2007). The recent discovery of SNP markers, first detected in EST (expressed sequenced tag) sequences (Van Deynze et al. 2007; Jimenez-Gomez and Maloof 2009) then in non-coding sequences (Labate et al. 2009) provided access to a higher level of polymorphism. Labate and colleagues estimated parameters of diversity among *S. lycopersium* accessions, first using the SNP detected in 50 loci that were resequenced in a diversity panel of 31 accessions. In a second investigation, multilocus estimates of polymorphism were obtained and led to rejection of the neutral equilibrium model of evolution within the studied collection (Labate, Robertson et al. 2009). Public germplasm are potential allele mining sources for crop improvement as illustrated by previous authors who sampled among US seed banks 30 accessions from the five continents. The study confirmed that history of crossing with wild tomato species and distribution among different environments across the world has spread allelic variation (Labate, Sheffer et al. 2011).

Molecular markers have proven their efficiency in sampling and maximizing allelic richness (Schoen and Brown 1993) through the development of nested core collections (McKhann, Camilleri et al. 2004). Such nested core collections (from 8 to 96 accessions) were constructed in tomato, capturing most of the molecular and phenotypic variation present in a set of 360 constituted of wild, feral and cultivated accessions (Ranc, Muños et al. 2008).

#### **7.3 Use of molecular diversity to dissect phenotypes**

Molecular markers allowed the construction of high density genetic maps of the tomato genome (Tanksley, Ganal et al. 1992). This permitted the dissection of quantitative traits into Mendelian factors or QTL (Quantitative Trait Loci) (Paterson, Lander et al. 1988; Tanksley et al. 1992). This strategy also opened the way to investigate physical mapping and molecular cloning of genetic factors underlying quantitative traits (Paterson, Damon et al. 1991). Moreover, *Lycopersicon* varieties and related species are all diploid and chromosomally collinear, making genetic dissection straightforward. The first gene cloned by positional cloning was the *Pto* gene, confering resistance to *Pseudomonas syringae* (Martin, Brommonschenkel et al. 1993). Since then, interspecific crosses with each wild species were performed. Due to the low genetic diversity within the cultivated compartment (Miller and Tanksley 1990), most of the mapping populations are based on interspecific crosses between a cultivar and related wild species from the lycopersicon group (as reviewed by Foolad (2007) ;Labate, Grandillo et al.(2007); Grandillo et al (2011)) or from lycopersicoides (Pertuzé, Ji et al. 2002) and juglandifolia group (Albrecht, Escobar et al. 2010). However, maps based on intraspecific crosses have proved their interest notably on fruit quality aspects (Saliba-Colombani, Causse et al. 2001). All those populations allowed discovering and/or characterizing a myriad of major genes (Table 2) and QTLs involved in various traits.

Rapidly, molecular breeding strategies were set up and implemented to "pyramid" genes of interest for agronomical traits, notably using Advanced Backcross QTL method (AB-QTL) (Tanksley, Grandillo et al. 1996). Using this approach with a *S. lycopersicum x S. pimpinellifolium* progeny, in which agronomical favorable QTL alleles were detected, Grandillo and colleague*s* showed how a wild species could contribute to improve

If interspecific populations for genetic analyses and diversity studies answered to many questions, it has left a void in the understanding of genotypic variation within tomato breeding programs which focus on intra-specific populations (Van Deynze, Stoffel et al. 2007). The recent discovery of SNP markers, first detected in EST (expressed sequenced tag) sequences (Van Deynze et al. 2007; Jimenez-Gomez and Maloof 2009) then in non-coding sequences (Labate et al. 2009) provided access to a higher level of polymorphism. Labate and colleagues estimated parameters of diversity among *S. lycopersium* accessions, first using the SNP detected in 50 loci that were resequenced in a diversity panel of 31 accessions. In a second investigation, multilocus estimates of polymorphism were obtained and led to rejection of the neutral equilibrium model of evolution within the studied collection (Labate, Robertson et al. 2009). Public germplasm are potential allele mining sources for crop improvement as illustrated by previous authors who sampled among US seed banks 30 accessions from the five continents. The study confirmed that history of crossing with wild tomato species and distribution among different environments across the world has spread

Molecular markers have proven their efficiency in sampling and maximizing allelic richness (Schoen and Brown 1993) through the development of nested core collections (McKhann, Camilleri et al. 2004). Such nested core collections (from 8 to 96 accessions) were constructed in tomato, capturing most of the molecular and phenotypic variation present in a set of 360

Molecular markers allowed the construction of high density genetic maps of the tomato genome (Tanksley, Ganal et al. 1992). This permitted the dissection of quantitative traits into Mendelian factors or QTL (Quantitative Trait Loci) (Paterson, Lander et al. 1988; Tanksley et al. 1992). This strategy also opened the way to investigate physical mapping and molecular cloning of genetic factors underlying quantitative traits (Paterson, Damon et al. 1991). Moreover, *Lycopersicon* varieties and related species are all diploid and chromosomally collinear, making genetic dissection straightforward. The first gene cloned by positional cloning was the *Pto* gene, confering resistance to *Pseudomonas syringae* (Martin, Brommonschenkel et al. 1993). Since then, interspecific crosses with each wild species were performed. Due to the low genetic diversity within the cultivated compartment (Miller and Tanksley 1990), most of the mapping populations are based on interspecific crosses between a cultivar and related wild species from the lycopersicon group (as reviewed by Foolad (2007) ;Labate, Grandillo et al.(2007); Grandillo et al (2011)) or from lycopersicoides (Pertuzé, Ji et al. 2002) and juglandifolia group (Albrecht, Escobar et al. 2010). However, maps based on intraspecific crosses have proved their interest notably on fruit quality aspects (Saliba-Colombani, Causse et al. 2001). All those populations allowed discovering and/or

characterizing a myriad of major genes (Table 2) and QTLs involved in various traits.

Rapidly, molecular breeding strategies were set up and implemented to "pyramid" genes of interest for agronomical traits, notably using Advanced Backcross QTL method (AB-QTL) (Tanksley, Grandillo et al. 1996). Using this approach with a *S. lycopersicum x S. pimpinellifolium* progeny, in which agronomical favorable QTL alleles were detected, Grandillo and colleague*s* showed how a wild species could contribute to improve

constituted of wild, feral and cultivated accessions (Ranc, Muños et al. 2008).

allelic variation (Labate, Sheffer et al. 2011).

**7.3 Use of molecular diversity to dissect phenotypes** 

cultivated tomato (Tanksley, Grandillo et al., 1996). Introgression Lines (IL) derived from interspecific crosses allowed to dissect the effect of chromosome fragments from a donor (usually from a wild relative) introgressed into a recurrent elite line. IL offer the possibility to evaluate the agronomic performance of a specific set of QTL (Paran, Goldman et al. 1995). IL was used as a base for fine mapping and positional cloning of several genes and QTL of interest. The first IL library was developed between *S. pennellii* and *S. lycopersicum* (Eshed and Zamir 1995; Zamir 2001). QTL mapping power was increased compared to biallelic QTL mapping population, and was again improved by the constitution of sub-IL set with smaller introgressed fragments. This progeny was successful in identifying QTLs for fruit traits (Causse, Duffe et al 2004); anti-oxidants (Rousseaux, Jones et al. 2005), vitamin C (Stevens, Buret et al. 2007) and volatile aromas (Tadmor, Fridman et al. 2002). The introgression of a QTL identified in these IL has allowed plant breeders to boost the level of soluble solids (brix) in commercial varieties and largely increased tomato yield in California (Fridman, Carrari et al. 2004). Such exotic libraries were thus designed with several species, involving *S. pimpinellifolium* (Doganlar, Frary et al. 2002), *S. habrochaites* (Monforte and Tanksley 2000; Finkers, van Heusden et al. 2007) and *S. lycopersicoides* (Canady, Meglic et al. 2005).

Introgression lines were also used to dissect the genetic basis of heterosis (Eshed and Zamir 1995). Heterosis refers to phenomenon where hybrids between distant varieties or crosses between related species exhibit greater biomass, speed of development, and fertility than both parents (Birchler, Yao et al. 2010). Heterosis involves genome–wide dominance complementation and inheritance model such as locus–specific overdominance (Lippman and Zamir 2007). Heterotic QTL for several trait were identified in tomato IL (Semel and Nissenbaum, 2006)*.* A unique QTL was shown to display at the heterozygous level improved harvest index, earliness and metabolite content (sugars and amino acids) in processing tomatoes (Gur, Osorio et al. 2010; Gur, Semel et al. 2011) Furthermore, a natural mutation in the SFT gene, involved in flowering (Shalit, Rozman et al. 2009), was shown to correspond to a single overdominant gene increasing yield in hybrids of processing tomato (Krieger, Lippman et al. 2010).

Metabolite detection is an approach of choice to identify compounds involved in fruits quality traits. Metabolite QTL (mQTL) can be now identified for non-volatile metabolites like sugars, pigments or volatiles compounds (Bovy, Schijlen et al. 2007). This was done on several interspecific populations, notably on *S. lycopersicum x S. Chmielewskii* (Do, Prudent et al. 2010) and intraspecific crosses (Saliba-Colombani et al. 2001; Causse, Saliba-Colombani et al. 2002; Zanor, Rambla et al. 2009). Recent technologies allowed screening for diversity in a wide range of components on whole genomes. This can be done in a targeted way to better characterize known metabolites (Tieman, Taylor et al. 2006) or untargeted manner to identify new metabolites (Tikunov, Lommen et al., 2005). Further than identify and quantify compounds, metabolomics can be of great help to decipher biosynthetic pathways (Keurentjes 2009). Metabolome studies can be combined to transcriptomic data to identify the key factors (Mounet, Moing et al. 2009; Do, Prudent et al. 2010). Metabolomics has an important role to play in characterization of natural diversity in tomato (Schauer, Zamir et al. 2004; Fernie et al. 2011). As well, it can boost the biochemical understanding of fruit content and be an enhancer for quality breeding (Fernie and Schauer 2009; de Vos, Hall et al. 2011).

#### **7.4 Dissection of the molecular bases of domestication and diversification**

Product of human domestication and later diversification of fruit types, led to a large morphological diversity in tomato fruit (with small to large, round, blocky, elongated, pear shaped fruits, with color ranging from red to green, white, black, pink, orange or yellow). On the contrary, wild tomato species carry small, round red or green fruits, with a low intraspecific phenotypic diversity. This has drawn scientist attention on the inheritance and development of fruit size and shape in the tomato (Yeager 1937). Influence of chromosome 2 in fruit morphology was noticed (Butler 1964). Thus, using available molecular techniques, fruit traits genetic control has been widely dissected (Grandillo, Ku et al. 1999; Lippman and Tanksley 2001; Barrero and Tanksley 2004). The first QTL, fw2.2, controlling fruit weight variation was cloned (Frary, Nesbitt et al 2000). It has been suggested that diversity of fruit shape in cultivated germplasm can be explained to a large extent by four genes (Rodriguez, Muños et al. 2011). The study established a model for fruit shape evolution in tomato. This model includes four major mutations recently identified: FAS which increases locule number, fruit fasciation and size (Cong, Barrero et al. 2008), LC which increases locule number and fruit size (Muños, Ranc et al. 2011), OVATE which gives ovoid fruit shape (Liu, van Eck et al 2002) and SUN which gives an elongated fruit shape (van der Knaap, Lippman et al. 2002; Xiao, Jiang et al. 2008) or the oxheart shape when associated to LC and FAS. The allelic distribution of the four genes was associated with morphologic, geographical and historical data in a collection of diverse cultivated accessions. This study established that the selection occurred in distinct chronologic and historic periods: LC arose first, followed by OVATE, both in *S.l. cerasiforme* background but in distinct populations. FAS arose later in a LC background. Presence of those three mutations in Latin American germplasm suggests Pre-Columbian mutations. Combined with fw2.2, they must have strongly contributed to the increase in fruit size during tomato domestication. On the contrary, SUN mutation is not carried by any Latin American material tested, suggesting that SUN mutation appeared post domestication in European material (probably in Italy). This study also showed that the selection for fruit shape is strongly responsible for the underlying genetic structure in tomato cultivars. The recent discoveries of the molecular events shaping tomato fruit indicate that the germplasm is frequently more diverse phenotypically than the wild related germplasm but not necessarily showing a similar pattern at the molecular level. *"The irony of all this,"* says Steve Tanksley (geneticist at Cornell University, and precursor of all these studies) *"is all that diversity of heirlooms can be accounted for by a handful of genes. There are probably no more than 10 mutant genes that create the diversity of heirlooms you see"* (Borrell 2009). Tomato selection and spread worldwide has led to the immense diversity of varieties that characterizes many domesticated plant species (Purugganan and Fuller 2009).

#### **8. Association genetics: New valorization of natural diversity**

Recent advance in molecular genetics and computation has allowed the emergence of association mapping (Myles, Peiffer et al. 2009). Association mapping takes advantage of historical recombination events and natural genetic diversity. By using large numbers of lines and molecular markers over the whole genome, the resolution of Genome Wide Association studies (GWAS) is much higher than in conventional segregating populations. Such approach requires an accurate estimate of the genetic structure of the sample studied (Price, Zaitlen et al. 2010) and linkage disequilibrium (LD) extend among loci. Yu and

Product of human domestication and later diversification of fruit types, led to a large morphological diversity in tomato fruit (with small to large, round, blocky, elongated, pear shaped fruits, with color ranging from red to green, white, black, pink, orange or yellow). On the contrary, wild tomato species carry small, round red or green fruits, with a low intraspecific phenotypic diversity. This has drawn scientist attention on the inheritance and development of fruit size and shape in the tomato (Yeager 1937). Influence of chromosome 2 in fruit morphology was noticed (Butler 1964). Thus, using available molecular techniques, fruit traits genetic control has been widely dissected (Grandillo, Ku et al. 1999; Lippman and Tanksley 2001; Barrero and Tanksley 2004). The first QTL, fw2.2, controlling fruit weight variation was cloned (Frary, Nesbitt et al 2000). It has been suggested that diversity of fruit shape in cultivated germplasm can be explained to a large extent by four genes (Rodriguez, Muños et al. 2011). The study established a model for fruit shape evolution in tomato. This model includes four major mutations recently identified: FAS which increases locule number, fruit fasciation and size (Cong, Barrero et al. 2008), LC which increases locule number and fruit size (Muños, Ranc et al. 2011), OVATE which gives ovoid fruit shape (Liu, van Eck et al 2002) and SUN which gives an elongated fruit shape (van der Knaap, Lippman et al. 2002; Xiao, Jiang et al. 2008) or the oxheart shape when associated to LC and FAS. The allelic distribution of the four genes was associated with morphologic, geographical and historical data in a collection of diverse cultivated accessions. This study established that the selection occurred in distinct chronologic and historic periods: LC arose first, followed by OVATE, both in *S.l. cerasiforme* background but in distinct populations. FAS arose later in a LC background. Presence of those three mutations in Latin American germplasm suggests Pre-Columbian mutations. Combined with fw2.2, they must have strongly contributed to the increase in fruit size during tomato domestication. On the contrary, SUN mutation is not carried by any Latin American material tested, suggesting that SUN mutation appeared post domestication in European material (probably in Italy). This study also showed that the selection for fruit shape is strongly responsible for the underlying genetic structure in tomato cultivars. The recent discoveries of the molecular events shaping tomato fruit indicate that the germplasm is frequently more diverse phenotypically than the wild related germplasm but not necessarily showing a similar pattern at the molecular level. *"The irony of all this,"* says Steve Tanksley (geneticist at Cornell University, and precursor of all these studies) *"is all that diversity of heirlooms can be accounted for by a handful of genes. There are probably no more than 10 mutant genes that create the diversity of heirlooms you see"* (Borrell 2009). Tomato selection and spread worldwide has led to the immense diversity of varieties

**7.4 Dissection of the molecular bases of domestication and diversification** 

that characterizes many domesticated plant species (Purugganan and Fuller 2009).

Recent advance in molecular genetics and computation has allowed the emergence of association mapping (Myles, Peiffer et al. 2009). Association mapping takes advantage of historical recombination events and natural genetic diversity. By using large numbers of lines and molecular markers over the whole genome, the resolution of Genome Wide Association studies (GWAS) is much higher than in conventional segregating populations. Such approach requires an accurate estimate of the genetic structure of the sample studied (Price, Zaitlen et al. 2010) and linkage disequilibrium (LD) extend among loci. Yu and

**8. Association genetics: New valorization of natural diversity** 

colleagues (2005) proposed a unified mixed model taking into account the genetic structure of the sample, based on single locus analysis. This model is being updated by integrating a multi-locus analysis (Ayers et al. 2010).In autogamous crops, it is expected that large extent of LD will reduce the resolution and risks to lead to false positive associations. Nevertheless, successful results have been obtained in selfing crops (Atwell, Huang et al. 2010; Ramsay, Comadran et al. 2011).

In tomato, several studies revealed contrasted results according to the samples studied. First studies of the linkage disequilibrium revealed large LD in cultivated tomatoes (Mazzucato, Papa et al, 2008; van Berloo, Zhu et al, 2008; Robbins, Sim et al, 2010). Van Berloo and colleagues performed association mapping within a collection of 94 accessions containing both old and elite (hybrids) European germplasm and about 300 markers (AFLP). Structure coinciding with fruit size was identified allowing grouping between cherry tomato and round-beef types, extensive LD was observed (15 cM average). Robbins and colleagues investigated the population structure among 70 tomato cultivars (modern and vintage, from fresh and processing market). The STRUCTURE analysis (Pritchard, Stephens et al. 2000) revealed groups predefined by market niche and age into distinct subpopulations. Furthermore, they detected two subpopulations within the processing varieties, corresponding to historical patterns of breeding conducted for specific production environments. They found no subpopulation within fresh-market varieties. High levels of admixture were shown in several varieties representing a transition in the demarcation between processing and fresh-market. Mapping and LD analysis on a genome wide level was performed (Robbins, Sim et al. 2010). Using a panel of 102 accessions including 95 cultivars (heirloom, fresh and processing cultivars) and 9 wild species), effect of selection on genome variation was studied using 340 markers (SNP, SSR, and INDEL12). LD value varied from 6-8 cM (all accessions) up to 3-16 cM (fresh market cultivars). Inter-chromosomal LD appeared to be population dependent, suggesting cautious approach for association mapping. Notably, a genetic divergence between fresh market and processing types was also shown. On the contrary, the use of cherry tomato allowed the construction of core collection with a reduced structure and lower LD (Ranc et al, 2008; 2010). In a pilot study on chromosome 2, using markers distant from several cM to few kb, Ranc (2010) showed that LD varied strongly from one region to the other. A few distant markers remained in strong LD, but could be removed from the analysis.

The first association study was performed by Nesbitt and Tanksley (2002) to identify the SNP responsible for FW2.2 gene they had cloned. They failed to find any association between fruit size and genomic sequence of the *fw2.2* region in a collection of 39 cherry tomato accessions. Ranc and colleagues (2010) identified significant association in the promoter region, thanks to a larger and more representative sample. From a breeding point of view, the admixture mapping between the cultivated tomato and its closest relative is a method of choice for allele mining in wild germplasm. Muños and colleagues (2011) used this approach to identify causal polymorphism of QTL controlling locule number on chromosome 2. New SNP arrays are now available thanks to Next Generation Sequencing technologies (NGS), as the genotyping array developed under the Solanaceae Coordinated Agriculture Project (SolCAP) initiative carrying 7,000 effective SNP (SolCAP 2008). These tools will be very useful to scan the whole genome for associations.

<sup>12</sup> Insertion-Deletion

## **9. Conclusion: Toward a change in the way to manage and use diversity**

Crossing wild and cultivated species can reveal alleles left behind during the domestication process. Molecular markers strongly helped to reinforce the use of wild relatives (Zamir 2008). Interfacing genetic resources management and plant breeding, pre-breeding is now recognized as an important adjunct to plant breeding, as a way to introduce new traits from non-adapted populations and wild relatives, notably for abiotic stress (FAO 2010). Nevertheless, the extensive use of this genetic richness contained in seed banks and germplasm collection faces limits. The difficulty to introgress accurately the targeted allele (with favorable effect) without unfavorable ones, carried on by "linkage drag", remains.

With the emergence of bioinformatics and nanotechnologies -so called "post-genomics" erathe last decade has opened high throughput sequencing era. Now, conducting large intraspecific studies becomes a reality in tomato, allowing a better characterization of its genetic diversity. With the completion of its genome sequence (Mueller, Lankhorst et al. 2009) and rich annotation as well as a large number of tools available via SGN (SOL Genome Network; http://solgenomics.net/organism/solanum\_lycopersicum/genome) platform (Bombarely, Menda et al. 2010), tomato and its relatives is the most advanced vegetable crop. A draft of the genome sequence of *S. pimpinellifolium* LA1589 is also released by D. Ware, W. R. McCombie, and Z. B. Lippman at Cold Spring Harbor Laboratory allowing a detailed comparison of both species. The genome sequences of tomato provide clues for understanding the Solanum clade evolutive history and identify genes involved in fleshy fruit development.

Progress in sequencing technologies has reached the point where genotyping by sequencing (GBS) is now possible (Davey, Hohenlohe et al. 2011; Elshire, Glaubitz et al. 2011). This opens new perspectives in terms of genetic diversity management, notably toward conservation and survey of large populations. In a near future, techniques such as GBS may allow breeders and scientists of the tomato community to determine population characteristics prior concretely establishing genome or nucleotide diversity. GBS opens ways to a global and quantitative management of diversity, and let foresee an *a priori* genetic resource management. It also opens perspectives in allele based breeding called genomic selection (Hamblin, Buckler et al. 2011).

If *ex situ* germplasm conservation is well developed and will benefit of these developments, *in situ* conservation of tomato and its wild relatives is becoming critical due to major ecological changes in its area origin. Efforts on *in situ* conservation and participatory approaches as proposed by Jarvis, Brown et al (2008) and (Thomas, Dawson et al. 2011) could be very useful to maintain the adaptive potential of tomato genetic resources. Nuez and colleagues proposed to use *S. cheesmanii* accessions now stored in germplasm banks to reinstate some extinct populations in Galapagos Islands (Nuez et al. 2004). This could help avoiding the present paradox: the more knowledge we gain on tomato diversity and its evolutive history, the less available those genetic resources are available in the wild.
