**3. Genome sequence variation within cultivars**

Sequencing and de novo assembly and annotation of the first grapevine genomes [18, 22] provided a new body of knowledge and a new toolbox for the study of genome sequence diversity. Two different strategies were used for the first genome assemblies, a homozygous assembly based on PN40024, a partially inbred line derived from Pinot Noir, [18] or an assembly including both, consensus contigs of the two genome copies and independent contigs for each of the two haplotypes in more dissimilar genome regions of Pinot Noir (ENTAV 115) [22]. Both projects estimated a haploid genome size close to 500 Mb. More recently, long-read sequencing technologies such as PacBio are facilitating the release of haplotype-resolved assemblies, which are already available for the heterozygous grapevine cultivars Cabernet Sauvignon and Chardonnay [23, 24]. By the time being, the availability of reference genomes combined with the development of next-generation sequencing (NGS) technologies enable genome-wide analysis of the grapevine germplasm at affordable costs, which is extremely useful in genetic diversity studies as well as to search for mutations causing phenotypic variation [15, 24–26]. Although the use of these approaches to characterize somatic variation in grapevine is still scarce, an increasing number of publications are shedding light on the magnitude and type of variation that accumulates at the genome level within given cultivars.

Somatic SNV (single nucleotide variants) and small insertions/deletions (INDEL) mutations are often the result of errors in DNA replication taking place during mitotic cell division. While the frequency of INDEL may exceed that of single base substitutions due for instance to low resolution of polymerases at homopolymeric or short repeats, INDEL are more difficult to detect using high-throughput sequencing methods due to the same reason. The first attempt to detect somatic polymorphisms at a genome-wide scale in grapevine used 454 GS-FLX sequencing technology to compare three Pinot Noir clones to the sequences in the genome assemblies of the Pinot-related accessions PN40024 and ENTAV-115 [27]. In this study, mean rates of 1.6 SNV, 5.1 INDEL, and 35.2 mobile element movements per Mb were described among clones. Short-read sequencing technologies led by Illumina provide a framework to accurately detect SNV and are also useful to detect small INDEL. In this manner, genome resequencing of three clones corresponding to different morphotypes of the ancient Italian wine cultivar Nebbiolo identified between 16 and 26 clone-specific SNV per Mb of genome [28]. However, these numbers might be over-estimated considering that the validation success was 61% for a quality-trimmed sub-selection of SNV [28]. More recently, the re-sequencing of 15 clones of Chardonnay compared to a de novo genome draft assembly for this cultivar identified a much more reduced number of SNV using a stringent k-merbased calling strategy variation [24]. The sum of SNV + INDEL ranged between 221 and 2 polymorphisms per clone (0.004–0.455 per Mb of genome), which

corresponds to at least three orders of magnitude of lower rates than in the Nebbiolo study, despite that Chardonnay accessions corresponded to diverse geographical origins and phenotypes including seedlessness and berry color variation [24]. Concerning the putative impact of these polymorphisms, a total of 21 (0.07%) and 55 (3.4%) clone-specific variants were predicted as potentially altering protein function in Nebbiolo and Chardonnay, respectively, including one nonsynonymous substitution in the *VviDXS* gene as the possible origin of the Muscat flavor of one Chardonnay clone [24, 28]. Transcriptome re-sequencing (RNA-seq) can also be useful to identify polymorphisms in coding sequences. For example, an RNA-seq study comparing the seedless somatic variant Corinto Bianco to its seeded ancestor Pedro Ximenes identified 13 polymorphisms with 100% validation rate (12 SNV and one dinucleotide), all of them being heterozygous variants [29]. This is also important to be considered since, rather than resulting from direct base substitution mutations, some of the somatic SNV detected in sequencing studies might correspond to loss of heterozygosity (LOH) in hemizygous regions generated after somatic SV.

SV involves changes in the chromosome landscape. It includes inter- and intra-chromosomal translocations, deletions, and insertions (the last two types are generally considered as SV if >1 kb) including those caused by the movement of transposable elements (TE) [30, 31]. The rapidly growing number of genomic studies in multiple species is unveiling more complex forms of SV, collectively known as chromoanagenesis, and combines several of the previous features [32]. In addition to the activity of TE, SV often relies on mistakes in replicative processes or on DNA breakage during mitosis followed by illegitimate repair mechanisms [33–35]. Although SV is generally deleterious, it can accumulate along the multiplication of grapevine cultivars behaving as recessive heterozygous due to the absence of sexual reproduction [15]. Features such as changes in copy number and breakpoint joins have been used in genomic studies to detect SV between grapevine cultivars and somatic variants [15, 25, 36–38]. By far, the most recurrently described case of somatic SV in grapevine relates to hemizygous deletions of different sizes around the grape color locus on chromosome 2 that causes loss of berry color variants (see below). Smaller SV, translocations, and inversions have also been described in somatic variants differing in ripening time [25]. Genome-wide SV studies in a higher number of clones would be required to estimate the frequency of different types of SV independently of specific phenotypes or genome regions resulting from human selection.

TE are extremely frequent in plant genomes and correspond to sequences that have the ability to replicate and insert in different locations, either indirectly through an RNA intermediate (retrotransposons or class I) or directly by cut-and-paste mechanisms (transposons or class II) [39]. The transposition of these elements generates changes in genome size and can disrupt target loci upon insertion. In addition, TE can lead to SV and genome rearrangements due to noncanonical transposition events or to homologous recombination related with their repetitive nature [39]. Altogether, TE has a high potential to impact on organismal phenotypes. While all superfamilies of TE are represented in the grapevine genome, those in class I (e.g.: Non-LTR LINEs, LTR Ty1/copia, LTR Ty3/gypsy, and other LTR) are much more numerous (>100,000 copies in total) than class II superfamilies (hAT, PIF, Mutator, and CACTA) totaling about 3000 copies in the grapevine reference genome [18, 40]. Because ca 50% of the grapevine genomes involve mobile element-like/repetitive sequences [18, 38], it is reasonable that they could be a major driver of somatic variation emerging during the extensive vegetative multiplication of grapevine cultivars. In fact, emergent phenotypes in grapevine somatic variants have frequently been associated with the movement of TE altering gene expression [41–43], although,

**33**

[26, 42, 43, 46].

*Somatic Variation and Cultivar Innovation in Grapevine DOI: http://dx.doi.org/10.5772/intechopen.86443*

ated to TE during the propagation of grapevine cultivars.

**variation**

as experimental systems.

function effects.

with the exception of color variants, their phenotypes have not been selected for production. While the use of molecular markers suggests that the TE genomic landscape can vary between grapevine clones [27, 44, 45], systematic studies are still required to determine the magnitude of somatic genome variation that accumulates associ-

**4. Nucleotide sequence variation underlying grapevine somatic** 

The availability of grapevine reference genomes and the advent of NGS technologies have paved the way for the identification of the nucleotide diversity underlying variation for relevant phenotypic traits in grapevine. Somatic variants are excellent tools for this goal, since they allow studying the mutation effect in a common genetic background when comparing somatic variants to the direct ancestor of the same cultivar. This facilitates the identification of the causal genes and gene variants. In fact, in the last years, the molecular and genetic basis of an increasing number of phenotypic traits has been elucidated using somatic variants

We consider transcriptome RNA-seq comparisons as an excellent diagnosis tool for the screening of candidate genes because this technology has the potential to trace mutations that alter either gene expression or coding sequences. In our hands, the process starts with a careful phenotypic analysis comparing the progenitor normal plant and the somatic variants. Concurrently, we develop self-cross derived progenies of both genotypes for segregation analyses. The main objective of the phenotypic analysis is to understand the developmental origin of the emerged trait. In this manner, we can identify a target organ, tissue, and developmental stage in which the mutation is initially expressed and take samples of it from each variant to conduct a transcriptome comparison. The interpretation of gene biological function from the developmental and phenotypic variation can frequently be misleading since, as mentioned before, many of these mutations have dominant gain-of-

Under these premises, transcriptome comparison, both at gene expression and sequence levels, combined with the results of segregation analyses of mutant phenotype in self-cross populations of each variant can provide a preliminary identification of putative candidate genes. These candidates will have to be confirmed by directly comparing their sequences in normal and somatic variants of the same cultivar. Both in transcriptome and sequence analyses, it is important to consider

When the described approaches lead to the identification of sequence variation susceptible of generating the mutant phenotypic effect, it is still required to confirm that this sequence variation is the cause of the phenotype. When the responsible mutation is present in the L2 layer and can be transmitted through gametes, co-segregation of the mutant phenotype with the candidate sequence variants would support a causality relationship although it is not a definitive proof. Genetic transformation to restore normal or variant phenotypes can be a difficult and time-consuming alternative in grapevine. Other possibilities like allele-specific expression analyses or sequence characterization of a large number of variants or cultivars displaying the same phenotype have been used in different cases to proof that a candidate gene variant is in fact responsible for a relevant phenotypic effect

In the next section, we review several examples of studies taking advantage of somatic variants to understand the molecular genetics of four relevant grape traits.

the possible chimeric state of causal mutations in the somatic variants.

*Somatic Variation and Cultivar Innovation in Grapevine DOI: http://dx.doi.org/10.5772/intechopen.86443*

*Advances in Grape and Wine Biotechnology*

somatic SV.

human selection.

corresponds to at least three orders of magnitude of lower rates than in the Nebbiolo study, despite that Chardonnay accessions corresponded to diverse geographical origins and phenotypes including seedlessness and berry color variation [24]. Concerning the putative impact of these polymorphisms, a total of 21 (0.07%) and 55 (3.4%) clone-specific variants were predicted as potentially altering protein function in Nebbiolo and Chardonnay, respectively, including one nonsynonymous substitution in the *VviDXS* gene as the possible origin of the Muscat flavor of one Chardonnay clone [24, 28]. Transcriptome re-sequencing (RNA-seq) can also be useful to identify polymorphisms in coding sequences. For example, an RNA-seq study comparing the seedless somatic variant Corinto Bianco to its seeded ancestor Pedro Ximenes identified 13 polymorphisms with 100% validation rate (12 SNV and one dinucleotide), all of them being heterozygous variants [29]. This is also important to be considered since, rather than resulting from direct base substitution mutations, some of the somatic SNV detected in sequencing studies might correspond to loss of heterozygosity (LOH) in hemizygous regions generated after

SV involves changes in the chromosome landscape. It includes inter- and intra-chromosomal translocations, deletions, and insertions (the last two types are generally considered as SV if >1 kb) including those caused by the movement of transposable elements (TE) [30, 31]. The rapidly growing number of genomic studies in multiple species is unveiling more complex forms of SV, collectively known as chromoanagenesis, and combines several of the previous features [32]. In addition to the activity of TE, SV often relies on mistakes in replicative processes or on DNA breakage during mitosis followed by illegitimate repair mechanisms [33–35]. Although SV is generally deleterious, it can accumulate along the multiplication of grapevine cultivars behaving as recessive heterozygous due to the absence of sexual reproduction [15]. Features such as changes in copy number and breakpoint joins have been used in genomic studies to detect SV between grapevine cultivars and somatic variants [15, 25, 36–38]. By far, the most recurrently described case of somatic SV in grapevine relates to hemizygous deletions of different sizes around the grape color locus on chromosome 2 that causes loss of berry color variants (see below). Smaller SV, translocations, and inversions have also been described in somatic variants differing in ripening time [25]. Genome-wide SV studies in a higher number of clones would be required to estimate the frequency of different types of SV independently of specific phenotypes or genome regions resulting from

TE are extremely frequent in plant genomes and correspond to sequences that have the ability to replicate and insert in different locations, either indirectly through an RNA intermediate (retrotransposons or class I) or directly by cut-and-paste mechanisms (transposons or class II) [39]. The transposition of these elements generates changes in genome size and can disrupt target loci upon insertion. In addition, TE can lead to SV and genome rearrangements due to noncanonical transposition events or to homologous recombination related with their repetitive nature [39]. Altogether, TE has a high potential to impact on organismal phenotypes. While all superfamilies of TE are represented in the grapevine genome, those in class I (e.g.: Non-LTR LINEs, LTR Ty1/copia, LTR Ty3/gypsy, and other LTR) are much more numerous (>100,000 copies in total) than class II superfamilies (hAT, PIF, Mutator, and CACTA) totaling about 3000 copies in the grapevine reference genome [18, 40]. Because ca 50% of the grapevine genomes involve mobile element-like/repetitive sequences [18, 38], it is reasonable that they could be a major driver of somatic variation emerging during the extensive vegetative multiplication of grapevine cultivars. In fact, emergent phenotypes in grapevine somatic variants have frequently been associated with the movement of TE altering gene expression [41–43], although,

**32**

with the exception of color variants, their phenotypes have not been selected for production. While the use of molecular markers suggests that the TE genomic landscape can vary between grapevine clones [27, 44, 45], systematic studies are still required to determine the magnitude of somatic genome variation that accumulates associated to TE during the propagation of grapevine cultivars.
