**6. Analysis of transcriptome**

Despite its global importance, genomic sequence resources available for olive are still scarce, though an increasing number of expressed gene functions are being described in the last few years through limited NGS approaches. Recently, many EST sequences from large scale transcriptomic analyses of different organs, such as fruits and leaves have been released (Alagna et al., 2009; Galla et al., 2009; Ozgenturk et al., 2009).

While these studies have highlighted the utility of cDNA sequencing for candidate gene discovery and gene function, a comprehensive description of genes expressed in *Olea europaea* remains unavailable.

Over the past several years, the NGS technology has emerged as a cutting edge approach for high-throughput sequence determination and this has dramatically improved the efficiency and speed of gene discovery (Ansorge, 2009). Furthermore, NGS has also significantly accelerated and improved the sensitivity of gene-expression profiling and, is expected to boost collaborative and comparative genomics studies (Strickler et al., 2012).

In this study, we generated over one million sequence reads with 454 FLX technology (Roche Diagnostics Corporation, Basel, Switzerland) and identified a number of gene functions potentially involved in the expression of major traits that control productivity and quality of olive and oil production.

The starting materials used to explore the olive transcriptome were flower and fruit samples from five different genotypes. Flower tissues at different developmental stages were sampled from Leccino, Dolce Agogia and Frantoio varieties. Two 454 sequencing libraries were obtained from retro-transcribed pooled RNA samples, extracted from flower buds at all stages of development until anthesis of Leccino and Dolce Agogia genotypes, respectively. Furthermore, pooled flower samples of Leccino and Frantoio genotypes, collected after anthesis, were used for synthesis of cDNAs and for the subsequent preparation of two additional 454 sequencing libraries.

In order to gain information on genes expressed in the drupe, with particular regard to those involved in response to pathogen infections, another set of four 454 sequencing libraries was obtained from fruit samples (at about 17 weeks after flowering) of Ortice and Ruveia genotypes collected before and after the infection caused by the olive fruit fly (*Bactrocera oleae*).

The eight 454 cDNA libraries, four from flower and four from fruit tissues, were sequenced in two separate runs by using 454 GS FLX Titanium Sequencer (Roche Diagnostics Corporation, Basel, Switzerland); each library was loaded on ¼ sector of a picotiter sequencing plate.

We identified a total of more than 1 million sequence reads with an average length of 356 bp, corresponding to a little less of half billion bases, about 60% of them are from fruit and 40% from flower samples (Table 3).


**Table 3.** Raw sequencing data

140 Olive Germplasm – The Olive Cultivation, Table Olive and Olive Oil Industry in Italy

of 18 known miRNA families were identified in the libraries.

(Alagna et al., 2009; Galla et al., 2009; Ozgenturk et al., 2009).

**6. Analysis of transcriptome** 

*europaea* remains unavailable.

quality of olive and oil production.

(*Bactrocera oleae*).

preparation of two additional 454 sequencing libraries.

A first inventory of sRNAs in olive has been obtained from juvenile and adult shoots, revealing that the 24-nt class dominates the sRNA transcriptome and atypically accumulates to levels never seen in other plant species, suggesting an active role of heterochromatin silencing in the maintenance and integrity of its large genome (Donaire et al., 2011). A total

Despite its global importance, genomic sequence resources available for olive are still scarce, though an increasing number of expressed gene functions are being described in the last few years through limited NGS approaches. Recently, many EST sequences from large scale transcriptomic analyses of different organs, such as fruits and leaves have been released

While these studies have highlighted the utility of cDNA sequencing for candidate gene discovery and gene function, a comprehensive description of genes expressed in *Olea* 

Over the past several years, the NGS technology has emerged as a cutting edge approach for high-throughput sequence determination and this has dramatically improved the efficiency and speed of gene discovery (Ansorge, 2009). Furthermore, NGS has also significantly accelerated and improved the sensitivity of gene-expression profiling and, is expected to

In this study, we generated over one million sequence reads with 454 FLX technology (Roche Diagnostics Corporation, Basel, Switzerland) and identified a number of gene functions potentially involved in the expression of major traits that control productivity and

The starting materials used to explore the olive transcriptome were flower and fruit samples from five different genotypes. Flower tissues at different developmental stages were sampled from Leccino, Dolce Agogia and Frantoio varieties. Two 454 sequencing libraries were obtained from retro-transcribed pooled RNA samples, extracted from flower buds at all stages of development until anthesis of Leccino and Dolce Agogia genotypes, respectively. Furthermore, pooled flower samples of Leccino and Frantoio genotypes, collected after anthesis, were used for synthesis of cDNAs and for the subsequent

In order to gain information on genes expressed in the drupe, with particular regard to those involved in response to pathogen infections, another set of four 454 sequencing libraries was obtained from fruit samples (at about 17 weeks after flowering) of Ortice and Ruveia genotypes collected before and after the infection caused by the olive fruit fly

boost collaborative and comparative genomics studies (Strickler et al., 2012).

**5. miRNA** 

Assembling of adaptor-trimmed 454 sequence data was performed using GSAssembler Software (Roche Diagnostics Corporation, Basel, Switzerland). To build a compilation of gene structures and functions expressed in *Olea*, we first assembled row data from all the eight libraries together (Table 4).


**Table 4.** Total assembling

More than 83% of raw sequences were included in the assembly with 112,717 remaining as singletons. This produced a set of 25,342 contigs with an average length of 892 bp (Table 4). As expected, when sequences from flower and fruit samples are assembled separately, the

number of EST sequences assembled in contigs are significantly lower; however the average length of contigs and singletons remains similar (Table 5).

Olive Tree Genomic 143

matching singleton sequences were found (Figure 3). This is most probably due to the sharp decrease of FAD 6 transcript abundance in fruits sampled at late developing stages, from 15

To predict gene functions, we used a BlastX-based annotation (E-value ≤ 1-e-5) of unigenes comparing them to NCBI non-redundant (nr) database (http://www.ncbi.nlm.nih.gov/). About 52% of the unigenes match to known functional genes; while the remaining 48% has

The majority of the BlastX annotated unigenes matches most to *Vitis vinifera*, *Populus trichocarpa* and *Ricinus communis* counterpart sequences, in decreasing order (Figure 4)*.*

We also mapped the GI identifiers (http://www.ncbi.nlm.nih.gov/) of the best BlastX hits to UniprotKB protein database (http://www.uniprot.org/) in order to extract Gene Ontology (GO, http://www.geneontology.org/). Approximately one-fourth of the unigene set was assigned to GO terms. This allowed us to group unigenes in 14 sub-categories of biological processes, 9 sub-categories of cellular components and 11 sub-categories of molecular

**Figure 4.** Overall profile of unigenes based on homology with GenBank sequences

to 20 weeks after flowering (Matteucci et al., 2011).

no function assigned (Figure 4).

functions (Figure 5).


**Table 5.** Flower and Fruit Assembling

To assess the representativeness and the overall quality of the assembling, three randomly chosen gene sequences, among those already characterized in *Olea*, were used as a reference to map contig and singleton sequences produced by the assembling (Figure 3).

**Figure 3.** Overview of assembling procedure

The fact that two out of three selected genes are 100% covered by the total assembly with a single contig composed by a great number of EST's, indicates the coverage of the assembly is sufficient to characterize the full coding sequence of high-medium expressed transcripts. Only FAD 6 shows partial coverage, especially in the fruit assembly where only three matching singleton sequences were found (Figure 3). This is most probably due to the sharp decrease of FAD 6 transcript abundance in fruits sampled at late developing stages, from 15 to 20 weeks after flowering (Matteucci et al., 2011).

142 Olive Germplasm – The Olive Cultivation, Table Olive and Olive Oil Industry in Italy

length of contigs and singletons remains similar (Table 5).

**Reads in Assembly** 

**Table 5.** Flower and Fruit Assembling

**Figure 3.** Overview of assembling procedure

**Samples in Assembly** 

number of EST sequences assembled in contigs are significantly lower; however the average

**% Number Length Average (bp)** 

To assess the representativeness and the overall quality of the assembling, three randomly chosen gene sequences, among those already characterized in *Olea*, were used as a reference

The fact that two out of three selected genes are 100% covered by the total assembly with a single contig composed by a great number of EST's, indicates the coverage of the assembly is sufficient to characterize the full coding sequence of high-medium expressed transcripts. Only FAD 6 shows partial coverage, especially in the fruit assembly where only three

Flower 338.853 72,82 14.599 804 91.999 345 Fruit 570.878 82,01 15.058 884 72.662 333

to map contig and singleton sequences produced by the assembling (Figure 3).

**Contigs Singletons**

**Number Length Average (bp)** 

To predict gene functions, we used a BlastX-based annotation (E-value ≤ 1-e-5) of unigenes comparing them to NCBI non-redundant (nr) database (http://www.ncbi.nlm.nih.gov/). About 52% of the unigenes match to known functional genes; while the remaining 48% has no function assigned (Figure 4).

The majority of the BlastX annotated unigenes matches most to *Vitis vinifera*, *Populus trichocarpa* and *Ricinus communis* counterpart sequences, in decreasing order (Figure 4)*.*

**Figure 4.** Overall profile of unigenes based on homology with GenBank sequences

We also mapped the GI identifiers (http://www.ncbi.nlm.nih.gov/) of the best BlastX hits to UniprotKB protein database (http://www.uniprot.org/) in order to extract Gene Ontology (GO, http://www.geneontology.org/). Approximately one-fourth of the unigene set was assigned to GO terms. This allowed us to group unigenes in 14 sub-categories of biological processes, 9 sub-categories of cellular components and 11 sub-categories of molecular functions (Figure 5).

Olive Tree Genomic 145

assessment of the relationships among the different accessions, of the geographical patterns of distribution of genetic variation and of the genetic consequences of olive trees domestication. It will finally form the basis for the development of novel molecular marker

They will also allow the analysis of global gene expression and specific gene expression of olive tissues in diverse developmental stages and conditions. The identification and characterization of expression of important genes involved in agronomic and productive traits affecting fruit production and quality, biotic and abiotic stress resistance, important development characters (e.g., juvenility, self-incompatibility, ovary abortion, chill response), it may offer a significant amount of tools and open new opportunities for improvement

Many researches will be focused on gene network activities, using olive microarray and/or qPCR to address the expression patterns of genes, during plant and fruit development and ripening of drupe fruits, fatty acid metabolism as well as phenylpropanoid metabolism. Moreover, new generation of molecular markers will be developed, helpful to localize genes involved in both monogenic and polygenic agronomic traits, to construct genetic fine-maps; these markers will be also used for marker-assisted selection (MAS) to obtain elite genotypes

This research was partially supported by Progetto Strategico MIPAF "OLEA - *Genomica e Miglioramento genetico dell'olivo*", D.M. 27011/7643/10, and by the Province of Trento and Edmund Mach Foundation. We thank the Roche Diagnostic Spa, Applied Science to support

either through molecular breeding and/or genetic engineering.

by allowing the analysis of cross progenies at earlier stages.

*Edmund Mach Foundation, IASMA, San Michele all'Adige, Italy* 

*University of Tuscia, Dept. DAFNE, Viterbo, Italy* 

*IGA-Institute of Applied Genomics, Udine, Italy* 

*Dept. Crop Species Biology, Pisa, Italy* 

*ENEA, Trisaia, Rotondella (MT), Italy* 

*CNR- Institute of Plant Genetics, Perugia, Italy* 

assays.

**Author details** 

Michele Morgante

Riccardo Velasco

Andrea Cavallini

Gaetano Perrotta

Luciana Baldoni

**Acknowledgement** 

the OLEA Italian Project.

Rosario Muleo

**Figure 5.** GO terms distribution in the cellular components, biological processes and molecular functions vocabularies.

Metabolic process sub-category, consisting of more than 11,000 genes, is dominant in biological process. While, binding and cell part subcategories, consisting of about 13,0000 and 8,000 genes, are dominant in molecular function and cellular component, respectively. We also noticed an appreciable number of genes included in cellular process, catalytic activity and organelle sub-categories (Figure 5). However, at this level of detail no dramatic differences are evident between flower and fruit tissue transcriptomes.

Next generation RNA sequence from additional organs/tissues/genotypes of *Olea europaea*, as well as, full comparative data analysis between and within the sequenced samples are currently in progress. These studies will certainly provide valuable information about gene functions that trigger key metabolic pathways for the expression of desired traits.
