**4. Olive chloroplast genome**

138 Olive Germplasm – The Olive Cultivation, Table Olive and Olive Oil Industry in Italy

**3.2. Estimation of copy number of assembled sequences** 

dataset (Swaminathan et al., 2007).

168,250 supercontigs).

**3.3. Olive genome composition** 

1.0 database is reported in Table 2.

The annotation pipeline is reported in Figure 1.

**Figure 1.** The annotation pipeline for the production of OLEAREP database.

Assuming that Illumina sequence reads in our experiments are sampled without bias for particular sequence types, mapping the whole genomic dataset with Illumina sequence reads provides a method of estimating the copy number of any genomic sequence in the

Data in the literature and slot blot experiments previously performed in our lab (Giordani, personal communications) allowed estimation of the copy number per haploid genome of 16 sequences. The 16 sequences with known redundancy were inserted in the whole genomic database and used as reference for the estimation of copy number by mapping on them a pool of around 270 million Illumina 75 nt reads (coverage 14.4 x). We adopted a classification commonly used in biochemical experiments (Britten & Kohne, 1968) and defined supercontigs as highly repeated (HR, redundancy > 10.000 copies per genome, 3,619 supercontigs), medium repeated (MR, redundancy ranging between 100 and 10,000 copies per genome, 67,045 supercontigs) and "unique" (U, redundancy < 100 copies per genome,

HR and MR supercontig datasets were annotated to produce the OLEAREP 1.0 database.

The distribution of sequence type in the HR and MR datasets and in the whole OLEAREP

The average coverage of each HR and MR sequence was used to estimate the redundancy of the various types of repeat classes. Concerning the whole olive genome, around 50% appears to be made of highly repeated sequences. Of these, around 2/3 are tandem repeats belonging to five major families and other minor families (Figure 2). Such extreme redundancy of tandem repeats appears a peculiar feature of olive genome, not found in the The chloroplast genome of the olive has an organisation and gene order that is conserved among numerous Angiosperm species and do not contain any of the inversions, gene duplications, insertions, inverted repeat expansions and gene/intron losses that have been found in the chloroplast genomes of the genera *Jasminum* and *Menodora*, from the same family as *Olea* (Mariotti et al., 2010). 40 polymorphisms have been identified in the plastome sequence, poorly able to differentiate among olive cultivars.
