**3.2. Estimation of copy number of assembled sequences**

Assuming that Illumina sequence reads in our experiments are sampled without bias for particular sequence types, mapping the whole genomic dataset with Illumina sequence reads provides a method of estimating the copy number of any genomic sequence in the dataset (Swaminathan et al., 2007).

Olive Tree Genomic 139

other plant species whose genome has been sequenced. On the contrary, medium repeated component is mainly composed of LTR-retrotransposons, while tandem repeats are much

Sequence type Nr. of sequences (%)

DNA transposons 31 (0.86) 2,183 (3.26) Retrotransposons LTR-*Copia* 134 (3.70) 7,569 (11.29) LTR-*Gypsy* 258 (7.13) 8,066 (12.03)

Tandem repeats 1,535 (42.42) 6,718 (10.02) rDNA 29 (0.80) 555 (0.83) Putative genes 46 (1.27) 2,729 (4.07) Unknown repeats 317 (8.76) 1,795 (2.68) No hits found 1,240 (34.26) 36,481 (54.41)

The chloroplast genome of the olive has an organisation and gene order that is conserved among numerous Angiosperm species and do not contain any of the inversions, gene duplications, insertions, inverted repeat expansions and gene/intron losses that have been found in the chloroplast genomes of the genera *Jasminum* and *Menodora*, from the same family as *Olea* (Mariotti et al., 2010). 40 polymorphisms have been identified in the plastome

Total 3,619 67,045

**Table 2.** Functional percentage distribution of the supercontigs in OLEAREP 1.0.

**Figure 2.** HR (left) and MR (right) fraction composition.

sequence, poorly able to differentiate among olive cultivars.

**4. Olive chloroplast genome** 

HR MR

Non-LTR 29 (0.80) 949 (1.42)

less represented in this genome portion.

Data in the literature and slot blot experiments previously performed in our lab (Giordani, personal communications) allowed estimation of the copy number per haploid genome of 16 sequences. The 16 sequences with known redundancy were inserted in the whole genomic database and used as reference for the estimation of copy number by mapping on them a pool of around 270 million Illumina 75 nt reads (coverage 14.4 x). We adopted a classification commonly used in biochemical experiments (Britten & Kohne, 1968) and defined supercontigs as highly repeated (HR, redundancy > 10.000 copies per genome, 3,619 supercontigs), medium repeated (MR, redundancy ranging between 100 and 10,000 copies per genome, 67,045 supercontigs) and "unique" (U, redundancy < 100 copies per genome, 168,250 supercontigs).
