**3.3. Olive genome composition**

HR and MR supercontig datasets were annotated to produce the OLEAREP 1.0 database. The annotation pipeline is reported in Figure 1.

**Figure 1.** The annotation pipeline for the production of OLEAREP database.

The distribution of sequence type in the HR and MR datasets and in the whole OLEAREP 1.0 database is reported in Table 2.

The average coverage of each HR and MR sequence was used to estimate the redundancy of the various types of repeat classes. Concerning the whole olive genome, around 50% appears to be made of highly repeated sequences. Of these, around 2/3 are tandem repeats belonging to five major families and other minor families (Figure 2). Such extreme redundancy of tandem repeats appears a peculiar feature of olive genome, not found in the


other plant species whose genome has been sequenced. On the contrary, medium repeated component is mainly composed of LTR-retrotransposons, while tandem repeats are much less represented in this genome portion.

**Table 2.** Functional percentage distribution of the supercontigs in OLEAREP 1.0.

**Figure 2.** HR (left) and MR (right) fraction composition.
