**4.1.1 How to identify the IS***6110* **insertion sites**

Since late nineties, several methods have been applied to identify and sequence the loci in which the IS was integrated inside the genome. The methods applied for the identification and sequence of the flaking-regions included cloning of the agarose-excised hybridizing bands (Beggs et al., 2000); reverse dot blot assay (Steinlein & Crawford, 2001); wholegenome microarrays (Kivi et al., 2002), ligation-mediated PCR (Otal et al., 2008) and construction of BACs libraries (Alonso et al., 2011) among others. All these procedures are usually cumbersome and show difficulties to detect all the insertions present, particularly in those strains carrying high IS*6110* copy number.

The development of high throughput whole-genome sequencing procedures has allowed the overcome of some of those difficulties, however this procedure is so far not of general

IS*6110* the Double-Edged Passenger 71

than RFLP bands is not a rare event (Beggs et al., 2000; Warren et al., 2000; Alonso et al., 2011). This result is more evident in those strain carrying high copy number of the IS.

The influence that the insertion could have in the content of active/non active genes was considered that could give insights into the number of genes required for infection, being thus a source of information to detect which were the genes or gene content essential for

Some works were devoted to compare the insertion loci of virulent with those of avirulent strains. The attenuated vaccine strain *M. bovis* BCG has major differences on the content of IS*6110* compared to the virulent strain *M. tuberculosis* H37Rv: one and 16 copies respectively. However the IS*6110* copy number per genome not appears to be related to the attenuation of the bacilli (see part 5). In fact, the avirulent strain H37Ra has a supplementary copy compared to its parenteral strain the virulent H37Rv. Comparison of H37Rv and H37Ra genomes showed two main differences among them mediated by the insertion of IS*6110*. However these changes have not a clear role in the attenuation of the avirulent strain (Brosh

Comparison of several BCG strains showed differences among them in relation to IS*6110*. The "ancestral" BCG (for example, BCG tokio) carries two copies of the IS sited in the DR region and upstream the two component system *pho*P-*pho*R (see part 4.2). This last copy was lost in the "modern" BCG (for example, BCG pasteur) that has a single copy inserted in the

Identification of essential genes could be also possible through the detection of those never carrying inserted ISs, following the assessment that those mutations could be deleterious for the bacteria. An *in silico* study, based on previous experimental data, estimated that the *M. tuberculosis* genome contains 35% of essential genes (Lamichhane et al., 2003). Even though the data on genome loci with insertion identifies transposition/recombination events either in coding or in non-coding regions, generally speaking, there has been detected higher number of insertion loci inside coding region. However, the non-coding sequences represent only 10% of the genome suitable to host IS. Therefore the proportion of insertions inside non-coding region is actually higher compared to the proportion of insertions inside coding regions (Table 3). This could represent a sort of "ORF-preserving" behaviour of the genome variability mediated by IS*6110* transposition. This is consistent with the suggestion of a greater selection against intra-genic insertion in *M*. *tuberculosis* during infection *in vivo* than

In a study conducted over 161 clinical isolates of *M. tuberculosis*, the insertion sites of the IS*6110* were determined (Yesilkaya et al., 2005). Only 100 ORF were affected by insertion, and was considered by the authors that represented a global low number of non-essential genes. In conclusion most of the genes in *M. tuberculosis* might play important role for

From the data obtained thus far, a high proportion of the *IS6110* coding-targeted genes correspond to the functional category containing PE-PPE group of genes (see references in Table 3). These genes are very characteristics of the MTBC members and are considered

preferential loci mentioned, namely DR region (Brosh et al., 2007).

related to the antigen variability of the bacilli (McEvoy et al., 2009).

when grown *in vitro* (Yesilkaya et al., 2005).

infection and transmission.

virulence (McEvoy et al., 2007).

et al., 1999).

applicability. Whole-genome sequences of tens of MTBC strains are currently finished or at several degrees of accomplishment, however, that number could not compete with the thousand IS*6110*-RFLP patterns already registered at the available data-bases.

New technologies are being currently under development aiming to determine the IS*6110* insertional sites of a high number of *M. tuberculosis* isolates by using high-throughput methodologies, such as the Masive-Insertion Site sequencing (IS-seq) (Sandoval et al., ESM-2010). Such a kind of procedures will surely help to unravel the IS flanking region sequences in a more feasible manner.
