**2. Genetic diversity and phylogeny of** *M. tuberculosis*

The availability of the complete *M. tuberculosis* genome sequence (Cole *et al.*, 1998) opened new ways to conduct studies and to understand the evolution of the closely related MTBC strains. By using Bacterial Artificial Chromosomes (BAC) libraries it was shown that seven loci were deleted in *M. bovis* with respect to *M. tuberculosis,* reinforcing previous studies indicating that these strains probably originated from a common ancestor (Gordon *et al.*, 1999, Sreevatsan et al., 1997). This was more fully appreciated by comparative genomics studies (Brosch *et al.*, 2002) that also divided the *M. tuberculosis* strains into "ancient" and "modern" based on a deletion known as TbD1 in the modern strains. Several molecular markers have been developed to type strains and infer phylogenetic relationships. Some of these are considered more useful for epidemiological studies, such as transmission, reinfection and/or reactivation, while others are considered more robust phylogenetic markers that can help to decipher the evolution of *M. tuberculosis*. The methods used for epidemiology include restriction fragment length polymorphism (RFLP) of IS*6110* sites (van Embden *et al.*, 1993, van Soolingen *et al.*, 1993), spoligotyping to identify unique spacers within the Clustered Regulatory Short Palindromic Repeats (CRISPR) or Direct Repeat (DR) region (van Embden *et al.*, 2000, Brudey *et al.*, 2006, Kamerbeek *et al.*, 1997), and the identification of Variable Number of Tandem Repeats-Mycobacterial Interspersed Repetitive Units (MIRUs-VNTR) that are strain-specific repeats of short DNA sequences at different positions of the chromosome (Supply et al., 2003). Molecular markers that provide more robust phylogenetic information and have helped to shape the evolutionary scenario of *M. tuberculosis* include LSP, SNPs and Multilocus Sequence Analysis (MLSA) (Filliol *et al.*, 2006, Gagneux *et al.*, 2006, Gutacker *et al.*, 2006, Comas *et al.*, 2009) (Figure 1). Although it has been argued that the use of RFLPs, spoligotyping and VNTR markers is highly prone to convergent evolution and thus to homoplasies (i.e., the same spoligotyping can be observed in strains belonging to different lineages), recent studies show that, at least for the main lineages, this does not seem to be the case (Kato-Maeda *et al.*, 2011). However, more studies are required to clarify this issue.

Based on our current view of its evolutionary history, *M. tuberculosis* can be divided into six phylogeographical lineages, which have been adapted to their local human populations (Figure 1). The use of different molecular makers, such as spoligotyping, LSPs and SNPs, can also classify the global population of MTB into comparable groups. For instance, Lineage 1 (Indo-Oceanic lineage) corresponds to the East African-Indian (EAI) family; Lineage 2 (East Asian Lineage) corresponds to the Beijing family; Lineage 3 or East African-Indian corresponds to the Central Asia (CAS) family; Lineage 4 is the Euro American Lineage that includes the Haarlem, LAM, X, T, S and Tuscany families; Lineage 5 (West African Lineage 1) and Lineage 6 (West African Lineage 2) correspond to AFRI 2 and AFRI 1, respectively, by spoligotyping (Sola *et al.*, 2001, Brudey et al., 2006). Based on the evidence accumulated from these studies, it has been suggested that *M. tuberculosis* evolved as a human pathogen in Africa, which is also the continent where all main *M. tuberculosis*

Genomic Variability of *Mycobacterium tuberculosis* 41

immuno-pathological events and affect disease manifestation. For example, in a study conducted in Vietnamese patients, a clear association between the Euro American Lineages of *M. tuberculosis* and pulmonary rather than meningeal tuberculosis was observed, suggesting these strains are less capable of extra-pulmonary dissemination than other strains in the study population (Caws *et al.*, 2008). In a study using a cohort of patients and household contacts in Gambia, both *M. africanum* and *M. tuberculosis* were equally transmitted to the household contacts but *M. tuberculosis* Beijing strains were most likely to progress to disease (de Jong *et al.*, 2008). Another source of evidence came from a recent study associating Lineages 1, 5 and 6, with a higher pro-inflammatory cytokine response

Genetic diversity within bacterial species is usually generated by mutations and by the exchange of genetic material. The process of HGT is thought to be an important driver of bacterial evolution in both pathogenic and non-pathogenic bacteria (Becq *et al.*, 2007). Horizontally transferred genes can be acquired in clusters known as genomic islands or pathogenicity islands that can be identified by characteristics that distinguish them from the host genome, such as GC content, flanking nucleotide repeats and insertion elements. In the case of *M. tuberculosis*, there is evidence of ancient gene transfer events that could have taken place in a progenitor tubercle bacilli pool before the clonal expansion that gave rise to the MTBC (Gutierrez et al., 2005). One of these events involved the Rv0986-8 virulence operon (Rosas-Magallanes *et al.*, 2006) that could have originated from genetic exchange between an environmental bacillus ancestor and other bacterial species (Nicol & Wilkinson, 2008). In the absence of recent events of HGT, modern *M. tuberculosis* lineages evolve essentially by mutations that alter its genome, resulting in SNPs and LSPs, such as deletions and insertions, the latter mainly mediated by transposition of the IS*6110* insertion element. Although allelic variation in MTBC organisms is quite restricted when compared with other pathogenic bacteria (Sreevatsan et al., 1997), there is a growing recognition that there is substantial genetic diversity among isolates. At the level of SNPs changes can be either synonymous (sSNP) or non-synonymous (nsSNP) and this diversity has been undeniably useful for typing and defining evolutionary relationships among strains. SNPs provide many advantages for the analysis of phylogenetic relationships among microorganisms, especially among closely related clonal organisms such as the MTBC. Initial descriptions of the *M. tuberculosis* population structure involved analysis of SNPs in the *katG* and *gyrA* genes and defined three major genetic groups (Sreevatsan et al., 1997). Later surveys have extended this strategy to include more than 100 sSNPs identified in 112 *M. tuberculosis* isolates (Gutacker et al., 2002). In more recent work using 159 sSNPs identified by wholegenome comparison of sequenced strains, it was possible to classify 212 isolates into 56 haplotypes that grouped strains into six *M. tuberculosis* SNP Cluster Groups (SCG) and one SCG that grouped all the *M. bovis* strains (Filliol et al., 2006). A re-evaluation of the SNP phylogeny was obtained by using *de novo* sequencing of 89 randomly distributed genes in 108 global strains (Comas et al., 2009). This study suggested that initial classification could be done using a subset of discriminatory SNPs and then, if further molecular characterization were needed, a MIRU-VNTR typing technique could be applied to differentiate individual strains. However, the choice of discriminatory SNPs is not an easy

when compared with the modern Lineages 2, 3 and 4 (Portevin *et al.*, 2011).

**3. SNPs in** *M. tuberculosis*

lineages have been isolated (Hershberg et al., 2008). Moreover, the "ancient" lineages described by Brosh (2002) are present in West Africa and the spread of the "modern" lineages are associated with the human migration out of the African continent (Wirth *et al.*, 2008, Hershberg et al., 2008).

Phylogenetic studies have also shown that clinical strains of *M. tuberculosis* are more genetically variable than originally expected (Hirsh *et al.*, 2004, Gagneux et al., 2006, Hershberg et al., 2008). Moreover, genetic variability can be translated into phenotypic differences, such as transmission capacity, virulence and pathogenicity that can have epidemiological consequences and affect the outcome of the disease. Although there are few studies showing a clear association between lineage and transmission capacity it is now clear that Lineage 2 (Beijing family) *M. tuberculosis* has spread globally more than any other lineage (Parwati *et al.*, 2010). The use of spoligotyping to type paraffin-embedded strains obtained from tuberculosis patients in different time periods has also shown an increase of this genotype over time in Africa. Its isolation from children, which is a measure of recent transmission, increased from 13% in 2000 to 33% in 2003 in South Africa (Cowley *et al.*, 2008). The capacity of the Beijing genotype to spread more than other lineages is not completely understood but it has been suggested that factors contributing to its expansion could involve the selective pressure imposed by BCG vaccination and drug treatment (Parwati et al., 2010).

Fig. 1. Schematic representation of the phylogeography of *M. tuberculosis*. squares indicate the 6 main Lineages and circles are representative of the spoligotype families

The successful transmission of particular strains is not limited to the Beijing genotype. In a recent study, where guinea pigs were exposed to air from a HIV-tuberculosis ward, one non-Beijing strain was shown to be responsible for most of the secondary infections observed (Escombe *et al.*, 2008). In addition to transmission capacity, it is also currently accepted that genetically different *M. tuberculosis* strains produce markedly different

lineages have been isolated (Hershberg et al., 2008). Moreover, the "ancient" lineages described by Brosh (2002) are present in West Africa and the spread of the "modern" lineages are associated with the human migration out of the African continent (Wirth *et al.*,

Phylogenetic studies have also shown that clinical strains of *M. tuberculosis* are more genetically variable than originally expected (Hirsh *et al.*, 2004, Gagneux et al., 2006, Hershberg et al., 2008). Moreover, genetic variability can be translated into phenotypic differences, such as transmission capacity, virulence and pathogenicity that can have epidemiological consequences and affect the outcome of the disease. Although there are few studies showing a clear association between lineage and transmission capacity it is now clear that Lineage 2 (Beijing family) *M. tuberculosis* has spread globally more than any other lineage (Parwati *et al.*, 2010). The use of spoligotyping to type paraffin-embedded strains obtained from tuberculosis patients in different time periods has also shown an increase of this genotype over time in Africa. Its isolation from children, which is a measure of recent transmission, increased from 13% in 2000 to 33% in 2003 in South Africa (Cowley *et al.*, 2008). The capacity of the Beijing genotype to spread more than other lineages is not completely understood but it has been suggested that factors contributing to its expansion could involve the selective pressure imposed by BCG vaccination and drug treatment

Fig. 1. Schematic representation of the phylogeography of *M. tuberculosis*. squares indicate

The successful transmission of particular strains is not limited to the Beijing genotype. In a recent study, where guinea pigs were exposed to air from a HIV-tuberculosis ward, one non-Beijing strain was shown to be responsible for most of the secondary infections observed (Escombe *et al.*, 2008). In addition to transmission capacity, it is also currently accepted that genetically different *M. tuberculosis* strains produce markedly different

the 6 main Lineages and circles are representative of the spoligotype families

2008, Hershberg et al., 2008).

(Parwati et al., 2010).

immuno-pathological events and affect disease manifestation. For example, in a study conducted in Vietnamese patients, a clear association between the Euro American Lineages of *M. tuberculosis* and pulmonary rather than meningeal tuberculosis was observed, suggesting these strains are less capable of extra-pulmonary dissemination than other strains in the study population (Caws *et al.*, 2008). In a study using a cohort of patients and household contacts in Gambia, both *M. africanum* and *M. tuberculosis* were equally transmitted to the household contacts but *M. tuberculosis* Beijing strains were most likely to progress to disease (de Jong *et al.*, 2008). Another source of evidence came from a recent study associating Lineages 1, 5 and 6, with a higher pro-inflammatory cytokine response when compared with the modern Lineages 2, 3 and 4 (Portevin *et al.*, 2011).
