**Genomic Variability of**  *Mycobacterium tuberculosis*

María Mercedes Zambrano, Ginna Hernández-Neuta, Iván Hernández-Neuta, Andrea Sandoval, Andrés Cubillos-Ruiz, Alejandro Reyes and Patricia Del Portillo *Corporación CorpoGen, Bogotá D.C., Colombia* 

### **1. Introduction**

36 Understanding Tuberculosis – Deciphering the Secret Life of the Bacilli

Zink, A., Haas, C.J., Reischl, U., Szeimies, U. & Nerlich, A.G. (2001). Molecular analysis of

Zink, A.R., Grabner, W., Reischl, U., Wolf, H. & Nerlich, A.G. (2003a). Molecular study on

Zink, A.R., Sola, C., Reischl, U., Grabner, W., Rastogi, N., Wolf, H. & Nerlich, A.G. (2003b).

Zink, A.R. & Nerlich, A.G. (2004). Molecular strain identification of the *Mycobacterium tuberculosis* complex in archival tissue samples. *J Clin Pathol*, 57, 1185–1192. Zink, A.R., Sola, C., Reischl, U., Grabner, W., Rastogi, N., Wolf, H. & Nerlich, A.G. (2004).

Zuber, B., Chami, M., Houssin, C., Dubochet, J., Griffiths, G. & Daffé, M. (2008). Direct

populations from ancient Egypt. *Epidemiol Infect*, 130, 239–249.

in ancient Egyptian mummies. *Int J Osteoarchaeol*, 14, 404–413.

native state. *J Bacteriol,* 190, 5672-5680.

mummies by spoligotyping. *J Clin Microbiol*, 41, 359–367.

366.

skeletal tuberculosis in an ancient Egyptian population. *J Med Microbiol*, 50, 355–

human tuberculosis in three geographically distinct and time delineated

Characterization of *Mycobacterium tuberculosis* complex DNAs from Egyptian

Molecular identification and characterization of *Mycobacterium tuberculosis* complex

visualization of the outer membrane of mycobacteria and corynebacteria in their

Genomic variability provides the basis for adaptation and evolution and constitutes a fascinating aspect of the metabolically and phylogenetically diverse microbial world. Variability in bacteria has been extensively studied both because it enables inferring evolutionary relationships and because it plays an important role in host-pathogen interactions. Microbiologists, who have long struggled with species classification, have gained a more recent appreciation of the level of genetic diversity in microorganisms that has led to new awareness of what may constitute a bacterial "species" (Doolittle & Zhaxybayeva, 2009). In the clinical setting, genomic variability can represent a significant barrier to treatment. Many pathogens can acquire mutations or foreign genetic material through horizontal gene transfer (HGT) in response to the selective pressure imposed by the host immune system and by chemotherapy (Hawkey & Jones, 2009, Sampson, 2011), resulting in strains that are difficult to eradicate in hospitals as well as during long-term infection. Understanding the extent of genomic variability and its effects on disease in the case of pathogens that display genetic homogeneity and low variability, as is the case for the causative agent of tuberculosis, *Mycobacterium tuberculosis*, is particularly fascinating. The success of *M. tuberculosis* is intimately tied to the infectious process and its interaction with the human host, which is believed to have resulted from a long process of co-evolution (Donoghue, 2009, Gutierrez *et al.*, 2005). As a result, *M. tuberculosis* is capable of subverting the immune response and persisting as a latent form within an individual and for millennia within the human population.

Despite the availability of chemotherapy and the continued efforts to control the disease, tuberculosis continues to be one of the top ten causes of morbidity and mortality worldwide, with approximately 9 million cases per year, according to the World Health Organization (Lawn & Zumla, 2011). In spite of the growing interest and continued efforts, there are still significant gaps in our knowledge regarding both the pathogen and its interaction with the host that hamper control strategies. The appearance and spread of multi-drug (MDR) as well as extensively drug resistant (XDR) strains of *M. tuberculosis* represent a growing threat worldwide and underscore the importance of effective diagnosis and treatment. Given the

Genomic Variability of *Mycobacterium tuberculosis* 39

illustrating how genomic variability can emerge as a consequence of mutations that result in Single Nucleotide Polymorphisms (SNPs) and Large Sequence Polymorphisms (LSPs), namely insertions and deletions. We will then discuss the importance of variability in

The availability of the complete *M. tuberculosis* genome sequence (Cole *et al.*, 1998) opened new ways to conduct studies and to understand the evolution of the closely related MTBC strains. By using Bacterial Artificial Chromosomes (BAC) libraries it was shown that seven loci were deleted in *M. bovis* with respect to *M. tuberculosis,* reinforcing previous studies indicating that these strains probably originated from a common ancestor (Gordon *et al.*, 1999, Sreevatsan et al., 1997). This was more fully appreciated by comparative genomics studies (Brosch *et al.*, 2002) that also divided the *M. tuberculosis* strains into "ancient" and "modern" based on a deletion known as TbD1 in the modern strains. Several molecular markers have been developed to type strains and infer phylogenetic relationships. Some of these are considered more useful for epidemiological studies, such as transmission, reinfection and/or reactivation, while others are considered more robust phylogenetic markers that can help to decipher the evolution of *M. tuberculosis*. The methods used for epidemiology include restriction fragment length polymorphism (RFLP) of IS*6110* sites (van Embden *et al.*, 1993, van Soolingen *et al.*, 1993), spoligotyping to identify unique spacers within the Clustered Regulatory Short Palindromic Repeats (CRISPR) or Direct Repeat (DR) region (van Embden *et al.*, 2000, Brudey *et al.*, 2006, Kamerbeek *et al.*, 1997), and the identification of Variable Number of Tandem Repeats-Mycobacterial Interspersed Repetitive Units (MIRUs-VNTR) that are strain-specific repeats of short DNA sequences at different positions of the chromosome (Supply et al., 2003). Molecular markers that provide more robust phylogenetic information and have helped to shape the evolutionary scenario of *M. tuberculosis* include LSP, SNPs and Multilocus Sequence Analysis (MLSA) (Filliol *et al.*, 2006, Gagneux *et al.*, 2006, Gutacker *et al.*, 2006, Comas *et al.*, 2009) (Figure 1). Although it has been argued that the use of RFLPs, spoligotyping and VNTR markers is highly prone to convergent evolution and thus to homoplasies (i.e., the same spoligotyping can be observed in strains belonging to different lineages), recent studies show that, at least for the main lineages, this does not seem to be the case (Kato-Maeda *et al.*, 2011). However, more studies

Based on our current view of its evolutionary history, *M. tuberculosis* can be divided into six phylogeographical lineages, which have been adapted to their local human populations (Figure 1). The use of different molecular makers, such as spoligotyping, LSPs and SNPs, can also classify the global population of MTB into comparable groups. For instance, Lineage 1 (Indo-Oceanic lineage) corresponds to the East African-Indian (EAI) family; Lineage 2 (East Asian Lineage) corresponds to the Beijing family; Lineage 3 or East African-Indian corresponds to the Central Asia (CAS) family; Lineage 4 is the Euro American Lineage that includes the Haarlem, LAM, X, T, S and Tuscany families; Lineage 5 (West African Lineage 1) and Lineage 6 (West African Lineage 2) correspond to AFRI 2 and AFRI 1, respectively, by spoligotyping (Sola *et al.*, 2001, Brudey et al., 2006). Based on the evidence accumulated from these studies, it has been suggested that *M. tuberculosis* evolved as a human pathogen in Africa, which is also the continent where all main *M. tuberculosis*

**2. Genetic diversity and phylogeny of** *M. tuberculosis*

disease outcome.

are required to clarify this issue.

burden to public health and the complexity of the disease, an effective control of tuberculosis must involve diverse approaches and will require a better understanding of the host as well as of the environmental and bacterial factors that govern disease outcome.

*M. tuberculosis* belongs to the *Mycobacterium tuberculosis* Complex (MTBC), a group of slowgrowing mycobacteria that are closely related at the DNA level and share identical 16S rRNA gene sequences but that differ in terms of phenotypes and host preference (Brosch *et al.*, 2001, Sreevatsan *et al.*, 1997). The MTBC includes the human-adapted strains *M. tuberculosis*, *Mycobacterium africanum* and *Mycobacterium canneti*, being *M. cannetti* the most divergent within the MTBC complex (Gutierrez et al., 2005). The MTBC also includes animal-adapted strains. *M. bovis* has a wider host range and is the main cause of tuberculosis in other animal species. *M. microti* is a pathogen of rodents, *M. pinnipedii* causes disease in sea lions and seals, *M. caprae* is a pathogen of goats and, recently, "*M. mungi*" was isolated from mongoose (Alexander *et al.*, 2010, Mostowy & Behr, 2005). The high similarity at the DNA level suggests that this group could have resulted from a bottleneck event that led to the expansion of a successful clone that then gave rise to different host-adapted ecotypes of the same species (Smith *et al.*, 2006).

Understanding the differences that underlie the biology and evolution of the MTBC has been the focus of considerable work (Smith *et al.*, 2009, Comas & Gagneux, 2009). Members of the MTBC have a highly clonal population structure where recent events of HGT are essentially absent (Supply et al., 2003, Gutierrez et al., 2005). This contrasts with many other microorganisms where horizontally acquired genetic material can play important roles in acquisition of novel virulence determinants and properties such as antibiotic resistance and the capacity to exploit different environmental niches. Recent surveys using MTBC strains that are more representative of global isolates, as well as advances in genome sequence analysis, have indicated, however, that there is more variation than previously anticipated and that this variation can be used to both distinguish isolates as well as to trace phylogenetic lineages (Hershberg *et al.*, 2008).

A greater knowledge of the diversity present in *M. tuberculosis* and MTBC strains can also lead to deeper understanding of the biological consequences associated with strain variability. The variation in circulating *M. tuberculosis* isolates has been critical for identification of strains, outbreaks and changes within the population. It has also in some cases been associated with phenotypic properties that are relevant in terms of the disease, such as transmission potential, immunological response and manifestation of the disease (Nicol & Wilkinson, 2008). However, the link between genotypic and phenotypic properties is not necessarily evident given the complexity of the host-pathogen interaction and the effect of environmental factors. In this context, a deeper understanding of the population structure and dynamics of new clonal lineages, with mutations that contribute to a particular lineage's success, can provide great insight regarding the appearance and spread of strain variants relevant to public health and to the control, treatment and eradication of tuberculosis.

This chapter will provide an overview of recent studies regarding genetic variability in *M. tuberculosis*. This will include a brief description of the importance of variability for the study of the evolution of the MTBC. Also, we will address the mechanisms of genomic variation in a pathogen characterized by genetic homogeneity and inappreciable HGT by

burden to public health and the complexity of the disease, an effective control of tuberculosis must involve diverse approaches and will require a better understanding of the host as well as of the environmental and bacterial factors that govern disease outcome.

*M. tuberculosis* belongs to the *Mycobacterium tuberculosis* Complex (MTBC), a group of slowgrowing mycobacteria that are closely related at the DNA level and share identical 16S rRNA gene sequences but that differ in terms of phenotypes and host preference (Brosch *et al.*, 2001, Sreevatsan *et al.*, 1997). The MTBC includes the human-adapted strains *M. tuberculosis*, *Mycobacterium africanum* and *Mycobacterium canneti*, being *M. cannetti* the most divergent within the MTBC complex (Gutierrez et al., 2005). The MTBC also includes animal-adapted strains. *M. bovis* has a wider host range and is the main cause of tuberculosis in other animal species. *M. microti* is a pathogen of rodents, *M. pinnipedii* causes disease in sea lions and seals, *M. caprae* is a pathogen of goats and, recently, "*M. mungi*" was isolated from mongoose (Alexander *et al.*, 2010, Mostowy & Behr, 2005). The high similarity at the DNA level suggests that this group could have resulted from a bottleneck event that led to the expansion of a successful clone that then gave rise to different host-adapted

Understanding the differences that underlie the biology and evolution of the MTBC has been the focus of considerable work (Smith *et al.*, 2009, Comas & Gagneux, 2009). Members of the MTBC have a highly clonal population structure where recent events of HGT are essentially absent (Supply et al., 2003, Gutierrez et al., 2005). This contrasts with many other microorganisms where horizontally acquired genetic material can play important roles in acquisition of novel virulence determinants and properties such as antibiotic resistance and the capacity to exploit different environmental niches. Recent surveys using MTBC strains that are more representative of global isolates, as well as advances in genome sequence analysis, have indicated, however, that there is more variation than previously anticipated and that this variation can be used to both distinguish isolates as well as to trace

A greater knowledge of the diversity present in *M. tuberculosis* and MTBC strains can also lead to deeper understanding of the biological consequences associated with strain variability. The variation in circulating *M. tuberculosis* isolates has been critical for identification of strains, outbreaks and changes within the population. It has also in some cases been associated with phenotypic properties that are relevant in terms of the disease, such as transmission potential, immunological response and manifestation of the disease (Nicol & Wilkinson, 2008). However, the link between genotypic and phenotypic properties is not necessarily evident given the complexity of the host-pathogen interaction and the effect of environmental factors. In this context, a deeper understanding of the population structure and dynamics of new clonal lineages, with mutations that contribute to a particular lineage's success, can provide great insight regarding the appearance and spread of strain variants relevant to public health and to the control, treatment and eradication of

This chapter will provide an overview of recent studies regarding genetic variability in *M. tuberculosis*. This will include a brief description of the importance of variability for the study of the evolution of the MTBC. Also, we will address the mechanisms of genomic variation in a pathogen characterized by genetic homogeneity and inappreciable HGT by

ecotypes of the same species (Smith *et al.*, 2006).

phylogenetic lineages (Hershberg *et al.*, 2008).

tuberculosis.

illustrating how genomic variability can emerge as a consequence of mutations that result in Single Nucleotide Polymorphisms (SNPs) and Large Sequence Polymorphisms (LSPs), namely insertions and deletions. We will then discuss the importance of variability in disease outcome.
