**3. Drug-resistance against the background of Mtb genetic clades and current diagnostic approaches**

for the mutations of interest. The study used samples from four geographic regions, but the results do not specify the lineages of the resistant strains. This is problematic, because some DR mutations are lineage specific. More details on the mechanisms of drug resistance summarized from literature sources are in **Table 1**. It should also be noted that the mutations in the identified

Clade-Specific Distribution of Antibiotic Resistance Mutations in the Population…

http://dx.doi.org/10.5772/intechopen.75181

85

Genome-wide association studies (GWAS) exploit the rapid turnover and high throughput of NGS, identifying variants in natural populations linked to phenotypic traits by statistical association. Bacterial GWAS have not been frequently used because their population structures reduce the power of association or produce false positives [25]. The clonal nature of bacterial reproduction—especially prevalent in Mtb—means that spurious variations can be strongly associated with particular phenotypes [26]. However, Earle *et al*. have successfully used a linear mixed model approach to perform GWAS on four species of bacteria, including Mtb, to show associations between genetic variation and antibiotic resistant phenotypes. The success of this approach depended on "controlling population structure and boosting power by recovering signals of lineage-level associations" [27]. This method allows the researcher to eliminate signals due only to population, while preserving strain-specific signals that contribute to the DR phenotype.

Coll *et al*. identified a proposed minimum set of SNPs that can be used to differentiate all seven Mtb lineages and 55 sublineages [28]. They identified 88 SNPs in DR candidate regions (two promoters, 21 genes). However, this list of SNPs is aimed at identifying lineages, and is

Feuerriegel *et al*. showed that many polymorphisms in Mtb previously known to be associated with DR are useful for distinguishing clades, which indicates a lineage specificity of drug resistance [29]. The same team designed the first available web-based drug resistance analysis tool, the Phylo-Resistance Search Engine (PhyResSE) [30]. The tool was evaluated by testing 92 Mtb strains from Sierra Leone with known drug resistance phenotypes, either mono-resistant (RIF, INH or SM) or poly-resistant (RIF, INH, SM, ethambutol (EMB) or pyrazinamide (PZA)). The major advantage of this tool is that it forms a complete analysis pipeline, taking FASTQ files as input: including quality control, mapping and base recalibration prior to genotyping. Thus the end-user need not do complex bioinformatic analysis. This requires considerable computational power. PhyResSE uses a variant catalog based on validated resistance-SNPs from literature as well as from their own experimental data for phylogenetic and drug resistance diagnosis, including lineage-specific resistance mutations. The paper does not give detailed descriptions of the methods used for inclusion or exclusion of specific mutations, or how the sensitivity and specificity were calibrated for mutations, as some mutations may confer only low-level resistance. The program returns a plain-language output which cites the experimental support for the result and also states whether or not there is a high degree of confidence in a particular polymorphism conferring drug resistance. The strains used to evaluate the tool in the paper do not include MDR or XDR strains. Excessive contamination and/or poor sequencing coverage would provide a barrier to correct diagnosis. One of the few studies using gene pairs associated with drug resistance was by Cui *et al*. [31]. The rationale for the study was that evolution of transmissible drug-resistant Mtb is caused by multiple mutations, many of which interact with each other. This study used nearly 300 Mtb genome sequences from public datasets and their phenotypic drug-sensitivity testing results.

target genes do not explain all cases of drug resistance.

not necessarily informative about drug-resistance in the strains.

The disease TB first appeared roughly 70,000 years ago [17]. Studies show that Mtb arose as an obligate human pathogen and that different strains co-evolved with humans, migrated out of Africa, and that the populations expanded with their human hosts [18]. The migrations of modern humans out of Africa and the increased population density during the Neolithic period could be at the origin of its expansion. This theory is consistent with the bacterium's phylogeny and phylogeography [19].

Genetic analyses of global strains have revealed that distinct lineages of Mtb have emerged in different regions of the world. The considerable genetic diversity between these lineages is linked to ancient human migrations out of Africa and to more recent movements and population growth [20]. Hershberg *et al*. demonstrate that there is a greatly reduced selection pressure in Mtb, owing to factors including high clonality of Mtb and serial transmission bottlenecks, both of which reduce the effective population size, increasing the effects of genetic drift [20]. Mutations can reach high functional diversity without being eliminated, which has implications for the emergence of MDR-TB.

During diagnostic procedures, it is helpful to find the lineage of the infecting Mtb strain(s), because some lineages might have acquired specific virulence and/or resistance features before expanding [21]. Clades differ by growth rate and in patterns of host-pathogen interaction in terms of cytokine induction and rate of uptake by macrophages [22]. Lineage 2 (Beijing clade) also is associated with hyper-virulence and with an extended drug resistance pattern [23].

Here we discuss research papers investigating the feasibility of replacing phenotypic drug testing of Mtb with molecular diagnostic techniques. All of them rely on understanding the genetic mechanisms underlying the development and persistence of drug-resistance in Mtb strains, including the context of lineages with varying evolutionary histories.

Köser *et al*. were among the first to publish a method for rapid WGS analysis of an Mtb clinical specimen to reduce the time of XDR-TB diagnosis. They used SNPs to identify lineages, combined with a catalog of well-described DR polymorphisms, demonstrating that WGS is superior to current genotypic tests, but not yet as reliable as phenotypic testing [24].

Rodwell *et al*. of the Global Consortium for Drug-Resistant TB Diagnostics (GCDD) investigated whether a certain collection of mutations can be used as markers of drug resistance in a molecular diagnostic test. They studied a collection of MDR and XDR-TB strains from different regions. Their approach was to select eight genes (*katG*, *inhA*, *rpoB*, *gyrA*, *gyrB*, *rrs*, *eis*, and *thyA*) in which mutations are known to be strongly associated with resistance to the antibiotics INH, RIF, FLQ, AMK, KAN, and CAP. In each specimen, the eight genes were amplified and sequenced, and variants were detected against the H37Rv reference strain. The specificity and sensitivity of the identified variants for drug resistance were determined. They concluded that about 30 mutations in six genes predicted XDR-TB phenotypes with 90–98% sensitivity and almost 100% specificity [3]. However, using these 30 mutations diagnostically would rely on purifying mycobacterial DNA from clinical samples and amplifying the genes of interest before identifying the presence of the mutations. Such a test would rely on broad sequencing coverage and accurate base calling for the mutations of interest. The study used samples from four geographic regions, but the results do not specify the lineages of the resistant strains. This is problematic, because some DR mutations are lineage specific. More details on the mechanisms of drug resistance summarized from literature sources are in **Table 1**. It should also be noted that the mutations in the identified target genes do not explain all cases of drug resistance.

**3. Drug-resistance against the background of Mtb genetic clades and** 

The disease TB first appeared roughly 70,000 years ago [17]. Studies show that Mtb arose as an obligate human pathogen and that different strains co-evolved with humans, migrated out of Africa, and that the populations expanded with their human hosts [18]. The migrations of modern humans out of Africa and the increased population density during the Neolithic period could be at the origin of its expansion. This theory is consistent with the bacterium's

Genetic analyses of global strains have revealed that distinct lineages of Mtb have emerged in different regions of the world. The considerable genetic diversity between these lineages is linked to ancient human migrations out of Africa and to more recent movements and population growth [20]. Hershberg *et al*. demonstrate that there is a greatly reduced selection pressure in Mtb, owing to factors including high clonality of Mtb and serial transmission bottlenecks, both of which reduce the effective population size, increasing the effects of genetic drift [20]. Mutations can reach high functional diversity without being eliminated, which has

During diagnostic procedures, it is helpful to find the lineage of the infecting Mtb strain(s), because some lineages might have acquired specific virulence and/or resistance features before expanding [21]. Clades differ by growth rate and in patterns of host-pathogen interaction in terms of cytokine induction and rate of uptake by macrophages [22]. Lineage 2 (Beijing clade) also is associated with hyper-virulence and with an extended drug resistance pattern [23].

Here we discuss research papers investigating the feasibility of replacing phenotypic drug testing of Mtb with molecular diagnostic techniques. All of them rely on understanding the genetic mechanisms underlying the development and persistence of drug-resistance in Mtb

Köser *et al*. were among the first to publish a method for rapid WGS analysis of an Mtb clinical specimen to reduce the time of XDR-TB diagnosis. They used SNPs to identify lineages, combined with a catalog of well-described DR polymorphisms, demonstrating that WGS is

Rodwell *et al*. of the Global Consortium for Drug-Resistant TB Diagnostics (GCDD) investigated whether a certain collection of mutations can be used as markers of drug resistance in a molecular diagnostic test. They studied a collection of MDR and XDR-TB strains from different regions. Their approach was to select eight genes (*katG*, *inhA*, *rpoB*, *gyrA*, *gyrB*, *rrs*, *eis*, and *thyA*) in which mutations are known to be strongly associated with resistance to the antibiotics INH, RIF, FLQ, AMK, KAN, and CAP. In each specimen, the eight genes were amplified and sequenced, and variants were detected against the H37Rv reference strain. The specificity and sensitivity of the identified variants for drug resistance were determined. They concluded that about 30 mutations in six genes predicted XDR-TB phenotypes with 90–98% sensitivity and almost 100% specificity [3]. However, using these 30 mutations diagnostically would rely on purifying mycobacterial DNA from clinical samples and amplifying the genes of interest before identifying the presence of the mutations. Such a test would rely on broad sequencing coverage and accurate base calling

strains, including the context of lineages with varying evolutionary histories.

superior to current genotypic tests, but not yet as reliable as phenotypic testing [24].

**current diagnostic approaches**

84 Basic Biology and Applications of Actinobacteria

phylogeny and phylogeography [19].

implications for the emergence of MDR-TB.

Genome-wide association studies (GWAS) exploit the rapid turnover and high throughput of NGS, identifying variants in natural populations linked to phenotypic traits by statistical association. Bacterial GWAS have not been frequently used because their population structures reduce the power of association or produce false positives [25]. The clonal nature of bacterial reproduction—especially prevalent in Mtb—means that spurious variations can be strongly associated with particular phenotypes [26]. However, Earle *et al*. have successfully used a linear mixed model approach to perform GWAS on four species of bacteria, including Mtb, to show associations between genetic variation and antibiotic resistant phenotypes. The success of this approach depended on "controlling population structure and boosting power by recovering signals of lineage-level associations" [27]. This method allows the researcher to eliminate signals due only to population, while preserving strain-specific signals that contribute to the DR phenotype.

Coll *et al*. identified a proposed minimum set of SNPs that can be used to differentiate all seven Mtb lineages and 55 sublineages [28]. They identified 88 SNPs in DR candidate regions (two promoters, 21 genes). However, this list of SNPs is aimed at identifying lineages, and is not necessarily informative about drug-resistance in the strains.

Feuerriegel *et al*. showed that many polymorphisms in Mtb previously known to be associated with DR are useful for distinguishing clades, which indicates a lineage specificity of drug resistance [29]. The same team designed the first available web-based drug resistance analysis tool, the Phylo-Resistance Search Engine (PhyResSE) [30]. The tool was evaluated by testing 92 Mtb strains from Sierra Leone with known drug resistance phenotypes, either mono-resistant (RIF, INH or SM) or poly-resistant (RIF, INH, SM, ethambutol (EMB) or pyrazinamide (PZA)). The major advantage of this tool is that it forms a complete analysis pipeline, taking FASTQ files as input: including quality control, mapping and base recalibration prior to genotyping. Thus the end-user need not do complex bioinformatic analysis. This requires considerable computational power. PhyResSE uses a variant catalog based on validated resistance-SNPs from literature as well as from their own experimental data for phylogenetic and drug resistance diagnosis, including lineage-specific resistance mutations. The paper does not give detailed descriptions of the methods used for inclusion or exclusion of specific mutations, or how the sensitivity and specificity were calibrated for mutations, as some mutations may confer only low-level resistance. The program returns a plain-language output which cites the experimental support for the result and also states whether or not there is a high degree of confidence in a particular polymorphism conferring drug resistance. The strains used to evaluate the tool in the paper do not include MDR or XDR strains. Excessive contamination and/or poor sequencing coverage would provide a barrier to correct diagnosis.

One of the few studies using gene pairs associated with drug resistance was by Cui *et al*. [31]. The rationale for the study was that evolution of transmissible drug-resistant Mtb is caused by multiple mutations, many of which interact with each other. This study used nearly 300 Mtb genome sequences from public datasets and their phenotypic drug-sensitivity testing results. The variants were identified using a standard variant-calling approach. The resulting variants were first filtered using PLINK to remove phylogenetically related variants. The remaining mutations were analyzed using the program GBOOST, which performs a Chi-square test to confirm associations between two variants and phenotype. The resulting gene pairs were screened for the presence of drug target genes and further filtered by non-synonymous mutations. The resulting gene pairs were: one for INH, one for RIF, four for EMB and five for ethionamide (ETH). The authors reported that most of the identified gene pairs containing drug targets consisted of the unique mycobacterial Pro-Pro-Glu (PPE) family proteins, and from this they infer that PPE family proteins play an important role in Mtb drug resistance [30]. The identified mutations were not validated in this study, but the study does show the potential for using pairs of mutations in the diagnosis of drug resistance rather than single mutations. It should be noted that the PPE family proteins make up 10% of the Mtb genome, and they are highly polymorphic, so associations with these genes might occur as a result of genetic drift rather than selection pressure [32]. The value of removing population-specific mutations is unclear, as some lineages of Mtb are strongly associated with drug resistance.

performed. Also, this database integrates clinical, epidemiological and microbiological data for all the recorded Mtb isolates. Analysis of this study compared distribution patterns of 58,025 amino acid substitutions in 1089 Mtb strains from the GMTV database. The polymorphisms were determined relative to the H37Rv reference strain [38]. Frequencies of all polymorphisms were calculated for the entire set of 1089 Mtb genomes and for Mtb lineages as they were identified in the GMTV database. Analysis of the data showed that many DR polymorphisms were strongly associated with specific Mtb lineages. A mosaic plot of the data is shown in **Figure 1**. Genomes of the Beijing, Haarlem and Lineage 4.3 clades contained numerous DR mutations, while only a few of them were observed in the Lineage 4.1, Ural and X-type. Bacteria of the latter clades appeared to be mostly drugsusceptible. Statistically significant prevalence of DR mutations in bacteria of specific Mtb clades was confirmed by Fisher's exact test with Bonferroni adjustment. Of these, 25

Co-occurrence of alleles of different polymorphic sites was identified by calculating the link-

fied between 823 polymorphic sites including 10 DR mutations [10].Functional associations between DR mutations (denoted as mutations from an initial *A* allele to allele *a* conferring DR) and other genetic polymorphisms (denoted as *B* for the most frequent allele and *b* for all other alternative variants in Mtb population) were identified by Levin's attributable risk statistic

[1 − *EXP*(*ln*(1 − *Ra*) − 1.96 × *StdErr*)] *to* [1 − *EXP*(*ln*(1 − *Ra*) + 1.96 × *StdErr*)] (1)

In the case of estimation of the risk of DR mutation from *A* to *a* in a subpopulation of organ-

*Ra* <sup>=</sup> *PAB Pab* <sup>−</sup> *PaB PAb* \_\_\_\_\_\_\_\_\_\_\_\_\_\_ (*PAB* <sup>+</sup> *PaB*)(*PaB* <sup>+</sup> *Pab*) (2)

Risks of secondary mutations *B* to *b* in a DR subpopulation with the genotype *a* were calcu-

*Ra* <sup>=</sup> *PAB Pab* <sup>−</sup> *PaB PAb* \_\_\_\_\_\_\_\_\_\_\_\_\_\_ (*PAB* <sup>+</sup> *PAb*)(*PAb* <sup>+</sup> *Pab*) (4)

\_\_\_\_\_\_\_\_\_\_\_\_\_ *PAb* <sup>+</sup> *Ra*(*PAB* <sup>+</sup> *Pab*) \_\_\_\_\_\_\_\_\_\_\_\_ *N* × *PaB*

\_\_\_\_\_\_\_\_\_\_\_\_ *PaB* <sup>+</sup> *Ra*(*PAB* <sup>+</sup> *Pab*) \_\_\_\_\_\_\_\_\_\_\_\_ *N* × *PAb*


Clade-Specific Distribution of Antibiotic Resistance Mutations in the Population…

http://dx.doi.org/10.5772/intechopen.75181

87

above 6.63 corresponds to a p-value ≤ 0.01) were identi-

was calculated by

and *StdErr* were calculated by Eqs. (4) and

(3)

(5)

DR-polymorphism/lineage pairs showed an odds ratio above 1.

[39]. Confidence range values of attributable risks were calculated by Eq. (1).

isms having the allele *b* at the secondary polymorphic site, the parameter *Ra*

Eq. (2) and the Fleiss' standard error parameter *StdErr* was calculated by Eq. (3).

age disequilibrium (LD) and χ<sup>2</sup>

statistically reliable associations (χ<sup>2</sup>

*StdErr* <sup>=</sup> <sup>√</sup>

*StdErr* <sup>=</sup> <sup>√</sup>

(5), respectively:

lated by Eq. (1), but in this case the parameters *Ra*

Mortimer *et al*. proposed a method of distinguishing DR loci under positive selection [33]. The rationale behind this is that methods for identifying advantageous mutations usually depend on recombination to differentiate target loci from neutral variants, which is not feasible in the case of Mtb. They analyzed over 1000 Mtb genomes from Russia and South Africa, mostly Lineages 2 (Beijing) and 4, and examined the frequency of different mutations in the populations. They found that resistant sub-populations are less diverse than susceptible subpopulations, which is consistent with the ongoing transmission of resistant Mtb. They classified the DR mutations as either "tight targets" or "sloppy targets" based on their diversity. The authors also noted that lipid metabolism genes are enriched in the list of DR loci under positive selection. This approach has potential for understanding the genetics of resistance in clonal bacteria.

A variety of bioinformatic approaches have been useful for resolving the evolution of the various lineages of *Mtb* over time, for tracing the emergence of pathogenic and more virulent strains, and for identifying variants in Mtb genomes responsible for the development of antibiotic resistance [28, 34–36]. In tandem with methods for rapid identification of drug resistance, researchers are also investigating methods of exploiting our understanding of the evolution of drug resistance. Treatment of TB with antibiotics has had an overall effect of selecting for drug resistance, rather than having the hoped-for effect of selecting DR variants with reduced fitness. Baym *et al*. have reviewed possible mechanisms of selection for drug resistance inversion [37]. These rely on the concept of using combinations of antibiotics and other compounds to inhibit bacterial growth and at the same time reversing the selection for resistance, similar to the combination of penicillin and clavulanic acid to block bacterial β-lactamase, while minimizing or reversing selection for resistance. This avenue shows promise, particularly in combination with quick genotyping of clinical samples.
