**4. Non-random associations between polymorphic sites in genomes of** *M. tuberculosis*

Data for this research was sourced from the GMTV database [17], which consists of SNPs and indels for a large number of Mtb strains for which whole genome sequencing was performed. Also, this database integrates clinical, epidemiological and microbiological data for all the recorded Mtb isolates. Analysis of this study compared distribution patterns of 58,025 amino acid substitutions in 1089 Mtb strains from the GMTV database. The polymorphisms were determined relative to the H37Rv reference strain [38]. Frequencies of all polymorphisms were calculated for the entire set of 1089 Mtb genomes and for Mtb lineages as they were identified in the GMTV database. Analysis of the data showed that many DR polymorphisms were strongly associated with specific Mtb lineages. A mosaic plot of the data is shown in **Figure 1**. Genomes of the Beijing, Haarlem and Lineage 4.3 clades contained numerous DR mutations, while only a few of them were observed in the Lineage 4.1, Ural and X-type. Bacteria of the latter clades appeared to be mostly drugsusceptible. Statistically significant prevalence of DR mutations in bacteria of specific Mtb clades was confirmed by Fisher's exact test with Bonferroni adjustment. Of these, 25 DR-polymorphism/lineage pairs showed an odds ratio above 1.

The variants were identified using a standard variant-calling approach. The resulting variants were first filtered using PLINK to remove phylogenetically related variants. The remaining mutations were analyzed using the program GBOOST, which performs a Chi-square test to confirm associations between two variants and phenotype. The resulting gene pairs were screened for the presence of drug target genes and further filtered by non-synonymous mutations. The resulting gene pairs were: one for INH, one for RIF, four for EMB and five for ethionamide (ETH). The authors reported that most of the identified gene pairs containing drug targets consisted of the unique mycobacterial Pro-Pro-Glu (PPE) family proteins, and from this they infer that PPE family proteins play an important role in Mtb drug resistance [30]. The identified mutations were not validated in this study, but the study does show the potential for using pairs of mutations in the diagnosis of drug resistance rather than single mutations. It should be noted that the PPE family proteins make up 10% of the Mtb genome, and they are highly polymorphic, so associations with these genes might occur as a result of genetic drift rather than selection pressure [32]. The value of removing population-specific mutations is

86 Basic Biology and Applications of Actinobacteria

unclear, as some lineages of Mtb are strongly associated with drug resistance.

Mortimer *et al*. proposed a method of distinguishing DR loci under positive selection [33]. The rationale behind this is that methods for identifying advantageous mutations usually depend on recombination to differentiate target loci from neutral variants, which is not feasible in the case of Mtb. They analyzed over 1000 Mtb genomes from Russia and South Africa, mostly Lineages 2 (Beijing) and 4, and examined the frequency of different mutations in the populations. They found that resistant sub-populations are less diverse than susceptible subpopulations, which is consistent with the ongoing transmission of resistant Mtb. They classified the DR mutations as either "tight targets" or "sloppy targets" based on their diversity. The authors also noted that lipid metabolism genes are enriched in the list of DR loci under positive selection. This approach has potential for understanding the genetics of resistance in clonal bacteria. A variety of bioinformatic approaches have been useful for resolving the evolution of the various lineages of *Mtb* over time, for tracing the emergence of pathogenic and more virulent strains, and for identifying variants in Mtb genomes responsible for the development of antibiotic resistance [28, 34–36]. In tandem with methods for rapid identification of drug resistance, researchers are also investigating methods of exploiting our understanding of the evolution of drug resistance. Treatment of TB with antibiotics has had an overall effect of selecting for drug resistance, rather than having the hoped-for effect of selecting DR variants with reduced fitness. Baym *et al*. have reviewed possible mechanisms of selection for drug resistance inversion [37]. These rely on the concept of using combinations of antibiotics and other compounds to inhibit bacterial growth and at the same time reversing the selection for resistance, similar to the combination of penicillin and clavulanic acid to block bacterial β-lactamase, while minimizing or reversing selection for resistance. This avenue shows promise, particularly in combination with quick genotyping of clinical samples.

**4. Non-random associations between polymorphic sites in genomes** 

Data for this research was sourced from the GMTV database [17], which consists of SNPs and indels for a large number of Mtb strains for which whole genome sequencing was

**of** *M. tuberculosis*

Co-occurrence of alleles of different polymorphic sites was identified by calculating the linkage disequilibrium (LD) and χ<sup>2</sup> -statistics. In total, 288,840 pairs of polymorphisms showing statistically reliable associations (χ<sup>2</sup> above 6.63 corresponds to a p-value ≤ 0.01) were identified between 823 polymorphic sites including 10 DR mutations [10].Functional associations between DR mutations (denoted as mutations from an initial *A* allele to allele *a* conferring DR) and other genetic polymorphisms (denoted as *B* for the most frequent allele and *b* for all other alternative variants in Mtb population) were identified by Levin's attributable risk statistic [39]. Confidence range values of attributable risks were calculated by Eq. (1).

$$\left[1 - \text{EXP}(\ln(1 - R\_{\square}) - 1.96 \times \text{StdErr})\right] \text{ to } \left[1 - \text{EXP}(\ln(1 - R\_{\square}) + 1.96 \times \text{StdErr})\right] \tag{1}$$

In the case of estimation of the risk of DR mutation from *A* to *a* in a subpopulation of organisms having the allele *b* at the secondary polymorphic site, the parameter *Ra* was calculated by Eq. (2) and the Fleiss' standard error parameter *StdErr* was calculated by Eq. (3).

$$R\_a = \frac{P\_{AB}P\_{ab} - P\_{ab}P\_{Ab}}{(P\_{AB} + P\_{ab})(P\_{ab} + P\_{ab})} \tag{2}$$

$$StdErr = \sqrt{\frac{P\_{Ab} + R\_s(P\_{sB} + P\_{ab})}{N \times P\_{a\theta}}}\tag{3}$$

Risks of secondary mutations *B* to *b* in a DR subpopulation with the genotype *a* were calculated by Eq. (1), but in this case the parameters *Ra* and *StdErr* were calculated by Eqs. (4) and (5), respectively:

(5), respectively: 
$$R\_a = \frac{P\_{Ab}P\_{ab} - P\_{aB}P\_{Ab}}{(P\_{Ab} + P\_{Ab})(P\_{ab} + P\_{ab})} \tag{4}$$

$$StdErr = \sqrt{\frac{P\_{s\theta} + P\_s (P\_{s\theta} + P\_{ab})}{N \star P\_{sb}}} \tag{5}$$

In Eqs. (2)–(5), values *PAB*, *PAb*, *PaB* and *Pab* are the frequencies of corresponding allele combinations; and *N* is the total number of the analyzed Mtb strains = 1089.

The reasoning behind the further analysis is displayed in **Figure 2**, where two contingency tables of co-distribution of an arginine to leucine replacement at position 463 in the protein KatG rendering INH resistance [40] and two other secondary mutations are shown. Both pairs of mutations are characterized by strong linkage disequilibrium above 0.9. First, the co-distribution of the DR mutation KatG R463L and a polymorphism D69Y in a drug efflux protein Stp (Rv2333c) is considered (**Figure 2-1**). The replacement of the aspartate residue by tyrosine at position 69 of the protein Stp is rather common in the Mtb population and it has not been associated with any DR phenotype. However, this study showed that 91–99% of the DR mutation KatG R463L depends on the presence of the Stp D69Y substitution. In contrast, the likelihood of a D → Y replacement in the protein Stp does not depend significantly on the state of the KatG R463L polymorphism. The estimated attributable risk is in the range of 21–27%. The confidence ranges of attributable risks in **Figure 2** are denoted as *A* → *a|b* and *B* → *b|a*, respectively.

Let us consider another co-distribution of the same DR-related polymorphism KatG R463L and a leucine to serine substitution at position 896 in PPE35 protein (Rv1918c), which is shown in **Figure 2-2**. These two mutations are strongly associated with each other, but this

**Figure 2.** Contingency tables of co-distribution of a DR mutation KatG R463L rendering resistance to INH and two secondary polymorphisms in the (1) efflux drug protein Stp and (2) PPE35 protein. Attributable risks of mutation acquisition were calculated and denoted as *A* → *a|b* and *B* → *b|a* for the likelihood of DR mutation acquisition when the

**Drug resistance mutations Annotation**

**EmbC V981 L (EMB)**

**RpsL K43R (SM)**

Clade-Specific Distribution of Antibiotic Resistance Mutations in the Population…

http://dx.doi.org/10.5772/intechopen.75181

89

78.1 to 98.5

68.7 to 96.6

76.7 to 97.4

75.7 to 96.4

72.9 to 95.0

79.6 to 97.8

73.5 to 95.1

73.8 to 95.2

**GidB E92D (SM)**

79.6 to 99.5

71.8 to 98.1

75.6 to 97.3

77.8 to 97.5

78.0 to 97.6

75.7 to 96.4

75.4 to 96.3

75.7 to 96.4

**GidB L16R (SM)**

81.1 to 96.0

74.4 to 93.9

76.0 to 92.7

76.6 to 92.4

75.3 to 91.6

74.8 to 91.0

75.9 to 91.8

69.3 to 87.4

Hypothetical protein

Hypothetical protein

Hypothetical protein

Transcriptional regulator

UTP-glucose-1-phosphate uridylyltransferase

methyltransferase

Magnesium and cobalt transporter

VapC47 toxin

DNA-

secondary site is mutated and the likelihood of secondary mutation in a DR sub-population, respectively.

**ThyA T202A (PAS)**

80.7 to 99.6

64.6 to 94.7

67.1 to 92.9

72.9 to 95.0

70.2 to 93.6

68.5 to 92.4

68.0 to 92.2

71.2 to 93.8

**Secondary mutations**

Rv0193c K417\*,E

Rv1186c P207A,T

Rv1321 S144R

Rv2017 A262E

GalU Q235R

Rv3204 T34A

CorA K139\*,E

VapC47 S46 L

**GyrA S95 T (FLQ)**

86.4 to 94.6

84.0 to 93.9

81.8 to 91.1

76.5 to 86.5

76.6 to 86.6

74.7 to 84.8

76.2 to 86.1

74.1 to 84.4

**KatG S315 T,N (INH)**

87.5 to 97.7

82.8 to 96.4

84.7 to 96.0

83.9 to 95.1

81.8 to 93.8

82.5 to 94.1

85.4 to 95.9

84.5 to 95.3

**KatG R463L (INH)**

77.4 to 93.1

79.8 to 95.7

80.6 to 94.5

73.5 to 89.4

74.9 to 90.3

72.2 to 88.3

70.7 to 87.3

71.0 to 87.5

**AccD6 D229G (INH)**

71.9 to 90.9

76.5 to 95.0

72.5 to 90.6

70.6 to 88.6

69.4 to 87.8

69.1 to 87.3

67.3 to 86.1

71.8 to 89.1

**Figure 1.** Mosaic plot representing the contingency table for the presence (black) or absence (gray) of each DR polymorphism for each of the 10 loci and the clade for the specimens in the GMTV dataset. The size of the rectangle represents the number of sequences in the category. A dotted line indicates that there were no specimens in that category. Clades with 10 or fewer representatives were omitted.

In Eqs. (2)–(5), values *PAB*, *PAb*, *PaB* and *Pab* are the frequencies of corresponding allele combina-

The reasoning behind the further analysis is displayed in **Figure 2**, where two contingency tables of co-distribution of an arginine to leucine replacement at position 463 in the protein KatG rendering INH resistance [40] and two other secondary mutations are shown. Both pairs of mutations are characterized by strong linkage disequilibrium above 0.9. First, the co-distribution of the DR mutation KatG R463L and a polymorphism D69Y in a drug efflux protein Stp (Rv2333c) is considered (**Figure 2-1**). The replacement of the aspartate residue by tyrosine at position 69 of the protein Stp is rather common in the Mtb population and it has not been associated with any DR phenotype. However, this study showed that 91–99% of the DR mutation KatG R463L depends on the presence of the Stp D69Y substitution. In contrast, the likelihood of a D → Y replacement in the protein Stp does not depend significantly on the state of the KatG R463L polymorphism. The estimated attributable risk is in the range of 21–27%. The confidence ranges of attributable risks in **Figure 2** are denoted as *A* → *a|b* and *B* → *b|a*, respectively. Let us consider another co-distribution of the same DR-related polymorphism KatG R463L and a leucine to serine substitution at position 896 in PPE35 protein (Rv1918c), which is shown in **Figure 2-2**. These two mutations are strongly associated with each other, but this

**Figure 1.** Mosaic plot representing the contingency table for the presence (black) or absence (gray) of each DR polymorphism for each of the 10 loci and the clade for the specimens in the GMTV dataset. The size of the rectangle represents the number of sequences in the category. A dotted line indicates that there were no specimens in that category.

Clades with 10 or fewer representatives were omitted.

tions; and *N* is the total number of the analyzed Mtb strains = 1089.

88 Basic Biology and Applications of Actinobacteria

**Figure 2.** Contingency tables of co-distribution of a DR mutation KatG R463L rendering resistance to INH and two secondary polymorphisms in the (1) efflux drug protein Stp and (2) PPE35 protein. Attributable risks of mutation acquisition were calculated and denoted as *A* → *a|b* and *B* → *b|a* for the likelihood of DR mutation acquisition when the secondary site is mutated and the likelihood of secondary mutation in a DR sub-population, respectively.



ranges of attributable risks *A* → *a|b* and *B* → *b|a*do not overlap and *A* → *a|b* > *B* → *b|a* as in **Figure 2-1**. In total, 554 secondary polymorphisms were found, which increase likelihood of acquisition of 9 out of 10 studied DR mutations. The mutation KasA G269S, rendering resistance to INH [41], was strongly associated only with the GidB L16R mutation rendering SM resistance [42], which indicates that the former polymorphism is most likely a secondary mutation in

Clade-Specific Distribution of Antibiotic Resistance Mutations in the Population…

http://dx.doi.org/10.5772/intechopen.75181

91

A selection of secondary mutations predetermining acquisition of nine of the most widely distributed DR mutations rendering resistance to FLQ, INH, EMB, SM and para-aminosalisylic acid (PAS) in multidrug resistant Mtb are shown in **Table 2**. Values *Xmin to Xmax* in **Table 2** represent confidence ranges estimated for *p*-value ≤ 0.05 (Eq. (1)). It was found that the acquisition of DR mutations require allelic alterations in many other proteins including several transmembrane transporter and efflux proteins, osmoprotectant, transcriptional regulators and some other proteins. Strong cross-associations between DR polymorphisms characteristic for different lineages (**Figure 1**) favors the hypothesis of strong functional associations between these mutations compared to neutral genetic drift. The identified proteins predefining the acquisition of the DR phenotype may be molecular targets for development of new drugs for antibiotic resistance reversion.

**5. The concept of the drug resistance reversion and implementation** 

The concept of drug resistance reversion was applied in recent studies [7, 41]. Drug resistance mutations are often incompatible with one another, as shown by negative linkage disequilibrium values. This suggests that the cumulative fitness cost of mutations is often too high for the resulting strain to be viable. FS-1 is a new drug which seems to exploit this tendency. Active units of FS-1 are aggregated micelles containing complexes of tri-iodide molecules coordinated by metal ions and integrated into a dextrin-polypeptide moiety. The basic formula of the micelle is:

)<sup>y</sup> [Me(Lm)J]

where *L*—dextrin-polypeptide ligand; *Me*—Li/Mg ions; *n*, *m*, *x*, *y* and *k*—variable integers ≥1; molecular mass of the micelles is in the range of 30–300 kD. This molecular complex was designed to prolong the residence time of moderately oxidative iodine molecules in an organ-

Studies of XDR-TB infection in animal models showed the reversion of Mtb pathogens to a more drug sensitive phenotype after treatment with FS-1 despite the remaining DR related mutations in their genomes [7]. Drug resistance reversion was also confirmed on an *in vitro* model with a XDR-TB clinical isolate SCAID 187.0 when cultivated for 60 days in six passages on a medium with a sub-lethal dose of ¼ MIC of FS-1. Reduction of the antibiotic resistance of XDR-TB isolates obtained during the clinical trial of FS-1 was consistent with the results of the above-mentioned laboratory experiments. It was concluded that the DR phenotype requires multiple genes to be in specific activity states controlled either by transcription regulation or resulting from specific mutations. A combination of genetic variants creates a genomic context of drug resistance.

+ <sup>x</sup>} (Cl−)

y+x+k]

multidrug resistant Mtb.

**thereof**

[{(L<sup>n</sup> (MeJ3)+

ism and facilitate their transportation to inner tissues.

Polymorphic sites are denoted by names of genes and pairs of amino acid substitutions from the most common allele to one or several alternative allelic states.

Deletions are marked by asterisks (\*).Values *Xmin to Xmax* in cells represent confidence ranges estimated for *p*-value ≤0.05 (Eq. (1)).

**Table 2.** Attributable risk of acquisition of DR mutations in sub-populations of Mtb with secondary mutations.

dependence is highly symmetric: in more than 90% of cases both mutations co-occur in the same genomes. It may indicate a genetic drift event when the DR phenotype is characteristic for a sub-lineage of isolates sharing common ancestry and the neutral mutation in the hypermutable PPE35 protein is a genetic marker of the sublineage.

For further analysis, only those secondary polymorphisms which influenced the DR mutations significantly, but were independent, were selected; i.e. cases were selected when confidence ranges of attributable risks *A* → *a|b* and *B* → *b|a*do not overlap and *A* → *a|b* > *B* → *b|a* as in **Figure 2-1**. In total, 554 secondary polymorphisms were found, which increase likelihood of acquisition of 9 out of 10 studied DR mutations. The mutation KasA G269S, rendering resistance to INH [41], was strongly associated only with the GidB L16R mutation rendering SM resistance [42], which indicates that the former polymorphism is most likely a secondary mutation in multidrug resistant Mtb.

A selection of secondary mutations predetermining acquisition of nine of the most widely distributed DR mutations rendering resistance to FLQ, INH, EMB, SM and para-aminosalisylic acid (PAS) in multidrug resistant Mtb are shown in **Table 2**. Values *Xmin to Xmax* in **Table 2** represent confidence ranges estimated for *p*-value ≤ 0.05 (Eq. (1)). It was found that the acquisition of DR mutations require allelic alterations in many other proteins including several transmembrane transporter and efflux proteins, osmoprotectant, transcriptional regulators and some other proteins. Strong cross-associations between DR polymorphisms characteristic for different lineages (**Figure 1**) favors the hypothesis of strong functional associations between these mutations compared to neutral genetic drift. The identified proteins predefining the acquisition of the DR phenotype may be molecular targets for development of new drugs for antibiotic resistance reversion.
