**3. Tuberculosis databases and platforms**

Since the emergence of Bioinformatics and Computational Biology back in the 1960's, numer‐ ous databases and computational tools have been created in order to provide the scientific community the necessary means to access and interpret a range of biological data.

Actually, the contribution of these disciplines became particularly evident in the 1990's and 2000's, when the development of supercomputers, powerful personal computers, and com‐ puter networks at global scale, as well as of high-throughput technologies, collectively referred as *omics* – *e.g.*, genomics, transcriptomics, and proteomics –, revolutionized the field of Biology.

Nowadays, a number of web resources are publicly available aiming to organize, integrate, and provide efficient access to the ever-increasing amount of biological information produced over decades of research, particularly in recent years, with numerous projects applying the aforementioned high-throughput technologies worldwide. Accordingly, diverse options to visualize, search, retrieve and analyze this wealth of data are offered, providing the opportu‐ nity to acquire more detailed knowledge about genomes and their respective organisms, among many others opportunities.

However, the creation and maintenance of such web resources is a challenge by itself, not only because they usually have to deal with large amounts of data, but mostly because they require the designing of schemes and frameworks that accurately represent the complexity of biolog‐ ical systems, which is frequently a hard task to be accomplished. Another difficulty is the development of efficient data retrieval systems, implemented in user-friendly interfaces and intended for complex and massive database searching. It is worth noting that, in many circumstances, the authors and curators of such resources receive little or no remuneration for their productive efforts, and the access to financial support for creation and maintenance of biological databases is still a difficult task.

In this section we present the main web resources fully or partially dedicated to mycobacterial species with relevance for readers interested in TB. Each database or platform, categorized according to its purpose and functionality, is quickly reviewed, and references to the original paper describing it, as well as its electronic site, are provided, serving as a guideline for researches or students working on TB. Notably, the computational resources presented here are all publicly available as online services and can potentially be applied to the identification of new drug targets, vaccine antigens, or diagnostics for TB, among many others applications.

### **3.1. Generic and multifunctional**

**Pan American Health Organization (PAHO)**. Serving as the regional office for WHO, PAHO has been working for more than one century to improve health and the living standards of the countries of the Americas, being recognized as part of the United Nations' system. URL:

**StopTB Partnership**. The StopTb Partnership operates through a secretariat hosted by the World Health Organization (WHO) in Geneva, Switzerland, and seven working groups whose role is to accelerate progress on access to TB diagnosis and treatment, research and develop‐ ment for new TB diagnostics, drugs and vaccines, and tackling drug resistant- and HIV-

**Tb Alliance**. Established in the year 2000, its main objective is to discover and develop better, faster-acting, and affordable drugs to fight tuberculosis. Today, the organization and its

**World Health Organization (WHO)**. Created in 1948, WHO is the directing and coordinating authority in international health within the United Nations' system, composed of 193 countries and two associate members. It supports and promotes health research in several areas, Tb being

Since the emergence of Bioinformatics and Computational Biology back in the 1960's, numer‐ ous databases and computational tools have been created in order to provide the scientific

Actually, the contribution of these disciplines became particularly evident in the 1990's and 2000's, when the development of supercomputers, powerful personal computers, and com‐ puter networks at global scale, as well as of high-throughput technologies, collectively referred as *omics* – *e.g.*, genomics, transcriptomics, and proteomics –, revolutionized the field of Biology.

Nowadays, a number of web resources are publicly available aiming to organize, integrate, and provide efficient access to the ever-increasing amount of biological information produced over decades of research, particularly in recent years, with numerous projects applying the aforementioned high-throughput technologies worldwide. Accordingly, diverse options to visualize, search, retrieve and analyze this wealth of data are offered, providing the opportu‐ nity to acquire more detailed knowledge about genomes and their respective organisms,

However, the creation and maintenance of such web resources is a challenge by itself, not only because they usually have to deal with large amounts of data, but mostly because they require the designing of schemes and frameworks that accurately represent the complexity of biolog‐ ical systems, which is frequently a hard task to be accomplished. Another difficulty is the development of efficient data retrieval systems, implemented in user-friendly interfaces and intended for complex and massive database searching. It is worth noting that, in many

community the necessary means to access and interpret a range of biological data.

partners manage a portfolio of new anti-Tb drugs. URL: <http://www.tballiance.org>

<http://new.paho.org/hq/>

associated TB. URL: <http://www.stoptb.org/>

362 Tuberculosis - Current Issues in Diagnosis and Management

one of them. URL: <http://who.int/topics/tuberculosis/en/>

**3. Tuberculosis databases and platforms**

among many others opportunities.

**MyBASE**. The Mycobacterial Database [1] is an integrated platform for functional and evolutionary genomic study of the genus *Mycobacterium*, comprising extensive literature review and data annotation on mycobacterial genome polymorphism, virulence factors, and essential genes. URL: <http://mybase.psych.ac.cn/>

**TBDB**. The TB Database [2] provides a comprehensive genomic data repository for *M. tuberculosis* and related bacteria, combining (*in silico*) genome sequence and annotation data and (experimental) gene-expression data. It also provides an analysis platform with suitable computational tools to assist (comparative) genomic and gene-expression studies of these microorganisms. Annotated features of genes and genomes, predicted orthologous groups, operons and synteny blocks, as well as predicted and curated immunological epitopes and gene-expression patterns are available. URL: <http://www.tbdb.org/>

**The MycoBrowser portal**. The Mycobacterial Browser portal [3] is an extensive genomic and proteomic data repository for four related mycobacteria: *M. tuberculosis* H37Rv, *M. leprae* TN, *M. marinum* M, and *M. smegmatis* MC2. The system provides *in silico* generated and manually reviewed information on the complete genome sequence of these organisms. As part of this portal, the **TubercuList** database [4] integrates a range of information on the *M. tuberculosis* genome, such as genomic and protein annotations and features, drug and transcriptome data, mutant and operon annotation, and comparative genomics. It represents a complete redesign of the database with the same name provided by the GenoList genome browser (also described in this chapter). URL: <http://mycobrowser.epfl.ch/>

#### **3.2. Genomic mapping and data mining**

**TubercuList**, **BoviList**, **BCGList**. The GenoList [5] is a collection of databases dedicated to microbial genome analysis, providing a complete data set of protein and nucleotide sequences for selected species, as well as annotation and functional classification of these sequences. The TubercuList, BoviList, and BCGList databases are devoted to collect and integrate various aspects of the genomic information of *M. tuberculosis* H37Rv, *M. bovis* AF2122/97, and *M. bovis* BCG Pasteur 1173P2, respectively. URL: <http://genolist.pasteur.fr/>

**TBrowse**. The TBrowse [6] is a genomic data resource, based on the Generic Model Organism Database (a collection of open source computational tools for creating and managing genomescale biological databases); the browse provides the scientific community an integrative genomic map of *M. tuberculosis* with millions of data-points representing different genomic features and computational predictions systematically collected from online resources and publications, including gene/operon predictions, orthologs, gene expression data, non-coding RNA, pathway/networks, regulatory elements, variation and repeats, subcellular localization, among others. URL: <http://tbrowse.osdd.net>

**3.4. Genetic diversity and epidemiology**

vntrplus.org>

SITVIT\_ONLINE/>

**MGDD**. The Mycobacterial Genome Divergence Database [9] comprises a data repository of genetic variations among different organisms belonging to the *M. tuberculosis* complex. The MGDD system provides quick searches for precomputed single nucleotide polymorphisms (SNPs), insertions/deletions, repeat expansions, and divergent sequences (inversions, dupli‐ cations, and changes in synteny) in genomic regions of fully sequenced *M. tuberculosis* complex

Web Resources on TB: Information, Research, and Data Analysis

http://dx.doi.org/10.5772/53949

365

**MIRU-VNTRplus**. The Mycobacterial Interspersed Repetitive Unit – Variable Number Tandem Repeat (MIRU-VNTR) database [10,11] comprises a collection of 186 well character‐ ized strains representing the major *M. tuberculosis* complex in which, for each strain, species, lineage, and epidemiologic information are provided together with 24 MIRU loci, Spoligotype patterns, Regions of Difference (RD) profiles, Single Nucleotide Polymorphisms (SNPs), susceptibility data, and IS6110 Restriction Fragment Length Polymorphism (RFLP) fingerprint images. The system enables users to analyze genotyping data of their own strains alone or in comparison with the reference strains in the database; analyses and comparisons of genotypes can be based on Multiple Locus VNTR Analysis (MLVA), Spoligotypes, Large Sequence Polymorphism (LSP) and SNPs data, or on a weighted combination of these markers. Tools for data analysis include: search for similar strains, creation of phylogenetic and minimum spanning trees and mapping of geographic information. URL: <http://www.miru-

**MTCID**. The *M. tuberculosis* Clinical Isolate Genetic Polymorphism Database [12] consists in a repository of genetic polymorphisms, providing Single Nucleotide Polymorphism (SNPs) and Spoligotype profiles of clinical isolates of *M. tuberculosis*, based on published literature

**SITVITWEB**. The SITVITWEB [13] is a multi-marker database, comprising three major types of molecular markers: Spoligotypes, Mycobacterial Interspersed Repetitive Units (MIRUs) and Variable Number Tandem Repeat (VNTRs); this webserver is dedicated to the investigation of *M. tuberculosis* genetic diversity and molecular epidemiology. Currently, this international resource provides genotyping information on 62,582 *M. tuberculosis* complex clinical isolates from 153 countries of patient origin. URL: <http://www.pasteur-guadeloupe.fr:8081/

Additionally, a few relevant computational tools are currently available as web services dedicated to analyze the genetic diversity of *M. tuberculosis* complex strains and characterize

The **spolTools** [14] comprise a collection of browser programs designed to manipulate and analyze Spoligotype data of the *M. tuberculosis* complex, consisting in an online repository of Spoligotype isolates collected from various published data sets (currently, 1179 Spoligotypes and 6278 isolates across 30 datasets), and online tools for manipulating and analyzing these data (computation of basic population genetic quantities; visualization of clusters of Spoligo‐ type patterns based on an estimated evolutionary history; and a procedure to predict emerging

species and strains genomes. URL: <http://mirna.jnu.ac.in/mgdd/>

and manual curation. URL: <http://ccbb.jnu.ac.in/Tb/>

TB dynamics using molecular epidemiological data:

#### **3.3. Comparative genomics**

**GenoMycDB**. The GenoMycDB [7] is a relational database for large-scale comparative analysis of completely sequenced mycobacterial genomes based on their predicted protein content. Currently, the database comprises six mycobacteria – *M. tuberculosis* (strains H37Rv and CDC1551), *M. bovis* AF2122/97, *M. avium subsp. paratuberculosis* K10, *M. leprae* TN, and *M. smegmatis* MC2 155 – providing for each of their encoded protein sequences the predicted subcellular localization, the assigned cluster of orthologous groups (COGs), features of the corresponding gene, and links to several important databases; in addition, pairs or groups of homologs between selected species/strains can be dynamically inferred based on user-defined criteria. URL: <http://www.dbbm.fiocruz.br/GenoMycDB.html>

**MycoDB**. The xBASE [8] is another collection of databases, this one dedicated to bacterial comparative genome analyses. It provides precomputed data of comparative genome analyses among selected bacterial genera, as well as inferred orthologous groups and functional annotations. It also provides precomputed analyses of codon usage, base composition, codon adaptation index (CAI), hydropathy, and aromaticity of the protein coding sequences in these bacteria. As part of this multi-microbial system, the MycoDB currently comprises comparative data from 61 completely sequenced or unfinished mycobacterial genomes, including 40 strains of *M. tuberculosis*, *M. bovis* AF2122/97, *M. bovis* BCG strains Pasteur 1173P2 and Tokyo 172, among others mycobacteria. URL: <http://www.xbase.ac.uk/>

**Mycobacterium tuberculosis Comparative Database**. This Broad Institute's database comprises precomputed comparative genome analyses data of eight *M. tuberculosis* patient isolates with relevant clinical phenotypes and disease epidemiology (varied degree of spread, drug resistance, and clinical severity): *M. tuberculosis* F11, *M. tuberculosis* Haar‐ lem, *M. tuberculosis* KZN 4207 (DS), *M. tuberculosis* KZN 1435 (MDR), *M. tuberculosis* KZN 605 (XDR), *M. tuberculosis* C, *M. tuberculosis* 98-R604 INH-RIF-EM, and *M. tuberculosis* W-148. Among the comparative data provided by this TB resource we can cite: inferred families of orthologous genes, genomic two-dimensional dot plot matrices, comparative genome mapping and browsing, and several comparative gene annotations and features. URL: <http://www.broadinstitute.org/annotation/genome/mycobacterium\_tuberculosis\_spp/ MultiHome.html>

### **3.4. Genetic diversity and epidemiology**

**TBrowse**. The TBrowse [6] is a genomic data resource, based on the Generic Model Organism Database (a collection of open source computational tools for creating and managing genomescale biological databases); the browse provides the scientific community an integrative genomic map of *M. tuberculosis* with millions of data-points representing different genomic features and computational predictions systematically collected from online resources and publications, including gene/operon predictions, orthologs, gene expression data, non-coding RNA, pathway/networks, regulatory elements, variation and repeats, subcellular localization,

**GenoMycDB**. The GenoMycDB [7] is a relational database for large-scale comparative analysis of completely sequenced mycobacterial genomes based on their predicted protein content. Currently, the database comprises six mycobacteria – *M. tuberculosis* (strains H37Rv and CDC1551), *M. bovis* AF2122/97, *M. avium subsp. paratuberculosis* K10, *M. leprae* TN, and *M. smegmatis* MC2 155 – providing for each of their encoded protein sequences the predicted subcellular localization, the assigned cluster of orthologous groups (COGs), features of the corresponding gene, and links to several important databases; in addition, pairs or groups of homologs between selected species/strains can be dynamically inferred based on user-defined

**MycoDB**. The xBASE [8] is another collection of databases, this one dedicated to bacterial comparative genome analyses. It provides precomputed data of comparative genome analyses among selected bacterial genera, as well as inferred orthologous groups and functional annotations. It also provides precomputed analyses of codon usage, base composition, codon adaptation index (CAI), hydropathy, and aromaticity of the protein coding sequences in these bacteria. As part of this multi-microbial system, the MycoDB currently comprises comparative data from 61 completely sequenced or unfinished mycobacterial genomes, including 40 strains of *M. tuberculosis*, *M. bovis* AF2122/97, *M. bovis* BCG strains Pasteur 1173P2 and Tokyo 172,

**Mycobacterium tuberculosis Comparative Database**. This Broad Institute's database comprises precomputed comparative genome analyses data of eight *M. tuberculosis* patient isolates with relevant clinical phenotypes and disease epidemiology (varied degree of spread, drug resistance, and clinical severity): *M. tuberculosis* F11, *M. tuberculosis* Haar‐ lem, *M. tuberculosis* KZN 4207 (DS), *M. tuberculosis* KZN 1435 (MDR), *M. tuberculosis* KZN 605 (XDR), *M. tuberculosis* C, *M. tuberculosis* 98-R604 INH-RIF-EM, and *M. tuberculosis* W-148. Among the comparative data provided by this TB resource we can cite: inferred families of orthologous genes, genomic two-dimensional dot plot matrices, comparative genome mapping and browsing, and several comparative gene annotations and features. URL: <http://www.broadinstitute.org/annotation/genome/mycobacterium\_tuberculosis\_spp/

among others. URL: <http://tbrowse.osdd.net>

364 Tuberculosis - Current Issues in Diagnosis and Management

criteria. URL: <http://www.dbbm.fiocruz.br/GenoMycDB.html>

among others mycobacteria. URL: <http://www.xbase.ac.uk/>

**3.3. Comparative genomics**

MultiHome.html>

**MGDD**. The Mycobacterial Genome Divergence Database [9] comprises a data repository of genetic variations among different organisms belonging to the *M. tuberculosis* complex. The MGDD system provides quick searches for precomputed single nucleotide polymorphisms (SNPs), insertions/deletions, repeat expansions, and divergent sequences (inversions, dupli‐ cations, and changes in synteny) in genomic regions of fully sequenced *M. tuberculosis* complex species and strains genomes. URL: <http://mirna.jnu.ac.in/mgdd/>

**MIRU-VNTRplus**. The Mycobacterial Interspersed Repetitive Unit – Variable Number Tandem Repeat (MIRU-VNTR) database [10,11] comprises a collection of 186 well character‐ ized strains representing the major *M. tuberculosis* complex in which, for each strain, species, lineage, and epidemiologic information are provided together with 24 MIRU loci, Spoligotype patterns, Regions of Difference (RD) profiles, Single Nucleotide Polymorphisms (SNPs), susceptibility data, and IS6110 Restriction Fragment Length Polymorphism (RFLP) fingerprint images. The system enables users to analyze genotyping data of their own strains alone or in comparison with the reference strains in the database; analyses and comparisons of genotypes can be based on Multiple Locus VNTR Analysis (MLVA), Spoligotypes, Large Sequence Polymorphism (LSP) and SNPs data, or on a weighted combination of these markers. Tools for data analysis include: search for similar strains, creation of phylogenetic and minimum spanning trees and mapping of geographic information. URL: <http://www.miruvntrplus.org>

**MTCID**. The *M. tuberculosis* Clinical Isolate Genetic Polymorphism Database [12] consists in a repository of genetic polymorphisms, providing Single Nucleotide Polymorphism (SNPs) and Spoligotype profiles of clinical isolates of *M. tuberculosis*, based on published literature and manual curation. URL: <http://ccbb.jnu.ac.in/Tb/>

**SITVITWEB**. The SITVITWEB [13] is a multi-marker database, comprising three major types of molecular markers: Spoligotypes, Mycobacterial Interspersed Repetitive Units (MIRUs) and Variable Number Tandem Repeat (VNTRs); this webserver is dedicated to the investigation of *M. tuberculosis* genetic diversity and molecular epidemiology. Currently, this international resource provides genotyping information on 62,582 *M. tuberculosis* complex clinical isolates from 153 countries of patient origin. URL: <http://www.pasteur-guadeloupe.fr:8081/ SITVIT\_ONLINE/>

Additionally, a few relevant computational tools are currently available as web services dedicated to analyze the genetic diversity of *M. tuberculosis* complex strains and characterize TB dynamics using molecular epidemiological data:

The **spolTools** [14] comprise a collection of browser programs designed to manipulate and analyze Spoligotype data of the *M. tuberculosis* complex, consisting in an online repository of Spoligotype isolates collected from various published data sets (currently, 1179 Spoligotypes and 6278 isolates across 30 datasets), and online tools for manipulating and analyzing these data (computation of basic population genetic quantities; visualization of clusters of Spoligo‐ type patterns based on an estimated evolutionary history; and a procedure to predict emerging strains/genotypes associated with elevated transmission). URL: <http:// www.emi.unsw.edu.au/spolTools/>

**3.6. Structural biology**

<http://bmi.icmr.org.in/mtbsd/MtbSD.php>

discovery purposes. URL: <http://www.tbdreamdb.com/>

ers all over the world, especially those from low-income countries.

**3.7. Drug targets and resistance**

**4. Conclusion**

**MtbSD**. The *M. tuberculosis* Structural Database [19] is a resource dedicated to 3D protein structures of *M. tuberculosis*, providing relevant information on description, reaction catalyzed, domains, active sites, structural homologs and similarities between bound and cognate ligands. Currently, the database comprises 876 structures for 332 mycobacterial genes. URL:

Web Resources on TB: Information, Research, and Data Analysis

http://dx.doi.org/10.5772/53949

367

**TDR Targets database**. The Tropical Disease Research (TDR/WHO) Targets database [20] comprises extensive genetic, biochemical and pharmacological data related to tropical disease pathogens, including *M. tuberculosis*, as well as computationally predicted druggability for potential targets and compound desirability information; the goal is to exploit the availability of diverse datasets to facilitate the identification and prioritization of drugs and drug targets in neglected disease pathogens, such as the tubercle bacillus. URL: <http://tdrtargets.org/>

**TB Drug Resistance Mutation Database**. The Tuberculosis Drug Resistance Mutation Database [21] is a comprehensive database that catalogs mutations associated with TB drug resistance and the frequency of the most common mutations associated with resistance to specific drugs, providing a resource for the development of molecular diagnostics for TB, as well as structural mapping of mutations to investigate mechanisms of resistance for drug

As outlined in this chapter, Informatics has acquired a great importance not only in the biological sciences, but in all areas of knowledge. Internet has become one of the most important tools for most people, from a dedicated researcher interested in the latest advances in his/her particular field of work to the teenager trying to contact his friends. Companies, industries and research institutes developed sites, where they expose their work to laymen.

The large number of publicly available databases and computational tools that have been developed, dedicated to organize, integrate, and provide efficient access to the ever-increasing amount of biological information produced over decades of research, have benefited research‐

One important drawback, that still has to be overcome, is that the wealth of biological information available is presently fragmented, dispersed across numerous computational resources, and is redundant in many circumstances, clearly requiring unification in order to

Ideally, the upcoming databases and computational tools should offer: data integration, providing multi-perspective analyses; combine *in silico* generated and manually curated data, improving the quality of our research; present efficient data structure, storage and processing,

provide a global and integral picture of the biological systems they are dedicated to.

The **TB-Insight** [15] is a collection of computational methods (based on different models and datasets) for both lineage classification of *M. tuberculosis* complex strains, and for visualization of genetic diversity in *M. tuberculosis* complex population and distribution by lineage, as well as visual representation of associations between patient and strain groups, providing percep‐ tion on differences in phenotypic characteristics, and phylogeographic associations of *M. tuberculosis* complex strains with host populations. URL: <http://tbinsight.cs.rpi.edu/>

#### **3.5. Gene expression and regulation**

**MTBRegList**. The MTBRegList [16] is dedicated to the analysis of gene expression and regulation data in *M. tuberculosis*, containing predicted and characterized regulatory motifs cross-referenced with their respective transcription factor(s), experimentally identified transcription start sites, and DNA binding sites. URL: <http://www.usherbrooke.ca/vers/ MtbRegList>

**MycoperonDB**. The MycoperonDB [17] is a repository of known and computationally predicted operons and transcriptional units of (currently) five different mycobacteria – *M. tuberculosis* (strains H37Rv and CDC1551), *M. bovis* AF2122/97, *M. avium subsp. paratuberculo‐ sis* K10, and *M. leprae* TN – whose genomes have been completely sequenced. Presently, it comprises 18,053 genes organized as 8,256 predicted operons and transcriptional units, providing literature links for experimentally characterized operons, and access to known promoters and related information. URL: <http://cdfd.org.in/mycoperondb/home.html>

**MTBreg**. The MTBreg is part of the online services provided by the UCLA-DOE Institute for Genomics and Proteomics (http://www.doe-mbi.ucla.edu/), and consists in a repository of conditionally regulated proteins in *M. tuberculosis* grown under several different conditions mimicking infection; the database provides information on proteins that are regulated by selected transcription factors or other regulatory proteins, as well as on the experimental condition, the experimental dataset and a literature reference. URL: <http://www.doembi.ucla.edu/Services/MTBreg/>

**MycoRegDB**. The Mycobacterial Promoter and Regulatory Elements Database [18] is part of a user-friendly web interface (**RegAnalyst**) that integrates a motif prediction program (MoPP), a pattern detection tool (MyPatternFinder), and a database of promoter and regulatory elements from various mycobacterial species (MycoRegDB). Currently, the MycoRegDB comprises the following species: *M. tuberculosis* (strains H37Rv and CDC1551), *M. bovis* BCG, *M. leprae*, *M. smegmatis*, *M. avium* subsp. *paratuberculosis*, *M. marinum*, *M. ulcerans*, *M. gilvum*, and *M. vanbaalenii*. For each database entry, a variety of useful information is provided, such as, gene annotation, CDS positions, promoter/regulatory sequence (with Transcription Start Point (TSP) or binding site explicitly marked), TSP-CDS/Motif-CDS distance, among others. The first release of MycoRegDB contained 290 annotated DNA motifs (174 promoters and 116 transcription factor binding sites) described in 81 research papers, according to the authors.. URL: <http://www.nii.ac.in/~deepak/RegAnalyst/MycoRegDB>

### **3.6. Structural biology**

strains/genotypes associated with elevated transmission). URL: <http://

The **TB-Insight** [15] is a collection of computational methods (based on different models and datasets) for both lineage classification of *M. tuberculosis* complex strains, and for visualization of genetic diversity in *M. tuberculosis* complex population and distribution by lineage, as well as visual representation of associations between patient and strain groups, providing percep‐ tion on differences in phenotypic characteristics, and phylogeographic associations of *M.*

**MTBRegList**. The MTBRegList [16] is dedicated to the analysis of gene expression and regulation data in *M. tuberculosis*, containing predicted and characterized regulatory motifs cross-referenced with their respective transcription factor(s), experimentally identified transcription start sites, and DNA binding sites. URL: <http://www.usherbrooke.ca/vers/

**MycoperonDB**. The MycoperonDB [17] is a repository of known and computationally predicted operons and transcriptional units of (currently) five different mycobacteria – *M. tuberculosis* (strains H37Rv and CDC1551), *M. bovis* AF2122/97, *M. avium subsp. paratuberculo‐ sis* K10, and *M. leprae* TN – whose genomes have been completely sequenced. Presently, it comprises 18,053 genes organized as 8,256 predicted operons and transcriptional units, providing literature links for experimentally characterized operons, and access to known promoters and related information. URL: <http://cdfd.org.in/mycoperondb/home.html>

**MTBreg**. The MTBreg is part of the online services provided by the UCLA-DOE Institute for Genomics and Proteomics (http://www.doe-mbi.ucla.edu/), and consists in a repository of conditionally regulated proteins in *M. tuberculosis* grown under several different conditions mimicking infection; the database provides information on proteins that are regulated by selected transcription factors or other regulatory proteins, as well as on the experimental condition, the experimental dataset and a literature reference. URL: <http://www.doe-

**MycoRegDB**. The Mycobacterial Promoter and Regulatory Elements Database [18] is part of a user-friendly web interface (**RegAnalyst**) that integrates a motif prediction program (MoPP), a pattern detection tool (MyPatternFinder), and a database of promoter and regulatory elements from various mycobacterial species (MycoRegDB). Currently, the MycoRegDB comprises the following species: *M. tuberculosis* (strains H37Rv and CDC1551), *M. bovis* BCG, *M. leprae*, *M. smegmatis*, *M. avium* subsp. *paratuberculosis*, *M. marinum*, *M. ulcerans*, *M. gilvum*, and *M. vanbaalenii*. For each database entry, a variety of useful information is provided, such as, gene annotation, CDS positions, promoter/regulatory sequence (with Transcription Start Point (TSP) or binding site explicitly marked), TSP-CDS/Motif-CDS distance, among others. The first release of MycoRegDB contained 290 annotated DNA motifs (174 promoters and 116 transcription factor binding sites) described in 81 research papers, according to the authors..

URL: <http://www.nii.ac.in/~deepak/RegAnalyst/MycoRegDB>

*tuberculosis* complex strains with host populations. URL: <http://tbinsight.cs.rpi.edu/>

www.emi.unsw.edu.au/spolTools/>

366 Tuberculosis - Current Issues in Diagnosis and Management

**3.5. Gene expression and regulation**

mbi.ucla.edu/Services/MTBreg/>

MtbRegList>

**MtbSD**. The *M. tuberculosis* Structural Database [19] is a resource dedicated to 3D protein structures of *M. tuberculosis*, providing relevant information on description, reaction catalyzed, domains, active sites, structural homologs and similarities between bound and cognate ligands. Currently, the database comprises 876 structures for 332 mycobacterial genes. URL: <http://bmi.icmr.org.in/mtbsd/MtbSD.php>
