**2.2.5 Selection of candidate drug targets from the network analysis**

We constructed a protein-protein interaction (PPI) map, combining the results generated by PEIMAP, iPfam and PSIMAP (Fig. 2). The number of interactions detected for each database is described in (Table 1). By merging the data from the different approaches, bias to a specific class of interactions was avoided. The predicted network also contained isolated sub-networks which were difficult to analyze. These sub-networks appeared as a consequence of the inability to assign domains or from the lack of homology of those proteins to the known pairs of protein interactions. These sub-networks could be investigated by further experimental validation of the network. The total number of high confidence predicted interactions were 33,861 for 1,366 nodes


Table 1. Number of nodes and predicted interactions for each database.

By using the topological metrics of connectivity and betweenness centrality we identified 384 potential targets. From these targets, those that had homology to human proteins were eliminated. This substantially reduced the number of potential targets, although higher specificity of drug effects was expected. As explained earlier, toxicity becomes a very important issue when designing or searching for a drug, since many clinical trials failed because of undesired and severe side effects. After this filter, the final number of targets was reduced to 142. Further filters can be applied to this list to select those targets that were most attractive for drug design (Table 2).

From the group of targets, 91 kinases were predicted as essential proteins in the network with no homology to the human kinome. Kinases are very important regulators of signaling in the cell, and in the case of *Leishmania,* kinases are crucial to enable the different metabolic changes needed to adapt to a human host. Perhaps by intensive pharmacological investigation, drugs that are very successful in treating cancer (e.g., Gleevec) could be used against *Leishmania* parasites. One particular example from the group of predicted kinases detected on the network is the protein LMPK (*LmjF36.6470*). This protein has been shown to be essential in *Leishmania mexicana* (Wiese 1998) and it has conserved orthologs in other species such as *L. amazonensis*, *L. major*, *L. tropica*, *L. aethiopica*, *L. donovani, L. infantum*, and *L. braziliensis* (Wiese & Gorcke 2001). Therefore, this kinase was an interesting candidate for experimental validation and possibly its upstream and down-stream interacting partners could also be inhibited by a combination of drugs. In addition, one of the challenges in this disease is to find a broad-spectrum drug that can have therapeutic effects on several *Leishmania* species that cause different forms of leishmaniasis. Further analysis of this target can help to elucidate drugs or combination of drugs that are active against amastigotes, the stage responsible for the disease in mammals. Three ABC transporters that were *Leishmania* specific -

al. 2008). For proteins of unknown function in the GeneDB database, we predicted their possible biological roles by evaluating the results of Gene Ontology terms for biological

We constructed a protein-protein interaction (PPI) map, combining the results generated by PEIMAP, iPfam and PSIMAP (Fig. 2). The number of interactions detected for each database is described in (Table 1). By merging the data from the different approaches, bias to a specific class of interactions was avoided. The predicted network also contained isolated sub-networks which were difficult to analyze. These sub-networks appeared as a consequence of the inability to assign domains or from the lack of homology of those proteins to the known pairs of protein interactions. These sub-networks could be investigated by further experimental validation of the network. The total number of high

**Proteins PEIMAP PSIMAP IPFAM** 

**8,335** Nodes Edges Nodes Edges Nodes Edges

By using the topological metrics of connectivity and betweenness centrality we identified 384 potential targets. From these targets, those that had homology to human proteins were eliminated. This substantially reduced the number of potential targets, although higher specificity of drug effects was expected. As explained earlier, toxicity becomes a very important issue when designing or searching for a drug, since many clinical trials failed because of undesired and severe side effects. After this filter, the final number of targets was reduced to 142. Further filters can be applied to this list to select those targets that were most

From the group of targets, 91 kinases were predicted as essential proteins in the network with no homology to the human kinome. Kinases are very important regulators of signaling in the cell, and in the case of *Leishmania,* kinases are crucial to enable the different metabolic changes needed to adapt to a human host. Perhaps by intensive pharmacological investigation, drugs that are very successful in treating cancer (e.g., Gleevec) could be used against *Leishmania* parasites. One particular example from the group of predicted kinases detected on the network is the protein LMPK (*LmjF36.6470*). This protein has been shown to be essential in *Leishmania mexicana* (Wiese 1998) and it has conserved orthologs in other species such as *L. amazonensis*, *L. major*, *L. tropica*, *L. aethiopica*, *L. donovani, L. infantum*, and *L. braziliensis* (Wiese & Gorcke 2001). Therefore, this kinase was an interesting candidate for experimental validation and possibly its upstream and down-stream interacting partners could also be inhibited by a combination of drugs. In addition, one of the challenges in this disease is to find a broad-spectrum drug that can have therapeutic effects on several *Leishmania* species that cause different forms of leishmaniasis. Further analysis of this target can help to elucidate drugs or combination of drugs that are active against amastigotes, the stage responsible for the disease in mammals. Three ABC transporters that were *Leishmania* specific -

718 14,839 3,184 158,984 2,336 50,398

processes using the BinGO plug-in available in Cytoscape.

confidence predicted interactions were 33,861 for 1,366 nodes

Table 1. Number of nodes and predicted interactions for each database.

**Number of** 

attractive for drug design (Table 2).

**2.2.5 Selection of candidate drug targets from the network analysis** 

*LmjF34.0670*, *LmjF27.0470*, *LmjF32.2060* - were also predicted as essential. They confer resistance to antimonials and pentamidine by extruding the drug outside of the cell (Perez-Victoria., et al. 2002). Based upon our analysis, these proteins could be also interesting drug targets due to their role in the homeostasis of the intracellular parasite environment.

Fig. 2. Visualization of the predicted *Leishmania major* interactome


Table 2. Top 10 list of predicted targets from the *L. major* interactome.

Current Advances in Computational Strategies for Drug Discovery in Leishmaniasis 263

The term *nij* are the stoichiometric coefficients of metabolite *i* in reaction *j*. In this case, diffusion is not considered in the system. These equations can be applied to compartments, where the flux between compartments has to be considered as a different reaction. The stoichiometric coefficients *nij* can be combined into the so-called *stoichiometric matrix,* where

According to this, a mathematical model for a metabolic network can be described as a system with a vector S = (*S*1, *S*2, …, *Sn*) of concentration values for the different species, a vector *v* = (*v*1, *v*2, …, *vr* ) of reaction rates and the stoichiometric matrix *N*. With these

*dS Nv*

The total number of genes included in the network was 560, with 1,112 reactions and 1,101 metabolites. The process for the reconstruction involved different data sources, in particular literature and biological databases. Reaction stoichiometry and subcellular localization were also extracted by examining the existing literature. Some reactions were assigned as nongene-associated to account for spontaneously generated metabolites. The gene-associated

*Flux balance analysis* is a method that has been used extensively to analyze metabolic networks. The important advantage of this method is that does not require detailed knowledge of the enzyme kinetics. The principle of the method relies on investigation of the fluxes that have the greatest influence in the growth (or production of biomass) by preserving a set of constraints such as physicochemical, thermodynamic, topological and environmental. In the case of the *Leishmania* model, constraints included reaction reversibility rules, promastigote/amastigote protein expression data, various medium conditions and prevalence of transport reactions across cellular compartments were used. The model was simulated under *steady state* conditions*,* which means that the net change of

Virtual knockouts were carried out over the network with the aim to detect potential drug targets. The knockout genes were classified as being lethal (essential for survival), growth reducing, or with no effect. From this analysis, only 12% of single knockouts were predicted as lethal and 10% as growth reducing. Approximately 83% of all lethal genes belonged to three metabolic processes: lipid, carbohydrate or amino-acid metabolism, highlighting how

From the group of lethal genes, the authors selected those that were exclusive or without human orthologs as potential candidates (a strategy that was also employed for the interactome analysis). The gene *LmjF05.0350*, which encodes for trypanothione reductase, was lethal *in silico*. This enzyme participates in the reduction of oxidative agents by using trypanothione and this molecule is only present in kinetoplastids. This enzyme has been studied extensively as drug target, confirming the predictions of the mathematical model (Eberle., et al. 2011). *LmjF31.2940* and *LmjF21.0845*, encoding for squalene synthase and hypoxanthine-guanine phosphoribosyltransferase respectively, were also predicted as lethal in the network. The squalene synthase inhibitors affect the sterol biosynthesis pathway, taking advantage of the trypanosomatid requirement for specific endogenous sterols (e.g.

*dt* (3)

each column belongs to a reaction and each row to a substance (Klipp et al., 2005).

definitions, the balance equation (3) can be rewritten as follows:

**3.1 Predicted drug targets in the** *Leishmania major* **metabolic network** 

reactions were further adjusted according to the specified constraints.

any metabolite in the network during time should be zero.

critical these are to general function.

It has been shown that modular organization is a prevalent feature in biology, and this modular organization of pathways can be used to infer protein function (Rives & Galitski 2003). We detected 63 clusters or modules in the network, and assigned potential biological processes to 263 proteins with no prior functional description. By examining the proportion of predicted targets by biological process, 64% of the proteins in the network were predicted to participate in the protein phosphorylation (GO:0006468). In addition, 8% of proteins were predicted to be involved in nucleosome assembly (GO:0006334), 4% in nucleic acid metabolic process (GO:0006139), 4% in electron transport (GO:0006118), 4% in transport processes (GO:0006810), and 2% in protein amino acid alkylation (GO:0006139). The remaining 14% of target proteins were distributed across processes with one protein per process. This result highlighted the importance of protein kinases as the main protein class to characterize and explore as drug targets in *Leishmania* parasites.

### **3. Selection of drug targets by metabolic flux balance analysis and** *in silico* **deletions**

Proteins involved in metabolism constitute another important source of drug targets. The energetic balance in the cell is controlled by enzymes that regulate the transformation of substrates in a coordinated and efficient manner. These enzymes are needed specifically for producing energy or as building blocks for other molecules being essential for the viability of the organism. However, a different approach needs to be used for modeling metabolism because the interactions between enzymes depend upon the rate of turnover of molecules or *fluxes*, not specifically through physical interactions as described for the case of the interactome.

The reconstruction of metabolic networks is more established compared to interactome generation. Since glycolysis was elucidated in 1930, several metabolic pathways have been discovered in many organisms. Metabolic networks reconstructed from this source of data started with *E. coli* (Reed & Palsson 2003) and was followed later on by reconstructions in eukaryotic organisms such as *Saccharomyces cerevisiae* (Duarte., et al. 2004) and *Aspergillus niger* (David., et al. 2003). More recently, metabolic networks of *Plasmodium falciparum* (Plata., et al. 2010) and *Leishmania major* (Chavali., et al. 2008) were reconstructed with the aim of detecting drug targets, and some details about the network generation and analysis will be discussed.

In order to build a metabolic network it is necessary to list all the substances with their concentrations and the reactions between substances. In living systems these reactions are catalyzed by enzymes and the transport processes are carried out by transporters or channels. The reactions are influenced by the *stoichiometric coefficients* which denote the proportion of substrate and product molecules involved in a reaction. The reaction:

$$\text{S1} + \text{S1} \to 2\text{P}$$

describes the generation of product P from S1. Therefore, the stoichiometric coefficients for this particular reaction are -1,-1 and 2 respectively. For a metabolic network consisting of *m*  substances and *r* reactions, the systems dynamics are described by *systems equations* (2) (or balance equations, since the balance of substrate production and degradation is considered):

$$\frac{dSi}{dt} = \sum\_{j=1}^{r} n\_{ij}v\_j \tag{2}$$

It has been shown that modular organization is a prevalent feature in biology, and this modular organization of pathways can be used to infer protein function (Rives & Galitski 2003). We detected 63 clusters or modules in the network, and assigned potential biological processes to 263 proteins with no prior functional description. By examining the proportion of predicted targets by biological process, 64% of the proteins in the network were predicted to participate in the protein phosphorylation (GO:0006468). In addition, 8% of proteins were predicted to be involved in nucleosome assembly (GO:0006334), 4% in nucleic acid metabolic process (GO:0006139), 4% in electron transport (GO:0006118), 4% in transport processes (GO:0006810), and 2% in protein amino acid alkylation (GO:0006139). The remaining 14% of target proteins were distributed across processes with one protein per process. This result highlighted the importance of protein kinases as the main protein class

**3. Selection of drug targets by metabolic flux balance analysis and** *in silico*

Proteins involved in metabolism constitute another important source of drug targets. The energetic balance in the cell is controlled by enzymes that regulate the transformation of substrates in a coordinated and efficient manner. These enzymes are needed specifically for producing energy or as building blocks for other molecules being essential for the viability of the organism. However, a different approach needs to be used for modeling metabolism because the interactions between enzymes depend upon the rate of turnover of molecules or *fluxes*, not specifically through physical interactions as described for the

The reconstruction of metabolic networks is more established compared to interactome generation. Since glycolysis was elucidated in 1930, several metabolic pathways have been discovered in many organisms. Metabolic networks reconstructed from this source of data started with *E. coli* (Reed & Palsson 2003) and was followed later on by reconstructions in eukaryotic organisms such as *Saccharomyces cerevisiae* (Duarte., et al. 2004) and *Aspergillus niger* (David., et al. 2003). More recently, metabolic networks of *Plasmodium falciparum* (Plata., et al. 2010) and *Leishmania major* (Chavali., et al. 2008) were reconstructed with the aim of detecting drug targets, and some details about the network generation and analysis

In order to build a metabolic network it is necessary to list all the substances with their concentrations and the reactions between substances. In living systems these reactions are catalyzed by enzymes and the transport processes are carried out by transporters or channels. The reactions are influenced by the *stoichiometric coefficients* which denote the

S1 + S1 → 2P describes the generation of product P from S1. Therefore, the stoichiometric coefficients for this particular reaction are -1,-1 and 2 respectively. For a metabolic network consisting of *m*  substances and *r* reactions, the systems dynamics are described by *systems equations* (2) (or balance equations, since the balance of substrate production and degradation is considered):

1

*j*

*dt*

*ij j*

(2)

*n v*

*r*

proportion of substrate and product molecules involved in a reaction. The reaction:

*dSi*

to characterize and explore as drug targets in *Leishmania* parasites.

**deletions** 

case of the interactome.

will be discussed.

The term *nij* are the stoichiometric coefficients of metabolite *i* in reaction *j*. In this case, diffusion is not considered in the system. These equations can be applied to compartments, where the flux between compartments has to be considered as a different reaction. The stoichiometric coefficients *nij* can be combined into the so-called *stoichiometric matrix,* where each column belongs to a reaction and each row to a substance (Klipp et al., 2005).

According to this, a mathematical model for a metabolic network can be described as a system with a vector S = (*S*1, *S*2, …, *Sn*) of concentration values for the different species, a vector *v* = (*v*1, *v*2, …, *vr* ) of reaction rates and the stoichiometric matrix *N*. With these definitions, the balance equation (3) can be rewritten as follows:

$$\frac{dS}{dt} = \mathbf{N}v \tag{3}$$

#### **3.1 Predicted drug targets in the** *Leishmania major* **metabolic network**

The total number of genes included in the network was 560, with 1,112 reactions and 1,101 metabolites. The process for the reconstruction involved different data sources, in particular literature and biological databases. Reaction stoichiometry and subcellular localization were also extracted by examining the existing literature. Some reactions were assigned as nongene-associated to account for spontaneously generated metabolites. The gene-associated reactions were further adjusted according to the specified constraints.

*Flux balance analysis* is a method that has been used extensively to analyze metabolic networks. The important advantage of this method is that does not require detailed knowledge of the enzyme kinetics. The principle of the method relies on investigation of the fluxes that have the greatest influence in the growth (or production of biomass) by preserving a set of constraints such as physicochemical, thermodynamic, topological and environmental. In the case of the *Leishmania* model, constraints included reaction reversibility rules, promastigote/amastigote protein expression data, various medium conditions and prevalence of transport reactions across cellular compartments were used. The model was simulated under *steady state* conditions*,* which means that the net change of any metabolite in the network during time should be zero.

Virtual knockouts were carried out over the network with the aim to detect potential drug targets. The knockout genes were classified as being lethal (essential for survival), growth reducing, or with no effect. From this analysis, only 12% of single knockouts were predicted as lethal and 10% as growth reducing. Approximately 83% of all lethal genes belonged to three metabolic processes: lipid, carbohydrate or amino-acid metabolism, highlighting how critical these are to general function.

From the group of lethal genes, the authors selected those that were exclusive or without human orthologs as potential candidates (a strategy that was also employed for the interactome analysis). The gene *LmjF05.0350*, which encodes for trypanothione reductase, was lethal *in silico*. This enzyme participates in the reduction of oxidative agents by using trypanothione and this molecule is only present in kinetoplastids. This enzyme has been studied extensively as drug target, confirming the predictions of the mathematical model (Eberle., et al. 2011). *LmjF31.2940* and *LmjF21.0845*, encoding for squalene synthase and hypoxanthine-guanine phosphoribosyltransferase respectively, were also predicted as lethal in the network. The squalene synthase inhibitors affect the sterol biosynthesis pathway, taking advantage of the trypanosomatid requirement for specific endogenous sterols (e.g.

Current Advances in Computational Strategies for Drug Discovery in Leishmaniasis 265

against a recently developed tool, also freely available, called AutoDock Vina (Trott & Olson 2010). The experiments consisted of screening ligands with known activity against the HIV protease and *decoys* or non-binders. Autodock Vina performed very well in terms of speed, being ~10 times faster, and more accurate in ranking larger molecules compared to AutoDock. We are currently using AutoDock Vina in a virtual screening project called "Drug Search for Leishmaniasis" in association with IBM-World Community Grid (http://www.worldcommunitygrid.org/research/dsfl/overview.do) to speed up the

As an example of the application of this strategy in *Leishmania,* a recent study demonstrated the utility of virtual screening to identify potential MAPK inhibitors. The target, MAPK was first modeled by using *homology modeling* techniques. Essentially, the technique predicts the 3D structure of a particular protein by finding sequence homology to a model protein with experimentally determined structure. This model was refined by molecular dynamics. Structural features, such as ATP binding pocket, phosphorylation lip, and common docking site were identified. Virtual screening was carried out using this target with several compounds from the class of ATP inhibitors. Interestingly, the docking analysis suggested that the indirubin class of molecules could act as putative inhibitors of *Leishmania* MAPK. By testing this result experimentally, the authors found reasonably good correlation between *in vitro* activity and calculated binding energy for indirubin class of inhibitors obtained in the virtual screening study. These molecules make strong hydrogen bonding interactions with Lys43, Arg57, Asp155, Glu94, and Ile96 amino acid residues of the *Leishmania* MAPK model. These residues belong to the catalytic domain and inhibition of the catalytic domain leads to impaired kinase activity (Awale., et al. 2010). This is a clear example of the synergy between

computational and experimental methods to accelerate drug discovery.

**5. Selection of new drugs using machine learning techniques** 

patterns would be critical for establishing robust drug-target relationships.

**6. Experimental approaches for drug testing in Leishmania** 

The *Leishmania* proteome is estimated to contain ~8,150 proteins based on the annotated genome of the sequenced species (Peacock., et al. 2007). However, fewer than 150 proteins have 3D structures in the PDB. This limits the use of docking-based strategies to search for anti-Leishmania compounds. An alternative strategy to associate active compounds with *Leishmania* targets is by using machine learning techniques. This approach is intended to find patterns on protein targets such as domains, post-translational modifications etc, that can be linked to a specific class of compounds. This system will "learn" these patterns and when challenged by proteins from the organism of interest it will predict the potential association for a particular compound. Two studies have applied this strategy to a particular set of protein targets (Bulashevska., et al. 2009; Thangudu., et al. 2010), employing the different techniques such as support vector machines (SVM) and Bayesian classifiers (BC). As a perspective, these methods could also be applied for drug search in *Leishmania;* however, the definition of protein

Several *in vitro* assays to test *Leishmania* susceptibility to new potential inhibitors have been developed for the two stages of *Leishmania*, namely promastigotes and amastigotes. These two stages are morphologically and biochemically different and these differences are likely responsible for their differing susceptibility to proven anti-leishmanial compounds. Assays developed with intracellular amastigotes have the advantage of being more "disease

process of finding new active compounds.

ergosterol). Interestingly, proteins belonging to this process in *Leishmania* were also detected by homology and interactome analysis showing consistency between methods.

Double knockouts were also simulated to identify lethal combination of genes. Out of 152,520 double deletions, 19,341 were lethal. From this group, 19,285 double deletions were trivial lethal, which means that at least one of the genes involved was lethal in a single deletion. There were 56 non-trivial lethal double deletions that could be interesting to test experimentally. The main participation of these double knockouts was in the lipid and carbohydrate metabolism with 57.6% of the genes in these groups. One explanation for the large number of double deletions that were not essential is the high degree of redundancy in the network. These results show the utility of a different methodology that uses mathematical modeling for the detection of essential genes in metabolism.
