**2.2.3 Topological analysis of the network**

258 Current Topics in Tropical Medicine

in other organisms such as fungi and bacteria (He., et al. 2008; Kim., et al. 2008). The domain interaction analysis generated more diversity in the detection of possible interactions because modular exchange of protein domains allowed rewiring the network even if the isolated sequence of the domain was conserved. However, despite the high accuracy of this method, the prediction of protein interactions was limited as there was not an abundance of crystallized protein complexes. The PEIMAP database was also used, and it included sequences of protein interaction pairs detected by several methods, including co-

To construct the *Leishmania major* network, protein sequences were extracted from the GeneDB database. This database included genomic and proteomic information of pathogens, including protozoan parasites. The protein sequences were aligned to the interacting domain pairs using PSI-BLAST against the SCOP 1.71 database with an E-value cutoff of 0.0001, as described previously (Kim., et al. 2008). The PSI-BLAST tool was used for the alignments because it had the advantage of detecting small conserved sequences, such as small domains that would be otherwise missed by using the standard BLASTP. The same strategy was applied for the alignments concerning the iPfam database. In this case, the domain assignment for the *Leishmania* proteins was carried out using the Pfam database (release 18.0) with the hmmpfam tool employed for the alignments. The final set of predicted interactions was carried out by homology search over the PEIMAP database using BLASTP, with a minimal cutoff of 40% sequence identity and 70% length coverage. The PEIMAP database included protein-protein interaction (PPI) information from six source databases: DIP (Xenarios., et al. 2000), BIND (Bader., et al. 2001), IntAct (Hermjakob., et al. 2004), MINT (Zanzoni., et al. 2002), HPRD(Peri., et al. 2004), and BioGrid (Stark., et al. 2006).

As discussed earlier, the reliability of this analysis and its bias to certain types of protein interactions was dependent on the experimental method employed. Therefore, it was necessary to combine results from different databases to increase the coverage and the confidence of the predicted interactions. In the *Leishmania major* interactome, we used a simple scoring system to identify high confidence interactions. A previous study classified the experimental methods according to their reliability (Chua., et al. 2006), and we used this data in addition to the significance of the sequence alignments to calculate the confidence of the interactions. This scoring system was called the 'combined score' method, and it was applied for the confidence calculations in the STRING database (von Mering., et al. 2005). This database is useful for searching predicted protein interactions detected by other

1 (1 )*<sup>n</sup>*

*i E score R* 

where *score* was the confidence value ranging from 0-1 with 1 equals to 100% accuracy, *E* was the set of methods under analysis (PEIMAP, PSIMAP, iPfam); *Ri* was the reliability of method *i,* and *n* was the number of interactions predicted by method *i*. The results of these calculations represented pairs of interactions with their respective confidence. With this information, it was possible to select those interactions that fulfilled a particular confidence threshold. In this case, a confidence score of 0.7 was chosen to select the core *Leishmania major* network. The threshold

*i*

(1)

immunoprecipitation (co-IP) and yeast two-hybrid.

**2.2.2 Filtering interactions by using a combined confidence score** 

methods, although the definitions are beyond the scope of this chapter.

The score was calculated according to the formula (1):

Topological metrics such as clustering coefficient and mean shortest path help to describe global characteristics of the network. They measure the density of the connections within the network. Highly dense connected networks are characterized by modular components which also maintain the robustness of the network against failures. Biological networks tend to have a modular structure (Jeong., et al. 2001) and one additional way to test for reliability of the predicted network is by comparing the values of the clustering coefficient and mean shortest path to randomly generated networks with the same number of nodes and edges. These metrics should be statistically different between predicted and random networks. In the case of *Leishmania* network, 1,000 random networks were generated and the metrics calculated and compared to the original network.

The power law fitting for the definition of scale-free structure can be calculated using the plug-in Network Analyzer v.2.6.1(Assenov., et al. 2008) available in the platform Cytoscape (Shannon., et al. 2003). This platform includes a very advanced environment for network visualization and analysis. Network topology metrics, such as betweenness centrality, and connectivity were calculated using the Hubba server (http://hub.iis.sinica.edu.tw/ Hubbawebcite). (Lin., et al. 2008) A plug-in version of this tool in Cytoscape was recently made available. For the calculation of the metrics, the confidence scores of the interactions were used so the detection could be focused on the nodes most likely to be essential in the group of highly supported interactions. From this analysis, a potential list of targets was selected. However, it was possible that some proteins detected could also be conserved in terms of sequence and function among several organisms including humans. This becomes a problem if drugs targeting some of these proteins interfere with important biological process in humans, generating unwanted toxic effects. To avoid this, an additional filter was used for the list of predicted targets and it consisted of aligning the *Leishmania* proteins to the human proteins and excluding proteins that were conserved between these two species.

### **2.2.4 Prediction of protein function from network clusters**

An important feature of network analysis was the prediction of protein function. The normal procedure for inferring function involved a homology search of the unknown protein versus a curated protein database such as UniProt (http://www.uniprot.org/). In some occasions, the detection of protein function was not feasible as significant similarity could not be found. When this approach failed, protein interaction network analysis helped to uncover potential functions. The prediction of protein function based on network analysis involved the assumption suggested by experimental data that interacting proteins tended to have related functions. This implied that it was possible to predict the function of neighboring nodes by clustering network modules and knowing the function of some of the nodes inside of the module. This analysis was carried out over the *Leishmania* network using the Markov Clustering (MCL) algorithm (Enright., et al. 2002) which has been demonstrated to be a robust and fast algorithm for detecting clusters or modules in protein networks (Brohee & van Helden 2006). The algorithm was implemented in the NeAT tool (Brohee., et

Current Advances in Computational Strategies for Drug Discovery in Leishmaniasis 261

*LmjF34.0670*, *LmjF27.0470*, *LmjF32.2060* - were also predicted as essential. They confer resistance to antimonials and pentamidine by extruding the drug outside of the cell (Perez-Victoria., et al. 2002). Based upon our analysis, these proteins could be also interesting drug

targets due to their role in the homeostasis of the intracellular parasite environment.

Fig. 2. Visualization of the predicted *Leishmania major* interactome

LmjF11.0330 Q4QH47 PIF1 helicase-like protein LmjF35.2450 Q4FWM4 Hypothetical protein conserved

LmjF21.0853 Q4QCC1 Hypothetical protein conserved

LmjF26.0660 Q4Q9C8 Protein disulfide isomerase LmjF25.2050 Q4Q9N4 Helicase-like protein Table 2. Top 10 list of predicted targets from the *L. major* interactome.

**GeneDB ID Uniprot ID Description** LmjF15.0770 Q4QFA8 Protein kinase. LmjF07.0250 Q4QIR9 Protein kinase

LmjF25.1990 Q4Q9P0 Protein kinase

LmjF27.1800 Q4FYE1 Protein kinase-like LmjF35.1000 Q4FX16 Casein kinase I

al. 2008). For proteins of unknown function in the GeneDB database, we predicted their possible biological roles by evaluating the results of Gene Ontology terms for biological processes using the BinGO plug-in available in Cytoscape.
