**2. Materials and methods**

Our *in silico* analyses were carried out using the public databases and web based programs (Table 1). The programs were employed to identify and annotate the parasite proteins involved in the nuclear transport mechanism. The identified parasite proteins were then compared with the human counterparts.


**Table 1.** Databases and web-based programs used in the analysis of nuclear transport of *T. brucei*.

We utilized a personal computer equipped with AMD Turion 64x2 dual-core processor, memory size of 32 gigabytes and NVIDIA graphics card to perform the analyses. Our *in silico* work is summarized in Figure 4.

The nuclear transport refers to a process of entry and exit of large molecules from the cell nucleus. To identify *T. brucei* proteins of nuclear transport, the protein sequences of other various eukaryotic organisms were retrieved in FASTA format from National Centre for Biotechnology Information (NCBI) server and Universal Protein Knowledgebase/SwissProt (UniProtKB/ SwissProt) database based on biological processes and protein name search. The number of hits obtained for the query was recorded after manual inspection. The retrieved protein sequences were clustered into groups with more than 30% similarity using BLASTClust [10] to reduce non-redundant protein sequences. The non-redundant data set was subjected to BLASTp [11] analyses against an integrated genomic and functional genomic database for eukaryotic pathogens of the family Trypanosomatidae, TriTrypDB. The analysis was using cutoff point with E-value of less than 1e-06 and score of more than 100. Hits that pointed to the same location or overlapped location were removed manually. The identified protein sequences then were then retrieved from the TriTrypDB.

**Figure 4.** *In silico* analysis workflow.

36 Bioinformatics

provide an insight into drug target discovery.

compared with the human counterparts.

**2. Materials and methods** 

Protein sequence

retrieval

Clustering of

Identification of protein domains

Identification of post translational modification sites

Sequence similarity

*silico* work is summarized in Figure 4.

and human by determining the degree of protein sequence similarity. The information on the sequence level divergence between *T. brucei* proteins and their human counterparts may

Our *in silico* analyses were carried out using the public databases and web based programs (Table 1). The programs were employed to identify and annotate the parasite proteins involved in the nuclear transport mechanism. The identified parasite proteins were then

**Analysis Programme name URL and Reference where** 

protein sequences BLASTClust www.vardb.org/vardb/analysis/bla

search BLASTp (NCBI) http://blast.ncbi.nlm.nih.gov/

**Table 1.** Databases and web-based programs used in the analysis of nuclear transport of *T. brucei*.

We utilized a personal computer equipped with AMD Turion 64x2 dual-core processor, memory size of 32 gigabytes and NVIDIA graphics card to perform the analyses. Our *in* 

The nuclear transport refers to a process of entry and exit of large molecules from the cell nucleus. To identify *T. brucei* proteins of nuclear transport, the protein sequences of other various eukaryotic organisms were retrieved in FASTA format from National Centre for Biotechnology Information (NCBI) server and Universal Protein Knowledgebase/SwissProt (UniProtKB/ SwissProt) database based on biological processes and protein name search. The number of hits obtained for the query was recorded after manual inspection. The retrieved protein sequences were clustered into groups with more than 30% similarity using

Information (NCBI) www.ncbi.nlm.nih.gov/

TriTrypDB http:// tritrypdb.org/ tritrypdb/

(CDD) http://www.ncbi.nlm.nih.gov/cdd/

Research Tool (SMART) http://smart.embl-heidelberg.de/ InterPro http://www.ebi.ac.uk/interpro/

PROSITE http://prosite.expasy.org/

National Centre for Biotechnology

Universal Protein

Knowledgebase/SwissProt (UniProtKB/ SwissProt)

Conserved Domain Database

Simple Modular Architecture

**available** 

http://www.uniprot.org/

stclust.html

A portion of protein that can evolve, function, and exist independently is called protein domain. It is a compact three dimensional structure, stable and distribution of polar and non-polar side chains contribute to its folding process. To determine the functional protein domains, all identified protein sequences of *T. brucei* from TriTrypDB were subjected to

functional annotation which makes use of Conserved Domain Database (CDD) [12], Simple Modular Architecture Research Tool (SMART) [13] and InterPro [14] programs. The protein sequences were submitted in FASTA format as queries.

Investigation on Nuclear Transport of *Trypanosoma brucei*: An *in silico* Approach 39

**Protein sequences Total** 

Raw protein sequences retrieved from NCBI and UniProtKB 1546 Raw protein sequences subjected to BLASTClust programme 1548 Non redundant protein sequences resulting from BLASTClust analysis 248 Query sequences for BLASTp analysis against TritrypDB database 248

The BLASTp analyses against TriTrypDB using cut off point with E-value of less than 1e-06 and score of more than 100 for the whole 248 query protein sequences resulted in 34 hits of parasite proteins. However our approach failed to identify a Ran GTPase-activating protein (RanGAP) protein in this parasite. In reference [18] also reported that sequence similarity searches have been unable to identify a RanGAP protein in any protozoan. Keyword searches among annotated proteins in the *T. gondii* genome database identified one candidate which was shown to have strong similarity to Ran-binding protein 1 (RanBP1) based on sequence analysis. Perhaps the RanGAP function in apicomplexans is performed by a single protein with multiple cellular responsibilities (i.e., a fusion of Ran binding protein 1 and RanGAP). It is also possible that a completely unique parasite protein

Table 3 shows the identified and characterized parasite proteins involved in the nuclear transport machinery. The functional annotation based on protein domains, showed that, out of 34, only 22 parasite protein sequences were predicted with high confidence level to be involved in the nuclear transport mechanism with the presence of relevant protein domains. This includes guanine triphosphate (GTP)-binding domain, Nucleoporin (NUP) C terminal domain, Armadillo repeat, Importin B N-terminal domain, regulator of chromosome condensation 1 (RCC1) repeat and Exportin domain (Table 4). All these protein domains were experimentally verified to regulate the nuclear transport mechanism in eukaryotes. There were seven *T. brucei* proteins that exhibited functional features of the Importin receptor. This finding is consensus with the number of Importin receptors in another eukaryotic pathogen, *Toxoplasma gondii* [8]. In addition, our results of other nuclear transport constituents in *T. brucei* such as RCC1, Ran, nuclear transport factor 2 (NTF2), cell apoptosis susceptibility (CAS), Exportin and Ran binding proteins were also in agreement with

The nuclear and cytoplasmic compartments are divided by the nuclear envelope in eukaryotes. By using this compartmentalization and controlling the movement of molecules between the nucleus and the cytosol, cells are able to regulate numerous cellular mechanisms such as transcription and translation. Proteins with molecular size lower than 40 kDa are able to passively diffuse through the nuclear pore complex (NPC), whereas larger proteins require active transport through the assistance of Karyopherins, specific transport receptors that shuttle between the nucleus and cytosol. Karyopherins which are able to distinguish between the diverse proteome to target specific cargo molecules for transport, can be subdivided into those that transport molecules into the nucleus (Importins) and those that transport molecules out of the nucleus (Exportins). It has been reported that

**Table 2.** Summary of protein sequences retrieved in *in silico* analysis.

possesses the RanGAP function.

reference [18].

Posttranslational modification (PTM) is the chemical modification of a protein after its translation. It is one of the later steps in protein biosynthesis, and thus gene expression, for many proteins. In this part of study, in relation to regulatory aspects of nuclear transport mechanism, we focused on potential glycosylation and phosphorylation sites. To analyze the post translational modification sites, all protein sequences of *T. brucei* from TriTrypDB were subjected to PROSITE [15] programme. The proteins sequences were submitted in FASTA format as queries.

Protein–protein interactions occur when two or more proteins bind together, often to carry out their biological function. Proteins might interact for a long time to form part of a protein complex, a protein may be carrying another protein, or a protein may interact briefly with another protein just to modify it. To analyze the participation of parasite proteins in proteinprotein interactions, all protein sequences of *T. brucei* from TriTrypDB were subjected to mining of STRING 8.2 database [16]. The STRING 8.2 database integrates information from numerous sources, including experimental repositories, computational prediction methods and public text collections. The proteins sequences were submitted in FASTA format as queries. All information on protein-protein interaction were recorded and evaluated accordingly.

The degree of similarity between amino acids occupying a particular position in the protein sequence can be interpreted as a rough measure of how conserved a particular region or sequence motif is. To compare the parasite proteins with human homologues, all protein sequences of *T. brucei* from TriTrypDB were subjected to BLASTp analysis against *Homo sapiens* proteins. The proteins sequences were submitted in FASTA format as queries. The criteria such as cutoff point with E-value of less than 1e-06 and score of more than 100 were used.
