**1. Introduction**

30 Bioinformatics

[39] MetaCyc. http://metacyc.org/

*suppl 1. 2005, pp. 177-185.*

*Physics, May 2007*.

Sequence Analysis.

[50] RDF-OWL. http://www.w3.org/RDF/ [51] BioPAX: http://www.biopax.org/

*Research*, 33, 1399-1409.

Bioinformatics 7:296

pp. 207-215.

2000.

[40] Kher, S; Jianling Peng; SyrkinWurtele, E.; Dickerson, J. A Symbolic computing approach to evidence code mapping for biological data integration and subjective analysis for reference associations for metabolic pathways, Annual Meeting of the North American

[41] Kher, S; Dickerson, J; Rawat N. Biological pathway data integration trends, techniques, issues and challenges: A survey, Nature and biologically inspired computing, NaBIC

[43] Karp, P. D., Paley, S., Krieger, C. J. An Evidence Ontology for Use in Pathway/Genome DS, Pacific Symposium on Biocomputing 2004, pp. 190-201, Singapore Bounsaythip, C., Lindfors, E., Gopalacharyulu, P., Hollmen, J., and Oresic, M. *Network Based Representation of Biological Data for Enabling Context Based Mining, Bioinformatics, vol.21,* 

[44] Newman, M. E. J and Leicht, E. A *Mixture Models and Exploratory Analysis in Networks*,

[45] Pearl. J. (2000) *Causality: Models, Reasoning, and Inference.*Cambridge University Press,

[46] Pearl, J. (1988). *Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference*.

[47] Christopher Nemeth, STOR-I, Hidden Markov Models with Applications to DNA

[48] Crompton, S.; Matthews, B.; Gray, A.; Jones, A.; White, R. Data Integration in Bioinformatics Using OGSA-DAI, In Proceedings of Fourth All Hands Meeting, 2005. [49] Cheung Kei-hoi; Qi, P; Tuck,D; Krauthammer,M. A Semantic Web Approach to Biological Pathway Data Reasoning and Integration, Elsevier Vol. 4, issue 3, Sep 2006,

[52] Sun, J. and Zeng, A. , IdentiCS – Identification of coding sequence and *in silico*  reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence, *BMC Bioinformatics* 2004, 5:112 doi:10.1186/1471-2105-5-112 [53] Pinney, J.W., Shirley, M.W., McConkey, G.A., Westhead, D.R. (2005) MetaSHARK: software for automated metabolic network prediction from DNA sequence and is application to the genomes of *Plasmodium falciparum* and *Eimeria tenella*, *Nucleic Acids* 

[54] Notebaart, R. A., F. H. van Enckevort, C. Francke, R. J. Siezen, and B. Teusink. 2006. Accelerating the reconstruction of genome-scale metabolic networks. BMC

[55] Arredondo, T., Seeger, M., Dombrovskaia, L., Avarias, J., Calderón, F., Candel, D., Muñoz, F., Latorre, V., Agulló, L., Cordova, M., and Gómez, L.: "Bioinformatics Integration Framework for Metabolic Pathway Data-Mining". In: Ali, M., Dapoigny, R.(eds): Innovations in Applied Artificial Intelligence. Lecture Notes in Artificial

Intelligence, Vol. 4031. Springer-Verlag, Berlin (2006) p. 917-926

Fuzzy Information Processing Society, 2008, NAFIPS 2008. NY 2008. pp. 1-6.

2010, Second World Congress, Fukuoka, Japan, 2010, pp.177 – 182.

[42] MetNetDB. http://www.metnetdb.org/MetNet\_db.htm

San Mateo, CA, USA: Morgan Kaufmann Publishers.

#### **1.1. Trypanosomiasis**

A group of animal and human diseases caused by parasitic protozoan trypanosomes is called trypanosomiases. The final decade of the 20th century witnessed a frightening revival in sleeping sickness (human African trypanosomiasis) in sub-Saharan Africa. Meanwhile, Chagas' disease (American trypanosomiasis) remains one of the most widespread infectious diseases in South and Central America. Arthropod vectors are responsible for the spread of African and American trypanosomiases, and disease restraint through insect control programs is an attainable target. However, the existing drugs for both illnesses are far from ideal. The trypanosomes are some of the earliest diverging members of the Eukaryotae and share several biochemical oddities that have inspired research into discovery of new drug targets. Nevertheless, discrepancies in mode of interactions between trypanosome species and their hosts have spoiled efforts to design drugs effective against both species. Heightened awareness of these neglected diseases might result in progress towards control through increased financial support for drug development and vector eradication [1].

Trypanosome is a group of unicellular parasitic flagellate protozoa which mostly infects the vertebrate genera. A number of trypanosome species cause important veterinary diseases, but only two cause significant human diseases. In sub-Saharan Africa, *Trypanosoma brucei* causes sleeping sickness or human African trypanosomiasis whilst in America, *Trypanosoma cruzi* causes Chagas' disease (Figure 1) [2]. Meanwhile, the life cycle of these parasitic protozoa engage insect vectors and mammalian hosts (Figure 2) [1]. All trypanosomes require more than one obligatory host to complete their life cycle and are transmitted via vectors. Most of the species are transmitted by blood-feeding invertebrates, however there

© 2012 Yahya et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Yahya et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Investigation on Nuclear Transport of *Trypanosoma brucei*: An *in silico* Approach 33

are distinct mechanisms among the varying species. In the invertebrate hosts they are generally found in the intestines as opposed to the bloodstream or any other intracellular environment in the mammalian host. As trypanosomes develop through their life cycle, they

The life cycle often consists of the trypomastigote form in the vertebrate host and the trypomastigote or promastigote form in the gut of the invertebrate host. Intracellular lifecycle stages are normally found in the amastigote form. The trypomastigote morphology

The genome organization of *T. brucei* is splitted into nuclear and mitochondrial genomes. The nuclear genome of *T. brucei* is made up of three classes of chromosomes according to their size on pulsed-field gel electrophoresis, large chromosomes (1 to 6 megabase pairs), intermediate chromosomes (200 to 500 kilobase pairs) and mini chromosomes (50 to 100 kilobase pairs) [4]. The large chromosomes contain most genes, while the small chromosomes tend to carry genes involved in antigenic variation, including the variant surface glycoprotein (VSG) genes. Meanwhile, the mitochondrial genome of the Trypanosoma, as well as of other kinetoplastids, known as the kinetoplast, is characterized by a highly complex series of catenated circles and minicircles and requires a cohort of proteins for organisation during cell division. The genome of *T. brucei* has been completely

Nuclear transport of proteins and ribonucleic acids (RNAs) between the nucleus and cytoplasm is a key mechanism in eukaryotic cells [6]. The transport between the nucleus and cytoplasm involves primarily three classes of macromolecules: substrates, adaptors, and receptors. The transport complex is formed when the substrates bind to an import or an export receptor. Some transport substrates require one or more adaptors to mediate formation of a transport complex. Once assembled, these transport complexes are transferred in one direction across the nuclear envelope via aqueous channels that are part of the nuclear pore complexes (NPCs). Following dissociation of the transport complex, both adaptors and receptors are recycled through the NPC to allow another round of transport to occur. Directionality of either import or export therefore depends on the formation of receptor-substrate complex on one side of the nuclear envelope and the dissociation of the complex on the other. The Ran GTPase is vital in producing this asymmetry. Modulation of nuclear transport generally involves specific inhibition of the formation of a transport complex, however, more global forms of regulation also occur [7]. The general concept of

*In silico* study is defined as an analysis which is performed using computer or via computer simulation. It involves the strategy of managing, mining, integrating, and interpreting

undergo a series of morphological changes [3] as is typical of trypanosomatids.

is unique to species in the genus Trypanosoma.

sequenced and is now available online [5].

import and export process is shown in Figure 3 [8].

**1.2. Nuclear transport** 

**1.3.** *In silico* **approach** 

**Figure 1.** Geographic distribution of *Trypanosoma brucei* and *Trypanosoma cruzi*, showing endemic countries harboring these diseases [2].

**Figure 2.** Life cycles of (A) *Trypanosoma cruzi* and (B) *Trypanosoma brucei*. Upper cycles represent different stages that take place in the insect vectors. Lower cycles represent different stages in man and other mammalian hosts [1].

are distinct mechanisms among the varying species. In the invertebrate hosts they are generally found in the intestines as opposed to the bloodstream or any other intracellular environment in the mammalian host. As trypanosomes develop through their life cycle, they undergo a series of morphological changes [3] as is typical of trypanosomatids.

The life cycle often consists of the trypomastigote form in the vertebrate host and the trypomastigote or promastigote form in the gut of the invertebrate host. Intracellular lifecycle stages are normally found in the amastigote form. The trypomastigote morphology is unique to species in the genus Trypanosoma.

The genome organization of *T. brucei* is splitted into nuclear and mitochondrial genomes. The nuclear genome of *T. brucei* is made up of three classes of chromosomes according to their size on pulsed-field gel electrophoresis, large chromosomes (1 to 6 megabase pairs), intermediate chromosomes (200 to 500 kilobase pairs) and mini chromosomes (50 to 100 kilobase pairs) [4]. The large chromosomes contain most genes, while the small chromosomes tend to carry genes involved in antigenic variation, including the variant surface glycoprotein (VSG) genes. Meanwhile, the mitochondrial genome of the Trypanosoma, as well as of other kinetoplastids, known as the kinetoplast, is characterized by a highly complex series of catenated circles and minicircles and requires a cohort of proteins for organisation during cell division. The genome of *T. brucei* has been completely sequenced and is now available online [5].

## **1.2. Nuclear transport**

32 Bioinformatics

countries harboring these diseases [2].

other mammalian hosts [1].

**Figure 1.** Geographic distribution of *Trypanosoma brucei* and *Trypanosoma cruzi*, showing endemic

**Figure 2.** Life cycles of (A) *Trypanosoma cruzi* and (B) *Trypanosoma brucei*. Upper cycles represent different stages that take place in the insect vectors. Lower cycles represent different stages in man and Nuclear transport of proteins and ribonucleic acids (RNAs) between the nucleus and cytoplasm is a key mechanism in eukaryotic cells [6]. The transport between the nucleus and cytoplasm involves primarily three classes of macromolecules: substrates, adaptors, and receptors. The transport complex is formed when the substrates bind to an import or an export receptor. Some transport substrates require one or more adaptors to mediate formation of a transport complex. Once assembled, these transport complexes are transferred in one direction across the nuclear envelope via aqueous channels that are part of the nuclear pore complexes (NPCs). Following dissociation of the transport complex, both adaptors and receptors are recycled through the NPC to allow another round of transport to occur. Directionality of either import or export therefore depends on the formation of receptor-substrate complex on one side of the nuclear envelope and the dissociation of the complex on the other. The Ran GTPase is vital in producing this asymmetry. Modulation of nuclear transport generally involves specific inhibition of the formation of a transport complex, however, more global forms of regulation also occur [7]. The general concept of import and export process is shown in Figure 3 [8].

## **1.3.** *In silico* **approach**

*In silico* study is defined as an analysis which is performed using computer or via computer simulation. It involves the strategy of managing, mining, integrating, and interpreting

Investigation on Nuclear Transport of *Trypanosoma brucei*: An *in silico* Approach 35

information from biological data at the genomic, metabalomic, proteomic, phylogenetic, cellular, or whole organism levels. The bioinformatics instruments and skills become crucial for *in silico* research as genome sequencing projects have resulted in an exponential growth in protein and nucleic acid sequence databases. Interaction among genes that gives rise to multiprotein functionality generates more data and complexity. *In silico* approach in medicine is not only reducing the need for expensive lab work and clinical trials but also is possible to speed the rate of drug discovery. In 2010, for example, researchers found potential inhibitors to an enzyme associated with cancer activity *in silico* using the protein docking algorithm EADock [9]. About 50 % of the molecules were later shown to be active inhibitors *in vitro* [9]. A unique advantage of the *in silico* approach is its worldwide accessibility. In some cases, having internet access or even just a computer is sufficient enough. Laboratory experiments either *in vivo* or *in vitro* both require more materials. In protein sequence analysis, *in silico* approach gives highly reproducible results in many cases or even exactly the same results because it only relies on comparison of the query sequence to a database of previously annotated sequences. However, in sophisticated analysis such as development of the 3-D structure of proteins from their primary sequences, discrepancies in results are to be expected due to the manual optimization which must consider several crucial steps such as template selection, target-template alignment, model construction and

Considering the importance of nuclear shuttling in many cellular processes, proteins responsible for the nuclear transport are vital for parasite survival. The presence of nuclear transport machinery was highlighted in the eukaryotic parasites such as *Plasmodium falciparum*, *Toxoplasma gondii* and *Cryptosporidium parvum*. However, the nuclear transport in *T. brucei* has not been established. Nuclear shuttling is one of the overlooked aspects of drug design and delivery. Exploitation of macromolecules movement across the nuclear envelope promises to be an exciting area of drug development. Furthermore, the divergence between host and parasite systems is always exploited as a strategy in drug development. Therefore, the exploitation of peculiarities of *T. brucei* nuclear transport machinery as compared to its host might be a promising strategy for the control of trypanosomiasis, which remains to be

This study is carried out to investigate the nuclear transport constituents of *T. brucei* by determining the functional characteristics of the parasite proteins. This includes functional protein domain, post translational modification sites and protein-protein interaction. The parasite proteins identified to exhibit the relevant functional protein domains, post translational modification sites and protein-protein interaction, are predicted as the true components for nuclear transport mechanism. This study also aims to evaluate the unique characteristics of proteins responsible for nuclear transport machinery between the parasites

model evaluation.

further investigated.

**1.5. Objectives** 

**1.4. Problem statements** 

Key:


GDP Guanine diphosphate

NTF2 Nuclear transport factor 2

RCC1 Regulator of chromosome condensation 1

**Figure 3.** For import of molecules, cytoplasmic cargo is identified by Importin *a*, which then binds to Importin b (1). This ternary complex translocates through the nuclear membrane and into the nucleus. Once there, RanGTP binds to Importin b and causes a dissociation of the complex, which releases cargo to the nucleus (2). Import receptors are then recycled back to the nucleus (3) through binding of RanGTP and export to the cytosol. RanGTP is then hydrolyzed to the GDP-bound state and causes the release of the import receptors (4) and the cycle starts over again. Export of cargo undergoes a similar mechanism. Exported molecules will bind to the export receptor with RanGTP and exit the nucleus (5). Next RanGTP is hydrolyzed to cause release of cargo into the cytoplasm (6). NTF2 specifically identifies RanGDP and returns it to the nucleus (7) for RCC1 to then exchange it to RanGTP (8) [8].

information from biological data at the genomic, metabalomic, proteomic, phylogenetic, cellular, or whole organism levels. The bioinformatics instruments and skills become crucial for *in silico* research as genome sequencing projects have resulted in an exponential growth in protein and nucleic acid sequence databases. Interaction among genes that gives rise to multiprotein functionality generates more data and complexity. *In silico* approach in medicine is not only reducing the need for expensive lab work and clinical trials but also is possible to speed the rate of drug discovery. In 2010, for example, researchers found potential inhibitors to an enzyme associated with cancer activity *in silico* using the protein docking algorithm EADock [9]. About 50 % of the molecules were later shown to be active inhibitors *in vitro* [9]. A unique advantage of the *in silico* approach is its worldwide accessibility. In some cases, having internet access or even just a computer is sufficient enough. Laboratory experiments either *in vivo* or *in vitro* both require more materials. In protein sequence analysis, *in silico* approach gives highly reproducible results in many cases or even exactly the same results because it only relies on comparison of the query sequence to a database of previously annotated sequences. However, in sophisticated analysis such as development of the 3-D structure of proteins from their primary sequences, discrepancies in results are to be expected due to the manual optimization which must consider several crucial steps such as template selection, target-template alignment, model construction and model evaluation.

#### **1.4. Problem statements**

34 Bioinformatics

Key:

GTP Guanine triphosphate GDP Guanine diphosphate NTF2 Nuclear transport factor 2

RCC1 Regulator of chromosome condensation 1

**Figure 3.** For import of molecules, cytoplasmic cargo is identified by Importin *a*, which then binds to Importin b (1). This ternary complex translocates through the nuclear membrane and into the nucleus. Once there, RanGTP binds to Importin b and causes a dissociation of the complex, which releases cargo to the nucleus (2). Import receptors are then recycled back to the nucleus (3) through binding of RanGTP and export to the cytosol. RanGTP is then hydrolyzed to the GDP-bound state and causes the release of the import receptors (4) and the cycle starts over again. Export of cargo undergoes a similar mechanism. Exported molecules will bind to the export receptor with RanGTP and exit the nucleus (5). Next RanGTP is hydrolyzed to cause release of cargo into the cytoplasm (6). NTF2 specifically identifies

RanGDP and returns it to the nucleus (7) for RCC1 to then exchange it to RanGTP (8) [8].

Considering the importance of nuclear shuttling in many cellular processes, proteins responsible for the nuclear transport are vital for parasite survival. The presence of nuclear transport machinery was highlighted in the eukaryotic parasites such as *Plasmodium falciparum*, *Toxoplasma gondii* and *Cryptosporidium parvum*. However, the nuclear transport in *T. brucei* has not been established. Nuclear shuttling is one of the overlooked aspects of drug design and delivery. Exploitation of macromolecules movement across the nuclear envelope promises to be an exciting area of drug development. Furthermore, the divergence between host and parasite systems is always exploited as a strategy in drug development. Therefore, the exploitation of peculiarities of *T. brucei* nuclear transport machinery as compared to its host might be a promising strategy for the control of trypanosomiasis, which remains to be further investigated.

#### **1.5. Objectives**

This study is carried out to investigate the nuclear transport constituents of *T. brucei* by determining the functional characteristics of the parasite proteins. This includes functional protein domain, post translational modification sites and protein-protein interaction. The parasite proteins identified to exhibit the relevant functional protein domains, post translational modification sites and protein-protein interaction, are predicted as the true components for nuclear transport mechanism. This study also aims to evaluate the unique characteristics of proteins responsible for nuclear transport machinery between the parasites and human by determining the degree of protein sequence similarity. The information on the sequence level divergence between *T. brucei* proteins and their human counterparts may provide an insight into drug target discovery.

Investigation on Nuclear Transport of *Trypanosoma brucei*: An *in silico* Approach 37

Removal of unreviewed and partial raw protein sequences

Database mining of functional proteinprotein interactions

Sequence similarity search against *Homo sapiens* 

BLASTClust [10] to reduce non-redundant protein sequences. The non-redundant data set was subjected to BLASTp [11] analyses against an integrated genomic and functional genomic database for eukaryotic pathogens of the family Trypanosomatidae, TriTrypDB. The analysis was using cutoff point with E-value of less than 1e-06 and score of more than 100. Hits that pointed to the same location or overlapped location were removed manually.

Keyword search

Retrieval of raw protein sequences from two public databases

Clustering of reviewed raw protein sequences

Sequence similarity search against *T. brucei* database

Retrieval of identified parasite protein sequences

Functional annotation of identified parasite proteins

Identification of protein domains

A portion of protein that can evolve, function, and exist independently is called protein domain. It is a compact three dimensional structure, stable and distribution of polar and non-polar side chains contribute to its folding process. To determine the functional protein domains, all identified protein sequences of *T. brucei* from TriTrypDB were subjected to

The identified protein sequences then were then retrieved from the TriTrypDB.

**Figure 4.** *In silico* analysis workflow.

Identification of post translational modification sites
