**1. Introduction**

Bioinformatics is the application of techniques derived from disciplines such as applied mathematics, computer science, and statistics to analyze and interpret biological data. In this chapter, you will learn how to use bioinformatic techniques to identify pathogen virulence factor (VF) peptide sequence similarities to human nerve tissue proteins and then how to identify target peptides that could form the basis for engineering recombinant antibodies. Also, wet experiments could be conducted on the identified overlapping sequences to help us to single out target antibodies to be tested for tissue culture studies [1, 2]. The most ideal targeted peptide sequences for antibody engineering are those physiologically relevant, easy to access, and comprise amino acid sequence regions which have high specificity in pathogenic steps and reduced amino acid string length.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### **1.1. Bioinformatics and its role in peptide discovery**

The accessibility to the extensive genomic and proteomic databases and the availability of tools to compare and evaluate the information have given rise to a new interdisciplinary field that combines biology and computer science [3]. Bioinformatics conceptualizes physical and chemical biology in terms of macromolecules and then applies "informatics" techniques (derived from disciplines such as applied mathematics, computer science, and statistics) to assimilate and organize the information associated with these molecules, on a large scale [4]. Bioinformatics is an exciting and exploratory method for peptide discovery in antibody engineering and development of antimicrobial therapies and vaccination strategies [5].

sequence similarities of a few selected infectious agents with human nerve tissue proteins for selecting peptides to engineer antipeptide antibodies which recognizes corresponding host/

Bioinformatics as a Tool to Identify Infectious Disease Pathogen Peptide Sequences as Targets…

http://dx.doi.org/10.5772/intechopen.71011

279

63 proteins were extracted from the Human Protein Atlas Database that were enriched and

To conduct a search for human proteins in the nervous tissue, access the website (www.proteinatlas.org), enter the tissue of study (e.g. nervous tissue) into the search box provided and

Manual protein selection was carried out based on their tissue expression (enriched and

The list of selected proteins are as follows: agrin (AGRN\_HUMAN, O00468), calbindin (CALB1\_HUMAN, P05937), n-chimaerin (CHIN\_HUMAN, P15882), secretogranin-2 (SCG2\_ HUMAN, P13521), neuromodulin (NEUM\_HUMAN, P17677), kinesin (KIFC1\_HUMAN, Q9BW19), tau (TAU\_HUMAN, P10636), 2′,3′-cyclic-nucleotide 3′-phosphodiesterase (CN37\_ HUMAN, P09543), myelin-associated glycoprotein (MAG\_HUMAN, P20916), myelin P0 (MYP0\_HUMAN, P25189), myelin P2 (MYP2\_HUMAN, P02689), oligodendrocyte-myelin

enhanced in the nervous tissue as observed by immunehistochemistry (**Figure 1**).

enhanced) and also on immunohistochemistry evidence (**Figure 2**).

**Figure 1.** Conducting a search on the Human Protein Atlas Database.

viral proteins.

click on search.

*1.2.1. Selection of nerve proteins*

There is significantly growing evidence that a number of neurodegenerative diseases are a result of the association of host cell proteins with viral and bacterial infectious agents [6]. When pathogenic micro organisms such as bacteria, viruses, parasites, or fungi cause an infectious disease, there are many molecular interactions between the host-pathogen proteins and host peptides [7] through all the stages of the disease whether incubation, prodromal illness, decline, and convalescence. There is much experimental evidence identifying the virulence factors (VF) of pathogen and host components such as receptors and tissue-specific proteins [8, 9]. Though the pathogenic pathway of the infectious agent in various host tissues is unknown, many of these processes are suspected to be attributable to the yet undiscovered role of molecular mimics identified in pathogenic microorganisms and its corresponding host tissue proteins. The sequence and structural similarities between the pathogenic VF protein and nerve peptides could impact either directly or indirectly the pathogenesis of the infectious disease [10–12]. It could contribute to molecular mimicry, steric hindrance, receptor binding, cell signaling, and autoantibody production events (involved in neuro degeneration) in the host.

Leprosy patients with peripheral nerve damage develop autoimmunity to myelin P0 (nerve protein). The above conclusion was drawn by gathering known scientific evidence that are as follows: (1) labeling and binding studies found that *Mycobacterium leprae* (bacterium causing leprosy) binds to myelin P0 [13]; (2) clinical studies confirmed the production of autoantibodies as a response of the bacterium to interact with myelin P0 [14, 15]; and (3) bioinformatics searches identified sequences and structural similarities between *M. leprae* and the immunoglobulin regions of myelin P0 [16].

Identification of molecular mimics in pathogen-host peptide sequences is one approach to identify target peptides for antibody engineering. There are about 180 extensive biological databases to retrieve information on sequence and functional aspects of biological molecules. The updated list is available in Nucleic Acids Research [17].

#### **1.2. The use of bioinformatics in identifying sequence similarities**

This section teaches you how to conduct a search for proteins present in a target host, how to obtain its amino acid sequence/s from the existing databases, how to compare the sequence/s of the host protein to that of the pathogen protein, and finally how to interpret the results based on existing evidential data. In our case study, we identify the virulence factor peptide sequence similarities of a few selected infectious agents with human nerve tissue proteins for selecting peptides to engineer antipeptide antibodies which recognizes corresponding host/ viral proteins.

#### *1.2.1. Selection of nerve proteins*

**1.1. Bioinformatics and its role in peptide discovery**

in the host.

278 Antibody Engineering

globulin regions of myelin P0 [16].

The updated list is available in Nucleic Acids Research [17].

**1.2. The use of bioinformatics in identifying sequence similarities**

The accessibility to the extensive genomic and proteomic databases and the availability of tools to compare and evaluate the information have given rise to a new interdisciplinary field that combines biology and computer science [3]. Bioinformatics conceptualizes physical and chemical biology in terms of macromolecules and then applies "informatics" techniques (derived from disciplines such as applied mathematics, computer science, and statistics) to assimilate and organize the information associated with these molecules, on a large scale [4]. Bioinformatics is an exciting and exploratory method for peptide discovery in antibody engi-

There is significantly growing evidence that a number of neurodegenerative diseases are a result of the association of host cell proteins with viral and bacterial infectious agents [6]. When pathogenic micro organisms such as bacteria, viruses, parasites, or fungi cause an infectious disease, there are many molecular interactions between the host-pathogen proteins and host peptides [7] through all the stages of the disease whether incubation, prodromal illness, decline, and convalescence. There is much experimental evidence identifying the virulence factors (VF) of pathogen and host components such as receptors and tissue-specific proteins [8, 9]. Though the pathogenic pathway of the infectious agent in various host tissues is unknown, many of these processes are suspected to be attributable to the yet undiscovered role of molecular mimics identified in pathogenic microorganisms and its corresponding host tissue proteins. The sequence and structural similarities between the pathogenic VF protein and nerve peptides could impact either directly or indirectly the pathogenesis of the infectious disease [10–12]. It could contribute to molecular mimicry, steric hindrance, receptor binding, cell signaling, and autoantibody production events (involved in neuro degeneration)

Leprosy patients with peripheral nerve damage develop autoimmunity to myelin P0 (nerve protein). The above conclusion was drawn by gathering known scientific evidence that are as follows: (1) labeling and binding studies found that *Mycobacterium leprae* (bacterium causing leprosy) binds to myelin P0 [13]; (2) clinical studies confirmed the production of autoantibodies as a response of the bacterium to interact with myelin P0 [14, 15]; and (3) bioinformatics searches identified sequences and structural similarities between *M. leprae* and the immuno-

Identification of molecular mimics in pathogen-host peptide sequences is one approach to identify target peptides for antibody engineering. There are about 180 extensive biological databases to retrieve information on sequence and functional aspects of biological molecules.

This section teaches you how to conduct a search for proteins present in a target host, how to obtain its amino acid sequence/s from the existing databases, how to compare the sequence/s of the host protein to that of the pathogen protein, and finally how to interpret the results based on existing evidential data. In our case study, we identify the virulence factor peptide

neering and development of antimicrobial therapies and vaccination strategies [5].

63 proteins were extracted from the Human Protein Atlas Database that were enriched and enhanced in the nervous tissue as observed by immunehistochemistry (**Figure 1**).

To conduct a search for human proteins in the nervous tissue, access the website (www.proteinatlas.org), enter the tissue of study (e.g. nervous tissue) into the search box provided and click on search.

Manual protein selection was carried out based on their tissue expression (enriched and enhanced) and also on immunohistochemistry evidence (**Figure 2**).

The list of selected proteins are as follows: agrin (AGRN\_HUMAN, O00468), calbindin (CALB1\_HUMAN, P05937), n-chimaerin (CHIN\_HUMAN, P15882), secretogranin-2 (SCG2\_ HUMAN, P13521), neuromodulin (NEUM\_HUMAN, P17677), kinesin (KIFC1\_HUMAN, Q9BW19), tau (TAU\_HUMAN, P10636), 2′,3′-cyclic-nucleotide 3′-phosphodiesterase (CN37\_ HUMAN, P09543), myelin-associated glycoprotein (MAG\_HUMAN, P20916), myelin P0 (MYP0\_HUMAN, P25189), myelin P2 (MYP2\_HUMAN, P02689), oligodendrocyte-myelin

**Figure 1.** Conducting a search on the Human Protein Atlas Database.

protein KIAA0319 (K0319\_HUMAN, Q5VV43), uncharacterized protein KIAA1211-like (K121L\_HUMAN, Q6NV74), microtubule-associated protein 1B (MAP1B\_HUMAN, P46821), neuronal calcium sensor 1 (NCS1\_HUMAN, P62166), neurofilament light polypeptide (NFL\_ HUMAN, P07196), receptor expression-enhancing protein 2 (REEP2\_HUMAN, Q9BRK0), secretogranin-3 (SCG3\_HUMAN, Q8WXD2), ubiquitin carboxyl-terminal hydrolase isozyme L1 (UCHL\_HUMAN, P09936), galactosylgalactosylxylosylprotein 3-beta-glucuronosyltransferase 1 (B3GA1\_HUMAN, Q9P2W7), beta-1,4 N-acetylgalactosaminyltransferase 1 (B4GN1\_HUMAN, Q00973), caprin-2 (CAPR2\_HUMAN, Q6IMN6), dopamine beta-hydroxylase (DOPO\_HUMAN, P09172), FAM81A (FA81A\_HUMAN, Q8TBF8), mitogen-activated protein kinase 10 (MK10\_HUMAN, P53779), N-terminal EF-hand calcium-binding protein 1 (NECA1\_HUMAN, Q8N987), neuroligin-3 (NLGN3\_HUMAN, Q9NZ94), protein kinase C and casein kinase substrate in neurons protein 1 (PACN1\_HUMAN, Q9BY11), sodium channel protein type 7 subunit alpha (SCN7A\_HUMAN, Q01118), and clathrin coat assembly AP180 (AP180\_HUMAN, O60641). The biological accepts of the proteins have been derived

Bioinformatics as a Tool to Identify Infectious Disease Pathogen Peptide Sequences as Targets…

http://dx.doi.org/10.5772/intechopen.71011

281

from the information presented in UniProt database for each protein [18–20].

FASTA formats for each of the above proteins were retrieved from NCBI PubMed. The FASTA format is a text-based format obtained from the PubMed search and represents either nucleo-

Upon accessing the website, select the database in which the search is to be conducted (e.g. Protein). Type the name of the protein and its species in brackets into the search text box pro-

The protein with the highest number of amino acids is chosen. Click on the hyperlinked protein to access its gene bank. Upon reaching the gene bank of the selected protein, click on the

All the FASTA formats of the human proteins are saved in a sequence on Microsoft Notepad

Pathogen-protein mimics, nerve protein sequences were BLAST (Basic Local Alignment

Access the BLAST website at https://blast.ncbi.nlm.nih.gov/Blast.cgi and click on Protein (Protein~Protein) BLAST. The FASTA formats of 63 nerve proteins were copied and pasted from the notepad into the text box provided. Enter the species of the organism against which the blast has to be performed/the sequence comparison has to be carried out specifying its Tax ID (**Figure 8**).

Search Tool; Version 2.7.1; e-value ≤0.01) [21] against a pathogen genome (**Figure 8**).

Obtain the FASTA format by copying all the information (Starting from the > symbol).

*1.2.2. Retrieving FASTA formats*

tide sequences or peptide sequences (**Figure 3**).

hyperlinked *FASTA* (**Figures 4**, **5** and **6**).

*1.2.3. Arranging the FASTA formats*

*1.2.4. Running the BLAST*

(**Figure 7**).

vided (e.g. Agrin (*Homo sapiens*)) and click on the search button.

**Figure 2.** Conducting an advanced search on the Human Protein Atlas Database.

glycoprotein (OMGP\_HUMAN, P23515), brain-derived neurotrophic factor (BDNF\_HUMAN, P23560), ciliary neurotrophic factor (CNTF\_HUMAN, P26441), neurotrophin-3 (NTF3\_ HUMAN, P20783), beta-nerve growth factor (NGF\_HUMAN, P01138), nestin (NEST\_HUMAN, P48681), neurofilament heavy polypeptide (NFH\_HUMAN, P12036), neurogranin (NEUG\_ HUMAN, Q92686), voltage-dependent T-type calcium channel subunit alpha-1G (CAC1G\_HUMAN, O43497), hippocalcin (HPCL1\_HUMAN, P37235), neurocalcin-delta (NCALD\_HUMAN, P61601), recoverin (RECO\_HUMAN, P35243), bombesin receptor subtype-3 (BRS3\_HUMAN, P32247), kininogen-1/bradykinin (KNG1\_HUMAN, P01042), calcitonin (CALC\_HUMAN, P01258), cholecystokinin (CCKN\_HUMAN, P06307), galanin peptides (GALA\_HUMAN, P22466), pro-neuropeptide Y (NPY\_HUMAN, P01303), neurotensin/neuromedin N (NEUT\_HUMAN, P30990), protein S100-B (S100B\_HUMAN, P04271), synapsin-1 (SYN1\_HUMAN, P17600), probable tubulin polyglutamylase (TTLL1\_HUMAN, O95922), myelin basic protein (MBP\_HUMAN, P02686), protein phosphatase 1 regulatory subunit 1B (PPR1B\_HUMAN, Q9UD71), Arf-GAP with GTPase, ANK repeat and PH domain-containing protein 2 (AGAP2\_HUMAN, Q99490), cathepsin L2 (CATL2\_HUMAN, O60911), D(1A) dopamine receptor (DRD1\_HUMAN, P21728), BDNF/NT-3 growth factors receptor (NTRK2\_HUMAN, Q16620), melanoma-associated antigen E1 (MAGE1\_HUMAN, Q9HCI5), microtubule-associated protein 6 (MAP6\_HUMAN, Q96JE9), protocadherin alpha-12 (PCDAC\_HUMAN, Q9UN75), carboxypeptidase E (CBPE\_HUMAN, P16870), Down syndrome cell adhesion molecule (DSCAM\_HUMAN, O60469), dyslexia-associated protein KIAA0319 (K0319\_HUMAN, Q5VV43), uncharacterized protein KIAA1211-like (K121L\_HUMAN, Q6NV74), microtubule-associated protein 1B (MAP1B\_HUMAN, P46821), neuronal calcium sensor 1 (NCS1\_HUMAN, P62166), neurofilament light polypeptide (NFL\_ HUMAN, P07196), receptor expression-enhancing protein 2 (REEP2\_HUMAN, Q9BRK0), secretogranin-3 (SCG3\_HUMAN, Q8WXD2), ubiquitin carboxyl-terminal hydrolase isozyme L1 (UCHL\_HUMAN, P09936), galactosylgalactosylxylosylprotein 3-beta-glucuronosyltransferase 1 (B3GA1\_HUMAN, Q9P2W7), beta-1,4 N-acetylgalactosaminyltransferase 1 (B4GN1\_HUMAN, Q00973), caprin-2 (CAPR2\_HUMAN, Q6IMN6), dopamine beta-hydroxylase (DOPO\_HUMAN, P09172), FAM81A (FA81A\_HUMAN, Q8TBF8), mitogen-activated protein kinase 10 (MK10\_HUMAN, P53779), N-terminal EF-hand calcium-binding protein 1 (NECA1\_HUMAN, Q8N987), neuroligin-3 (NLGN3\_HUMAN, Q9NZ94), protein kinase C and casein kinase substrate in neurons protein 1 (PACN1\_HUMAN, Q9BY11), sodium channel protein type 7 subunit alpha (SCN7A\_HUMAN, Q01118), and clathrin coat assembly AP180 (AP180\_HUMAN, O60641). The biological accepts of the proteins have been derived from the information presented in UniProt database for each protein [18–20].

#### *1.2.2. Retrieving FASTA formats*

FASTA formats for each of the above proteins were retrieved from NCBI PubMed. The FASTA format is a text-based format obtained from the PubMed search and represents either nucleotide sequences or peptide sequences (**Figure 3**).

Upon accessing the website, select the database in which the search is to be conducted (e.g. Protein). Type the name of the protein and its species in brackets into the search text box provided (e.g. Agrin (*Homo sapiens*)) and click on the search button.

The protein with the highest number of amino acids is chosen. Click on the hyperlinked protein to access its gene bank. Upon reaching the gene bank of the selected protein, click on the hyperlinked *FASTA* (**Figures 4**, **5** and **6**).

Obtain the FASTA format by copying all the information (Starting from the > symbol).

#### *1.2.3. Arranging the FASTA formats*

All the FASTA formats of the human proteins are saved in a sequence on Microsoft Notepad (**Figure 7**).

#### *1.2.4. Running the BLAST*

glycoprotein (OMGP\_HUMAN, P23515), brain-derived neurotrophic factor (BDNF\_HUMAN, P23560), ciliary neurotrophic factor (CNTF\_HUMAN, P26441), neurotrophin-3 (NTF3\_ HUMAN, P20783), beta-nerve growth factor (NGF\_HUMAN, P01138), nestin (NEST\_HUMAN, P48681), neurofilament heavy polypeptide (NFH\_HUMAN, P12036), neurogranin (NEUG\_ HUMAN, Q92686), voltage-dependent T-type calcium channel subunit alpha-1G (CAC1G\_HUMAN, O43497), hippocalcin (HPCL1\_HUMAN, P37235), neurocalcin-delta (NCALD\_HUMAN, P61601), recoverin (RECO\_HUMAN, P35243), bombesin receptor subtype-3 (BRS3\_HUMAN, P32247), kininogen-1/bradykinin (KNG1\_HUMAN, P01042), calcitonin (CALC\_HUMAN, P01258), cholecystokinin (CCKN\_HUMAN, P06307), galanin peptides (GALA\_HUMAN, P22466), pro-neuropeptide Y (NPY\_HUMAN, P01303), neurotensin/neuromedin N (NEUT\_HUMAN, P30990), protein S100-B (S100B\_HUMAN, P04271), synapsin-1 (SYN1\_HUMAN, P17600), probable tubulin polyglutamylase (TTLL1\_HUMAN, O95922), myelin basic protein (MBP\_HUMAN, P02686), protein phosphatase 1 regulatory subunit 1B (PPR1B\_HUMAN, Q9UD71), Arf-GAP with GTPase, ANK repeat and PH domain-containing protein 2 (AGAP2\_HUMAN, Q99490), cathepsin L2 (CATL2\_HUMAN, O60911), D(1A) dopamine receptor (DRD1\_HUMAN, P21728), BDNF/NT-3 growth factors receptor (NTRK2\_HUMAN, Q16620), melanoma-associated antigen E1 (MAGE1\_HUMAN, Q9HCI5), microtubule-associated protein 6 (MAP6\_HUMAN, Q96JE9), protocadherin alpha-12 (PCDAC\_HUMAN, Q9UN75), carboxypeptidase E (CBPE\_HUMAN, P16870), Down syndrome cell adhesion molecule (DSCAM\_HUMAN, O60469), dyslexia-associated

**Figure 2.** Conducting an advanced search on the Human Protein Atlas Database.

280 Antibody Engineering

Pathogen-protein mimics, nerve protein sequences were BLAST (Basic Local Alignment Search Tool; Version 2.7.1; e-value ≤0.01) [21] against a pathogen genome (**Figure 8**).

Access the BLAST website at https://blast.ncbi.nlm.nih.gov/Blast.cgi and click on Protein (Protein~Protein) BLAST. The FASTA formats of 63 nerve proteins were copied and pasted from the notepad into the text box provided. Enter the species of the organism against which the blast has to be performed/the sequence comparison has to be carried out specifying its Tax ID (**Figure 8**).


**Figure 5.** Gene information of agrin.

Bioinformatics as a Tool to Identify Infectious Disease Pathogen Peptide Sequences as Targets…

http://dx.doi.org/10.5772/intechopen.71011

283

**Figure 6.** FASTA format of agrin.

**Figure 3.** Conducting a search on the PubMed database.

**Figure 4.** List of available sequenced protein information.

Bioinformatics as a Tool to Identify Infectious Disease Pathogen Peptide Sequences as Targets… http://dx.doi.org/10.5772/intechopen.71011 283


**Figure 5.** Gene information of agrin.


**Figure 6.** FASTA format of agrin.

**Figure 4.** List of available sequenced protein information.

**Figure 3.** Conducting a search on the PubMed database.

282 Antibody Engineering

The pathogen genome sequences that were compared with the human nerve proteins are as follows: HIV (Tax ID: 11,676), Polio (Tax ID: 138,950), Japanese Encephalitis (Tax ID:64,320), *M. leprae* (Tax ID: 1769), Human herpes virus 1 (Tax ID: 10,298), Human herpes virus 2 (Tax ID: 10,310), Rabies virus (Tax ID: 11,292), Zika virus (Tax ID: 64,320), Corona virus (Tax ID:

Bioinformatics as a Tool to Identify Infectious Disease Pathogen Peptide Sequences as Targets…

http://dx.doi.org/10.5772/intechopen.71011

285

Select program PSI BLAST as the BLAST algorithm for a more position-sensitive search. It looks deeper into the database to best match to your query. Click on the BLAST button and wait for the results. Take screen shots of your result and also download the provided excel

The output of the BLAST identified the significant peptide sequence similarities between the human protein and its pathogenic counterpart **Figure 10**. These peptide sequence similarities are identified by amino acid positions, in which amino acids exist in single-letter codes. The BLAST provides us with the number of sequence similarities between the pathogenic genomic sequence and its host proteins. It also identifies viral counterpart peptides and the

Further interpretations of the results can be made by referring to the Uniprot database to obtain the biological and functional aspects of the host and the pathogen proteins (**Figures 11** and **12**).

The results show a number of sequence similarities existing between host proteins and various pathogen proteins. The maximum number of peptide sequence similarities were found between host protein caprin-2 which had 495 similarities with polio; neurogranin had 230

11,118), Varicella zoster virus (Tax ID: 10,335).

region of similarity on the host proteins.

*1.2.5. Ascribing a biological role and application*

format (**Figure 9**).

**Figure 9.** BLAST search.

#### **Figure 7.** FASTA formats of the 63 proteins in sequence.


**Figure 8.** BLAST home page.

The pathogen genome sequences that were compared with the human nerve proteins are as follows: HIV (Tax ID: 11,676), Polio (Tax ID: 138,950), Japanese Encephalitis (Tax ID:64,320), *M. leprae* (Tax ID: 1769), Human herpes virus 1 (Tax ID: 10,298), Human herpes virus 2 (Tax ID: 10,310), Rabies virus (Tax ID: 11,292), Zika virus (Tax ID: 64,320), Corona virus (Tax ID: 11,118), Varicella zoster virus (Tax ID: 10,335).

Select program PSI BLAST as the BLAST algorithm for a more position-sensitive search. It looks deeper into the database to best match to your query. Click on the BLAST button and wait for the results. Take screen shots of your result and also download the provided excel format (**Figure 9**).

The output of the BLAST identified the significant peptide sequence similarities between the human protein and its pathogenic counterpart **Figure 10**. These peptide sequence similarities are identified by amino acid positions, in which amino acids exist in single-letter codes. The BLAST provides us with the number of sequence similarities between the pathogenic genomic sequence and its host proteins. It also identifies viral counterpart peptides and the region of similarity on the host proteins.

Further interpretations of the results can be made by referring to the Uniprot database to obtain the biological and functional aspects of the host and the pathogen proteins (**Figures 11** and **12**).

#### *1.2.5. Ascribing a biological role and application*

The results show a number of sequence similarities existing between host proteins and various pathogen proteins. The maximum number of peptide sequence similarities were found between host protein caprin-2 which had 495 similarities with polio; neurogranin had 230

**Figure 9.** BLAST search.

**Figure 7.** FASTA formats of the 63 proteins in sequence.

284 Antibody Engineering

**Figure 8.** BLAST home page.


**Figure 10.** BLAST results of nerve proteins showing similarity to pathogen proteins.


similarities with HHV2; secretogranin-3 had 221 similarities with Japanese encephalitis; agrin had 212 similarities with varicella; caprin-2 had 198 similarities with rabies virus; galanin peptides had 87 similarities with Zika virus; kinesin had 54 similarities with HIV; neurofilament heavy polypeptide had 46 similarities with corona virus; neurogranin had 39 similarities with HHV1; and 2′,3′-cyclic-nucleotide 3′-phosphodiesterase had 21 similarities with *M. leprae*.

Bioinformatics as a Tool to Identify Infectious Disease Pathogen Peptide Sequences as Targets…

http://dx.doi.org/10.5772/intechopen.71011

287

**Figure 12.** Uniprot screenshot showing the biological and functional data of the viral protein.

This method identifies significant virulent factors which have sequence similarities to human nerve tissue proteins. The nerve proteins that exhibited sequence similarities with four or more pathogenic virulent factors are displayed in **Table 1**. All 63 proteins are found to have

Agrin is a heparin sulphate basal lamina glycoprotein with a molecular mass of 217,232 Da. It plays a central role in the formation and maintenance of the neuromuscular junction. It is known to direct events in postsynaptic differentiation. Agrin also induces the phosphorylation and activation of muscle-specific kinase (MUSK), the clustering of Acetyl choline esterase receptor (AChR) in the postsynaptic membrane, regulates calcium ion homeostasis in neu-

Agrin UniProtKB-O00468 (AGRIN\_HUMAN) (AA position 1269–1326) (**Figure 13**) has a similarity to membrane glycoprotein C (Sequence ID: AEW88711.1 AA Position 43–122) of the varicella zoster virus UniProtKB-Q9J3M8 (GE\_VZVO) which by its similarity has the

sequence similarities with *M. leprae* proteins.

*1.2.6. HHV3 peptide similarity to human protein agrin*

rons, and is involved in regulation of neuritis outgrowth [22, 23].

**Figure 11.** UniProtKB screenshot showing the biological and functional data of the human protein.

Bioinformatics as a Tool to Identify Infectious Disease Pathogen Peptide Sequences as Targets… http://dx.doi.org/10.5772/intechopen.71011 287

**Figure 12.** Uniprot screenshot showing the biological and functional data of the viral protein.

similarities with HHV2; secretogranin-3 had 221 similarities with Japanese encephalitis; agrin had 212 similarities with varicella; caprin-2 had 198 similarities with rabies virus; galanin peptides had 87 similarities with Zika virus; kinesin had 54 similarities with HIV; neurofilament heavy polypeptide had 46 similarities with corona virus; neurogranin had 39 similarities with HHV1; and 2′,3′-cyclic-nucleotide 3′-phosphodiesterase had 21 similarities with *M. leprae*.

This method identifies significant virulent factors which have sequence similarities to human nerve tissue proteins. The nerve proteins that exhibited sequence similarities with four or more pathogenic virulent factors are displayed in **Table 1**. All 63 proteins are found to have sequence similarities with *M. leprae* proteins.

Agrin is a heparin sulphate basal lamina glycoprotein with a molecular mass of 217,232 Da. It plays a central role in the formation and maintenance of the neuromuscular junction. It is known to direct events in postsynaptic differentiation. Agrin also induces the phosphorylation and activation of muscle-specific kinase (MUSK), the clustering of Acetyl choline esterase receptor (AChR) in the postsynaptic membrane, regulates calcium ion homeostasis in neurons, and is involved in regulation of neuritis outgrowth [22, 23].

#### *1.2.6. HHV3 peptide similarity to human protein agrin*

**Figure 10.** BLAST results of nerve proteins showing similarity to pathogen proteins.

286 Antibody Engineering

**Figure 11.** UniProtKB screenshot showing the biological and functional data of the human protein.

Agrin UniProtKB-O00468 (AGRIN\_HUMAN) (AA position 1269–1326) (**Figure 13**) has a similarity to membrane glycoprotein C (Sequence ID: AEW88711.1 AA Position 43–122) of the varicella zoster virus UniProtKB-Q9J3M8 (GE\_VZVO) which by its similarity has the


**Table 1.** Sequence similarities of human nerve tissue proteins with human virulent factors. Multiple alignments obtained in a single BLAST search could result in identities of the amino acids or substitutions of the amino acids in the same peptide region.

potential to bind to the tissue cell receptor. Experimental evidence in epithelial cells shows that the hetero demonization of viral receptors could spread the virus by sorting nascent virion to nerve tissue cell junctions. The virus particles can spread to adjacent cells through interactions with cellular receptors at these cell junctions. The virus at cell junctions spreads extremely rapidly into the tissues [24, 25]. Sequence mimics of agrin to the varicella mem

Bioinformatics as a Tool to Identify Infectious Disease Pathogen Peptide Sequences as Targets…

http://dx.doi.org/10.5772/intechopen.71011

**Figure 13.** BLAST output of membrane glycoprotein of HHV3 showing similarity to human protein agrin.

brane glycoprotein could have an effect on either viral entry into host cell, evasion or on tolerance of host immune response to the virus and virion attachment to the host cell. These similarities in peptide regions warrant further exploration to understand pathogenesis and to

Caprin-2UniProtKB-Q6IMN6 (CAPR2\_HUMAN) is a protein of molecular mass 68,429 Da. The structure of caprin-2 was found to be similar to the polio and rabies viruses. Caprin-2 (AA position: 136–176) has a similarity to the polyprotein of polio virus UniProtKB– E0WCG5 (E0WCG5\_9ENTO) (polyprotein sequence ID: ACZ05040.1 AA position: 1994–2070) (**Figures 14** and **15**). Caprin-2 (AA position: 13–54) also has a similarity to the phosphoprotein of rabies virus UniProtKB-Q80JL8 (Q80JL8\_9RHAB) (phosphoprotein sequence ID: AAO60615.1 AA position 76–110) (**Figure 15**). Caprin-2 has a significant role in influencing phosphorylation of the Wnt-signaling pathways (PubMed:18,762,581) [27]. Caprin-2 also facilitates LRP6 phosphorylation by CDK14/CCNY during G2/M stage of the cell cycle, which may potentiate cells for transport or translation of mRNAs, modulate the expression of neu

ronal proteins involved in synaptic plasticity [28], while simultaneously influencing cell cycle

identify target peptides for antibody engineering [26].

*1.2.7. Poliovirus and rabies virus peptide similarities to human protein caprin-2*

signaling and regulation of viral transcription and replication [29, 30].


289


**S. No**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 **Table 1.** of the amino acids or substitutions of the amino acids in the same peptide region.

Q00973

Q8WXD2

Secretogranin-3

Beta-1,4 N-acetylgalactosaminyltransferase 1

 1

29

0 Sequence similarities of human nerve tissue proteins with human virulent factors. Multiple alignments obtained in a single BLAST search could result in identities

1

2

8

0

0

0

0

P07196

Q5VV43

Q16620

P02686

Myelin basic protein

BDNF/NT-3 growth factors receptor

Dyslexia-associated protein KIAA0319

Neurofilament light polypeptide

0 3

5

221 0

10

8

0

0

9

0

0

0

1

1

2

4

0

0

77

0

0

0

37

21

5

5

0

2

1

0

0

0

23

8

11

0

0

1

15

P17600

Synapsin-1

P04271

Protein S100-B

P48681

Nestin

P23515

P25189

Myelin protein P0

Oligodendrocyte-myelin glycoprotein

0 0 0 0 0

0

0

0

2

9

4

3

0

5

1

7

11

2

13

0

0

5

0

26

0

2

11

7

0

0

0

12

0

3

2

0

8

2

30

0

22

2

0

1

0

9

0

0

1

23

P10636

Tau protein

Q9BW19

Kinesin

P17677

Neuromodulin

O00468

Agrin

**Query No.**

**Proteins**

**HIV**

0 1 54

0 2

0

0

1

22

7

1

0

0

0

1

0

0

14

5

9

0

0

19

1

0

0

0

9

0

0

0

0

1

0

1

3

6

28

1

0

75

0

4

0

0

6

1

0

1

212

288 Antibody Engineering

**Polio JE**

**HHV 1**

**HHV 2**

*M. leprae*

**Corona**

**Zika**

**Rabies**

**Vericella**


**Figure 13.** BLAST output of membrane glycoprotein of HHV3 showing similarity to human protein agrin.

potential to bind to the tissue cell receptor. Experimental evidence in epithelial cells shows that the hetero demonization of viral receptors could spread the virus by sorting nascent virion to nerve tissue cell junctions. The virus particles can spread to adjacent cells through interactions with cellular receptors at these cell junctions. The virus at cell junctions spreads extremely rapidly into the tissues [24, 25]. Sequence mimics of agrin to the varicella membrane glycoprotein could have an effect on either viral entry into host cell, evasion or on tolerance of host immune response to the virus and virion attachment to the host cell. These similarities in peptide regions warrant further exploration to understand pathogenesis and to identify target peptides for antibody engineering [26].

#### *1.2.7. Poliovirus and rabies virus peptide similarities to human protein caprin-2*

Caprin-2UniProtKB-Q6IMN6 (CAPR2\_HUMAN) is a protein of molecular mass 68,429 Da. The structure of caprin-2 was found to be similar to the polio and rabies viruses. Caprin-2 (AA position: 136–176) has a similarity to the polyprotein of polio virus UniProtKB– E0WCG5 (E0WCG5\_9ENTO) (polyprotein sequence ID: ACZ05040.1 AA position: 1994–2070) (**Figures 14** and **15**). Caprin-2 (AA position: 13–54) also has a similarity to the phosphoprotein of rabies virus UniProtKB-Q80JL8 (Q80JL8\_9RHAB) (phosphoprotein sequence ID: AAO60615.1 AA position 76–110) (**Figure 15**). Caprin-2 has a significant role in influencing phosphorylation of the Wnt-signaling pathways (PubMed:18,762,581) [27]. Caprin-2 also facilitates LRP6 phosphorylation by CDK14/CCNY during G2/M stage of the cell cycle, which may potentiate cells for transport or translation of mRNAs, modulate the expression of neuronal proteins involved in synaptic plasticity [28], while simultaneously influencing cell cycle signaling and regulation of viral transcription and replication [29, 30].


**Figure 14.** BLAST output of polyprotein of poliovirus showing similarity to human protein caprin-2.

*1.2.8. Mycobacterium leprae peptide similarity to 2′, 3′-cyclic-nucleotide 3′-phosphodiesterase*

2′, 3′-cyclic-nucleotide 3′-phosphodiesterase UniProtKB-P09543 (CN37\_HUMAN) is a protein of molecular mass 47,579 Da. 2′, 3′-cyclic-nucleotide 3′-phosphodiesterase (sequence ID: WP\_010908292.1 AA position 191–261) has a similarity to thiamin pyrophosphokinase of *M. leprae* UniProtKB A0A197SEI9 (A0A197SEI9\_MYCLR) (AA position: 170–2166) (**Figure 16**) 2′, 3′-cyclic-nucleotide 3′-phosphodiesterase is involved in RNA metabolism of the myelinating cell, CN37 (2′, 3′-cyclic-nucleotide 3′-phosphodiesterase) is the one of the most abundant myelin protein in nervous system. The sequence similarities identified could impact cell signaling and also regulate energy metabolism [31].

*1.2.9. Zika virus peptide similarity to human protein galanin*

3′-cyclic-nucleotide 3′-phosphodiesterase.

Galanin peptide UniProtKB-P22466 (GALA\_HUMAN) is a protein of molecular mass 13,302 Da. Galanin (AA position 53–99 position) has a similarity to polyprotein envelope protein E of Zika virus UniProtKB-Q73880 (Q73880\_9HIV1) sequence ID: ARB07952.1 (AA position: 729–765) (**Figure 17**). Galanin is involved in the smooth muscle contraction of the gastrointestinal and genitourinary tract, regulation of growth hormone release, modulation of insulin release, and might also be involved in the control of adrenal secretion [32]. The envelope protein E of the Zika virus is responsible for binding to host cell surface receptors and mediating fusion between viral and cellular membranes. It is synthesized in the endoplasmic

**Figure 17.** BLAST output of polyprotein of Zika virus showing similarity to human Galanin peptide.

**Figure 16.** BLAST output of thiamin pyrophosphate of *Mycobacterium leprae* showing similarity to human protein 2′,

Bioinformatics as a Tool to Identify Infectious Disease Pathogen Peptide Sequences as Targets…

http://dx.doi.org/10.5772/intechopen.71011

291



**Figure 16.** BLAST output of thiamin pyrophosphate of *Mycobacterium leprae* showing similarity to human protein 2′, 3′-cyclic-nucleotide 3′-phosphodiesterase.

#### *1.2.9. Zika virus peptide similarity to human protein galanin*

*1.2.8. Mycobacterium leprae peptide similarity to 2′, 3′-cyclic-nucleotide 3′-phosphodiesterase*

**Figure 14.** BLAST output of polyprotein of poliovirus showing similarity to human protein caprin-2.

**Figure 15.** BLAST output of phosphoprotein of rabies virus showing similarity to human protein caprin-2.

naling and also regulate energy metabolism [31].

290 Antibody Engineering

2′, 3′-cyclic-nucleotide 3′-phosphodiesterase UniProtKB-P09543 (CN37\_HUMAN) is a protein of molecular mass 47,579 Da. 2′, 3′-cyclic-nucleotide 3′-phosphodiesterase (sequence ID: WP\_010908292.1 AA position 191–261) has a similarity to thiamin pyrophosphokinase of *M. leprae* UniProtKB A0A197SEI9 (A0A197SEI9\_MYCLR) (AA position: 170–2166) (**Figure 16**) 2′, 3′-cyclic-nucleotide 3′-phosphodiesterase is involved in RNA metabolism of the myelinating cell, CN37 (2′, 3′-cyclic-nucleotide 3′-phosphodiesterase) is the one of the most abundant myelin protein in nervous system. The sequence similarities identified could impact cell sig-

Galanin peptide UniProtKB-P22466 (GALA\_HUMAN) is a protein of molecular mass 13,302 Da. Galanin (AA position 53–99 position) has a similarity to polyprotein envelope protein E of Zika virus UniProtKB-Q73880 (Q73880\_9HIV1) sequence ID: ARB07952.1 (AA position: 729–765) (**Figure 17**). Galanin is involved in the smooth muscle contraction of the gastrointestinal and genitourinary tract, regulation of growth hormone release, modulation of insulin release, and might also be involved in the control of adrenal secretion [32]. The envelope protein E of the Zika virus is responsible for binding to host cell surface receptors and mediating fusion between viral and cellular membranes. It is synthesized in the endoplasmic


**Figure 17.** BLAST output of polyprotein of Zika virus showing similarity to human Galanin peptide.

reticulum with protein prM and forms a heterodimer. Galanin's similarity with the ZIKA polypeptide could subsequently affect neural regulation of muscle function and play a role in immune evasion pathogenesis and viral replication [33].

#### *1.2.10. HIV 1 peptide similarity to human kinesin-like protein*

Kinesin-like protein KIFC1 UniProtKB-Q9BW19 (KIFC1\_HUMAN) is a protein of molecular mass 73,748 Da. Kinesin-like protein (AA position: 411–470) has a similarity to HIV virus envelope glycoprotein UniProtKB-D6QPK9 (D6QPK9\_9HIV1) sequence ID:ADG63850.1 (AA position:270–387)(**Figure 18**). KIFC1 along with microtubules contributes to movement of endocytic vesicles. These similarities could affect viral attachment to the host cell, membrane fusion, and entry into the cell and the nucleus [34, 35].

### *1.2.11. Corona virus peptide similarity to human neurofilament heavy polypeptide*

Neurofilament heavy polypeptide UniProtKB-P12036 (NFH\_HUMAN) is a protein of molecular mass 112,479 Da. Neurofilament heavy polypeptide (AA position: 819–872) has a similarity to ORF1a UniProtKB-A0A0F6SKM6 (A0A0F6SKM6\_9GAMC) of Corona virus sequence ID: AKF17723.1 (AA positions: 890 –1031) (**Figure 19**) neurofilament of the nerve tissue usually contain three intermediate filament proteins: L, M, and H (NFH-human) which is involved in the maintenance of neuronal caliber. NFH-H has an important function in axon maturation. These similarities could affect viral replication, protein processing, and could generate autoantibody production [36, 37].

HHV1 and envelope glycoprotein M of HHV2 at partially overlapping positions. Neurogranin (AA position: 38–63) has a similarity to the envelope glycoprotein M of HHV1(UniProtKB-A0A181ZHE7 (A0A181ZHE7\_HHV11) (sequence ID: SBO07578.1 AA position: 347–376) (**Figure 20**). Neurogranin (AA position: 38–64) also has a similarity to the envelope glycoprotein M of HHV2 (UniProtKB-A0A0Y0R357 (A0A0Y0R357\_HHV2)) (sequence ID: AMB66044.1 AA position 389–416) (**Figure 21**). Neurogranin functions as a signaling messenger, a substrate for protein kinase C and has affinity to calmodulin in the absence of calcium. These similarities

Bioinformatics as a Tool to Identify Infectious Disease Pathogen Peptide Sequences as Targets…

http://dx.doi.org/10.5772/intechopen.71011

293

**Figure 19.** BLAST output of ORF1 of corona virus showing similarity to human neurofilament heavy polypeptide.

**Figure 20.** BLAST output of envelope glycoprotein of HHV 1 showing similarity to human protein neurogranin.

#### *1.2.12. HHV 1 and HHV 2 peptide similarity to human protein neurogranin*

Neurogranin UniProtKB-Q92686 (NEUG\_HUMAN) is a protein of molecular mass 7618 Da. The structure of neurogranin at identical regions has a similarity to envelope glycoprotein M of


reticulum with protein prM and forms a heterodimer. Galanin's similarity with the ZIKA polypeptide could subsequently affect neural regulation of muscle function and play a role in

Kinesin-like protein KIFC1 UniProtKB-Q9BW19 (KIFC1\_HUMAN) is a protein of molecular mass 73,748 Da. Kinesin-like protein (AA position: 411–470) has a similarity to HIV virus envelope glycoprotein UniProtKB-D6QPK9 (D6QPK9\_9HIV1) sequence ID:ADG63850.1 (AA position:270–387)(**Figure 18**). KIFC1 along with microtubules contributes to movement of endocytic vesicles. These similarities could affect viral attachment to the host cell, membrane

Neurofilament heavy polypeptide UniProtKB-P12036 (NFH\_HUMAN) is a protein of molecular mass 112,479 Da. Neurofilament heavy polypeptide (AA position: 819–872) has a similarity to ORF1a UniProtKB-A0A0F6SKM6 (A0A0F6SKM6\_9GAMC) of Corona virus sequence ID: AKF17723.1 (AA positions: 890 –1031) (**Figure 19**) neurofilament of the nerve tissue usually contain three intermediate filament proteins: L, M, and H (NFH-human) which is involved in the maintenance of neuronal caliber. NFH-H has an important function in axon maturation. These similarities could affect viral replication, protein processing, and could

Neurogranin UniProtKB-Q92686 (NEUG\_HUMAN) is a protein of molecular mass 7618 Da. The structure of neurogranin at identical regions has a similarity to envelope glycoprotein M of

**Figure 18.** BLAST output of envelope glycoprotein of HIV 1 showing similarity to human kinesin-like protein.

immune evasion pathogenesis and viral replication [33].

292 Antibody Engineering

*1.2.10. HIV 1 peptide similarity to human kinesin-like protein*

fusion, and entry into the cell and the nucleus [34, 35].

generate autoantibody production [36, 37].

*1.2.11. Corona virus peptide similarity to human neurofilament heavy polypeptide*

*1.2.12. HHV 1 and HHV 2 peptide similarity to human protein neurogranin*


**Figure 19.** BLAST output of ORF1 of corona virus showing similarity to human neurofilament heavy polypeptide.

HHV1 and envelope glycoprotein M of HHV2 at partially overlapping positions. Neurogranin (AA position: 38–63) has a similarity to the envelope glycoprotein M of HHV1(UniProtKB-A0A181ZHE7 (A0A181ZHE7\_HHV11) (sequence ID: SBO07578.1 AA position: 347–376) (**Figure 20**). Neurogranin (AA position: 38–64) also has a similarity to the envelope glycoprotein M of HHV2 (UniProtKB-A0A0Y0R357 (A0A0Y0R357\_HHV2)) (sequence ID: AMB66044.1 AA position 389–416) (**Figure 21**). Neurogranin functions as a signaling messenger, a substrate for protein kinase C and has affinity to calmodulin in the absence of calcium. These similarities


**Figure 20.** BLAST output of envelope glycoprotein of HHV 1 showing similarity to human protein neurogranin.

of HHV1 & 2 with neurogranin could have an interaction with viral transport into the host cell Golgi network and subsequently to the host nucleus [38].

family of neuroendocrine secretory proteins comprising a number of significant cellular functions. In an experimental mouse model, autoimmunity with secretogranin was associated with encephalitis [39]. These similarities identified in the host-pathogen could affect neuro

Bioinformatics as a Tool to Identify Infectious Disease Pathogen Peptide Sequences as Targets…

http://dx.doi.org/10.5772/intechopen.71011

295

The sequence similarities in agrin,caprin-2,2′,3′-cyclic-nucleotide 3′-phosphodiesterase, galanin peptide, kinesin-like protein, neurofilament heavy polypeptide, neurogranin and secretogranin-3 with its corresponding pathogenic peptide/s could have a number of cellular-level implications which include alternations in receptor binding, signaling/synaptic transmission, metabolic alteration, inflammation, resulting in autoimmunity and consequently neuropathy

In conclusion, it is important to conduct bioinformatic searches and design wet experiments with the objective of identifying a vast number of functionally significant peptides for further comparison and study. Bioinformatic search tools and various available databases are to be extensively explored to rapidly develop possible neuroprotective or pathogenic peptide sequences. These peptides can be further explored as targets to generate recombinant antibodies. This exercise can also be used to develop an efficacious and safe vaccine against pathogens that demonstrate no autoimmune cross-reactions. It can also contribute to design peptide/drug molecules to neutralize the effects of neurotoxins. Bioinformatics is the key to

**Figure 23.** A model for the modes of host-pathogen interaction and possible intracellular regulation of metabolic

open the door of understanding medical and biological processes in the future.

endocrine secretory protein release and autoimmunity.

**2. Creating a schematic model**

(**Figure 23**) [11, 40].

**3. Conclusion**

activities.

#### *1.2.13. JE 2 peptide similarity to human protein secretogranin-3*

Secretogranin-3 UniProtKB-Q8WXD2 (SCG3\_HUMAN) is a protein of molecular mass 53,005 Da. Secretogranin-3 (AA position: 139–190) has a similarity to the polyprotein of Japanese encephalitis virus (UniProtKB-G3LHD8 (G3LHD8\_9FLAV) (sequence ID: SBO07578.1 AA position: 2744 to (**Figure 22**). Secretogranin-3 is a member of the chromogranin/secretogranin


**Figure 21.** BLAST output of envelope glycoprotein of human alpha herpes virus 2 showing similarity to human protein neurogranin.


family of neuroendocrine secretory proteins comprising a number of significant cellular functions. In an experimental mouse model, autoimmunity with secretogranin was associated with encephalitis [39]. These similarities identified in the host-pathogen could affect neuro endocrine secretory protein release and autoimmunity.
