**2. Structure-Based Virtual Screening (SBVS)**

SBVS involves the evaluation of databases based on the simulation of interactions between the ligands (small molecules) and receptors (target protein). The various steps in the process of SBVS are briefly shown in Figure 2. After obtaining the structure of the receptor and li‐ gand, the next step in the process is molecular docking, which involves the coupling of the ligands with the receptor. At this stage, various conformations and orientations are generat‐ ed and classified according to the score function. The target protein can be obtained from a database or by modelling.

**Figure 2.** Stages of SBVS. The receptor (the target protein) can be obtained from a database or by modelling. Molecu‐ lar docking completes the structure-based virtual screening.

### **2.1. Obtaining the Structure of the Protein Target**

The strategy of VS can be divided into *ligand-based virtual screening* (LBVS), where a large number of molecules can be evaluated based on the similarity of known ligands, and *struc‐ ture-based virtual screening* (SBVS), where a number of molecules can be evaluated for specifi‐

An Integrated View of the Molecular Recognition and Toxinology - From Analytical Procedures to Biomedical

**Figure 1.** Virtual screening can be divided into ligand-based virtual screening (LBVS) and structure-based virtual screen‐

Molecular docking is used to determine the best orientation and conformation of a ligand in its receptor site. The aim is to generate a range of conformations of the protein-ligand com‐ plex and sort them according to their scores, which are based on their stabilities. In order to do this, the protein structure and a database of ligands (potential candidates) are used as in‐ puts to the docking software. Thus, large collections of virtual compounds are subjected to docking into a protein-binding site and sorted according to their affinities for the macromo‐

The focus of this chapter is to present the strategy of SBVS and the basic concepts of the methodologies involved. Examples of these approaches that have been applied to the identi‐

SBVS involves the evaluation of databases based on the simulation of interactions between the ligands (small molecules) and receptors (target protein). The various steps in the process of SBVS are briefly shown in Figure 2. After obtaining the structure of the receptor and li‐

fication of animal venom inhibitors have been presented at the end of the chapter.

cally binding to the active sites of target proteins (Figure 1).

lecular target, as suggested by the score function.

**2. Structure-Based Virtual Screening (SBVS)**

ing (SBVS).

Applications

74

Knowledge of the target protein structure is essential for structure-based drug design. The determination of the 3-dimensional structure of the protein may be achieved experimentally by diffraction of X-rays or by magnetic resonance. If the structure of the target protein has already been solved, it can easily be found deposited in public databases such as PDB [37] which contains more than 80,000 experimentally solved structures.

However, sometimes the structure of the target is not known, and this poses a problem in the drug design process. This situation can be resolved by making use of computational methods for predicting protein structure.

Such methods are divided into 2 groups: those based on templates and those that are tem‐ plate-free. The first group includes comparative or homology modelling and threading. The second group includes methods that do not depend on templates to build the model, such as *ab initio* modelling (Figure 3).

that will be used as templates in the modelling process; sequence alignment between the tar‐ get and the templates; refinement of the alignment; construction of the model, adding loops

Computer-Based Methods of Inhibitor Prediction

http://dx.doi.org/10.5772/ 52334

77

The construction of the model depends on the availability of templates. For this purpose, alignment of target and template sequences is widely used and is very efficient. Sequence alignments are typically generated by searching for the result that presents the largest re‐ gion of identity and similarity. Generally, an identity percentage of at least 25% is consid‐

There are several tools available for sequence alignment. They differ in the methods used, which can be exhaustive or heuristic, as well as the number of sequences involved in the alignment (multiple or pairwise comparisons). Among these tools, BLAST/PSIBLAST [1; 2]

and side chains; and evaluation of the model (Figure 4).

**Figure 4.** Steps in the comparative modelling process.

ered significant.

**Figure 3.** Modelling methods can be classified into template-based methods (homology/comparative modelling) and template-free methods (*ab initio*).

#### *2.1.1. Template-Based Modelling*

Homology modelling is based on the use of proteins that share an ancestral relationship with the target protein, that is, that they are evolutionarily related and tend to have similar structures. Thus, this method basically involves knowledge of the primary chain of the tar‐ get protein and a search among databases for homologous proteins that have solved struc‐ tures. These proteins are used as templates.

Threading modelling is based on the principle that proteins may have similar structures without sharing the same ancestral relationship because the structure tends to be more con‐ served than the primary sequence. In this case, these methods evaluate the primary chain of the target protein in relation to proteins that have solved structures.

#### *2.1.1.1. Comparative/Homology Modelling*

Comparative or homology modelling constructs a model structure of the target protein us‐ ing its primary chain and the information obtained from homologous proteins that have solved structures. Therefore, this method depends on the availability of proteins that have structures similar to those of the target and can be used as templates. The whole process re‐ quires not only the construction of the model, but also the refinement and evaluation of the obtained model. The process can be divided into stages as follows: selection of the tem‐ plates, which involves the identification of homologous sequences in a database of proteins that will be used as templates in the modelling process; sequence alignment between the tar‐ get and the templates; refinement of the alignment; construction of the model, adding loops and side chains; and evaluation of the model (Figure 4).

**Figure 4.** Steps in the comparative modelling process.

**Figure 3.** Modelling methods can be classified into template-based methods (homology/comparative modelling) and

An Integrated View of the Molecular Recognition and Toxinology - From Analytical Procedures to Biomedical

Homology modelling is based on the use of proteins that share an ancestral relationship with the target protein, that is, that they are evolutionarily related and tend to have similar structures. Thus, this method basically involves knowledge of the primary chain of the tar‐ get protein and a search among databases for homologous proteins that have solved struc‐

Threading modelling is based on the principle that proteins may have similar structures without sharing the same ancestral relationship because the structure tends to be more con‐ served than the primary sequence. In this case, these methods evaluate the primary chain of

Comparative or homology modelling constructs a model structure of the target protein us‐ ing its primary chain and the information obtained from homologous proteins that have solved structures. Therefore, this method depends on the availability of proteins that have structures similar to those of the target and can be used as templates. The whole process re‐ quires not only the construction of the model, but also the refinement and evaluation of the obtained model. The process can be divided into stages as follows: selection of the tem‐ plates, which involves the identification of homologous sequences in a database of proteins

the target protein in relation to proteins that have solved structures.

template-free methods (*ab initio*).

Applications

76

*2.1.1. Template-Based Modelling*

tures. These proteins are used as templates.

*2.1.1.1. Comparative/Homology Modelling*

The construction of the model depends on the availability of templates. For this purpose, alignment of target and template sequences is widely used and is very efficient. Sequence alignments are typically generated by searching for the result that presents the largest re‐ gion of identity and similarity. Generally, an identity percentage of at least 25% is consid‐ ered significant.

There are several tools available for sequence alignment. They differ in the methods used, which can be exhaustive or heuristic, as well as the number of sequences involved in the alignment (multiple or pairwise comparisons). Among these tools, BLAST/PSIBLAST [1; 2] is a tool that performs local alignments based on the profiles between the target sequence and each sequence belonging to a known database.

The loops are usually modelled using a database of fragments or by *ab initio* modelling. The use of a database involves finding parts of protein structures known to fit onto 2 regions (stems) of the target protein, which are the regions that precede and follow the loop to be modelled. The conformation of the best matching fragment is used to model the loop.

Computer-Based Methods of Inhibitor Prediction

http://dx.doi.org/10.5772/ 52334

79

*Ab initio* methods generate many random loops and look for one that presents a low-energy state and includes conformational angles contained within the allowed regions of the Rama‐

The side chains can be modelled by programs that make use of libraries of rotamers, such as the software SCRWL4 [20]. The use of rotamer libraries reduces computational time because

After obtaining the model, its quality must be evaluated. This should be done to make sure that the model has structural features consistent with the physical and chemical rules. Sever‐ al errors in modelling can occur due to poor choice of template, bad alignment between the

In the evaluation stage of the model, the structural characteristics as well as the stereochem‐

There are tools available for analysing stereochemical properties, such as PROCHECK [23]. PROCHECK checks the general physicochemical parameters such as phi-psi angles (Rama‐ chandran plot) and chirality. The parameters of the model are compared with those al‐

To validate the model for chemical correctness, it is possible to use the software WHAT IF [39]. WHAT IF is a server that checks planarity and bond angles, among other parameters. It

Verify3D [4, 26] can be used for the analysis of the pseudo-energy profile of the model. It has a database containing environmental profiles based on secondary structures, and the solvent exposure of solved structures at high resolution. It should be noted that the results

To distinguish correct from incorrect regions, the ERRAT program [6] can be used; this is based on analysis of the characteristics of atomic interactions compared to the highly refined structures.

PROtein Volume Evaluation (PROVE; [30]) calculates the volume of the atoms in the macro‐ molecules using an algorithm that treats the atoms as spheres, analysing the model in rela‐

These software tools are available on servers such as ModFold [27], ProQ (see Section 6 -

Threading modelling is generally used when the template and target sequences share less than 30% identity. Thus, structures that do not share an evolutionary relationship with the

chandran plot [31] The software CODA [7] can be used for loop modelling.

target and template, and incorrect determination of loops and side chains.

it reduces the number of favourable torsion angles being examined.

may be different when different programs are used for verification.

tion to the highly resolved and refined structures stored in the PDB.

istry accuracy of the model must be examined.

also displays the Ramachandran plot.

Table 2), and SAVes (see Section 6 - Table 2).

ready compiled.

*2.1.1.2. Threading*

The results of the alignment can be evaluated using the E-value. The E-value shows an in‐ verse relationship with the identity/similarity between the sequences. Because it is a heuris‐ tic method, the results reported by BLAST are generally suboptimal.

If more than 1 template with similar scores is achieved, the best one can be selected as the template with the higher resolution.

Other methods such as HHpred [34] and Pyre [18] use Markov profiles (Hidden Markov models [HMMs]) combined with structural features.

When more than one template is selected, and taking into account that the results are usual‐ ly suboptimal, there is a need for an alignment between the target protein and the selected templates. In this case, multiple alignments are indicated. There are several tools that per‐ form multiple alignments, such as ClustalW [21]

After obtaining the alignments between the target and templates, the process of obtaining the model of the target protein begins. There are several software tools available, which dif‐ fer with respect to the method applied. Prominent among these are MODELLER [9, 33] and SWISS-MODEL [3] The software that has shown the best performance is MODELLER. The program models the backbone using a homology-derived restraint method, which is based on the multiple alignment between the target and templates to differentiate between highly conserved and less conserved residues. The model is optimised by energy minimisation and molecular dynamics methods (Figure 5).

**Figure 5.** The template 3D structures are aligned with the target sequence to be modelled. Spatial features are trans‐ ferred from the templates to the target and a number of spatial restraints on its structure are obtained. The 3D model is obtained by satisfying all the restraints as thoroughly as possible [33]

The regions of the target that are not aligned with the protein template generally represent loop regions. There are usually some regions caused by insertions and deletions producing gaps in the alignment. Closing these gaps requires modelling of the loops. The loops and the side chains are shaped during the refinement of the model. For this, methods that do not rely on templates can be applied. These include the use of physics parameters and knowl‐ edge-based data.

The loops are usually modelled using a database of fragments or by *ab initio* modelling. The use of a database involves finding parts of protein structures known to fit onto 2 regions (stems) of the target protein, which are the regions that precede and follow the loop to be modelled. The conformation of the best matching fragment is used to model the loop.

*Ab initio* methods generate many random loops and look for one that presents a low-energy state and includes conformational angles contained within the allowed regions of the Rama‐ chandran plot [31] The software CODA [7] can be used for loop modelling.

The side chains can be modelled by programs that make use of libraries of rotamers, such as the software SCRWL4 [20]. The use of rotamer libraries reduces computational time because it reduces the number of favourable torsion angles being examined.

After obtaining the model, its quality must be evaluated. This should be done to make sure that the model has structural features consistent with the physical and chemical rules. Sever‐ al errors in modelling can occur due to poor choice of template, bad alignment between the target and template, and incorrect determination of loops and side chains.

In the evaluation stage of the model, the structural characteristics as well as the stereochem‐ istry accuracy of the model must be examined.

There are tools available for analysing stereochemical properties, such as PROCHECK [23]. PROCHECK checks the general physicochemical parameters such as phi-psi angles (Rama‐ chandran plot) and chirality. The parameters of the model are compared with those al‐ ready compiled.

To validate the model for chemical correctness, it is possible to use the software WHAT IF [39]. WHAT IF is a server that checks planarity and bond angles, among other parameters. It also displays the Ramachandran plot.

Verify3D [4, 26] can be used for the analysis of the pseudo-energy profile of the model. It has a database containing environmental profiles based on secondary structures, and the solvent exposure of solved structures at high resolution. It should be noted that the results may be different when different programs are used for verification.

To distinguish correct from incorrect regions, the ERRAT program [6] can be used; this is based on analysis of the characteristics of atomic interactions compared to the highly refined structures.

PROtein Volume Evaluation (PROVE; [30]) calculates the volume of the atoms in the macro‐ molecules using an algorithm that treats the atoms as spheres, analysing the model in rela‐ tion to the highly resolved and refined structures stored in the PDB.

These software tools are available on servers such as ModFold [27], ProQ (see Section 6 - Table 2), and SAVes (see Section 6 - Table 2).

### *2.1.1.2. Threading*

is a tool that performs local alignments based on the profiles between the target sequence

An Integrated View of the Molecular Recognition and Toxinology - From Analytical Procedures to Biomedical

The results of the alignment can be evaluated using the E-value. The E-value shows an in‐ verse relationship with the identity/similarity between the sequences. Because it is a heuris‐

If more than 1 template with similar scores is achieved, the best one can be selected as the

Other methods such as HHpred [34] and Pyre [18] use Markov profiles (Hidden Markov

When more than one template is selected, and taking into account that the results are usual‐ ly suboptimal, there is a need for an alignment between the target protein and the selected templates. In this case, multiple alignments are indicated. There are several tools that per‐

After obtaining the alignments between the target and templates, the process of obtaining the model of the target protein begins. There are several software tools available, which dif‐ fer with respect to the method applied. Prominent among these are MODELLER [9, 33] and SWISS-MODEL [3] The software that has shown the best performance is MODELLER. The program models the backbone using a homology-derived restraint method, which is based on the multiple alignment between the target and templates to differentiate between highly conserved and less conserved residues. The model is optimised by energy minimisation and

**Figure 5.** The template 3D structures are aligned with the target sequence to be modelled. Spatial features are trans‐ ferred from the templates to the target and a number of spatial restraints on its structure are obtained. The 3D model

The regions of the target that are not aligned with the protein template generally represent loop regions. There are usually some regions caused by insertions and deletions producing gaps in the alignment. Closing these gaps requires modelling of the loops. The loops and the side chains are shaped during the refinement of the model. For this, methods that do not rely on templates can be applied. These include the use of physics parameters and knowl‐

and each sequence belonging to a known database.

models [HMMs]) combined with structural features.

form multiple alignments, such as ClustalW [21]

molecular dynamics methods (Figure 5).

is obtained by satisfying all the restraints as thoroughly as possible [33]

edge-based data.

template with the higher resolution.

Applications

78

tic method, the results reported by BLAST are generally suboptimal.

Threading modelling is generally used when the template and target sequences share less than 30% identity. Thus, structures that do not share an evolutionary relationship with the

target protein can be used as templates. However, the target protein has to adopt a fold sim‐ ilar to that of the protein that has had its structure solved. The method can be classified as a pairwise energy-based method.

Many programs such as THREADER [15, 28] and RAPTOR ([41, 42]) can be used to carry

Computer-Based Methods of Inhibitor Prediction

http://dx.doi.org/10.5772/ 52334

81

One of the biggest problems in comparative modelling is the lack of templates. Templatefree methods generate models based on the physicochemical properties and thermodynamic chain of the primary protein target. The processes are iterative. The conformation of the

Some methods use force fields based on knowledge as a scoring function. These methods are not strictly free of templates since they employ structures of small fragments of proteins such as, for example, ASTRO-FOLD [19, 35]. Others use energy functions based on first principles of energy and movement of atoms. Generally, these methods involve the calculation of ener‐ gies of the structures, which has a high computational cost. They are therefore limited to small molecules (approximately 100 residues), as in the case of the software ROSETTA [32].

Firstly, ROSETTA breaks the sequence of the target protein into several short fragments and predicts the secondary structures of the fragments using HMMs. These fragments are then arranged (assembled) into a tertiary setting. Random combinations of these fragments gen‐ erate a large number of models, which have their energies calculated. The conformation that

One application of molecular docking is virtual screening, in which a library of com‐ pounds is compared to one or more targets, thereby providing an analysis of compounds

structure is altered until a configuration of lower potential energy is found.

presents the lowest global energy value is chosen as the best model (Figure 7).

out this process.

*2.1.2. Template-Free Modelling*

**Figure 7.** Steps in the ROSETTA process.

**3. Molecular Docking**

ranked by potential.

Using the sequence of the target protein as input, a search is conducted on a database of structures in order to find the best structural match using the criterion of energy calculation. The process is accomplished through a search for solved structures that are most appropri‐ ate for the target protein. The comparison highlights secondary structures because they are evolutionarily conserved.

A model is constructed by placing aligned residues between the structure of the template and the target residues. In the next step, the energy of this model is calculated. This is done on various structures in the database. In the end, the models obtained are ranked based on the energy. The model presenting the lowest energy constitutes the most compatible folding model (Figure 6).

**Figure 6.** Steps in the threading modelling process.

Many programs such as THREADER [15, 28] and RAPTOR ([41, 42]) can be used to carry out this process.

### *2.1.2. Template-Free Modelling*

target protein can be used as templates. However, the target protein has to adopt a fold sim‐ ilar to that of the protein that has had its structure solved. The method can be classified as a

An Integrated View of the Molecular Recognition and Toxinology - From Analytical Procedures to Biomedical

Using the sequence of the target protein as input, a search is conducted on a database of structures in order to find the best structural match using the criterion of energy calculation. The process is accomplished through a search for solved structures that are most appropri‐ ate for the target protein. The comparison highlights secondary structures because they are

A model is constructed by placing aligned residues between the structure of the template and the target residues. In the next step, the energy of this model is calculated. This is done on various structures in the database. In the end, the models obtained are ranked based on the energy. The model presenting the lowest energy constitutes the most compatible folding

pairwise energy-based method.

evolutionarily conserved.

**Figure 6.** Steps in the threading modelling process.

model (Figure 6).

Applications

80

One of the biggest problems in comparative modelling is the lack of templates. Templatefree methods generate models based on the physicochemical properties and thermodynamic chain of the primary protein target. The processes are iterative. The conformation of the structure is altered until a configuration of lower potential energy is found.

Some methods use force fields based on knowledge as a scoring function. These methods are not strictly free of templates since they employ structures of small fragments of proteins such as, for example, ASTRO-FOLD [19, 35]. Others use energy functions based on first principles of energy and movement of atoms. Generally, these methods involve the calculation of ener‐ gies of the structures, which has a high computational cost. They are therefore limited to small molecules (approximately 100 residues), as in the case of the software ROSETTA [32].

Firstly, ROSETTA breaks the sequence of the target protein into several short fragments and predicts the secondary structures of the fragments using HMMs. These fragments are then arranged (assembled) into a tertiary setting. Random combinations of these fragments gen‐ erate a large number of models, which have their energies calculated. The conformation that presents the lowest global energy value is chosen as the best model (Figure 7).

**Figure 7.** Steps in the ROSETTA process.
