**3. The solvent and how to model it**

Solvation plays an important role in ligand-protein association and has a strong impact on comparisons of binding energies for dissimilar molecules. The binding affinity of a ligand for a receptor (ΔGbind) depends on the interaction free energy of the two molecules relative to their free energy in solution:

190 Bioinformatics

**2.3. Popular docking programs** 

between the ligand and the grid.

ligand minimisation in the field of the receptor.

**3. The solvent and how to model it** 

One of the most important and useful areas of application of molecular modelling is the approach of docking a protein onto a second molecule, typically a small ligand. This is of interest because it models the possible interactions between the protein and the ligand in the formation of a biologically important protein-ligand complex. To perform a computational docking, experimental or model 3D structures of both the protein and ligand molecules are

There are several software programs that are available for carrying out docking calculations, only some of them will be considered here. The DOCK program suite [Kuntz, 1992] is one of the best known. First of all, a set of overlapping spheres are used in the program to construct a negative image of a specified site on the protein or another macromolecule, and the negative image is then matched against structures of potential ligands. Matches can be scored in this program by the quality of the geometric fit, as well as by the molecular mechanics interaction energy [Meng et al., 1992] and can lead to protein-binding ligands that have micromolecular levels of binding affinity [Kuntz et al., 1994]. It has also been used

The program GRID [Goodford, 1985] identifies likely protein binding sites for ligands [Reynolds et al., 1989; Cruciani & Goodford, 1994] using a 3D grid around the protein.

The program AutoDock developed by Morris et al.; [http://www.scripps.edu/pub/olsonweb/doc/autodock/] uses a grid-based scheme for energies of individual atoms, allowing a quick computation of the interaction energy of the protein-ligand complex as the interaction

GLIDE software [Friesner et al., 2004, 2006; Halgren et al., 2004] also uses a grid-based scheme to represent the shape and properties of the receptor and then uses a systematic search algorithm to produce a set of initial conformations, using a OPLS-AA force field for

SURFLEX [Jain, 2003, 2007] is a fully automatic flexible molecular docking algorithm that presents results evaluated for reliability and accuracy in comparison with crystallographic

In a recent study, comparison of seven popular docking programs [Plewczynski et al., 2011] clearly showed that the ligand binding conformation could be identified in most cases by using the existing software. Yet, there is still the lack of universal scoring function for all types of molecules and protein families. One can always hope that incremental improvements in current techniques will gradually lead to major advances in this field.

Solvation plays an important role in ligand-protein association and has a strong impact on comparisons of binding energies for dissimilar molecules. The binding affinity of a ligand

experimental results on 81 protein/ligand pairs of substantial structural diversity.

required together with the charge distribution for each molecule.

for modelling protein-protein docking [Shoichet & Kuntz, 1996].

$$
\Delta \mathbf{G}\_{\text{bind}} = \Delta \mathbf{G}\_{\text{interact}} - \Delta \mathbf{G}\_{\text{solv},\text{L}} - \Delta \mathbf{G}\_{\text{solv},\text{R}} \tag{1}
$$

where ΔGinteract is the interaction free energy of the complex, ΔGsolv,L is the free energy of desolvating the ligand, and ΔGsolv,R is the free energy of occluding the receptor site from the solvent. Various methods have been proposed to evaluate or estimate these terms. The problem is difficult because the energy of each component on the right hand side of Equation 1 is large while the difference between them is small.

An accurate way to calculate relative binding energies is with free-energy perturbation techniques, although they are usually restricted to calculating the differential binding of similar compounds and require extensive computation, making it impractical as an initial screen, but quite useful sometimes [Buch et al., 2011; Reddy & Erion, 2007]. Several authors have described force fields that consider the bound and solvated states [see for example Chen et al., 2008; Moon & Howe, 1991], successfully predicting new ligands and also the structures of ligand-receptor complexes [Wilson et al., 1991].

When calculating interactions in congeneric series, the cost in electrostatic free energy of desolvating both the enzyme binding site and the burial part of the ligand (ΔGdesolv) is roughly constant within the series. This is particularly true when the calculation is done partitioning the electrostatic free energy contributions into a van der Waals term from the molecular mechanics force field, and an electrostatic contribution computed using a continuum method [Checa et al., 1997]. For that reason, it has been proposed to neglect ΔGdesolv in earlier studies.

The binding energy between ligand and receptor is approximated to the interaction enthalpy calculated by means of empirical energy functions that represent van der Waals repulsion, dispersion interactions by a Lennard-Jones term, and electrostatic interactions in the form of a Coulomb term that uses atom-centred point charges [Ajay & Murcko, 1995]. In most cases, these calculations of molecular mechanics are performed on a structure that is taken to represent the ensemble average of each complex. Entropy contributions are usually ignored although solvation terms are sometimes added to the scoring function by calculating changes in buried nonpolar surface area [Viswanadhan et al., 1999] or differences in the ease of desolvation of both the ligand and the binding site upon complex formation [Checa et al., 1997]. Molecular mechanics-based QSAR studies on ligand-receptor complexes can benefit greatly from proper incorporation of solvation effects into a COMBINE framework based on residue-based interaction energy decomposition [Pérez et al., 1998].

The relevance of solvation in modulating the biological activity of drugs is well known [Orozco & Luque, 2000]. In the last years, theoretical methods have been developed to calculate fragment contributions to the solvation free energy, particularly in the framework of quantum mechanical (QM) continuum solvation methods [Klamt et al., 2009]. Thus, fractional methods based on GB/SA methods have been developed [Cramer & Truhlar,

2008], as well as those based on the MST(Miertus-Scrocco-Tomasi) solvation method model [Soteras et al., 2004].

Using Molecular Modelling to Study Interactions Between Molecules with Biological Activity 193

It is sometimes possible to get quite accurate results with very simple models, such as the case of the molecular modellisation of phenethylamine carriers conducted in our lab. Calculations were carried out using chloride anion to mimic the picrate anion used in experimental measurements and with no explicit solvent molecules. The chloroform environment was simulated by a constant dielectric factor, as this solvent has a low dielectric constant and thus, interactions should not end quickly with the distance [Campayo et al., 2005]. When complexation takes place in water as the solvent, the environment is simulated by a distance-dependent dielectric factor, as it takes into account the fact that the intermolecular electrostatic interactions should vanish with distance faster than in the gas phase. This assumption proves to work as it gives theoretical results in good agreement with experimental transportation values [Miranda et al., 2004; Reviriego et al.,

2008]. Results for theoretical interactions have been supported by NMR experiments.

and protein association simulations.

molecular dynamics [Shivakumer et al., 2010].

**design** 

When applied to complex biomolecular systems, this loss of detail may become problematic in locations where water does not behave as a continuum medium, for example, the individual water molecules occurring in concave pockets at the surfaces of proteins [Li & Lazaridis, 2007]. The ELSCA (Energy by Linear Superposition of Corrections Approximation) method [Cerutti et al., 2005] has also been proposed for the rapid estimation of solvation energies. This procedure calculates the electrostatic and apolar solvation energy of bringing two proteins into close proximity or into contact compatible with the AMBER ff99 parameter set. The method is most useful in macromolecular docking

Solvent treatment is also of considerable interest in MD simulations as the solvent molecules (usually water, sometimes co-solvent and counterions/buffer or salt for electrolyte solutions) enter pockets and inner cavities of the proteins through their conformational changes. This is a very slow process and nearly as difficult to model as protein solving. One solution to this problem is using an efficient coupling of molecular dynamics simulation with the 3D molecular theory of solvation (3D-RISM-KH), contracting the solvent degrees of freedom [Luchko et al., 2010] or using free energy perturbation and OPLS force field together with

**4. Molecule-molecule or ion-molecule interactions in active molecule** 

being energetically comparable or stronger than a typical hydrogen bond.

Non-covalent interactions are central to biological structure and function. In considering potential interactions of molecules and/or ions and their receptor, the focus has been on hydrophobic interactions, hydrogen bonding and ion pairing. Although hydrogen bonds are by far the most important interactions in biological recognition processes, the cation-π interaction is a general, strong, non-covalent binding force that occurs throughout nature,

Cooperativity in multiple weak bonds (hydrogen bond and ion-π interactions among others) has been considered and studied at the MP2/6-311++ G(d,p) computational level

An explicit solvent model includes individual solvent molecules and calculates the free energy of solvation by simulating solute-solvent interactions. It requires an empirical interaction potential between the solvent and the solute, and between the solvent molecules, usually involving Monte Carlo (MC) calculations and/or molecular dynamics. MC calculations can be used to compute free energy differences and radial distribution functions, among others, and cannot be used to compute time-dependent properties such as diffusion coefficients or viscosity. MD simulations, on the other hand, can be used to compute free energies and time-dependent properties, transport properties, correlation functions, and others.

An implicit solvent model treats solvent as a polarisable continuum with a dielectric constant, , instead of explicit solvent molecules. The charge distribution of the solute polarises the solvent, producing a reaction potential that alters the solute. This interaction is represented by a solvent reaction potential introduced into the Hamiltonian. As interactions should be self consistently computed, they are also known as self-consistent reaction field (SCRF) methods [Onsager, 1936]. These models are significantly easier than explicit solvent models, but cannot model specific interactions such as hydrogen bonds.

Changes in hydration free energy during complex formation are a crucial element of binding free energies [Gilson & Zhou, 2007]. With the use of methods to predict binding free energies becoming common-place in the field of drug design, there is still a need for solvation methods that are both quick and accurate [Mancera, 2007], although much research has been carried out on the improvement of existing methods and development of new solvation models at many levels of theory [Chambers et al., 1996; Gallicchio et al., 2002; Palmer et al., 2011].

Explicit solvation models such as free energy perturbation (FEP), thermodynamic integration (TI) [Gilson & Zhou, 2007; Khavertskii & Wallquist, 2010] and the faster linear interaction energy (LIE) [Aqvist et al., 1994; Carlson & Jorgensen, 1995] offer detail on the distinct nature of water around the solute and are transferable across a wide range of data sets, although there is a lack of throughput in the field of drug design.

Implicit solvation models offer a faster alternative to explicit models by replacing the individual water molecules with a continuous medium [Baker, 2005; Chen et al., 2008], combining the hydration free energy density and group contribution [Jäger & Kast, 2001], or, more recently, calculating solvation free energy directly from the molecular structure [Delgado & Jaña, 2009]. For small organic molecules, the loss of molecular detail of the solvent results in relatively small differences between hydration free energy prediction accuracies calculated with explicit solvent models relative to the explicit treatment [Mobley et al., 2009; Nicholls et al., 2009]. To cope with some of these pitfalls, the variational implicit solvent model (VISM) has been proposed for calculating the solute/water interface where established models fail [Dzubiella et al., 2006a, 2006b].

It is sometimes possible to get quite accurate results with very simple models, such as the case of the molecular modellisation of phenethylamine carriers conducted in our lab. Calculations were carried out using chloride anion to mimic the picrate anion used in experimental measurements and with no explicit solvent molecules. The chloroform environment was simulated by a constant dielectric factor, as this solvent has a low dielectric constant and thus, interactions should not end quickly with the distance [Campayo et al., 2005]. When complexation takes place in water as the solvent, the environment is simulated by a distance-dependent dielectric factor, as it takes into account the fact that the intermolecular electrostatic interactions should vanish with distance faster than in the gas phase. This assumption proves to work as it gives theoretical results in good agreement with experimental transportation values [Miranda et al., 2004; Reviriego et al., 2008]. Results for theoretical interactions have been supported by NMR experiments.

192 Bioinformatics

[Soteras et al., 2004].

functions, and others.

Palmer et al., 2011].

2008], as well as those based on the MST(Miertus-Scrocco-Tomasi) solvation method model

An explicit solvent model includes individual solvent molecules and calculates the free energy of solvation by simulating solute-solvent interactions. It requires an empirical interaction potential between the solvent and the solute, and between the solvent molecules, usually involving Monte Carlo (MC) calculations and/or molecular dynamics. MC calculations can be used to compute free energy differences and radial distribution functions, among others, and cannot be used to compute time-dependent properties such as diffusion coefficients or viscosity. MD simulations, on the other hand, can be used to compute free energies and time-dependent properties, transport properties, correlation

An implicit solvent model treats solvent as a polarisable continuum with a dielectric constant, , instead of explicit solvent molecules. The charge distribution of the solute polarises the solvent, producing a reaction potential that alters the solute. This interaction is represented by a solvent reaction potential introduced into the Hamiltonian. As interactions should be self consistently computed, they are also known as self-consistent reaction field (SCRF) methods [Onsager, 1936]. These models are significantly easier than explicit solvent

Changes in hydration free energy during complex formation are a crucial element of binding free energies [Gilson & Zhou, 2007]. With the use of methods to predict binding free energies becoming common-place in the field of drug design, there is still a need for solvation methods that are both quick and accurate [Mancera, 2007], although much research has been carried out on the improvement of existing methods and development of new solvation models at many levels of theory [Chambers et al., 1996; Gallicchio et al., 2002;

Explicit solvation models such as free energy perturbation (FEP), thermodynamic integration (TI) [Gilson & Zhou, 2007; Khavertskii & Wallquist, 2010] and the faster linear interaction energy (LIE) [Aqvist et al., 1994; Carlson & Jorgensen, 1995] offer detail on the distinct nature of water around the solute and are transferable across a wide range of data

Implicit solvation models offer a faster alternative to explicit models by replacing the individual water molecules with a continuous medium [Baker, 2005; Chen et al., 2008], combining the hydration free energy density and group contribution [Jäger & Kast, 2001], or, more recently, calculating solvation free energy directly from the molecular structure [Delgado & Jaña, 2009]. For small organic molecules, the loss of molecular detail of the solvent results in relatively small differences between hydration free energy prediction accuracies calculated with explicit solvent models relative to the explicit treatment [Mobley et al., 2009; Nicholls et al., 2009]. To cope with some of these pitfalls, the variational implicit solvent model (VISM) has been proposed for calculating the solute/water interface where

models, but cannot model specific interactions such as hydrogen bonds.

sets, although there is a lack of throughput in the field of drug design.

established models fail [Dzubiella et al., 2006a, 2006b].

When applied to complex biomolecular systems, this loss of detail may become problematic in locations where water does not behave as a continuum medium, for example, the individual water molecules occurring in concave pockets at the surfaces of proteins [Li & Lazaridis, 2007]. The ELSCA (Energy by Linear Superposition of Corrections Approximation) method [Cerutti et al., 2005] has also been proposed for the rapid estimation of solvation energies. This procedure calculates the electrostatic and apolar solvation energy of bringing two proteins into close proximity or into contact compatible with the AMBER ff99 parameter set. The method is most useful in macromolecular docking and protein association simulations.

Solvent treatment is also of considerable interest in MD simulations as the solvent molecules (usually water, sometimes co-solvent and counterions/buffer or salt for electrolyte solutions) enter pockets and inner cavities of the proteins through their conformational changes. This is a very slow process and nearly as difficult to model as protein solving. One solution to this problem is using an efficient coupling of molecular dynamics simulation with the 3D molecular theory of solvation (3D-RISM-KH), contracting the solvent degrees of freedom [Luchko et al., 2010] or using free energy perturbation and OPLS force field together with molecular dynamics [Shivakumer et al., 2010].
