**3.2 Computational approaches**

20 Thermodynamics – Interaction Studies – Solids, Liquids and Gases

There were attempts to relate order parameters to structural characteristics of proteins and ligand-protein complexes. It was observed that amino acids with smaller side chains tend to show – intuitively - greater backbone flexibility than those with bulkier side chains (Goodman *et al.*, 2000). However, the variation of backbone amide <sup>2</sup> *S* parameters is larger than the differences between the averages for different amino acid types. Backbone amide order parameters are also only weakly affected by secondary structure elements, with loops having only slightly smaller average <sup>2</sup> *S* N-H values than helices or beta-turns (Kay et al., 1989). Backbone <sup>2</sup> *S* N-H values can be predicted from structures using a simple model that takes account of local contacts to the N-H and C=O atoms of each peptide group (Zhang and

A more sophisticated model for predicting dynamics from structure has recently been reported (tCONCOORD) (Seeliger *et al.*, 2007). tCONCOORD allows for a fast and efficient sampling of protein's conformational degrees of freedom based on geometrical restraints. Weak correlation between side chain order parameters and contact distance between the methyl carbon and neighboring atoms, with solvent exposure (Ming and Brüschweiler, 2004), and amino acid sequence conservation patterns (Mittermaier *et al.*, 2003) have been reported in literature. These results demonstrate that protein dynamics are strongly affected by the unique architecture of the protein as well as the environment. Thus, it cannot be readily predicted by the bioinformatic techniques, based on the primary/secondary sequence analysis. Developing a fast and reliable method of assessment of protein dynamics is, nevertheless, crucial for predictions of ligand-protein interactions - as it will be shown in the course of this chapter, dynamics affects all stages of molecular recognition events.

Order parameters can be related to entropy through the relationship developed by Yang and Kay (1996). This formalism quantifies the conformational entropy associated with observable protein motions by means of a specific motion model. For a wide range of motion models, the functional dependence of entropy on the <sup>2</sup> *S* parameter was demonstrated to be similar (Yang and Kay, 1996). This suggests that changes in <sup>2</sup> *S* can be related to the conformational entropy change in a model-independent manner. This approach has many advantages: it is straightforward, relatively free of assumptions (the requirement is that the internal motions are uncorrelated with the global tumbling of the macromolecule), and applicable to both NMR experiments and theoretical approaches (MD simulations). Moreover, since <sup>2</sup> *S* parameters are measured per bond vector, this approach enables site-specific reporting of any loses, gains, and redistributions of conformational

However, the model-free formalism can give only a qualitative view of micro-to-millisecond time scale motions. Failure to correctly account for anisotropic molecular tumbling and the assumption that all motions are un-correlated seriously compromises the usefulness of this approach for studying dynamics associated with large conformational changes or concerted motions. Because of the time scales, alternative approaches must be implemented to study

As described, ITC obtains free energy as the global parameter, thus, effects like ligandinduced conformational changes, domain-swapping, or protein oligomerisation, which

*G* , will not be resolved. In order to assess the role those factors

entropy through different dynamic states of the ligand-protein complex.

motions occurring at a millisecond time scale (e.g. R2 relaxation dispersion).

**3.1.3 Combination of ITC and NMR** 

contribute to the overall

Brüschweiler, 2002).

Computational approaches to ligand-protein interaction studies have great potential and the development of various methods, briefly described in this chapter, have been truly outstanding. However, every method – computational, experimental alike - has its limitations and computational methods should not be used in a 'black box' manner; one should beware of the 'Garbage In Garbage Out' phenomenon. Yet it is evident that theoretical approaches have finally come to the stage that makes rational molecular design truly rational.

During a binding event, the ligand may bind in multiple orientations. The conformation of either of the interacting partners can change significantly upon association. The network of intramolecular interactions (e.g. hydrogen bonds, salt bridges) can dramatically change (breaking and/or creating new contacts), and new intermolecular interactions occur. Water molecules and ions can be expelled upon binding, or – on the contrary – bind more tightly. Finally, conformational or solvation entropic contributions may play significant role, affecting the free energy in a way which is difficult to predict.

Growing amount of calorimetric data available allows the investigation of the thermodynamic profiles for many ligand-protein complexes in detail. When structural data (crystal, NMR) are available as well – and often it is the case - it is very appealing to speculate about the link between the structure of the complex and the thermodynamics of the binding event. However, such speculations are challenging. It is important to bear in mind that both enthalpic and entropic contributions to the free energy terms obtained from ITC experiments are global parameters, containing a mixture of different contributions, which can have either equal or opposing signs and different magnitudes. This may lead to various thermodynamic signatures of a binding event. Moreover, 'structural' interpretation of intrinsic entropic contributions is notoriously difficult. Hence, the experimental thermodynamic data cannot be easily interpreted on the basis of structural information alone. Last but not least, the contribution from the solvation effects is difficult to get insight into, and although direct experimental estimations of solvation free energy have been attempted, these always require additional assumptions (Homans, 2007, Shimokhina *et al.*, 2006).

No doubt, a great advantage of theoretical approaches lies with gaining an insight about each of those contributions and their de-convolution. Binding events (ligand-protein

(Klepeis *et al.*, 2009 and references therein). MD methods rely on quality of the force field (parameters, inclusion of non-additive effects, etc), description of solvent effects, adequate sampling, and quality of initial structures used for the simulations. The quality of the results relies also on the duration of the simulation. There are limits on the time scales at which the system of interest can be considered. Simulation runs are fairly short: typically nanoseconds to microseconds, rarely extending to miliseconds, if super-fast computers are employed. Since biological processes (ligand-protein binding, large conformational changes, etc) typically occur ar micro-to-milisecond scales, one needs to assess whether or not a simulation has reached equilibrium before the averages calculated can be trusted. Furthermore, the averages obtained need to be subjected to a statistical analysis, to make an

MD methods have been widely employed to study ligand-protein binding phenomena, conformational changes, solvent effects, and to assess individual contributions to the binding free energy. These methods are particularly useful in assessing the conformational entropic contribution to the free energy. Information about ps-to-ns time scale molecular motions can be readily obtained from the MD simulation trajectory and analysed either through diagonalisation of the covariance matrix of displacements of atomic Cartesian coordinates - quasi-harmonic analysis, Schlitter's approach (7,8), analysed through principal component analysis (PCA), or quantified NMR-like via generalised order parameters. Entropy changes can be estimated from the MD trajectory through Yang and Kay's relationship (9). The order parameter analysis has the advantage of being able to calculate order parameters by-vector, thus providing site-specific information on flexibility and hence intrinsic entropic contribution. Computed parameters can be also directly compared to the experimental results of NMR relaxation analysis (Best and Vendruscolo, 2004). In last few years several studies proved the success of this methodology in estimating of entropic

> <sup>1</sup> ' lndet 2

*B k Te SS k*

chemical/molecular mechanical (QM/MM) (Senn and Thiel, 2009) methods.

2 22 2 2 22 3 1 <sup>222</sup> <sup>222</sup>

MD-based approaches used for binding thermodynamics calculations include free energy perturbation (FEP) (Foloppe and Hubbard, 2006), thermodynamic integration (TI) (Straatsma and Berendsen, 1988), lambda-dynamics simulations (Knight and Brooks, 2009), Molecular Mechanics-Poisson-Boltzmann Surface Area (MM-PBSA) (Gilson and Zhou, 2007), Linear Interaction Energy (LIE) (Gilson and Zhou, 2007), and hybrid quantum

Free energy perturbation (FEP) is used to calculate free energy differences between two states from MD simulations. These two states can represent, for instance, unbound (apo) protein and a ligand-protein complex (holo), or two ligand-protein complexes with different ligands. In the framework of FEP, the difference in the free energy difference for two states

2 2

2 2 *S x y z xy yz yz* (9)

*ij i i j j xxx x* (8)

(7)

*B*

**1 M**

estimate of the errors.

contributions to the binding thermodynamics.

is obtained from the Zwanzig equation (10).

binding pose and the strength of their interactions) can be predicted by molecular docking, albeit intrinsic entropic contributions and solvation effects are usually ignored. Dynamic behaviour of proteins and ligands can be studied using extensive molecular dynamics (MD) simulation, which, combined with experimental NMR and ITC data, provide extremely valuable information on configurational entropy changes upon binding event, and hence about the intrinsic entropic contribution to the free energy. The global free energy changes can be studied by free energy perturbation (FEP) calculations, or related methods, such as thermodynamic integration (TI). Molecular docking methods allow for a quick assessment of enthalpic contributions, while solvent effects can be studied either by quantum chemical (QM) calculations (e.g. COSMO model), hybrid QM/MM schemes, or FEP-related methods.

Theoretical approaches allow also for investigations of transient phenomena, e.g. shortliving alternative conformers from an ensemble that contribute to the binding event but which cannot be readily observed. In a situation – which is not uncommon – when an experimental structure of the protein target or a part of it is missing (such as in cases of most G-protein-coupled receptors), computational approaches allow the generation of such structures (e.g. by homology modelling, threading, or *ab initio* predictions) and its use for predictions which can be validated experimentally, despite of the absence of protein structural data. Therefore, usage of theoretical methods is indispensable – not only for the interpretation of the existing experimental data, but also to direct and design new experiments.

Because of space limitations, only two theoretical methods, which are the most relevant for thermodynamics of molecular binding events, will be briefly discussed: MD-related methods (which includes MD simulations, FEP-like approaches, methods which use MD algorithms with enhanced sampling, and hybrid QM/MM schemes), and quantum chemical (QM) calculations. This division is not strict and many of these methods overlap, e.g. QM/MM methods use both MD simulations and QM calculations, and FEP-like methods have many flavours, including hybrid QM/MM-FEP.

#### **3.2.1 Molecular Dynamics (MD) simulations**

Molecular dynamics (MD) simulation consists of the numerical solution of the Newton's equations of motion of a system (e.g. protein, or a ligand-protein complex in water environment). The potential energy of the particle system is described by a function called force field (6). The potential energy of the system (U) is described as a sum of energy terms for covalent bonds, angles, dihedral angles, a van der Waals non-bonded term, and a nonbonded electrostatic term (Cornell *et al.*, 1995). Since the kinetic energy is also taken into account, the system is able to move across the energy barriers on the potential energy surface, which implies substantial changes (e.g. conformational) during the simulation.

$$\begin{split} \mathcal{U} &= \sum\_{\text{bonds}} k\_r \left( r - r\_0 \right)^2 + \sum\_{\text{angles}} k\_\theta \left( \theta - \theta\_0 \right)^2 + \sum\_{\text{dihedrals}} k \phi \left[ 1 + \cos \left( n \phi + \phi\_0 \right) \right] \\ &+ \sum\_{\text{atom } i} \sum\_{j \neq i} 4 \varepsilon\_{i,j} \left[ \left( \frac{\sigma\_{i,j}}{r\_{i,j}} \right)^{12} - \left( \frac{\sigma\_{i,j}}{r\_{i,j}} \right)^6 \right] + \sum\_i \sum\_{j \neq i} \frac{q\_i q\_j}{\varepsilon\_0 r\_{i,j}} \end{split} \tag{6}$$

The principles of MD simulations, algorithms used, and different types of force fields applied (all-atom, united atom, coarse-grain, etc) have been described in many publications

binding pose and the strength of their interactions) can be predicted by molecular docking, albeit intrinsic entropic contributions and solvation effects are usually ignored. Dynamic behaviour of proteins and ligands can be studied using extensive molecular dynamics (MD) simulation, which, combined with experimental NMR and ITC data, provide extremely valuable information on configurational entropy changes upon binding event, and hence about the intrinsic entropic contribution to the free energy. The global free energy changes can be studied by free energy perturbation (FEP) calculations, or related methods, such as thermodynamic integration (TI). Molecular docking methods allow for a quick assessment of enthalpic contributions, while solvent effects can be studied either by quantum chemical (QM) calculations (e.g. COSMO model), hybrid QM/MM schemes, or FEP-related methods. Theoretical approaches allow also for investigations of transient phenomena, e.g. shortliving alternative conformers from an ensemble that contribute to the binding event but which cannot be readily observed. In a situation – which is not uncommon – when an experimental structure of the protein target or a part of it is missing (such as in cases of most G-protein-coupled receptors), computational approaches allow the generation of such structures (e.g. by homology modelling, threading, or *ab initio* predictions) and its use for predictions which can be validated experimentally, despite of the absence of protein structural data. Therefore, usage of theoretical methods is indispensable – not only for the interpretation of the existing experimental data, but also to direct and design new

Because of space limitations, only two theoretical methods, which are the most relevant for thermodynamics of molecular binding events, will be briefly discussed: MD-related methods (which includes MD simulations, FEP-like approaches, methods which use MD algorithms with enhanced sampling, and hybrid QM/MM schemes), and quantum chemical (QM) calculations. This division is not strict and many of these methods overlap, e.g. QM/MM methods use both MD simulations and QM calculations, and FEP-like methods

Molecular dynamics (MD) simulation consists of the numerical solution of the Newton's equations of motion of a system (e.g. protein, or a ligand-protein complex in water environment). The potential energy of the particle system is described by a function called force field (6). The potential energy of the system (U) is described as a sum of energy terms for covalent bonds, angles, dihedral angles, a van der Waals non-bonded term, and a nonbonded electrostatic term (Cornell *et al.*, 1995). Since the kinetic energy is also taken into account, the system is able to move across the energy barriers on the potential energy surface, which implies substantial changes (e.g. conformational) during the simulation.

> 2 2 0 0 0

The principles of MD simulations, algorithms used, and different types of force fields applied (all-atom, united atom, coarse-grain, etc) have been described in many publications

*q q*

  1 cos

 

(6)

bonds angles dihedrals 12 6 , ,

*ij i ij ij i ji i j*

 

*ij ij i j*

*rr r*

*U krr k k n*

 

atom ,, 0 ,

have many flavours, including hybrid QM/MM-FEP.

**3.2.1 Molecular Dynamics (MD) simulations** 

,

*i j*

4

*r*

experiments.

(Klepeis *et al.*, 2009 and references therein). MD methods rely on quality of the force field (parameters, inclusion of non-additive effects, etc), description of solvent effects, adequate sampling, and quality of initial structures used for the simulations. The quality of the results relies also on the duration of the simulation. There are limits on the time scales at which the system of interest can be considered. Simulation runs are fairly short: typically nanoseconds to microseconds, rarely extending to miliseconds, if super-fast computers are employed. Since biological processes (ligand-protein binding, large conformational changes, etc) typically occur ar micro-to-milisecond scales, one needs to assess whether or not a simulation has reached equilibrium before the averages calculated can be trusted. Furthermore, the averages obtained need to be subjected to a statistical analysis, to make an estimate of the errors.

MD methods have been widely employed to study ligand-protein binding phenomena, conformational changes, solvent effects, and to assess individual contributions to the binding free energy. These methods are particularly useful in assessing the conformational entropic contribution to the free energy. Information about ps-to-ns time scale molecular motions can be readily obtained from the MD simulation trajectory and analysed either through diagonalisation of the covariance matrix of displacements of atomic Cartesian coordinates - quasi-harmonic analysis, Schlitter's approach (7,8), analysed through principal component analysis (PCA), or quantified NMR-like via generalised order parameters. Entropy changes can be estimated from the MD trajectory through Yang and Kay's relationship (9). The order parameter analysis has the advantage of being able to calculate order parameters by-vector, thus providing site-specific information on flexibility and hence intrinsic entropic contribution. Computed parameters can be also directly compared to the experimental results of NMR relaxation analysis (Best and Vendruscolo, 2004). In last few years several studies proved the success of this methodology in estimating of entropic contributions to the binding thermodynamics.

$$S < S' = \frac{1}{2} k\_B \ln \det \left[ \mathbf{1} + \frac{k\_B T e^2}{\hbar^2} \mathbf{M} \sigma \right] \tag{7}$$

$$
\sigma\_{ij} = \left\langle \left( \mathbf{x}\_i - \left\langle \mathbf{x}\_i \right\rangle \right) \left( \mathbf{x}\_j - \left\langle \mathbf{x}\_j \right\rangle \right) \right\rangle \tag{8}
$$

$$S^2 = \frac{3}{2} \left[ \left< \mathbf{x}^2 \right>^2 + \left< y^2 \right>^2 + \left< z^2 \right>^2 + 2\left< xy \right>^2 + 2\left< yz \right>^2 + 2\left< yz \right>^2 \right] - \frac{1}{2} \tag{9}$$

MD-based approaches used for binding thermodynamics calculations include free energy perturbation (FEP) (Foloppe and Hubbard, 2006), thermodynamic integration (TI) (Straatsma and Berendsen, 1988), lambda-dynamics simulations (Knight and Brooks, 2009), Molecular Mechanics-Poisson-Boltzmann Surface Area (MM-PBSA) (Gilson and Zhou, 2007), Linear Interaction Energy (LIE) (Gilson and Zhou, 2007), and hybrid quantum chemical/molecular mechanical (QM/MM) (Senn and Thiel, 2009) methods.

Free energy perturbation (FEP) is used to calculate free energy differences between two states from MD simulations. These two states can represent, for instance, unbound (apo) protein and a ligand-protein complex (holo), or two ligand-protein complexes with different ligands. In the framework of FEP, the difference in the free energy difference for two states is obtained from the Zwanzig equation (10).

Ligand-protein interactions can be driven by quantum effects. These include charge transfer, halogen bonds, or polarisation. Stabilisation energy related to charge transfer can be several

'Conventional' QM calculations, using HF, DFT, or semi-empirical methods provide a way to obtain the ground state energy of a ligand-protein system or a part of it. Most programs based on these are capable of studying molecular properties such as atomic charges, multipole moments, vibrational frequencies, and spectroscopic constants. In addition, there are methods allowing the study of excited-state processes, such as time-dependent DFT or

The application area of QM methods is vast. QM calculations are used for charge derivation for molecular dynamics simulations, for description of direct interactions (hydrogen bonds, halogen bonds, aromatic stacking), for calculations of pKa, protonation, redox states, and for

Derivation of accurate charges for a system being studied is an important step in preparation for MD simulation. Failure in charge representation will inevitably lead to incorrect results. Derivation of charges is done using QM calculations, usually in several steps, involving optimisation, electrostatic potential generation, and fitting charges into atoms. RESP methodology, based on charges derived from *ab initio* HF/6-31G\* level of theory has been for many years a standard in deriving charges for MD simulations (Bayly *et* 

Charge distribution is also required for the calculation of the solvation properties using conductor-like screening model (COSMO) (Klamt and Schüürmann, 1993). COSMO, just like any other continuum solvent approach, approximates the solvent by a dielectric continuum, surrounding the solute molecules outside of a molecular cavity. In COSMO, the polarisation charges of the continuum, caused by the polarity of the solute molecule, is derived from a scaled-conductor approximation (hence the name). In this way, the charge distribution of the molecule, which can be obtained from the QM calculations, and the energy of the

QM calculations are also used for force field development, such as adding new parameters and incorporating non-additive effects. Several studies indicate that non-additive effects (.e.g. electronic polarisation) significantly affect binding affinities of many ligands (Ji and Zhang, 2008, 2009 and references therein). Electrostatic interactions are critically dependent on charge distribution around both interacting species, and this distribution is heavily dependent on the conformation (geometry) of the complex. Description of hydrogen bonding is also affected by electronic polarisation – some hydrogen bonds, which are found broken during MD simulation using 'conventional' force fields are found to be stable, when non-additive force field is used (Ji and Zhang, 2009). Corrections for polarisation can be added to MD force fields in order to derive protein charges more accurately and provide a better description of electrostatic interactions. Protein polarisation is important for stabilisation of the native structures of proteins. MD simulations indicate that inclusion of polarisation effects not only improves the description of protein native structures, but also distinguishes native from decoy dynamically: the former are more stable than the latter under the polarised force fields. These observations provide strong evidence that inclusion of polarisation effects in calculations of ligand-protein interactions is likely to greatly

kcal/mol and force field-based schemes cannot describe this stabilisation correctly.

**3.2.2 Quantum mechanical (QM) calculations** 

*al.*, 1993, Cieplak *et al.*, 1995, Cornell *et al*., 1993).

improve accuracy of such calculations.

restricted open-shell Kohn-Sham (ROKS) (Li and Liu, 2010).

studying solvation effects, such as computing free solvation energies.

interaction between the solvent and the solute molecule can be determined.

$$
\Delta G \left( A \to B \right) = G\_B - G\_A = -k\_B T \ln \left\langle \exp \left( -\frac{E\_B - E\_A}{k\_B T} \right) \right\rangle\_A \tag{10}
$$

where A and B represent two states (e.g. apo and holo protein), G is the difference between free energies of both states, *Bk* is the Boltzmann's constant, T is the temperature, and the triangular brackets denote an average over a simulation run for state A.

FEP calculations converge properly only when the difference between these two states is small enough; therefore it is usually necessary to divide a perturbation into a series of smaller 'steps', which are calculated independently.

Thermodynamic integration (TI) is a related method to calculate free energy differences. Since the free energy can be expressed by its relation to the canonical partition function, the free energy difference in two different states can be used to calculate the difference of potential energy. TI calculations are usually executed by designing a thermodynamic cycle (Figure 1), and integrating along the relevant path. The path can be either a real chemical process or an artificial change (e.g. substitution of a methyl group by hydrogen atom).

The MD methods, despite their numerous successes, suffer of two major bottlenecks. One is the results are critically dependent on the force field used, therefore requires caution when use of appropriate force field and parameters. Many modern force fields are parametrised on experimental NMR data, some are able to include – to some extent – non-additive effects (electronic polarisation). Application of QM/MM schemes allow the inclusion of quantum effects to some extent. Another bottleneck is related to the adequacy of sampling. It is known that – due to relatively short time scales investigated – only some subsets of potential conformational changes can be observed, and often the system gets 'stuck' in a minimum, which does not have to be the global one. This makes the results heavily biased towards the starting structure and is very likely to underestimate the degree of molecular motions observed in the system. Prolonging the simulation time helps to solve the sampling problem only to some extent, and significantly increases the computational cost of MD simulations. Thus, in order to overcome the sampling issue, various enhanced sampling techniques have been employed. One of such methods, frequently used, is replica exchange MD (REMD), which attempts to overcome the problem of multiple-minima by exchanging temperatures of several replicas of the system. These replicas are non-interacting with each other and they run at different temperatures. REMD is also called "parallel tempering" (Earl and Deem, 2005). Another approach used to improve sampling is to construct the bias potential and add it to the potential energy function of the system (force field). This group of methods, referred to as umbrella sampling methods, consist of metadynamics (Laio and Gervasio, 2008), conformational flooding, and accelerated dynamics (Lange et al., 2006). The core feature of metadynamics is the construction of so-called reference potential, which is one that is the most similar to the actual potential. That is, repulsive markers are placed in a coarse time line in a space that is spanned by a small number of relevant collective variables. These markers are then placed on top of the underlying free energy landscape in order to push the system to rapidly accumulate in the initial basin by discouraging it from revisiting points in configurational space. In this way, the system is allowed to escape the lowest transition state as soon as the growing biasing potential and the underlying free energy well exactly counterbalance each other, effectively allowing the simulation to escape free energy minima.

### **3.2.2 Quantum mechanical (QM) calculations**

24 Thermodynamics – Interaction Studies – Solids, Liquids and Gases

where A and B represent two states (e.g. apo and holo protein), G is the difference between free energies of both states, *Bk* is the Boltzmann's constant, T is the temperature, and the

FEP calculations converge properly only when the difference between these two states is small enough; therefore it is usually necessary to divide a perturbation into a series of

Thermodynamic integration (TI) is a related method to calculate free energy differences. Since the free energy can be expressed by its relation to the canonical partition function, the free energy difference in two different states can be used to calculate the difference of potential energy. TI calculations are usually executed by designing a thermodynamic cycle (Figure 1), and integrating along the relevant path. The path can be either a real chemical process or an artificial change (e.g. substitution of a methyl group by hydrogen atom). The MD methods, despite their numerous successes, suffer of two major bottlenecks. One is the results are critically dependent on the force field used, therefore requires caution when use of appropriate force field and parameters. Many modern force fields are parametrised on experimental NMR data, some are able to include – to some extent – non-additive effects (electronic polarisation). Application of QM/MM schemes allow the inclusion of quantum effects to some extent. Another bottleneck is related to the adequacy of sampling. It is known that – due to relatively short time scales investigated – only some subsets of potential conformational changes can be observed, and often the system gets 'stuck' in a minimum, which does not have to be the global one. This makes the results heavily biased towards the starting structure and is very likely to underestimate the degree of molecular motions observed in the system. Prolonging the simulation time helps to solve the sampling problem only to some extent, and significantly increases the computational cost of MD simulations. Thus, in order to overcome the sampling issue, various enhanced sampling techniques have been employed. One of such methods, frequently used, is replica exchange MD (REMD), which attempts to overcome the problem of multiple-minima by exchanging temperatures of several replicas of the system. These replicas are non-interacting with each other and they run at different temperatures. REMD is also called "parallel tempering" (Earl and Deem, 2005). Another approach used to improve sampling is to construct the bias potential and add it to the potential energy function of the system (force field). This group of methods, referred to as umbrella sampling methods, consist of metadynamics (Laio and Gervasio, 2008), conformational flooding, and accelerated dynamics (Lange et al., 2006). The core feature of metadynamics is the construction of so-called reference potential, which is one that is the most similar to the actual potential. That is, repulsive markers are placed in a coarse time line in a space that is spanned by a small number of relevant collective variables. These markers are then placed on top of the underlying free energy landscape in order to push the system to rapidly accumulate in the initial basin by discouraging it from revisiting points in configurational space. In this way, the system is allowed to escape the lowest transition state as soon as the growing biasing potential and the underlying free energy well exactly counterbalance each other, effectively allowing the simulation to escape free energy

*kBT*

 

*A* (10)

 

*G A <sup>B</sup> GB GA kBT* ln exp *EB EA*

triangular brackets denote an average over a simulation run for state A.

smaller 'steps', which are calculated independently.

minima.

Ligand-protein interactions can be driven by quantum effects. These include charge transfer, halogen bonds, or polarisation. Stabilisation energy related to charge transfer can be several kcal/mol and force field-based schemes cannot describe this stabilisation correctly.

'Conventional' QM calculations, using HF, DFT, or semi-empirical methods provide a way to obtain the ground state energy of a ligand-protein system or a part of it. Most programs based on these are capable of studying molecular properties such as atomic charges, multipole moments, vibrational frequencies, and spectroscopic constants. In addition, there are methods allowing the study of excited-state processes, such as time-dependent DFT or restricted open-shell Kohn-Sham (ROKS) (Li and Liu, 2010).

The application area of QM methods is vast. QM calculations are used for charge derivation for molecular dynamics simulations, for description of direct interactions (hydrogen bonds, halogen bonds, aromatic stacking), for calculations of pKa, protonation, redox states, and for studying solvation effects, such as computing free solvation energies.

Derivation of accurate charges for a system being studied is an important step in preparation for MD simulation. Failure in charge representation will inevitably lead to incorrect results. Derivation of charges is done using QM calculations, usually in several steps, involving optimisation, electrostatic potential generation, and fitting charges into atoms. RESP methodology, based on charges derived from *ab initio* HF/6-31G\* level of theory has been for many years a standard in deriving charges for MD simulations (Bayly *et al.*, 1993, Cieplak *et al.*, 1995, Cornell *et al*., 1993).

Charge distribution is also required for the calculation of the solvation properties using conductor-like screening model (COSMO) (Klamt and Schüürmann, 1993). COSMO, just like any other continuum solvent approach, approximates the solvent by a dielectric continuum, surrounding the solute molecules outside of a molecular cavity. In COSMO, the polarisation charges of the continuum, caused by the polarity of the solute molecule, is derived from a scaled-conductor approximation (hence the name). In this way, the charge distribution of the molecule, which can be obtained from the QM calculations, and the energy of the interaction between the solvent and the solute molecule can be determined.

QM calculations are also used for force field development, such as adding new parameters and incorporating non-additive effects. Several studies indicate that non-additive effects (.e.g. electronic polarisation) significantly affect binding affinities of many ligands (Ji and Zhang, 2008, 2009 and references therein). Electrostatic interactions are critically dependent on charge distribution around both interacting species, and this distribution is heavily dependent on the conformation (geometry) of the complex. Description of hydrogen bonding is also affected by electronic polarisation – some hydrogen bonds, which are found broken during MD simulation using 'conventional' force fields are found to be stable, when non-additive force field is used (Ji and Zhang, 2009). Corrections for polarisation can be added to MD force fields in order to derive protein charges more accurately and provide a better description of electrostatic interactions. Protein polarisation is important for stabilisation of the native structures of proteins. MD simulations indicate that inclusion of polarisation effects not only improves the description of protein native structures, but also distinguishes native from decoy dynamically: the former are more stable than the latter under the polarised force fields. These observations provide strong evidence that inclusion of polarisation effects in calculations of ligand-protein interactions is likely to greatly improve accuracy of such calculations.

binding pocket of HBP contains a number of polar and charged residues, hence it is an example of a 'hydrophilic' binder. In contrast to HBP, the binding pocket of MUP is very 'hydrophobic'. Surprisingly, both HBP and MUP are characterised by similar overall entropy

In our recent study (Syme *et al*., 2010) we compared the driving forces for binding between these two proteins in terms of entropic contributions from ligand, protein, and solvent. We performed an extensive study combining x-ray crystallography, NMR spectroscopy, ITC,

The structure of MUP was solved both by x-ray crystallography and solution NMR (Barratt *et al.*, 2005, Kuser *et al.*, 2001, Timm *et al.*, 2001). Several ligand-MUP complexes were studied, including long-chain alcohols, pyrazine derivatives, and pheromones as ligands. Regardless of the chemical nature of ligand, the protein structures are very similar to each other: the desolvated ligand, which occupies the central, hydrophobic binding pocket,

The crystal structure of HBP complexed with histamine revealed two binding sites for the ligand: one of them possessing considerably higher affinity than the other (Figure 4, right panel) (Syme *et al*., 2010). Therefore, in order to simplify the thermodynamic analysis of ligand binding, a mutant of HBP was designed. In this mutant, denoted as HBP-D24R, negatively charged aspartic acid D24 inside the "low" affinity site was replaced by larger and positively charged arginine. This abolished binding of ligand to the "low" affinity site.

Fig. 4. Crystal structures of MUP (left panel) and HBP (right panel). Both MUP and HBP are displayed with their ligands bound: octanediol (purple), and histamine (pink). Ligands are represented as VDW spheres. For MUP, superimposed structures of apo (blue) and holo (dark cyan) protein are showed, in order to display a lack of major conformational changes associated with ligand binding. For HBP, the second (low affinity) binding site is showed

Given that the binding pocket of MUP is very hydrophobic, an entropy-driven binding signature might have been expected for ligand-MUP interactions. Surprisingly, global

of ligand binding.

and coloured yellow.

**4.1.2 Calorimetric studies of MUP** 

MD simulations, and QM calculations.

causes very few conformational changes (Figure 4, left panel).

**4.1.1 Structures of HBP and MUP** 

QM methods are also used in molecular docking. The application of QM methods to molecular docking was pioneered by Raha and Merz (2004), who developed a semiempirical QM-based scoring function and studied ion-mediated ligand binding processes. Their conclusion was that quantum chemical description is required for metal-containing systems, mainly because of poorly-defined atom types of metal atoms in most of the force field parameters, which cannot describe the interactions between a small molecule ligand and a metal ion in the active site of the protein.

Applicability of QM methods to study ligand-protein system has been discussed in literature (Raha *et al.*, 2007, Stewart, 2007). As well as these successes, many examples of the failure of QM approaches have been demonstrated. However, it should be kept in mind that most of these studies were based on either DFT, or semi-empirical Hamiltonians, which do not describe van der Waals interactions and hydrogen bonding terms of ligand-protein interactions correctly. This is, indeed, a serious limitation of "fast", hence more popular QM methods. A straightforward way to solve this problem is to add additional correction terms to the QM energy. It has been demonstrated that the addition of the dispersion energy and corrections for hydrogen bonds improved the performance of semi-empirical QM methods dramatically (Rezac *et al.,* 2009). The recently developed PM6-DH2 method (Fanfrlik *et al.,* 2010) yields, to date, the most accurate results for non-covalent interactions among the semiempirical QM methods. For small non-covalent complexes, the results obtained were comparable to the high-level wave-function theory-based calculations within chemical accuracy (1 kcal/mol) (Rezac *et al.*, 2009).

Another major bottleneck of QM methods applied to studying ligand-protein interactions thermodynamics is the size of the system. DFT can handle up to 150 atoms, highly-accurate methods such as coupled-clusters can handle a few tens of atoms and require very fast computers and long computing times. This limitation of the size of the systems that can be studied seriously compromises its usage in the study of ligand-protein thermodynamics. For instance, the usage of the linear scaling algorithm MOZYME (Stewart, 2009) based on the localised orbitals allows size increases to systems as large as 18 000 atoms and above, which allows calculation of very large ligand-protein complexes. Due to these developments QM methods have become, therefore, very useful for fast and highly accurate predictions of ligand-protein interactions energetics.

An alternative approach is to use so-called divide-and-conquer (DIVCON) algorithm (Dixon and Merz, 1996, 1997). The principle is to divide a large system into many smaller subsystems, separately determine the electron density of each of these subsystems, and then to add the corresponding contributions from each subsystem in order to obtain the total electron density and the energy.
