**2. Molecular modelling methods and their usefulness**

Molecular recognition is a central phenomenon in biology, for example, with enzymes and their substrates, receptors and their signal inducing ligands, antibodies and antigens, among others. Given two molecules with 3D conformations in atomic detail, it is important to know if the molecules bind to each other and, if it is so, what does the formed complex look like ("docking") and how strong is the binding affinity (that can be related to the "scoring"functions).

Molecules are not rigid. The motional energy at room temperature is large enough to let all atoms in a molecule move permanently. That means that the absolute positions of atoms in a molecule, and of a molecule as a whole, are by no means fixed, and that the relative location of substituents on a single bond may vary with time. Therefore, any compound containing one or several single bonds exists at every moment in many different conformers, but generally only low energy conformers are found to a large extent [Kund, 1997, as cited in Tóth et al., 2005].

186 Bioinformatics

recognition process.

"scoring"functions).

Computational methods have become increasingly important in a number of areas such as comparative or homology modelling, functional site location, characterisation of ligandbinding sites in proteins, docking of small molecules into protein binding sites, proteinprotein docking, and molecular dynamic simulations [see for example Choe & Chang, 2002]. Current results yield information that is sometimes beyond experimental possibilities and

To apply computational methods in drug design, it is always necessary to remember that to be effective, a designed drug must discriminate successfully between the macromolecular target and alternative structures present in the organism. The last few years have witnessed the emergence of different computational tools aimed at understanding and modelling this process at the molecular level. Although still rudimentary, these methods are shaping a coherent approach to help in the design of molecules with high affinity and specificity, both in lead discovery and in lead optimisation. Moreover, current information on the 3D structure of proteins and their functions provide a possibility to understand the relevant molecular interactions between a ligand and a target macromolecule. As a consequence, a comprehensive study of drug structure–activity relationships can help identify a 3D pharmacophore model as an aid for rational drug design, as a pharmacophore model can be defined as 'an ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target and to trigger (or block) its biological response', and a pharmacophore model can be established either in a ligandbased manner, by superposing a set of active molecules and extracting common chemical features that are essential for their bioactivity, or in a structure-based manner, by probing

can be used to guide and improve a vast array of experiments.

possible interaction points between the macromolecular target and ligands.

**2. Molecular modelling methods and their usefulness** 

Molecular recognition (MR) is a general term designating non-covalent interactions between two or more compounds belonging to host-guest, enzyme-inhibitor and/or drug-receptor complexes. A rigorous approach to an MR study should involve the adoption of a computational method independent from the chemical intuition of the researcher. Drug design purposes prompt another challenging feature of such an ideal computational method, the ability to make sufficiently accurate thermodynamic predictions about the

Molecular recognition is a central phenomenon in biology, for example, with enzymes and their substrates, receptors and their signal inducing ligands, antibodies and antigens, among others. Given two molecules with 3D conformations in atomic detail, it is important to know if the molecules bind to each other and, if it is so, what does the formed complex look like ("docking") and how strong is the binding affinity (that can be related to the

Molecules are not rigid. The motional energy at room temperature is large enough to let all atoms in a molecule move permanently. That means that the absolute positions of atoms in a molecule, and of a molecule as a whole, are by no means fixed, and that the relative location The biological activity of a drug molecule is supposed to depend on one single unique conformation amongst all the low energy conformations, the search for this so-called bioactive conformation for compound sets being one of the major tasks in Medicinal Chemistry. Searching for all low energy conformations is possible with molecular modelling studies, since molecular modelling is concerned with the description of the atomic and molecular interactions that govern microscopic and macroscopic behaviours of physical systems. These molecular interactions are classified as: (a) bonded (stretching, bending and torsion), (b) non-bonded (electrostatic (including interactions with metals), van der Waals and π-stacking), and (c) derived, as they result from the previous ones (hydrogen bonds and hydrophobic effect).

Protein-ligand or, in general, molecule-molecule binding free energy differences can *a priori* be computed from first principles using free energy perturbation techniques and a full atomic detailed model with explicit solvent molecules using molecular dynamics simulations. However, these are computationally demanding. More affordable approaches use end-point molecular dynamic simulations and compute free energies accounting for solvent effects with continuum methods, such as MM-PBSA (molecular mechanics Poisson-Boltzman surface area) or MM-GBSA (generalized Born surface area) [Kollman et al., 2000; Wang et al., 2005]. One of the first approaches was comparative molecular field analysis (CoMFA) [Cramer et al., 1988], which enabled interpretation and understanding of enzyme active sites when the crystal structure was absent. However, this type of analysis was not possible until *in vitro* drug-drug interaction studies were widely used (through the 1990s).

## **2.1. Molecular mechanics, molecular dynamics and docking**

Molecular mechanics (MM) is often the only feasible means with which to model very large and non-symmetrical chemical systems such as proteins and polymers. Molecular mechanics is a purely empirical method that neglects explicit treatment of electrons, relying instead on the laws of classical physics to predict the chemical properties of molecules . As a result, MM calculations cannot deal with problems such as bond breakage or formation, where electronic or quantum effects dominate. Furthermore, MM models are wholly system-dependent. MM energy predictions tend to be meaningless as absolute quantities, as the zero or reference value depends on the number and types of atoms and their connectivity, and so they are generally useful only for comparative studies. A force field is an empirical approximation for expressing structure-energy relationships in molecules and is usually a compromise between speed and accuracy.

Molecular mechanics have been shown to produce more realistic geometry values for the majority of organic molecules, owing to the fact that they are highly parameterised. Parameterisation of structures should be performed with care and non-"standard" molecules will need to have new parameters. This is usually done by analogy for bonded terms and assigning charges by a procedure consistent with the used force field.

Using Molecular Modelling to Study Interactions Between Molecules with Biological Activity 189

so for other systems, although parameters that enable the simulation of other systems have

CHARMM (Chemistry at HARward Macromolecular Mechanics) developed by Karplus et al. [http://www.charmm.org] was originally devised for proteins and nucleic acids [Brooks et al., 1983], and is now used for a range of macromolecules, molecular dynamics, solvation, crystal packing, vibrational analysis and QM/MM (quantum mechanics/molecular mechanics) studies. It uses five valence terms, one of which is electrostatic and is a basis for

GROMOS (Groningen Molecular Simulation) developed at the University of Groningen and the ETH (Eidgenössische Technische Hochschule) of Zurich [http://www.igc.ethz. ch/GROMOS/index] is quite popular for predicting the dynamical motion of molecules and bulk liquids, also being used for modelling biomolecules. It uses five valence terms, one of which is electrostatic [van Gunsteren and Berendsen., 1977]. Its parameters are currently

MM1-4 (Molecular Mechanics) developed by Allinger [1976] are general purpose force fields for monofunctional organic molecules. The first version of this method was the MM1 [Allinger, 1976]. MM2 was parameterised for a lot of functional groups while MM3 [Allinger & Durkin, 2000; Allinger & Yan, 1993] is probably one of the most accurate ways of modelling hydrocarbons. MM4 is the latest version with several improvements [Allinger et al., 1996].

MMFF (Merck Molecular Force Field) developed by Halgren [1996] is also a general purpose force field mainly for organic molecules. MMFF94 [Halgren, 1996] was originally designed for molecular dynamics simulations, but has also been widely used for geometrical optimisation. It uses five valence terms, one of which is electrostatic and another is a cross term. MMFF was parameterised based on high level *ab initio* calculations. MMFF94 contains parameters for a wide variety of functional groups that arise in Organic and Medicinal

OPLS (Optimized Potential for Liquid Simulations) developed by Jorgensen at Yale [http://zarbi.chem.yale.edu] was designed for modelling bulk liquids [Jorgensen & Tirado-Rives, 1996] and has been extensively used for modelling the molecular dynamics of biomolecules. It uses five valence terms, one of which is an electrostatic term and none of

TRIPOS (Sybil force field) is a commercial method designed for modelling organics and biomolecules. It is often used for CoMFA analysis and uses five valence terms, one of which

CVFF (Consistent Valence Force Field) developed by Dauber-Osguthorpe is a method parameterised for small organic (amides and carboxylic acids, among others) crystals and gas phase structures [Dauber-Osguthorpe et al., 2004]. It handles peptides, proteins and a wide range of organic systems. It was primarily intended for studies of structures and binding energies, although it predicts vibrational frequencies and conformational energies

been published [for examples see Doshi & Hamelberg, 2009; Zgarbová et al., 2011].

other force fields (e.g., MOIL [Elber et al., 1995]).

being updated [Horta et al., 2011].

Chemistry.

them is a cross term.

is an electrostatic term.

reasonably well.

There are many levels of theory in which computational models of 3D structures can be constructed. The overall aim of modelling methods is often to try to relate biological activity to structure. An important step towards this goal is to be able to compute the potential energy of the molecule as a function of the position of the constituent atoms. Once a method for evaluating the molecular potential energy is available, it is natural to search for an optimum molecular geometry by minimising the energy of the system. In a biological macromolecule, the potential energy surface is a complicated one, in which there are many local energy minima as well as a single overall energy minimum. All the energy minimisation algorithms commonly used have a marked tendency to locate only a local energy minimum that is close to the starting conformation. For a biological macromolecule, the number of conformations that have to be searched rises exponentially with the size of the molecule; hence, systematic searching is not a practical method for large molecules.

Molecular dynamics (MD) is a conformation space search procedure in which the atoms of a biological macromolecule are given an initial velocity and are then allowed to evolve in time according to the laws of Newtonian mechanics [van Gunsteren & Berendsen, 1977]. Depending on the simulated temperature of the system, the macromolecule can then overcome barriers at the potential energy surface in a way that is not possible with a minimisation procedure. One useful combination of molecular dynamics and minimisation schemes is a method known as simulated annealing [Kirkpatrick et al, 1983, Černý, 1985]. This method uses a molecular dynamics calculation in which the system temperature is raised to a high value to allow for a widespread exploration of the available conformational space. The system temperature is then gradually decreased as further dynamics are performed. Finally, a minimisation phase may be used to select a minimum energy molecular conformation.

One of the most important applications of molecular modelling techniques in structural biology is the simulation of the docking of a ligand molecule onto a receptor. These methods often search to identify the location of the ligand binding site and the geometry of the ligand in the active site, to get the correct ranking when considering a series of related ligands in terms of their affinity, or to evaluate the absolute binding free energy as accurately as possible. To select a force field and the adequate modelling methodology for a given task, it is important to appreciate the range of molecular systems to which it is applicable and the types of simulations that can be performed.

#### **2.2. Most used existing force fields**

AMBER (Assisted Model Building with Energy Refinement) developed by Kollman et al. [http://ambermd.org/] was originally parameterised specifically for proteins and nucleic acids [Weiner et al., 1984, 1986; Cornell et al., 1995], using 5 bonding and non-bonding terms along with a sophisticated electrostatic treatment and with no cross terms included. The results obtained with this method can be very good for proteins and nucleic acids, but less so for other systems, although parameters that enable the simulation of other systems have been published [for examples see Doshi & Hamelberg, 2009; Zgarbová et al., 2011].

188 Bioinformatics

molecular conformation.

types of simulations that can be performed.

**2.2. Most used existing force fields** 

molecules will need to have new parameters. This is usually done by analogy for bonded

There are many levels of theory in which computational models of 3D structures can be constructed. The overall aim of modelling methods is often to try to relate biological activity to structure. An important step towards this goal is to be able to compute the potential energy of the molecule as a function of the position of the constituent atoms. Once a method for evaluating the molecular potential energy is available, it is natural to search for an optimum molecular geometry by minimising the energy of the system. In a biological macromolecule, the potential energy surface is a complicated one, in which there are many local energy minima as well as a single overall energy minimum. All the energy minimisation algorithms commonly used have a marked tendency to locate only a local energy minimum that is close to the starting conformation. For a biological macromolecule, the number of conformations that have to be searched rises exponentially with the size of the molecule; hence, systematic searching is not a practical method for large molecules.

Molecular dynamics (MD) is a conformation space search procedure in which the atoms of a biological macromolecule are given an initial velocity and are then allowed to evolve in time according to the laws of Newtonian mechanics [van Gunsteren & Berendsen, 1977]. Depending on the simulated temperature of the system, the macromolecule can then overcome barriers at the potential energy surface in a way that is not possible with a minimisation procedure. One useful combination of molecular dynamics and minimisation schemes is a method known as simulated annealing [Kirkpatrick et al, 1983, Černý, 1985]. This method uses a molecular dynamics calculation in which the system temperature is raised to a high value to allow for a widespread exploration of the available conformational space. The system temperature is then gradually decreased as further dynamics are performed. Finally, a minimisation phase may be used to select a minimum energy

One of the most important applications of molecular modelling techniques in structural biology is the simulation of the docking of a ligand molecule onto a receptor. These methods often search to identify the location of the ligand binding site and the geometry of the ligand in the active site, to get the correct ranking when considering a series of related ligands in terms of their affinity, or to evaluate the absolute binding free energy as accurately as possible. To select a force field and the adequate modelling methodology for a given task, it is important to appreciate the range of molecular systems to which it is applicable and the

AMBER (Assisted Model Building with Energy Refinement) developed by Kollman et al. [http://ambermd.org/] was originally parameterised specifically for proteins and nucleic acids [Weiner et al., 1984, 1986; Cornell et al., 1995], using 5 bonding and non-bonding terms along with a sophisticated electrostatic treatment and with no cross terms included. The results obtained with this method can be very good for proteins and nucleic acids, but less

terms and assigning charges by a procedure consistent with the used force field.

CHARMM (Chemistry at HARward Macromolecular Mechanics) developed by Karplus et al. [http://www.charmm.org] was originally devised for proteins and nucleic acids [Brooks et al., 1983], and is now used for a range of macromolecules, molecular dynamics, solvation, crystal packing, vibrational analysis and QM/MM (quantum mechanics/molecular mechanics) studies. It uses five valence terms, one of which is electrostatic and is a basis for other force fields (e.g., MOIL [Elber et al., 1995]).

GROMOS (Groningen Molecular Simulation) developed at the University of Groningen and the ETH (Eidgenössische Technische Hochschule) of Zurich [http://www.igc.ethz. ch/GROMOS/index] is quite popular for predicting the dynamical motion of molecules and bulk liquids, also being used for modelling biomolecules. It uses five valence terms, one of which is electrostatic [van Gunsteren and Berendsen., 1977]. Its parameters are currently being updated [Horta et al., 2011].

MM1-4 (Molecular Mechanics) developed by Allinger [1976] are general purpose force fields for monofunctional organic molecules. The first version of this method was the MM1 [Allinger, 1976]. MM2 was parameterised for a lot of functional groups while MM3 [Allinger & Durkin, 2000; Allinger & Yan, 1993] is probably one of the most accurate ways of modelling hydrocarbons. MM4 is the latest version with several improvements [Allinger et al., 1996].

MMFF (Merck Molecular Force Field) developed by Halgren [1996] is also a general purpose force field mainly for organic molecules. MMFF94 [Halgren, 1996] was originally designed for molecular dynamics simulations, but has also been widely used for geometrical optimisation. It uses five valence terms, one of which is electrostatic and another is a cross term. MMFF was parameterised based on high level *ab initio* calculations. MMFF94 contains parameters for a wide variety of functional groups that arise in Organic and Medicinal Chemistry.

OPLS (Optimized Potential for Liquid Simulations) developed by Jorgensen at Yale [http://zarbi.chem.yale.edu] was designed for modelling bulk liquids [Jorgensen & Tirado-Rives, 1996] and has been extensively used for modelling the molecular dynamics of biomolecules. It uses five valence terms, one of which is an electrostatic term and none of them is a cross term.

TRIPOS (Sybil force field) is a commercial method designed for modelling organics and biomolecules. It is often used for CoMFA analysis and uses five valence terms, one of which is an electrostatic term.

CVFF (Consistent Valence Force Field) developed by Dauber-Osguthorpe is a method parameterised for small organic (amides and carboxylic acids, among others) crystals and gas phase structures [Dauber-Osguthorpe et al., 2004]. It handles peptides, proteins and a wide range of organic systems. It was primarily intended for studies of structures and binding energies, although it predicts vibrational frequencies and conformational energies reasonably well.
