**Part 2**

**Dynamics of Biomolecules** 

82 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Lundberg, J. L.; Mooney, E. J.; Rogers, C. E. (1969) *J. Polym. Sci., Part A*, *7*, 947.

Müller-Plathe, F.; Rogers, S. C.; van Gunsteren, W. F. (1993) *J. Chem. Phys.* 98, 9895.

Raymond, P. C.; Paul, D. R. (1990) *J. Polym. Sci., Part B: Polym. Phys.*, *28*, 2079-2102.

Sok, R. M.; Berendsen, H. J. C.; van Gunsteren, W. F. (1992) *J. Chem. Phys.*, 96, 4699.

Vieth, W. R.; Tam, P. M.; Michaels, A. S. (1966a) *J. Colloid Interface Sci.*, *22*, 360-370. Vieth, W. R.; Frangoulis, C. S.; Rionda, J. A. (1966b) *J. Colloid Interface Sci.*, *22*, 454-461. von Solms N., Michelsen M. L., Kontogeorgis G. M. (2005) *Ind. Eng. Chem. Res.* 44, 3330.

Panagiotopoulos, A. Z.; Quirke N.; Stapleton, M.; Tildesley, D. J. (1998) *Mol. Phys.* 63, 527. Pandiyan, S.; Brown, D., Neyertz, S.; van der Vegt, N. F. A. (2010) *Macromolecules*, 43, 2605–

Sada, E.; Humazawa, H.; Yakushiji, H.; Bamba, Y.; Sakata, K.; Wang, S.-T. (1987) *Ind. Eng.* 

Stannett, V. T.; Koros W. J., Paul D. R., Lonsdale H. K., Baker R. W. (1978) *Adv. Polym. Sci.*,

McQuarrie D. A. (1976) *Statistical mechanics*, Harper Collins, New York. Milano, G.; Guerra G.; Müller-Plathe, F. (2002) *Chem. Mater.* 14, 2977.

Mooney D. A.; MacElroy J. M. D. (1999) *J. Chem. Phys.* 110, 11087 Mozaffari, F.; Eslami, H.; Moghadasi, J. (2010) *Polymer* 51, 300-307.

Neyertz, S.; Brown, D. (2004) *Macromolecules*, 37, 10109-10122 Norman G. E.; Filinov, V. S. (1969) *High Temp.* (USSR), 7, 216. Ozkan I. A.; Teja A. S. (2005) *Fluid Phase Equilib.* 487, 228-229. Pace R. J.; Datyner, A. (1979) *J. Polym. Sci. A: Polym. Phys.*, 17, 437.

Mauritz, K. A.; Storey, R. F. (1990) *Macromolecules*, 23, 2033. McDonald, I. R.; Singer, K. (1967) *Discuss Faraday Soc.* 43, 40.

Moller, D.; Fischer, J. (1990) Mol. Phys. 69, 463.

Müller-Plathe, F. (1991) *Macromolecules*, 24, 6475. Müller-Plathe, F. (1994) *Acta Polym. Sci.* 45, 259-293.

Panagiotopoulos, A. Z. (1987) *Mol. Phys.* 61, 813.

Pant, K. P. V.; Boyd, R. H. (1993) *Macromolecules* 26, 679.

Pricl, S., Fermeglia, M. (2003) *Chem. Eng. Comm.* 190, 1267

Rudisill, E. N., Cummings, P. T. (1989) *Mol. Phys.* 68, 629.

Shing, K. S.; Gubbins, K. E. (1981) *Mol. Phys.* 43, 717.

Valleau, J. P.; Graham, I. S. (1990) *J. Phys. Chem.* 94, 7894.

Watt, I. C. (1980) *J. Macromol. Sci.: A Pure Appl. Chem.*, 14, 245.

Sanchez I. C., Rodgers P. A. (1990) *Pure & Appl. Chem.,* 62, 2107 Shelley, J. C., Patey, G. N. (1995) *J. Chem. Phys.* 102, 7656.

Tamai, Y.; Tanaka, H.; Nakanishi, K. (1995) *Macromolécules* 28, 2544

*Chem. Res. 26*, 433-438.

Stannett, V. (1978) *J. Membr. Sci.*, *3*, 97-115.

Valleau, J. P. (1991) *J. Comput. Phys.* 96, 193.

Von Wroblewski, S. (1879) *Ann. Phys.* 8, 29. Vrabec, J.; Hasse, H. (2002) *Mol. Phys.* 100, 3375. Vrabec, J.; Fischer, J. (1995) *Mol. Phys.* 85, 781. Vrentas, J. S.; Duda, J. L. (1979) *AIChE J.* 25, 1.

Widom B. J. (1963) *J. Chem. Phys.* 39, 2808

Potoff, J. J.; Panagiotopoulos, A. Z. (1998) *J. Chem. Phys.* 109, 10914.

Rueda, D. R.; Varkalis, A. (1995) *J. Polym. Sci. B: Polym. Phys.*, 33, 2263.

Shigetomi, T.; Tsuzumi, H.; Toi, K.; Ito, T. (2000) *J. Appl. Polym. Sci.*, 76, 67.

2621.

32, 69

## **M.Dyna***Mix* **Studies of Solvation, Solubility and Permeability**

Aatto Laaksonen1, Alexander Lyubartsev1 and Francesca Mocci1,2 *1Stockholm University, Division of Physical Chemistry, Department of Materials and Environmental Chemistry, Arrhenius Laboratory, Stockholm 2Università di Cagliari, Dipartimento di Scienze Chimiche Cittadella Universitaria di Monserrato, Monserrato, Cagliari 1Sweden* 

*2Italy* 

### **1. Introduction**

During the last four decades Molecular Dynamics (MD) simulations have developed to a powerful discipline and finally very close to the early vision from early 80's that it would mature and become a computer laboratory to study molecular systems in conditions similar to that valid in experimental studies using instruments giving information about molecular structure, interactions and dynamics in condensed phases and at interfaces between different phases. Today MD simulations are more or less routinely used by many scientists originally educated and trained towards experimental work which later have found simulations (along with Quantum Chemistry and other Computational methods) as a powerful complement to their experimental studies to obtain molecular insight and thereby interpretation of their results.

In this chapter we wish to introduce a powerful methodology to obtain detailed and accurate information about solvation and solubility of different categories of solute molecules and ions in water (and other solvents and phases including mixed solvents) and also about permeability and transport of solutes in different non-aqueous phases. Among the most challenging problems today are still the computations of the free energy and many to it related problems. The methodology used in our studies for free energy calculations is our Expanded Ensemble scheme recently implemented in a general MD simulation package called "M.Dyna*Mix*".

Although several very powerful MD simulation programs exist today we wish to highlight some of the more unique features of M.Dyna*Mix* which is both general purpose package but also designed towards applications which can be directly connected to spectroscopic experiments with solvation state NMR as example. As MD is the only method to obtain dynamical information in realistic (but still limited) time scales we wish to give examples of motional characteristics of molecules in both bulk solvents and in complex biomolecular systems (containing for example DNA, proteins and lipid membranes) in specific noncovalent interactions of varying strength. In addition we will demonstrate the capabilities of MD simulations in design of orally administrated drugs discussing problems of developing force field models applicable both in physiological water phases and barriers made of biological cell membranes. We will also present theoretical models to couple simulation results with experiments such as NMR relaxation and also powerful schemes to visualize often very complex solvation structures and relate it to intermolecular interactions within the solvation sphere and to the undergoing molecular dynamics behind coordination and solvation. We consider also applications of M.Dyna*Mix* package to simulations of large biomolecular systems, such as DNA molecules in ionic solution and lipid membrane bilayers.

### **2 The tools: M.Dyna***Mix* **software**

In this section we describe briefly possibilities of M.Dyna*Mix* package and give a brief manual of its usage, from preparing of molecular structures to analysis of simulation results.

The M.Dyna*Mix* MD simulation package has been developed in our laboratory in 90-ties (Lyubartsev & Laaksonen, 1998a, 2000). This is a scalable parallel general purpose MD software intended for simulations of arbitrary molecular mixtures. The program can employ most of commonly used force fields which are based on the potential energy function including electrostatic and Lennard-Jones interactions, covalent bond and angles, dihedral angles of various types as well as some other optional terms. The long-range electrostatic interactions are treated by the Ewald summation method (Allen & Tildesley, 1987). Various kinds of temperature and pressure control, including separate pressure control in different directions, are included. Two types of MD integrators are implemented: one is based on timereversible double time step algorithm by Tuckerman et al. (Tuckerman et al., 1992) which is used for simulations of flexible molecular models, and another is based on constraint dynamics with the use of "SHAKE" algorithm (Ryckaert et al., 1977), for simulations of rigid models. The program is written in standard Fortran-77 in a modular manner, the source code is well commented which make it suitable for users modifications. M.Dyna*Mix* version 5.2, released in 2010, includes an option for free energy calculation within the Expanded Ensemble method (Lyubartsev et al., 1992, 1998) which allows automated calculations of free energies and chemical potentials in single simulation run. The program is efficiently parallelized and scales well on up to a hundred processors or cores. The program contains utilities for preparation of molecular description files, as well as analysis suite to perform various calculations from the saved simulation trajectories. The flow chart of the program is schematically shown in **Figure 1**. In this chapter, we give a brief "quick start" introduction to the program, in which we concentrate on the new features appeared after previous publication (Lyubartsev & Laaksonen, 2000). The full reference manual of M.Dyna*Mix* software is available on-line at the distribution site (http://www.mmk.su.se/mdynamix).

#### **2.1 Preparation of molecular description files**

Before the start, a user must prepare, for each type of the molecules used in the simulations, a file with extension *.mmol* in which molecular topology and force field parameters are described. This includes list of atoms, their initial coordinates, Lennard-Jones parameters and charges, full list of covalent bonds, angles and torsion angles with the corresponding force field parameters.

MD simulations in design of orally administrated drugs discussing problems of developing force field models applicable both in physiological water phases and barriers made of biological cell membranes. We will also present theoretical models to couple simulation results with experiments such as NMR relaxation and also powerful schemes to visualize often very complex solvation structures and relate it to intermolecular interactions within the solvation sphere and to the undergoing molecular dynamics behind coordination and solvation. We consider also applications of M.Dyna*Mix* package to simulations of large biomolecular systems, such as DNA molecules in ionic solution and lipid membrane

In this section we describe briefly possibilities of M.Dyna*Mix* package and give a brief manual of its usage, from preparing of molecular structures to analysis of simulation results. The M.Dyna*Mix* MD simulation package has been developed in our laboratory in 90-ties (Lyubartsev & Laaksonen, 1998a, 2000). This is a scalable parallel general purpose MD software intended for simulations of arbitrary molecular mixtures. The program can employ most of commonly used force fields which are based on the potential energy function including electrostatic and Lennard-Jones interactions, covalent bond and angles, dihedral angles of various types as well as some other optional terms. The long-range electrostatic interactions are treated by the Ewald summation method (Allen & Tildesley, 1987). Various kinds of temperature and pressure control, including separate pressure control in different directions, are included. Two types of MD integrators are implemented: one is based on timereversible double time step algorithm by Tuckerman et al. (Tuckerman et al., 1992) which is used for simulations of flexible molecular models, and another is based on constraint dynamics with the use of "SHAKE" algorithm (Ryckaert et al., 1977), for simulations of rigid models. The program is written in standard Fortran-77 in a modular manner, the source code is well commented which make it suitable for users modifications. M.Dyna*Mix* version 5.2, released in 2010, includes an option for free energy calculation within the Expanded Ensemble method (Lyubartsev et al., 1992, 1998) which allows automated calculations of free energies and chemical potentials in single simulation run. The program is efficiently parallelized and scales well on up to a hundred processors or cores. The program contains utilities for preparation of molecular description files, as well as analysis suite to perform various calculations from the saved simulation trajectories. The flow chart of the program is schematically shown in **Figure 1**. In this chapter, we give a brief "quick start" introduction to the program, in which we concentrate on the new features appeared after previous publication (Lyubartsev & Laaksonen, 2000). The full reference manual of M.Dyna*Mix* software is

available on-line at the distribution site (http://www.mmk.su.se/mdynamix).

Before the start, a user must prepare, for each type of the molecules used in the simulations, a file with extension *.mmol* in which molecular topology and force field parameters are described. This includes list of atoms, their initial coordinates, Lennard-Jones parameters and charges, full list of covalent bonds, angles and torsion angles with the corresponding

**2.1 Preparation of molecular description files** 

force field parameters.

bilayers.

**2 The tools: M.Dyna***Mix* **software** 

Fig. 1. Flow chart of M.Dyna*Mix* simulation software

Even if *.mmol* files can be created with a simple text editor using available data on molecule structure and force field parameters, their preparation can be significantly simplified by the *makemol* utility. This utility creates a .mmol file from two sources: a "simple molecular structure" (*.smol*) file and "force field" (*.ff*) file containing force field parameters. A *.smol* file contains list of atoms with their initial coordinates, partial charges and force field atom types, and list of bonds. An example of *.smol* file for a methanol molecule is given in Figure **2**. A "force field" file contains force field parameters for a number of atom types (which can be common for different molecules). The *makemol* utility, using atom types specified in the *.smol* file, substitutes corresponding parameters from the force field file and creates the resulting *.mmol* file. It also generates automatically the list of covalent and torsion angles from the list of bonds (present in the *.smol* file) and substitutes parameters for them. The distribution of M.Dyna*Mix* contains examples of Amber94 and CHARMM27 force fields. It also contains utility *char2mdx* to transform general Charmm parameter file to M.Dyna*Mix* force field file.


Fig. 2. ".*smol*" file for a methanol molecule

There exists a number of open source molecular editors (for example, kalzium http://edu.kde.org/kalzium - included into Fedora Linux distribution, or Dundee ProDrg server - http://davapc1.bioch.dundee.ac.uk/prodrg/), that allow to draw a molecule with a mouse and generate a preliminary optimized molecular structure, as well as generate the list of bonds. Next, one needs to assign the atom types for each atom (the notations for atom types are different in different force fields), and specify partial atomic charges, which are usually not included into general force fields, but need to be calculated by a quantum-chemical program for each specific molecule. After the *.smol* file is created, one can use Makemol utility to create the structure-parameter *.mmol* file for the given type of molecules.

#### **2.2 Simulations**

88 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Even if *.mmol* files can be created with a simple text editor using available data on molecule structure and force field parameters, their preparation can be significantly simplified by the *makemol* utility. This utility creates a .mmol file from two sources: a "simple molecular structure" (*.smol*) file and "force field" (*.ff*) file containing force field parameters. A *.smol* file contains list of atoms with their initial coordinates, partial charges and force field atom types, and list of bonds. An example of *.smol* file for a methanol molecule is given in Figure **2**. A "force field" file contains force field parameters for a number of atom types (which can be common for different molecules). The *makemol* utility, using atom types specified in the *.smol* file, substitutes corresponding parameters from the force field file and creates the resulting *.mmol* file. It also generates automatically the list of covalent and torsion angles from the list of bonds (present in the *.smol* file) and substitutes parameters for them. The distribution of M.Dyna*Mix* contains examples of Amber94 and CHARMM27 force fields. It also contains utility *char2mdx* to transform general Charmm parameter file to M.Dyna*Mix* force field file.

There exists a number of open source molecular editors (for example, kalzium http://edu.kde.org/kalzium - included into Fedora Linux distribution, or Dundee ProDrg server - http://davapc1.bioch.dundee.ac.uk/prodrg/), that allow to draw a molecule with a mouse and generate a preliminary optimized molecular structure, as well as generate the list of bonds. Next, one needs to assign the atom types for each atom (the notations for atom types are different in different force fields), and specify partial atomic charges, which are usually not included into general force fields, but need to be calculated by a quantum-chemical program for each specific molecule. After the *.smol* file is created, one can use Makemol utility to create the structure-parameter *.mmol* file for the

Fig. 2. ".*smol*" file for a methanol molecule

given type of molecules.

Molecular dynamics simulation parameters are described in the main input file. From version 5.0, the format of the main input file was changed significantly compared with the previous publication of the program (Lyubartsev & Laaksonen, 2000). In the new format, all simulation parameters and features are specified in the form "Keyword value(s)", where keyword describes one specific parameter or property of the simulation (for example, time step, temperature, restart control, etc). For missing keywords, default values are used. A complete description of available keywords and their parameters is given in the manual at the M.Dyna*Mix* distribution site (http://www.mmk.su.se/mdynamix).

Finally, in order to start simulation, one needs to specify the start configuration of the simulated system. It is always possible to start from an already prepared (by various means) configuration written in a file in the xmol (xyz) format. This option may be preferable if one wish to simulate a specifically organized system, e.g. lipid bilayer or protein in a folded form. It is also possible to generate the start structure automatically. In this case, molecules are placed with random orientation and their centers of mass on FCC lattice. Clearly, in such case one can expect numerous overlappings of atoms belonging to different molecules. In order to deal with the problem, the following procedure can be used:


### **2.3 Post-simulation analysis**

The main result of a simulation is a set of trajectory files which contain configurations of the system (Cartesian coordinates and optionally velocities of all the atoms in the system) as a function of the simulation time. The trajectory analysis suite Tranal includes a number of utilities to compute a great variety of structural, thermodynamical and dynamical properties of the system, among which are radial and spatial distribution functions, various time correlation functions, diffusion, residence times, dielectric permittivity, order parameters and lateral diffusion for bilayer-like systems, etc. The suite includes the base module which can read trajectories of various formats (including those of other simulation packages), and specialized modules to compute specific properties.

#### **2.4 Expanded ensemble mode**

This special feature allows computations of solvation free energies by the expanded ensemble (EE) methodology (Lyubartsev et al., 1992, 1998). In the latest release (v.5.2), the EE method was implemented together with Wang-Landau algorithm for automatic optimization of weighting factors in expanded ensemble, which allows obtaining accurate free energies in a single MD run. A short description of the algorithm is given below.

The EE method implies a gradual insertion/deletion of the studied solute molecule into/from the solvent. The insertion parameter α is introduced which describes the degree of insertion of the solute, so that α = 1 describes the solute molecule fully interacting with the solvent while α = 0 represents the solute completely decoupled from the solvent (that is, case α = 0 describes pure solvent and the solute molecule in a gas phase of the same volume). The insertion parameter can accept a number of (fixed) values {αi}, i = 0,...,M in the range between 0 and 1. During the simulation, which can be carried out by either Monte Carlo (Lyubartsev et al., 1992) or molecular dynamics algorithm (Lyubartsev et al.,1994), attempts are made to change the insertion parameter to another (normally, neighboring) value, with acceptance probability:

$$P\left(i \to i \pm 1\right) = \exp\left(-\frac{\left(V^{S\ast}\left(a\_{i\pm 1}\right) - V^{S\ast}\left(a\_i\right)\right) + \left(\eta\_{i\pm 1} - \eta\_i\right)}{k\_B T}\right) \tag{1}$$

where *V*Ss(α) is the interaction energy of the solute particle with the solvent corresponding to the given insertion parameter α and ηi are so-called balancing (weighting) factors introduced with the purpose to make distribution over subensembles close enough to the uniform one (Lyubartsev et al., 1992). During the simulation, probabilities of different subensembles ρi are defined, and free energy difference between states with fully inserted and fully deleted solute (excess solvation free energy) is computed by:

$$F\_M - F\_0 = -k\_B T \ln \frac{\rho\_M}{\rho\_0} - \eta\_M + \eta\_0 \tag{2}$$

In previous works on computation of the solvation free energies (Lyubartsev et al., 1994, 2001), a linear scaling of solute-solvent interaction with α was used, namely VSs(α) = α VSs(1). Such a scheme has a shortage that the repulsive core of the Lennard-Jones potential decreases very slowly with decrease of α, which leads to necessity to consider very small α values. Now a new scheme is implemented in which interaction *VSs*(α) between the solute and solvent atoms (which is supposed to be a sum of the Lennard-Jones and electrostatic terms) is scaled according to the following:

$$V^{Ss}\left(a\right) \equiv a^4 V\_{LJ}^{Ss}\left(1\right) + a^2 V\_{el}^{Ss}\left(1\right) \tag{3}$$

where *VSsLJ* and *VSsel* are Lennard-Jones and electrostatic parts of the solute-solvent interaction respectively. The rationale between α4 scaling of the Lennard-Jones interaction is that the effective core radius of the LJ potential scales then as (α4)1/12= α1/3, and thus the effective volume of the core (approximately proportional to the free energy of cavity formation) scales linearly with α. Scaling α2 for the electrostatic interactions is chosen in order to switch off them faster than the repulsive LJ interaction, to minimize problems with hydrogen atoms of water molecules which do not have LJ potential. With scaling of the solute-solvent interaction described by (3), it is possible to choose αi points uniformly distributed in the range [0:1] so that a reasonable acceptance probability of the transitions is maintained through the whole range of subensembles.

EE method was implemented together with Wang-Landau algorithm for automatic optimization of weighting factors in expanded ensemble, which allows obtaining accurate free energies in a single MD run. A short description of the algorithm is given below.

The EE method implies a gradual insertion/deletion of the studied solute molecule into/from the solvent. The insertion parameter α is introduced which describes the degree of insertion of the solute, so that α = 1 describes the solute molecule fully interacting with the solvent while α = 0 represents the solute completely decoupled from the solvent (that is, case α = 0 describes pure solvent and the solute molecule in a gas phase of the same volume). The insertion parameter can accept a number of (fixed) values {αi}, i = 0,...,M in the range between 0 and 1. During the simulation, which can be carried out by either Monte Carlo (Lyubartsev et al., 1992) or molecular dynamics algorithm (Lyubartsev et al.,1994), attempts are made to change the insertion parameter to another (normally, neighboring)

> 1 1 1 exp *Ss Ss*

where *V*Ss(α) is the interaction energy of the solute particle with the solvent corresponding to the given insertion parameter α and ηi are so-called balancing (weighting) factors introduced with the purpose to make distribution over subensembles close enough to the uniform one (Lyubartsev et al., 1992). During the simulation, probabilities of different subensembles ρi are defined, and free energy difference between states with fully inserted

> 0 0 0

In previous works on computation of the solvation free energies (Lyubartsev et al., 1994, 2001), a linear scaling of solute-solvent interaction with α was used, namely VSs(α) = α VSs(1). Such a scheme has a shortage that the repulsive core of the Lennard-Jones potential decreases very slowly with decrease of α, which leads to necessity to consider very small α values. Now a new scheme is implemented in which interaction *VSs*(α) between the solute and solvent atoms (which is supposed to be a sum of the Lennard-Jones and electrostatic

where *VSsLJ* and *VSsel* are Lennard-Jones and electrostatic parts of the solute-solvent interaction respectively. The rationale between α4 scaling of the Lennard-Jones interaction is that the effective core radius of the LJ potential scales then as (α4)1/12= α1/3, and thus the effective volume of the core (approximately proportional to the free energy of cavity formation) scales linearly with α. Scaling α2 for the electrostatic interactions is chosen in order to switch off them faster than the repulsive LJ interaction, to minimize problems with hydrogen atoms of water molecules which do not have LJ potential. With scaling of the solute-solvent interaction described by (3), it is possible to choose αi points uniformly distributed in the range [0:1] so that a reasonable acceptance probability of the transitions is

 ln *<sup>M</sup> MB M <sup>ρ</sup> F F = kT <sup>η</sup> <sup>+</sup> <sup>η</sup> ρ*

and fully deleted solute (excess solvation free energy) is computed by:

*<sup>V</sup> <sup>α</sup> <sup>V</sup> <sup>α</sup> <sup>+</sup> η η P i i± =*

*i± i i± i*

1 1 *Ss 4 Ss 2 Ss V LJ el α = α V + α V* (3)

(1)

(2)

*B*

*k T*

value, with acceptance probability:

terms) is scaled according to the following:

maintained through the whole range of subensembles.

Another modification of the EE procedure implemented in the new version of M.Dyna*Mix* is the automatic choice of the balancing factors *ηi* by the Wang-Landau algorithm (Wang & Landau, 2001). In the application to the expanded ensemble technique the Wang-Landau algorithm can be formulated as follows. We start simulations with zero balancing factors. After visiting a subensemble *i*, a small increment *Δη* is added to the corresponding value of the balancing factor: *ηi(t+dt)* = *ηi(t)+Δη*, which decreases the probability to go to already visited states and favors to attaining a uniform distribution. After a certain number of steps (a sweep), when the system visited all subensembles and passed several times (default 2) between the end points, the value of the increment *Δη* is decreased twice. After a certain number of sweeps (usually 10-12), the value of the increment is becoming very low, and simultaneously the profile of balancing factors is tuned in a way providing uniform walking in the space of subensembles. After that the equilibration stage ends, and production run with fixed balancing factors is made yielding the solvation free energies.

### **3. Examples of M.Dyna***Mix* **applications**

In this chapter we concentrate only on some selected types of MD simulations which are just very few of many M.Dyna*Mix* has been used in the past.

### **3.1 Solubility and permeability**

Solubility (or lack of it), in particular in water, but also in other solvents and solvent mixtures is among the most important problems in the development of new drug molecules today, in particular, for poorly water-soluble, orally bioavailable drugs, with a controllable release rate. Computer modelling and simulations (together with combinatorial chemistry, high-throughput screening and genomics/proteomics) have long been key tools in modern drug design but has not been utilized much in solubility studies. After drug molecules are dissolved in the beginning of the administration they start their journey in living organisms to find the target. During this process they are adsorbed and distributed and doing so they need penetrate numerous obstacles, including the blood-brain barrier. To study the molecular details solubility and permeability during this transport, MD simulations together with free energy calculations are highly useful (Lyubartsev et al., 1998c, 2001). Solubility of drug molecules and their permeability across lipid membranes have been studied using M.DynaMix by calculating log*P* partition coefficients from the solvation free energies (Lyubartsev et al. 1999; Åberg et al. 2004).

#### **3.2 Solvation in mixed solvents**

Many substances turn out to be readily soluble in solvent mixtures (mixed solvents) but not in the pure components and vice versa (Reichardt, 1988). Selective solvation of ions from salts in mixed solvents is another, early discovered, property. Both IR measurements and computer simulations have confirmed the existence of micro-heterogeneities and conglomerate water structures at higher concentrations of the more organic co-solvent. Many amphiphilic and biomolecular systems are better dissolved in mixtures, as a result of preferential solvation. This Section will highlight the advantage of using MD simulation to study preferential and selective solvation to obtain insight into the complex molecular processes behind them and to give a picture and rules in the delicate balance of interactions between the different molecular species (Bergman & Laaksonen, 1998). We will first use a small disaccharide as a solute in discussing the preferential solvation. However, before discussing the solvation in mixed solvents we highlight a number of key features in hydration and solvation of the disaccharide in several pure solvents.

An important question in studying carbohydrates is the role of the solvent, most often water, and in particular its influence on the structure. Several aspects, such as conformational states, rotation barriers and intra/intermolecular hydrogen bonds, flexibility/rigidity, internal dynamics, hydrophilicity/phobicity, as well as electrostatic interactions are important in studying and describing the full three-dimensional structure of this category of molecules. Even small differences in the molecular geometry or conformation can easily affect the hydration around these inherently flexible molecules. Radial distribution functions (RDF) as well angular distribution functions (ADF) are the most commonly used tools to study solvent structure and conformational populations in computer simulations. Carbohydrates typically have a large number of hydroxyl groups ("half water molecules") which easily engage themselves in the hydrogen bond network of water around them leading to very distinct and sometimes rather spectacular solvation structures.

We have studied a small prototypic carbohydrate molecule *α*-D-Manp-(1*→*3)-*β*-D-Glcp-OMe containing both a mannose and a glucose ring connected with a glycosidic bridge, by carrying out molecular dynamics simulations in water, methanol, dimethylsulfoxide (DMSO) and also in the mixture of water and DMSO as the solvents (Vishnyakov et al., 1999, 2000a, 2000b). The mixed solvent allows experimentally low temperature studies at a molar ratio of 3:1 of water and DMSO making the solution more viscous. We use the same concentration in our MD simulations. We will simply call the molecule as "disaccharide" in the text below. Conformations of our disaccharide are well described using the two torsional angles across its glycosidic linkage, namely φ and ψ (**Figure 3**).

Fig. 3. Molecule *α*-D-Manp-(1*→*3)-*β*-D-Glcp-OMe containing both a mannose and a glucose ring connected with a glycosidic bridge.

Our simulations exhibit clearly two main minima (potential wells) on the adiabatic energy surface, one close to *g*- (-60°, we call it "well A") and the other one close to *g*+ (+60°, we call it "well B") with respect to the φ torsional angle. All simulations are carried out using the

between the different molecular species (Bergman & Laaksonen, 1998). We will first use a small disaccharide as a solute in discussing the preferential solvation. However, before discussing the solvation in mixed solvents we highlight a number of key features in

An important question in studying carbohydrates is the role of the solvent, most often water, and in particular its influence on the structure. Several aspects, such as conformational states, rotation barriers and intra/intermolecular hydrogen bonds, flexibility/rigidity, internal dynamics, hydrophilicity/phobicity, as well as electrostatic interactions are important in studying and describing the full three-dimensional structure of this category of molecules. Even small differences in the molecular geometry or conformation can easily affect the hydration around these inherently flexible molecules. Radial distribution functions (RDF) as well angular distribution functions (ADF) are the most commonly used tools to study solvent structure and conformational populations in computer simulations. Carbohydrates typically have a large number of hydroxyl groups ("half water molecules") which easily engage themselves in the hydrogen bond network of water around them leading to very distinct and sometimes rather spectacular solvation

We have studied a small prototypic carbohydrate molecule *α*-D-Manp-(1*→*3)-*β*-D-Glcp-OMe containing both a mannose and a glucose ring connected with a glycosidic bridge, by carrying out molecular dynamics simulations in water, methanol, dimethylsulfoxide (DMSO) and also in the mixture of water and DMSO as the solvents (Vishnyakov et al., 1999, 2000a, 2000b). The mixed solvent allows experimentally low temperature studies at a molar ratio of 3:1 of water and DMSO making the solution more viscous. We use the same concentration in our MD simulations. We will simply call the molecule as "disaccharide" in the text below. Conformations of our disaccharide are well described using the two torsional

Fig. 3. Molecule *α*-D-Manp-(1*→*3)-*β*-D-Glcp-OMe containing both a mannose and a glucose

Our simulations exhibit clearly two main minima (potential wells) on the adiabatic energy surface, one close to *g*- (-60°, we call it "well A") and the other one close to *g*+ (+60°, we call it "well B") with respect to the φ torsional angle. All simulations are carried out using the

hydration and solvation of the disaccharide in several pure solvents.

angles across its glycosidic linkage, namely φ and ψ (**Figure 3**).

ring connected with a glycosidic bridge.

structures.

M.Dyna*Mix* package (Lyubartsev & Laaksonen, 2000). The solvation structure is presented using spatial distribution functions (SDF) (Kusalik et al., 1999) visualized with the gOpenMol software (Laaksonen 1992; Bergman et al. 1997) http://www.csc.fi/english/pages/ g0penMol/tutorials. In all SDFs below the iso-density threshold is three times the bulk density in the simulations except when stated otherwise.

*In water solution* the disaccharide shows a clear preference for the well A. The hydration is dominated by the first-shell structuring in the solvent. The large number of hydrophilic centers in carbohydrates imposes a strong anisotropic structuring on the surrounding solvent. The hydroxyl groups were found to be extensively hydrogen-bonded to surrounding water molecules (**Figure 4**). For an optimal hydration both the solute and the solvent should have a compatible topology for the hydration requirements of the involved functional groups. The disaccharide molecule is found to be fairly rigid when dissolved in water showing no rotation around the glycosidic linkage.

Fig. 4. Hydration structure around *α*-D-Manp-(1*→*3)-*β*-D-Glcp-OMe showing a belt-like pattern.

*In methanol solution* (**Figure 5**), the solvation structure, as seen in the SDFs, reflects the ability of methanol to both donate and accept hydrogen bonds. Besides, having in principle the same possibility to form hydrogen bond combinations as water, methanols can even hydrogen bond to the non-hydroxyl oxygens. The methyl groups in methanol restrict the formation of similar solute–solvent hydrogen bonding pattern as in water. However, the hydrogen bonding can be easily seen in the SDFs; e.g. the donation of hydroxyl protons in methanol around O2*m* (upper left), O6*m* (middle right), and O4*g* (lower left). Note, *m* denotes the mannose ring and *g* denotes the glucose ring, while the ordering of the oxygens follows the standard rules. The solvation structure of methanol around the disaccharide is found less extended than in water, but rather more like an average between that of water and that of DMSO presented in the next paragraph.

*In DMSO solution* the most probable hydrogen-bonding sites for the DMSO oxygens around the disaccharide are depicted in **Figure 6**. Distinct features showing hydrogen bonding to the hydroxyl groups are present. Compared with water, where the corresponding SDFs show distinct belt-like shapes around the entire saccharide, DMSO behaves differently. While four possible combinations of hydrogen bonds between the hydroxyl groups and water are possible, DMSO can only be engaged in one type of H bond as it cannot donate hydrogen. However, distinct regions are observed for HO4*g* (middle left in **Figure 4**), HO2*m*  (upper right), and HO2*g* (lower right). Furthermore, the bigger size and much heavier mass slow down the mobility of DMSO and contribute to a more distinct type structure.

Fig. 5. Solvation structure around *α*-D-Manp-(1*→*3)-*β*-D-Glcp-OMe dissolved in methanol. White – hydrogen, Red – oxygen. For details see the text.

Fig. 6. Solvation structure around *α*-D-Manp-(1*→*3)-*β*-D-Glcp-OMe dissolved in DMSO. Yellow – sulphur, Red – oxygen. For details see the text.

*In Water-DMSO* mixture (**Figure 7**) we can see that the saccharide is selectively solvated. By combining the information obtained from using the traditional RDFs and the threedimensional SDFs, we can see the different parts of disaccharide preferentially solvated by either water or DMSO. We can also see that in some parts there is a clear competition between the two components to different hydroxyl groups. For example, DMSO acts as a hydrogen bond acceptor and solvates predominantly the regions close to HO2g and HO2m (see Figure). Water, acting as a hydrogen bond donor, hydrates selectively the regions around O6m and O2g. The SDFs provide a very detailed picture of solvation. Different SDF representations and techniques to analyze solvation graphically are given by Bergman and co-workers. An interesting detail is that when DMSO is acting as a hydrogen bond acceptor to the HO2g hydroxyl group, it can also simultaneously act as an acceptor a specific water molecule hydrogen bonding to O6m. This leads to a hydrogen bonded complex which involves two hydroxyl groups of the disaccharide and one water molecule and DMSO molecule.

We have studied preferential solvation of phenol in equimolar water/acetonitrile and water/ethanol mixtures (Dahlberg & Laaksonen, 2006). In this work we introduce a special "difference spatial distribution function" (ΔSDF) to illustrate the excess densities over the cosolvents in preferential solvation locations around the phenol molecule in these two solvent mixtures. This work was inspired by intermolecular 1H NOESY experiments carried out by Bagno (Bagno, 2002) nicely showing preferential solvation around organic molecules. Simulations and experiments agree well with each other but reveal also some differences. In fact, this is just another example where NMR and MD simulation techniques can be successfully combined (Odelius & Laaksonen, 1999).

Fig. 7. Preferential solvation structure around *α*-D-Manp-(1*→*3)-*β*-D-Glcp-OMe dissolved in 3:1 water-DMSO mixture. Purple – oxygen in DMSO, Green – oxygen in water. For details see the text.

### **3.2.1 Mixed solvents**

94 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Fig. 5. Solvation structure around *α*-D-Manp-(1*→*3)-*β*-D-Glcp-OMe dissolved in methanol.

Fig. 6. Solvation structure around *α*-D-Manp-(1*→*3)-*β*-D-Glcp-OMe dissolved in DMSO.

hydroxyl groups of the disaccharide and one water molecule and DMSO molecule.

We have studied preferential solvation of phenol in equimolar water/acetonitrile and water/ethanol mixtures (Dahlberg & Laaksonen, 2006). In this work we introduce a special "difference spatial distribution function" (ΔSDF) to illustrate the excess densities over the cosolvents in preferential solvation locations around the phenol molecule in these two solvent mixtures. This work was inspired by intermolecular 1H NOESY experiments carried out by Bagno (Bagno, 2002) nicely showing preferential solvation around organic molecules.

*In Water-DMSO* mixture (**Figure 7**) we can see that the saccharide is selectively solvated. By combining the information obtained from using the traditional RDFs and the threedimensional SDFs, we can see the different parts of disaccharide preferentially solvated by either water or DMSO. We can also see that in some parts there is a clear competition between the two components to different hydroxyl groups. For example, DMSO acts as a hydrogen bond acceptor and solvates predominantly the regions close to HO2g and HO2m (see Figure). Water, acting as a hydrogen bond donor, hydrates selectively the regions around O6m and O2g. The SDFs provide a very detailed picture of solvation. Different SDF representations and techniques to analyze solvation graphically are given by Bergman and co-workers. An interesting detail is that when DMSO is acting as a hydrogen bond acceptor to the HO2g hydroxyl group, it can also simultaneously act as an acceptor a specific water molecule hydrogen bonding to O6m. This leads to a hydrogen bonded complex which involves two

White – hydrogen, Red – oxygen. For details see the text.

Yellow – sulphur, Red – oxygen. For details see the text.

The Water-DMSO system as such is a commonly used mixed solvent. Other examples of such solvents are the binary mixtures of water, alcohols and acetonitrile, including both aqueous and nonaqueous mixtures of them. Mixed solvents are important for example in chemical industry. We have studied the DMSO-water binary system which is known to exhibit strongly non-ideal behavior as a high positive heat of mixing and negative excess mixing volume. It was suggested earlier that two water molecules first hydrogen-bond simultaneously to the sulfoxide oxygen with a third water molecule found hydrogen bonding and bridging the two water molecules, in this way forming a 3:1 water-DMSO complex. It should be straightforward to study if this is the case using computer simulations. For this purpose we introduced a "multi-particle SDF" in which the local coordinate system was defined based on the three oxygens; the sulfoxide oxygen and the two oxygens in water molecules involved in hydrogen bonding to the sulfoxide oxygen (Vishnyakov et al. 2001). This was possible because of the long enough residence times for these particular hydrogen bonds. It is easy to see from the MP-SDF (**Figure 8**) where the second shell water molecules to the sulfoxide group are found. It becomes clear from the simulations that no third water molecule hydrogen bonding to the first shell water molecules was found.

Fig. 8. Multi-particle spatial distribution function showing first and second hydration shell around DMSO. For details see the text.

### **3.2.2 Hydration of Ni2+**

MD simulations of a Ni2+ ion in water have been carried out to study the hydration structure around nickel (Egorov et al., 2006). The analysis is extended to the second hydration shell around the divalent ion. The nickel aqua-complex has been treated without any constraints in order to analyze the structure and dynamics, as well as molecular mechanisms of water substitutions and exchange in the first and second shells around the cation. The simulations show that the structure of [Ni(H2O)6]2+ complex is very stable (**Figure 9**). The main molecular mechanisms contributing to reorientational motion of the whole complex are the "pushes and kicks" given by the second hydration shell water oxygens. The rotations of water molecules themselves in the second shell are only changing the hydrogen positions and do not affect the reorientation of the complex. In spite of frequent exchange of water molecules between the bulk and the second hydration shell the overall dynamical behavior in the first and second shell is very similar.

Fig. 9. Structure of [Ni(H2O)6]2+ complex showing also the second hydration shell around the complex. Green – oxygen in water, Red – hydrogen in water For details see the text.

#### **3.3 Hydration and coordination of counterions around DNA**

Nucleic acids are highly negatively charged polyelectrolytes, displaying a considerable sensitivity to their ionic surroundings while undergoing different structural transitions and interacting with other charged species in their surroundings. Since early studies of DNA structure it has been shown that a variation in the counter ion nature or concentration, and in the water content, lead to different conformational properties; these properties depend also on the base pair composition and sequence (Saenger, 1984; Leslie et al., 1980).

Computer simulations and MD in particular are a very powerful technique to obtain detailed information about processes behind hydration and counterion coordination, including the dynamical behaviour of cations around the charged surface of DNA. However the highly charged nature of DNA has lead to a later application of MD simulation compared to that of proteins. In fact, one of the main problems in the simulation of these polyelectrolytes was the treatment of the long range electrostatic interactions, which has been solved with the use of the Ewald summation methods (Allen & Tildesley, 1987; Laaksonen et al., 1989). Proper treatment of these interactions, together with the developments of Force Fields designed for explicit solvent models (Cornell et al., 1995; MacKerell et al., 1995; MacKerell, 2004) and advances in computer power have lead to large

MD simulations of a Ni2+ ion in water have been carried out to study the hydration structure around nickel (Egorov et al., 2006). The analysis is extended to the second hydration shell around the divalent ion. The nickel aqua-complex has been treated without any constraints in order to analyze the structure and dynamics, as well as molecular mechanisms of water substitutions and exchange in the first and second shells around the cation. The simulations show that the structure of [Ni(H2O)6]2+ complex is very stable (**Figure 9**). The main molecular mechanisms contributing to reorientational motion of the whole complex are the "pushes and kicks" given by the second hydration shell water oxygens. The rotations of water molecules themselves in the second shell are only changing the hydrogen positions and do not affect the reorientation of the complex. In spite of frequent exchange of water molecules between the bulk and the second hydration shell the overall dynamical behavior

Fig. 9. Structure of [Ni(H2O)6]2+ complex showing also the second hydration shell around the complex. Green – oxygen in water, Red – hydrogen in water For details see the text.

Nucleic acids are highly negatively charged polyelectrolytes, displaying a considerable sensitivity to their ionic surroundings while undergoing different structural transitions and interacting with other charged species in their surroundings. Since early studies of DNA structure it has been shown that a variation in the counter ion nature or concentration, and in the water content, lead to different conformational properties; these properties depend

Computer simulations and MD in particular are a very powerful technique to obtain detailed information about processes behind hydration and counterion coordination, including the dynamical behaviour of cations around the charged surface of DNA. However the highly charged nature of DNA has lead to a later application of MD simulation compared to that of proteins. In fact, one of the main problems in the simulation of these polyelectrolytes was the treatment of the long range electrostatic interactions, which has been solved with the use of the Ewald summation methods (Allen & Tildesley, 1987; Laaksonen et al., 1989). Proper treatment of these interactions, together with the developments of Force Fields designed for explicit solvent models (Cornell et al., 1995; MacKerell et al., 1995; MacKerell, 2004) and advances in computer power have lead to large

also on the base pair composition and sequence (Saenger, 1984; Leslie et al., 1980).

**3.3 Hydration and coordination of counterions around DNA** 

**3.2.2 Hydration of Ni2+**

in the first and second shell is very similar.

use of molecular dynamic simulations in the study of nucleic acids. Here we focus on the modeling, at different levels of sophistication, of nucleic acids in physiological environments

with different species of counterion, either mono and multivalent ((Lyubartsev & Laaksonen, 1998b; van Dam et al., 1998; Mocci et al, 2004; Bunta et al., 2007; Korolev et al., 2001, 2002, 2004a, 2004b) and we give some examples of how the MD trajectories can be analyzed and the results checked against experimental data using M.Dyna*Mix* software.

### **3.3.1 Interactions with monovalent counterions**

Until almost the end of the last century it was believed that monovalent counterions did not show any "sign of localization or dehydration due to interactions with double-stranded (ds) nucleic acids" (Anderson et al., 1995) and that the interactions were not affected by the base pair sequence. Since that time, many investigations have lead to a revision of this view, and it is now believed that monovalent cations binds to DNA partially losing the solvation shell, often in a sequence specific way which also depends on the nature of the cation.

MD simulations have been playing a central role in revising this view, and have been of great help in verifying the role of these interactions on the conformational properties of DNA and in understanding the differences in the coordination of counterions around nucleic acids.

As examples of this concept we discuss some of our studies focused on DNA interactions with alkali cations. In order to understand the differences in the behaviour of cations of different size, we performed MD simulations on DNA solutions containing either the physiologically relevant DNA counter ion Na+, and the smallest and the largest among the alkali ions: Li+ and Cs+ (Lyubartsev & Laaksonen, 1998b; van Dam et al., 1998). The simulations were performed mimicking an infinite array of in parallel ordered DNA duplexes, by placing a DNA decamer [d(ATGCAGTCAG)2] along the Z direction of the simulation box and applying periodical boundary conditions.

Important information on ion-DNA interactions can be obtained inspecting the radial and spatial distribution function (RDF and SDF). The RDF gives the probability of finding a counterion type at a distance r from selected DNA atoms, or from DNA axis, relative to the probability expected from a random distribution at the same density. The SDF is the three dimensional distribution of a given atom species in a local coordinate system fixed on some (portion of a) molecule. In the SDF calculated using Tranal utility of M.Dyna*Mix* the value of the function in each point corresponds to the probability of finding there an atom compared to the bulk. Representation of a four dimensional function in two or three dimensions can be done in different ways (Bergman et al., 1998; Kusalik et al. 1999); one of the most used to visualize solvation structure and ion binding modes is the iso-intensity representation, where a surface links the points where the SDF function has a chosen value. RDF calculated between the ions and selected atoms either in the minor or major groove, or in the backbone, are able to show at a glance important differences in the behaviour of the counter ions. For example in **Figure 10** it is shown the RDF between counterions and P atoms.

The smallest ion has a remarkable higher probability to be directly bound to the phosphates compared to the other counterions. On the other hand the RDFs calculated between the counterions and selected atoms in the minor and major grooves, clearly indicate that the smallest ion do not penetrate in the grooves, while the largest has a remarkable preference for the minor groove. Analysis of the SDFs of the water and of the counterions around the binding site is very helpful for understanding these differences in the interactions and to understand the conformational preference of DNA in presence of different counter-ions. In the case of Lithium the SDFs of the water atoms and of the cation around the phosphates groups explained the higher preferences of Li+ for this site compared to other alkaline ions. In fact the average organization of the water molecules around the phosphates' oxygen atoms contains a hole whose dimensions is perfect for the small Li+ ions to fit in, keeping a tetrahedral hydration shell, as shown by the SDF in **Figure 11**. The larger alkali ions cannot bind in the same position without largely perturbing this solvation structure.

Fig. 10. RDF-s between counter-ions and P atom. Reproduced with permission from J. Biomol. Struct. Dyn., 16(3), 579-592 (1998). Copyright Adenine press (1998)

Fig. 11. SDFs of water's oxygen (violet), and hydrogen (cyan) around DNA around the DNA phosphate group. SDFs are drawn at the level 5. Reproduced with permission from J. Biomol. Struct. Dyn., 16(3), 579-592 (1998). Copyright Adenine press (1998)

The observed difference in the binding modes offered also an important key to the explanation of why Li+ counterions promote the transitions of DNA from the B conformation to the C-form. In absence of experimental data that could confirm in a direct way the binding preferences observed in the simulations, comparison with experimental data on the diffusion of studied ions in presence of DNA gave an important validation on the used potential model.

for the minor groove. Analysis of the SDFs of the water and of the counterions around the binding site is very helpful for understanding these differences in the interactions and to understand the conformational preference of DNA in presence of different counter-ions. In the case of Lithium the SDFs of the water atoms and of the cation around the phosphates groups explained the higher preferences of Li+ for this site compared to other alkaline ions. In fact the average organization of the water molecules around the phosphates' oxygen atoms contains a hole whose dimensions is perfect for the small Li+ ions to fit in, keeping a tetrahedral hydration shell, as shown by the SDF in **Figure 11**. The larger alkali ions cannot

bind in the same position without largely perturbing this solvation structure.

Fig. 10. RDF-s between counter-ions and P atom. Reproduced with permission from J.

Fig. 11. SDFs of water's oxygen (violet), and hydrogen (cyan) around DNA around the DNA

The observed difference in the binding modes offered also an important key to the explanation of why Li+ counterions promote the transitions of DNA from the B conformation to the C-form. In absence of experimental data that could confirm in a direct way the binding preferences observed in the simulations, comparison with experimental data on the diffusion of studied ions in presence of DNA gave an important validation on

phosphate group. SDFs are drawn at the level 5. Reproduced with permission from J. Biomol. Struct. Dyn., 16(3), 579-592 (1998). Copyright Adenine press (1998)

the used potential model.

Biomol. Struct. Dyn., 16(3), 579-592 (1998). Copyright Adenine press (1998)

In the field of DNA – monovalent counterions, in the last 15 years a large attention has been devoted to the dependence of the binding mode of alkaline counterions on the sequence of DNA base pairs. Among the most studied DNA sequences are the so called A-tracts: uninterrupted sequences of at least 4 AT base pairs without 5'TpA3' steps. The interest on these sequences is motivated by their peculiar structural features: when inserted in phase with the helix turn A-tracts induce the macroscopic bending of DNA, they are characterized by a narrow minor groove which is known to possess a very ordered hydration structure, called the hydration spine. It has been proposed that sequence specific interactions of monovalent counterions in the minor groove of A-tracts were responsible of these structural features (McFail-Isom et al., 1999). In fact an electrostatic collapse of the A-tract around ions substituting one or more water molecules of the hydration spine in the minor groove of Atract could induce either the bending, through an electrostatic collapse of the structure around the ion, and the narrowing of minor groove. Unfortunately experimental investigations on the sequence specificity of the binding to macromolecule of ions characterized by high mobility, like Na+ or K+, give results whose analysis are not straightforward and leave space to multiple interpretations.

Interesting experimental results had been obtained by NMR dispersion experiments (Denisov & Halle, 2000), measuring the NMR relaxation times of Na-23 in A-T rich sequences solutions at several magnetic fields; therefore we modified the M.Dyna*Mix* code to calculate the NMR relaxation time of this ion. It is important to note that even if the observed relaxation times are in the second time scale, the motion responsible of such decay are in the subpicosecond to the nanosecond time scale, thus perfectly accessible to MD simulations of DNA oligomers in aqueous solution (Mocci et al, 2004; Odelius & Laaksonen, 1999). The coupling of these techniques was used for studying the interactions between Na+ and a DNA oligomer containing either A-tracts and non A-tract regions, the sequence [d(CTTTTAAAAG)2]. In agreement with previous simulations on A-T rich oligomers (Mocci & Saba, 2003) we observed direct binding with long residence time only in the minor groove of one of the A-tracts. More specifically, a sodium ion, partially losing its hydration waters, intruded in the minor groove substituting one water molecule of the hydration spine (see **Figure 12**), and remained therein stuck for almost the entire simulation.

Fig. 12. SDFs of Na+ (yellow), water's oxygen (violet), and hydrogen (cyan) around DNA calculated during the binding to adenine N3 and thymine O2 in the minor groove (ca. 7 ns). The intensity levels are 10 for H and O of water, 1000 for Na+. Reproduced with permission from J. Phys. Chem. B 108, 16295-16302 (2004). Copyright 2004 American Chemical Society

Combination of the MD results with the quadrupolar relaxation experimental data offered a very good way to verify whether the high occupancy of that binding site, observed in the simulation, was reliable. In fact, the combined approach revealed that the occupancy of the binding sites in the minor groove of uninterrupted adenine sequences should be actually pretty low, indicating that longer samplings would be required for proper evaluation of the occupancy of those binding site.

The MD-NMR combination allowed also obtaining information at the microscopic level on the NMR relaxation processes, showing how the polyion affects the NMR parameters of Na-23. Importantly it indicated that the speed-up of the Na-23 relaxation rate, compared to simple electrolyte solution, is mainly due to the contribution of ions directly bound to the oligomer's surface, while for a long while it has been thought that no sign of dehydration due to the interaction with DNA was exhibited by monovalent counterions (Anderson et al., 1995).

The MD studies discussed above revealed that the preferential binding modes of each cation at the DNA surface are dependent on the hydration structure of DNA and of the cations; such binding modes could have not been investigated without explicit inclusion of the solvent in the simulations. Unfortunately explicit modelling of the solvent is the most computationally demanding part of the calculations and to extend the MD simulations to spatial dimension beyond those of oligomeric DNA fragments it is necessary to use simplified models for the solvent. A method to keep a realistic picture of the DNA-cations specific interactions while simplifying the description of the solvent is based on the use of effective potentials for the interactions between solutes (Lyubartsev & Laaksonen, 1995, 1999). This method is based on reconstruction of effective, solvent-mediated interaction potentials between DNA atoms or group of atoms and the cations from MD simulation with the explicit solvent. More in detail the construction of effective potentials start from the RDF between the solute molecules calculated from a fully atomistic MD simulation with explicit solvent; by means of a reverse Monte Carlo procedure (Lyubartsev & Laaksonen, 1995) the RDF functions are used to construct a set of effective interaction potentials that, when used to run a MD simulation without the solvent should reproduce the same solute-solute RDF. This approach allows obtaining potentials able to maintain the ion specific information also when using an implicit model of the solvent.

#### **3.3.2 Interactions with multivalent counterions**

The difficulties in experimentally determining the favourite binding modes with DNA are not limited to monovalent counterions; in facts, also the study of interactions with multivalent molecular cations as polyamines (PA) are affected by the same problem. Polyamines, which are positively charged organic compound having two or more primary amino groups, are present in the living cells and interact with cellular polyelectrolytes like DNA. Despite the fact that PA, especially spermine4+, are largely used as DNA crystallization agents, only a very limited number of crystallographic studies report their presence in the crystallographic cell. Simulations have been extremely useful in understanding why these components are "invisible" in X-ray studies. MD simulations performed mimicking an infinite array of parallel BDNAs (Korolev et al., 2001, 2002, 2004a, 2004b) in presence of either the natural polyamines spermine4+ (H3N+-(CH2)3-NH2+-(CH2)4- NH2+-(CH2)3-NH3+), spermidine3+ (H3N+-(CH2)3-NH2 +-(CH2)4-NH3 +), putrescine2+ (NH3 +- (CH2)4-NH3 +) or the synthetic polyamine diaminopropane2+ (NH3+-(CH2)3-NH3 +) revealed that these highly charged cations interact strongly with different groups on DNA. However all of the polyamines adopt disparate binding modes that make their detection pretty difficult with x-ray diffraction. The simulations also showed that PA and Na+ have a

pretty low, indicating that longer samplings would be required for proper evaluation of the

The MD-NMR combination allowed also obtaining information at the microscopic level on the NMR relaxation processes, showing how the polyion affects the NMR parameters of Na-23. Importantly it indicated that the speed-up of the Na-23 relaxation rate, compared to simple electrolyte solution, is mainly due to the contribution of ions directly bound to the oligomer's surface, while for a long while it has been thought that no sign of dehydration due to the interaction with DNA was exhibited by monovalent counterions (Anderson et

The MD studies discussed above revealed that the preferential binding modes of each cation at the DNA surface are dependent on the hydration structure of DNA and of the cations; such binding modes could have not been investigated without explicit inclusion of the solvent in the simulations. Unfortunately explicit modelling of the solvent is the most computationally demanding part of the calculations and to extend the MD simulations to spatial dimension beyond those of oligomeric DNA fragments it is necessary to use simplified models for the solvent. A method to keep a realistic picture of the DNA-cations specific interactions while simplifying the description of the solvent is based on the use of effective potentials for the interactions between solutes (Lyubartsev & Laaksonen, 1995, 1999). This method is based on reconstruction of effective, solvent-mediated interaction potentials between DNA atoms or group of atoms and the cations from MD simulation with the explicit solvent. More in detail the construction of effective potentials start from the RDF between the solute molecules calculated from a fully atomistic MD simulation with explicit solvent; by means of a reverse Monte Carlo procedure (Lyubartsev & Laaksonen, 1995) the RDF functions are used to construct a set of effective interaction potentials that, when used to run a MD simulation without the solvent should reproduce the same solute-solute RDF. This approach allows obtaining potentials able to maintain the ion specific information also

The difficulties in experimentally determining the favourite binding modes with DNA are not limited to monovalent counterions; in facts, also the study of interactions with multivalent molecular cations as polyamines (PA) are affected by the same problem. Polyamines, which are positively charged organic compound having two or more primary amino groups, are present in the living cells and interact with cellular polyelectrolytes like DNA. Despite the fact that PA, especially spermine4+, are largely used as DNA crystallization agents, only a very limited number of crystallographic studies report their presence in the crystallographic cell. Simulations have been extremely useful in understanding why these components are "invisible" in X-ray studies. MD simulations performed mimicking an infinite array of parallel BDNAs (Korolev et al., 2001, 2002, 2004a, 2004b) in presence of either the natural polyamines spermine4+ (H3N+-(CH2)3-NH2+-(CH2)4- NH2+-(CH2)3-NH3+), spermidine3+ (H3N+-(CH2)3-NH2+-(CH2)4-NH3+), putrescine2+ (NH3

(CH2)4-NH3+) or the synthetic polyamine diaminopropane2+ (NH3+-(CH2)3-NH3+) revealed that these highly charged cations interact strongly with different groups on DNA. However all of the polyamines adopt disparate binding modes that make their detection pretty difficult with x-ray diffraction. The simulations also showed that PA and Na+ have a

+-

occupancy of those binding site.

when using an implicit model of the solvent.

**3.3.2 Interactions with multivalent counterions** 

al., 1995).

different impact on DNA hydration and structure: while the Sodium cation attracts and organizes the water molecules around DNA, the polyamine pushes water away from the minor groove and induce a significant narrowing of the same. Differences in the binding preferences are observed also among the PA: while a small fraction of divalent polyamines can be found in the major groove, the other two PA are nearly absent. Furthermore differences in the binding modes where observed also between the synthetic and naturally occurring divalent PA, giving some hints on why nature selected only one of the two.

### **3.3.3 Molecular dynamics of lipid bilayers**

In this section we describe application of M.Dyna*Mix* package for simulation of lipid bilayers. Lipid bilayers represent a framework of biological membranes, which are very complex heterogeneous systems consisting of many different types of lipids, sterols, proteins, carbohydrates and various membrane associated molecules which are involved in a variety of cellular processes. Biomembranes surround cells: a membrane separates the interior of a cell from the outside environment. Being selectively permeable, membranes participate in control of the movement of various compounds (substances) into and out of cells. Permeability of drug molecules through membrane is one of the key properties defining efficacy of the drug, since drug molecules have to penetrate numerous membrane barriers in order to reach their targets. Studies of drug solubility in both aqueous and lipophilic environment are thus important for understanding drug transport in living organisms.

Fig. 13. Dipalmitoylphosphatidylcholine (DMPC) lipid

Lipid molecules constituting biomembranes differ with respect to the type of hydrophilic head-group and occur with a wide variety of hydrophobic hydrocarbon chains. Usually the most abundant phospholipid in animal and plants is phosphatidylcholine which is the key building block of membrane bilayers. An example of such lipid, dipalmitoylphosphatidylcholine (DMPC) is shown in **Figure 13**. It is not surprising that lipid membrane bilayers composed of various phosphatidylcholine lipids have been extensively studied by molecular dynamics as soon as computer hardware allowed to do such simulations. However, obtaining of reliable information on physical-chemical properties of lipid bilayers was, in many studies of the last decades, seriously limited by two factors: 1) necessity to simulate a large (of the order of few tens thousand) number of atoms during long (hundreds of nanosecond) simulation time and 2) not satisfactory reliability of the commonly available force fields to describe behaviour of lipid bilayers consistently, with respect to variation of lipid chemical structure, composition, thermodynamic conditions (Lyubartsev & Rabinovich, 2011). While solution of the first problem is determined by advances in the development of computer hardware, the issue of proper parametrization of the force field requires extensive computational work including numerous test simulation of the bilayer systems and calculations of different experimentally measured properties.

In the last several years we have used M.Dyna*Mix* package in order to improve CHARMM force field (Feller & MacKerell, 2000) which is one of the frequently used in biomolecular simulations. While been fully atomistic, this force field have potential advantages in comparison with the Gromos force field (Berger et al., 1997), which is based on the united atom model. However, recent detailed investigations have shown that the CHARMM force field has not-negligible disagreement with experiment in description of number important properties of lipid bilayers. For example, the CHARMM force field favour to rigid gel-like structures of bilayers composed of saturated lipids, and in order to keep bilayer in a natural liquid crystalline phase one need to apply surface tension (Lyubartsev & Rabinovich, 2011: and references therein). In paper (Högberg et al., 2008) a solution was suggested how to improve the CHARMM force field in order to simulate DMPC lipid bilayer in constantpressure tensionless simulations. The correction included two changes: 1) scaling of the socalled 1-4 electrostatic interactions (of atoms separated by exact 3 covalent bonds) was introduced, which was tuned to reproduce experimentally measured ration of trans- and gauche conformations in hydrocarbon chains, and 2) partial atom charges in the lipid headgroup were recalculated from high-quality ab-initio calculations. It was demonstrated in paper (Högberg et al., 2008) that these modifications of the CHARMM force field allowed to obtain, in 100 ns constant-pressure simulations, excellent agreement with experimentally measured properties of DMPC bilayer such as area per lipid at zero tension, electron density, structure factor and order parameters. An important feature of these simulations was also that long-range correction to the Lennard-Jones potential was included into pressure calculations. This correction, having physical origin in the dispersion (van-der - Waals) forces, can have an order of 100 - 200 bar in the typical range of cut-off distances, and is known to be important in correct determination of the average bilayer area, as well as affecting phase behaviour of the bilayer (Lyubartsev & Rabinovich, 2011).

In continuation of paper (Högberg et al., 2008), the modified CHARMM force field was used to simulate bilayers consisting of two similar lipids: DSPC (which differ from DMPC lipid displayed in **Figure 13** that it contains 18 carbon atoms in each tail) and DOPC lipid (which exactly as DSPC contains 18 carbon atoms in each tail but have a double bond between 9-th and 10-th carbons in the second tail). Despite very similar chemical structure, bilayers composed of these lipids show different behaviour, and have noticeably different temperatures of the gel - liquid crystalline transition: 24 oC for DMPC, 53 oC for DSPC, and 5 oC for DOPC, that is why in the physiological range of temperatures DMPC and DOPC bilayers exist in a liquid crystalline phase while DSPC is in a gel phase. Simulations of the three bilayers were carried out at 30 oC, for 128 lipids in the presence of 3840 water molecules (which correspond to fully hydrated state of phosphatydylcholine lipids with 30 water molecules per lipid). In the beginning of simulations, the lipids were arranged in two leaflets, each leaflet being generated by translation of coordinates of one lipid molecule in X and Y direction. The lipid spacing was chosen to correspond to a typical value of area per lipid for liquid-crystalline phase, 64 Å2. The necessary amount of water molecules was

consistently, with respect to variation of lipid chemical structure, composition, thermodynamic conditions (Lyubartsev & Rabinovich, 2011). While solution of the first problem is determined by advances in the development of computer hardware, the issue of proper parametrization of the force field requires extensive computational work including numerous test simulation of the bilayer systems and calculations of different experimentally

In the last several years we have used M.Dyna*Mix* package in order to improve CHARMM force field (Feller & MacKerell, 2000) which is one of the frequently used in biomolecular simulations. While been fully atomistic, this force field have potential advantages in comparison with the Gromos force field (Berger et al., 1997), which is based on the united atom model. However, recent detailed investigations have shown that the CHARMM force field has not-negligible disagreement with experiment in description of number important properties of lipid bilayers. For example, the CHARMM force field favour to rigid gel-like structures of bilayers composed of saturated lipids, and in order to keep bilayer in a natural liquid crystalline phase one need to apply surface tension (Lyubartsev & Rabinovich, 2011: and references therein). In paper (Högberg et al., 2008) a solution was suggested how to improve the CHARMM force field in order to simulate DMPC lipid bilayer in constantpressure tensionless simulations. The correction included two changes: 1) scaling of the socalled 1-4 electrostatic interactions (of atoms separated by exact 3 covalent bonds) was introduced, which was tuned to reproduce experimentally measured ration of trans- and gauche conformations in hydrocarbon chains, and 2) partial atom charges in the lipid headgroup were recalculated from high-quality ab-initio calculations. It was demonstrated in paper (Högberg et al., 2008) that these modifications of the CHARMM force field allowed to obtain, in 100 ns constant-pressure simulations, excellent agreement with experimentally measured properties of DMPC bilayer such as area per lipid at zero tension, electron density, structure factor and order parameters. An important feature of these simulations was also that long-range correction to the Lennard-Jones potential was included into pressure calculations. This correction, having physical origin in the dispersion (van-der - Waals) forces, can have an order of 100 - 200 bar in the typical range of cut-off distances, and is known to be important in correct determination of the average bilayer area, as well as

affecting phase behaviour of the bilayer (Lyubartsev & Rabinovich, 2011).

In continuation of paper (Högberg et al., 2008), the modified CHARMM force field was used to simulate bilayers consisting of two similar lipids: DSPC (which differ from DMPC lipid displayed in **Figure 13** that it contains 18 carbon atoms in each tail) and DOPC lipid (which exactly as DSPC contains 18 carbon atoms in each tail but have a double bond between 9-th and 10-th carbons in the second tail). Despite very similar chemical structure, bilayers composed of these lipids show different behaviour, and have noticeably different temperatures of the gel - liquid crystalline transition: 24 oC for DMPC, 53 oC for DSPC, and 5 oC for DOPC, that is why in the physiological range of temperatures DMPC and DOPC bilayers exist in a liquid crystalline phase while DSPC is in a gel phase. Simulations of the three bilayers were carried out at 30 oC, for 128 lipids in the presence of 3840 water molecules (which correspond to fully hydrated state of phosphatydylcholine lipids with 30 water molecules per lipid). In the beginning of simulations, the lipids were arranged in two leaflets, each leaflet being generated by translation of coordinates of one lipid molecule in X and Y direction. The lipid spacing was chosen to correspond to a typical value of area per lipid for liquid-crystalline phase, 64 Å2. The necessary amount of water molecules was

measured properties.

added outside the lipid bilayer, and the system was put into the periodic boundary conditions. The systems were simulated 1 ns under constant volume and then 1 ns under constant pressure and isotropic cell fluctuations. The obtained configurations were considered as starting points for longer simulations with independent cell fluctuations in Z and XY directions. All the systems were simulated after that for 100 ns, with the first 20 ns considered as equilibration.

Fig. 14. DSPC bilayer (a) and DOPC bilayer (b), simulated for 100 ns at 30C and constant pressure (1 bar).

Simulations showed the picture which corresponds well to the behaviour expected from experimental observations: while in DMPC and DOPC bilayers the lipids formed quickly a liquid-crystalline phase, with the area per lipid 59.5 and 62.7 Å2 respectively, the DSPC lipids became clearly ordered in a tilted structure, with much lower area per lipid (51 Å2) which is a typical value for a gel phase. Final snapshots of DOPC and DSPC bilayers are shown in **Figure 14**. The structures of two bilayers are strikingly different, taking in mind the fact that the only difference between the two kinds of lipids is presence of one double bond in the middle of the second tails of a DOPC lipid. Nevertheless, this behaviour is what one can expect from experimental observations.

As our exploratory simulations show that we can get reliable, consistent with experiment, behaviour of different bilayer systems, we can use the developed methodologies to address to more challenging problems, related to permeability of different substances across membranes, simulations of membrane proteins, ion channel, effects of other membrane associated molecules (cholesterol, polypeptides, anaesthetics) on membrane properties , and relating observation of these studies with the features essential for biological functioning, thus implementing the idea of a "computer laboratory" for biomembrane research.

### **4. Acknowledgements**

The authors wish to thank SNIC & SNAC for generous allocations of computer time throughout all the years.

#### **5. References**


Allen, M. & Tildesley D. (1987). *Computer simulations of Liquids.* Oxford University Press.

Anderson, C.F. & Record, M.T. Jr (1995). Salt-nucleic acid interactions. *Annual Review of* 

Bagno, A. (2002). Probing the solvation shell of organic molecules with intermolecular 1H

Berger, O.; Edholm O. & Jähnig, F. (1997). Molecular dynamics simulations of a fluid bilayer

Bergman, D. & Laaksonen, A. (1998). Topological and spatial structure in the liquid-water–

Bergman, D. L.; Laaksonen, L. & Laaksonen, A. (1998). Visualization of solvation structures in liquid mixtures. *Journal of Molecular Graphics and Modelling*, Vol. 15, pp. 301 -306 Bunta, J.; Dahlberg, M.; Erikson, L.; Korolev, N.; Laaksonen, A.; Lohikoski, R.; Lyubartsev,

Cornell, W.; Cieplak, P.; Bayly, C.; Gould, I.; Merz, K. J.; Ferguson, D.; Spellmeyer, D.; Fox,

Dahlberg, M & Laaksonen, A. (2006). Preferential Solvation of Phenol in Binary Solvent

Denisov, V. & Halle, B. (2000). Sequence-specific binding of counterions to B-DNA.

Egorov, AV; Komolkin, AV; Lyubartsev, AP & Laaksonen, A. (2006). First and second

Feller, S.E. & MacKerell, A.D. (2000). An improved empirical potential energy function for

Högberg, C.-J.; Nikitin, A.M. & Lyubartsev, A.P. (2008). Modification of the CHARMM force

Korolev, N.; Lyubartsev, A.P.; Nordenskiöld, L. & Laaksonen, A. (2001). Spermine: An

Korolev, N.; Lyubartsev, A.P.; Laaksonen, A. & Nordenskiöld, L. (2002). On the competition

of dipalmitoylphosphatidylcholine at full hydration, constant pressure and

acetonitrile mixture, *Physical Review E: Statistical Physics, Plasmas, Fluids, and Related* 

A.; Pinak, M. & Schyman, P. (2007). Solvating, manipulating, damaging, and repairing DNA in a computer. *International Journal of Quantum Chemistry*, Vol. 107,

T.; Caldwell, J. & Kollmann, P. J. (1995). Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. *Journal of the* 

Mixtures. A Molecular Dynamics Study. Journal of Physical Chemistry A, Vol. 110,

*Proceedings of the National Academy of Sciences of the United States of America*, Vol. 97,

hydration shell of Ni2+ studied by molecular dynamics simulations Theoretical

molecular simulations of phospholipids. *Journal of Physical Chemistry B*, Vol. 104,

field for DMPC lipid bilayer*. Journal of Computational Chemistry*, Vol. 29 , No. 14, pp.

"invisible" component in the crystals of B-DNA: A grand canonical Monte Carlo and Molecular Dynamics simulation study. *Journal of Molecular Biology*, Vol. 308,

between water, sodium ions and spermine in binding to DNA. A molecular dynamics computer simulation study. *Biophysical Journal*, Vol. 82, pp. 2860-2875 Korolev, N.; Lyubartsev, A.P.;. Laaksonen, A. & Nordenskiöld, L. (2004a). A molecular

dynamics simulation study of polyamine and sodium DNA. Interplay between

NOESY. *Journal of Physical Organic Chemistry*, Vol. 15, 790–795.

constant temperature. *Biophysical Journal*, Vol. 72, pp. 2002-2013

*Interdisciplinary Topics*, Vol. 58, No. 4, pp. 4706-4715

*American Chemical Society*, Vol. 117, pp. 5179-5197

Chemistry Accounts (2006) Vol. 115, 170–176.

**5. References** 

ISBN 0-19-855375-7, New York

No. 2, pp. 279-291

2253-2258.

pp. 629-633

pp. 7510-7515

2359-2369

pp. 907-917

*Physical Chemistry*, Vol. 46, pp. 657-700

polyamine binding and DNA structure. *European Biophysics Journal*, Vol. 33, No. 8, pp. 671 – 682


## **Practical Estimation of TCR-pMHC Binding Free-Energy Based on the Dielectric Model and the Coarse-Grained Model**

Hiromichi Tsurui1 and Takuya Takahashi2 *1Juntendo University 2Ritsumeikan University Japan* 

### **1. Introduction**

106 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

MacKerell, A. D. (2004), Empirical force fields for biological macromolecules: Overview and

McFail-Isom, L.; Sines, C. C. & Williams, L. D. (1999). DNA structure: cations in charge?.

Mocci, F. & Saba, G. (2003). Molecular Dynamics Simulations of AT-Rich Oligomers:

Mocci, F.; Laaksonen, A.; Lyubartsev, A. & Saba, G. (2004). Molecular Dynamics

Odelius, M. & Laaksonen, A. (1999). Combined MD simulation and NMR relaxation studies

Saenger, W. (1984) *Principles of Nucleic Acid Structure*, C.R Cantor, Editor, Springer

Tuckerman, M.; Berne, B.J. & Martyna, G.J. (1992). Reversible multiple time scale molecular

van Dam, L.; Lyubartsev, A.P.; Laaksonen, A. & Nordenskiold, L. (1998). Self-diffusion and

simulation study. *Journal of Physical Chemistry B*, Vol. 102, pp. 10636-10642 Vishnyakov, A.; Widmalm, G.; Kowalewski, J.; & Laaksonen A. (1999). Molecular Dynamics

Vishnyakov, A.; Widmalm, G.; & Laaksonen A. (2000). Carbohydrates exhibit a distinct

Vishnyakov, A.; Widmalm, G.; & Laaksonen A. (2001). Molecular dynamics simulations of

Vishnyakov, A; Lyubartsev, A; and Laaksonen A. (2001). Molecular Dynamics Simulations

Wang, F. &. Landau, D.P (2001). Efficient, Multiple-Range Random Walk Algorithm to Calculate the Density of States*. Physical Review Letters,* Vol. 86, pp. 2050-2053. Åberg, KM,; Lyubartsev, A.P.; Jacobsson, S.P. & Laaksonen, A. (2004). Determination of

Journal of Molecular Graphics and Modeling Vol. 19(2), 338-342.

Reichardt, C. (1988). *Solvents and Solvent Effects in Organic Chemistry*, VCH, Weinheim. Ryckaert, J.-P.; Ciccotti, G. & Berendsen, H.J.C. (1977). Numerical integration of the cartesian

Sequence-Specific Binding of Na+ in the Minor Groove of B-DNA. *Biopolymers*, Vol.

Investigation of 23Na NMR Relaxation in Oligomeric DNA Aqueous Solution.

of molecular motion and intermolecular interactions. In: *Molecular Dynamics. From Classical to Quantum Methods.* Balbuena, P., Seminario, J., Eds.; Theoretical and Computational Chemistry, Vol. 7, pp. 281-324, Elsevier Science, ISBN 10: 0-444-

equations of motion of a system with constraints: Molecular dynamics of n-alkanes.

association of Li+, Cs+, and H2O in oriented DNA fibers. An NMR and MD

Simulation of the α-D-Manp-(1→3)-β-D-Glcp-OMe Disaccharide in Water and Water/DMSO Solution. Journal of American Chemical Society, Vol. 121(23), 5403-

preferential solvation pattern in binary aqueous solvent mixtures. Angewandte

α-D-Manp-(1→3)-β-D-Glcp-OMe in methanol and in dimethylsulfoxide solutions,

of Dimethyl Sulfoxide and Dimethyl Sulfoxide-Water Mixture. Journal of Physical

solvation free energies by adaptive expanded ensemble molecular dynamics.

issues. *Journal of Computational Chemistry*, Vol. 25, pp. 1584–1604

*Current Opinion in Structural Biology*, Vol. 9, pp. 298-304.

*Journal of Physical Chemistry B*, Vol. 108, pp. 16295-16302

82910-5, ISBN 13: 978-0-444-82910-8 Amsterdam

Journal of Computational Physics, Vol. 23, pp. 327-341

Advanced Text in Chemistry, Springer-Verlag, New York

dynamics. *Journal of Chemical Physics*, Vol. 97, 1990–2001

Chemie, International Edition Vol. 39(1), 140-142.

*Journal of Chemical Physics*, Vol. 120, pp. 3770-3776.

Chemistry A, 2001, Vol. 105, 1702-1710.

68, pp. 471-485

5412.

To evaluate free energy changes of bio-molecules in a water solution, *ab initio* molecular dynamics (MD) simulations such as Quantum Mechanical Molecular Mechanics (QM/MM) and MD are the most theoretically rigorous methods (Car and Parrinello 1985; Kuhne, Krack et al. 2007), although the calculation cost is far too large for large molecular systems that contain many electrons. Therefore, all-atom MD simulations based on classical mechanics (i.e., Newton's equations) are used for the usual bio-molecular systems. As the conventional free energy perturbation (FEP) method based on all atom MD simulation is a strict method, to elucidate the molecular principles upon which the selectivity of a TCR is based, FEP simulations are used to analyse the binding free energy difference of a particular TCR (A6) for a wild-type peptide (Tax) and a mutant peptide (Tax P6A), both presented in HLA A2. The computed free energy difference is 2.9 kcal mol-1 and the agreement with the experimental value is good, although the calculation is very time-consuming and the simulation time is still insufficient for fully sampling the phase space. From this FEP calculation, better solvation of the mutant peptide when bound to the MHC molecule is important to the greater affinity of the TCR for the latter. This suggests that the exact and efficient evaluation of solvation is important for the affinity calculation (Michielin and Karplus 2002). Other FEP calculations of the wild-type and the variant human T cell lymphotropic virus type 1 Tax peptide presented by the MHC to the TCR have been performed using large scale massively parallel molecular dynamics simulations and the computed free energy difference using alchemical mutationbased thermodynamic integration, which agrees well with experimental data semiquantitatively (Wan, Coveney et al. 2005). However, the conventional FEP is still very timeconsuming when searching for so many unknown docking structures because all-atom MD for a large molecular system is a computationally hard task and MD simulations must be done not only in initial and final states but also in many intermediate states.

Recently, the energy representation (ER) method - where only the initial and final states of a molecular system need to be considered and the sampling cost is drastically decreased - is developed for the molecular solvation process by Matubayashi et al. (Matubayasi and Nakahara 2000; Matubayasi and Nakahara 2002) and will be applicable to the calculation of binding free energy. Of course, this ER method can be combined with the approximate models described below. Instead of MD simulations, Monte Carlo simulations are also used for the sampling of the configurations. This type of approach which only considers initial and final states is called an endpoint method.

Most of the calculation cost in all-atom MD involves the sampling of the solvent atom configurations because the number of solvent atoms - such as water and co-solvent ions - is much larger than that of the target bio-molecules, and long-range electrostatic potential is especially time consuming although efficient algorithms such as the Fast Multi Pole method and several Ewald methods are developed for all-atom MD. To decrease the calculation cost of the long-range electrostatic term, a continuum dielectric model - which can calculate the electrostatic free energy term of the system very efficiently - is widely used in many biomolecular systems and is described in the next section.

In the case of large molecules, the entropy term of solvation change becomes important (Asakura and Oosawa 1954), and the solvent accessible surface area (SA) based calculation method becomes insufficient because the excluded volume effect increases. Therefore, integration equation (IE) theories such as the Ornstein-Zernike equation and the closures which are developed in molecular liquid theory - promise to evaluate entropy change, including solvation and de-solvation processes (Kinoshita 2006; Kinoshita 2009; Yasuda, Yoshidome et al. 2010). The recent MD software package AMBER also contains such an IE algorithm, 3D-RISM, which is a reference site model employing Cartesian coordinates (Luchko, Gusarov et al. 2010). In particular, the simple morphological theory obtained from this IE approach is now applied to the elucidation of protein folding (Yasuda, Yoshidome et al. 2010) and F1-ATPase mechanisms and has proven to be useful (Yoshidome, Ito et al. 2011).

Fig. 1. The relationship among theoretical models and approaches

binding free energy. Of course, this ER method can be combined with the approximate models described below. Instead of MD simulations, Monte Carlo simulations are also used for the sampling of the configurations. This type of approach which only considers initial

Most of the calculation cost in all-atom MD involves the sampling of the solvent atom configurations because the number of solvent atoms - such as water and co-solvent ions - is much larger than that of the target bio-molecules, and long-range electrostatic potential is especially time consuming although efficient algorithms such as the Fast Multi Pole method and several Ewald methods are developed for all-atom MD. To decrease the calculation cost of the long-range electrostatic term, a continuum dielectric model - which can calculate the electrostatic free energy term of the system very efficiently - is widely used in many bio-

In the case of large molecules, the entropy term of solvation change becomes important (Asakura and Oosawa 1954), and the solvent accessible surface area (SA) based calculation method becomes insufficient because the excluded volume effect increases. Therefore, integration equation (IE) theories such as the Ornstein-Zernike equation and the closures which are developed in molecular liquid theory - promise to evaluate entropy change, including solvation and de-solvation processes (Kinoshita 2006; Kinoshita 2009; Yasuda, Yoshidome et al. 2010). The recent MD software package AMBER also contains such an IE algorithm, 3D-RISM, which is a reference site model employing Cartesian coordinates (Luchko, Gusarov et al. 2010). In particular, the simple morphological theory obtained from this IE approach is now applied to the elucidation of protein folding (Yasuda, Yoshidome et al. 2010) and F1-ATPase mechanisms and has proven to be useful (Yoshidome, Ito et al. 2011).

and final states is called an endpoint method.

molecular systems and is described in the next section.

Fig. 1. The relationship among theoretical models and approaches

The other approaches for decreasing the calculation cost of the solvent molecules are coarsegrained (CG) solvent models. The protein–dipole Langevin–dipole (PDLD) model, which can efficiently calculate the electrostatic interaction among permanent dipoles and induced dipoles of proteins and solvent atoms, is one of the coarse-grained solvent models. As the PDLD model is usually used in the outer area of the all-atom region, this is a hybrid approach of CG and all-atom models (Warshel and Levitt 1976; Xu, Wang et al. 1992). Hybrid approaches of all-atom, CG and continuum solvent models are evolving. A smoothly decoupled particle interface (SDPI) model has a switching region that gradually transitions from fully interacting particles to a continuum solvent. The resulting SDPI model allows for the use of an implicit solvent model based on a simple theory that needs only to reproduce the behaviour of bulk solvent rather than the more complex features of local interactions (Wagoner and Pande 2011). Of course, CG models for solute molecules - which are described in the third section - are promising for the understanding of protein folding (Liwo, He et al. 2011) and predictions of the ligand-receptor docking structure, etc.

The relationship among the theoretical models and approaches is summarised in Fig. 1.

### **2. The dielectric model and the MM-PBSA (GBSA) method**

In this section, we describe briefly the principles behind the methods, the differences between PBSA and GBSA and explicit and implicit treatment.

### **2.1 Principles of the method**

A molecule has an atomic polarisability due to its electrons and an orientational polarisability when the molecule is polar and has a permanent electric dipole moment. A high value of the relative dielectric constant (εr=78.4 at 298K) of water is mainly due to its orientational polarisation, where the electric dipole moment is 2.95 Debye. Moreover, the solution in our body contains several co-solvent ions such as Na+, Cl- , K+ and so on for the usual physiological condition. Therefore, electrostatic interactions among bio-molecules are largely decreased by water and solvent ions in a very complicated manner when compared with the *in vacuo* case (Koehl 2006).

To obtain the electrostatic contribution to free energy change, the dielectric model is a good approximation and is widely used to calculate the electrostatic potential of molecular systems in many scientific and technological fields.

First, the use of a simple function of the effective relative dielectric constant is the easiest way to reduce the calculation time required in obtaining electrostatic potentials.

A simple distance-dependent function, 4.5 r, which is proposed by Pickersgill (Pickersgill 1988), can well explain site-directed mutagenesis experiments. Warwicker showed that the simple Debye–Hückel shielding function with a uniform effective relative dielectric constant of 50 was sufficient to explain experimental results when compared with a continuum model (Warwicker 1999). Mehler et al. challenged this problem and proposed a sigmoid function considering the local hydrophobicity and hydrophilicity of protein molecules whose results were also in good agreement with pKa shift measurements (Mehler and Guarnieri 1999). These methods are simple and very fast; however, they all require parameter readjustment for each new system to be studied. Unfortunately, a universal function applicable to all macromolecular systems does not yet exist.

Empirically obtained effective dielectric functions that depend on the inter-atomic distance, r, such as linear functions (εr=*r* or 4*r*) and the sigmoid function is simple, and low calculation-cost method is still used in recent drug design studies for the docking simulations of large molecular systems so as to save on the calculation-cost, although the calculation error is large (Takahashi, Sugiura et al. 2002).

#### **2.1.1 PB approach**

On the other hand, the typical dielectric model solves the Poisson equation and treats biomolecules and water as continuum media which have a specific dielectric constant, although the position-dependent local dielectric constant - which is calculated from the electronic polarisation of atoms and the orientational polarisation of local dipoles - is also possible for a finite difference equation (Nakamura, Sakamoto et al. 1988; Pitera, Falta et al. 2001).

Moreover, the Poisson-Boltzmann (PB) equation, which was first proposed by Gouy in 1910 and was complemented by Chapman in 1913, is widely-used for considering the contribution of solvent ions. The Gouy-Chapman theory, which solves a simple onedimensional nonlinear PB equation, is often used in a membrane-electrolyte system that has electrical double layers (Forsten, Kozack et al. 1994).

The PB equation is a differential equation and it describes electrostatic interactions between molecules in ionic solutions by using a mean-filed approximation where the correlations among the solvent ions are neglected. The equation in SI units can be written as:

$$\bar{\nabla}\left[\mathcal{L}(\bar{r})\bar{\nabla}\mathfrak{u}(\bar{r})\right] = -\rho(\bar{r}) - \sum\_{i} c\_{i}^{\prime\prime} z\_{i} q\lambda(\bar{r}) \exp\left[\frac{-z\_{i}q\bar{\nabla}\mathfrak{u}(\bar{r})}{k\_{B}T}\right] \tag{1}$$

where is the divergence operator and ( ) *r* is the position-dependent dielectric, which is set to be constant in the solvent, the bio-molecule and the boundary regions in continuum dielectric models. ψ( )*<sup>r</sup>* is the gradient of the electrostatic potential, ( )*r* represents the charge density of the solute (i.e., the fixed charges of the bio-molecule), *<sup>i</sup> <sup>c</sup>* represents the concentration of the ion *i* at a distance of infinity from the solute, *zi* is the charge of the solvent ion, *q* is the charge of a proton, *kB* is the Boltzmann constant, *T* is the temperature and is a factor for the position-dependent accessibility of position *r* to the ions in the solution. If the potential is small and the electrostatic energy is negligible compared to the thermal fluctuation, *kBT*, the equation can be linearised and solved more efficiently.

2 2 ψ( )*r r* <sup>ψ</sup>( ) . (2)

Here, κ is the Debye shielding parameter, defined as follows:

$$\kappa^2 = \sum\_{i}^{\mathrm{m}} \frac{z\_i^2 q^2 c\_i^{\circ}}{\varepsilon k\_B T} \,. \tag{3}$$

This weak field limit approach is called the Debye–Hückel approximation (Fogolari, Brigo et al. 2002).

To solve the PB equation, there are typically three numerical methods: a finite difference (FD) method is relatively time consuming, but simple and applicable to a complex system which has a position-dependent local dielectric constant. Therefore, the FD method is firstly applied to calculate the electrostatic potential in a protein-solvent system, and the pKa shift of the protein ionisable residues are well-explained (Gilson and Honig 1987) and the effect of the salt concentration on the pKa are also reproduced (Takahashi, Nakamura et al. 1992). The finite element method (FEM) and the boundary element method (BEM) are more powerful and the calculation cost is smaller than the FD method, although only a uniform dielectric constant must be set in each region (Lu, Zhou et al. 2008).

#### **2.1.2 GB approach**

One other powerful way to obtain the electrostatic potential based on the dielectric model is the Generalised Born (GB) model, which solves the linearised PB equation by approximating such bio-molecules as proteins and nucleic acids as a set of spheres whose internal dielectric constant differs from the external solvent (Koehl 2006). The functional form of the model is written as:

$$\text{Gs} = \frac{1}{8\pi} \left( \frac{1}{\varepsilon\_0} - \frac{1}{\varepsilon} \right) \sum\_{i,j}^{N} \frac{q\_i q\_j}{f\_{ij}} \tag{4}$$

where

110 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Empirically obtained effective dielectric functions that depend on the inter-atomic distance, r, such as linear functions (εr=*r* or 4*r*) and the sigmoid function is simple, and low calculation-cost method is still used in recent drug design studies for the docking simulations of large molecular systems so as to save on the calculation-cost, although the

On the other hand, the typical dielectric model solves the Poisson equation and treats biomolecules and water as continuum media which have a specific dielectric constant, although the position-dependent local dielectric constant - which is calculated from the electronic polarisation of atoms and the orientational polarisation of local dipoles - is also possible for a finite difference equation (Nakamura, Sakamoto et al. 1988; Pitera, Falta et al.

Moreover, the Poisson-Boltzmann (PB) equation, which was first proposed by Gouy in 1910 and was complemented by Chapman in 1913, is widely-used for considering the contribution of solvent ions. The Gouy-Chapman theory, which solves a simple onedimensional nonlinear PB equation, is often used in a membrane-electrolyte system that has

The PB equation is a differential equation and it describes electrostatic interactions between molecules in ionic solutions by using a mean-filed approximation where the correlations

i

( ) *r*

 ( ) <sup>ψ</sup>( ) ( ) ( )exp *<sup>i</sup>*

set to be constant in the solvent, the bio-molecule and the boundary regions in continuum

charge density of the solute (i.e., the fixed charges of the bio-molecule), *<sup>i</sup> <sup>c</sup>* represents the concentration of the ion *i* at a distance of infinity from the solute, *zi* is the charge of the solvent ion, *q* is the charge of a proton, *kB* is the Boltzmann constant, *T* is the temperature

solution. If the potential is small and the electrostatic energy is negligible compared to the

2 2 ψ( )*r r* <sup>ψ</sup>( )

m 2 2

This weak field limit approach is called the Debye–Hückel approximation (Fogolari, Brigo et

*i i B z qc k T*

i

thermal fluctuation, *kBT*, the equation can be linearised and solved more efficiently.

2

*i i*

ψ( )

is the position-dependent dielectric, which is

( )*r*

. (2)

. (3)

represents the

to the ions in the

*B*

*zq r r r r c zq r k T* (1)

among the solvent ions are neglected. The equation in SI units can be written as:

dielectric models. ψ( )*<sup>r</sup>* is the gradient of the electrostatic potential,

and is a factor for the position-dependent accessibility of position *r*

Here, κ is the Debye shielding parameter, defined as follows:

calculation error is large (Takahashi, Sugiura et al. 2002).

electrical double layers (Forsten, Kozack et al. 1994).

where is the divergence operator and

**2.1.1 PB approach** 

2001).

al. 2002).

$$f\_{ij} = \sqrt{r\_{ij}^2 + a\_{ij}^{-2}e^{-D}}\tag{5}$$

and

$$D = \left(\frac{r\_{ij}}{2a\_{ij}}\right)^2, a\_{ij} = \sqrt{a\_i a\_j} \tag{6}$$

where is the dielectric constant *in vacuo*, is the dielectric constant of the solvent, *q*i is the electrostatic charge on the particle i, *r*ij is the distance between particles i and j, and *a*i is a length defined as the effective Born radius (Still, Tempczyk et al. 1990).

The effective Born radius of an atom represents its degree of burial inside the solute and corresponds to the distance from the atom to the molecular surface. The exact evaluation of the effective Born radii is the central issue for the GB model (Onufriev, Bashford et al. 2004).

To consider the electrostatic shielding effect due to the solvent ions, a simplified function based on the Debye-Hückel approximation is added to the function *G*s in the AMBER software package (Case, Cheatham et al. 2005), which is one of the most used packages in the world of bio-molecular simulations, as follows by (Srinivasan, Trevathan et al. 1999):

$$\text{Gs} = \frac{1}{8\pi} \left( \frac{1}{\varepsilon\_0} - \frac{\exp(-\kappa f\_{\vec{\eta}})}{\varepsilon} \right) \sum\_{i,j}^{N} \frac{q\_i q\_j}{f\_{\vec{\eta}}} \,. \tag{7}$$

They calculated the solvation free energies, *G*s, with this GB model for proteins and nucleic acids, which agreed very well with those of the PB model. The salt-dependence of the electrostatic binding free energy based on the Debye-Hückel approximation is still under investigation (Harris, Bredenberg et al. 2011).

### **2.1.3 The GBSA (PBSA) approach**

GBSA (PBSA) is simply a GB (PB) model with the hydrophobic solvent accessible surface area (SA) term. This is the most commonly used implicit solvent model combination and is widely used in MD simulations for large bio-molecules. This approach is known as MM/GBSA in the context of molecular mechanics. This formulation can well identify the native states of short peptides with a precise stereoscopic structure (Ho and Dill 2006), although the conformational ensembles produced by GBSA models in other studies differ significantly from those produced by an explicit solvent and do not identify the protein's native state (Zhou 2003). In particular, strong charge-charge interaction such as salt bridges are overstabilised due to insufficient electrostatic screening, and the alpha helix population became higher than the native one. These problems are common in PBSA. Variants of the GB model have also been developed to approximate the electrostatic environment of membranes, which have had some success in folding the transmembrane helices of integral membrane proteins (Im, Feig et al. 2003).

There are several kinds of software containing the GB algorithm. For example, the AMBER software package has three types of GBSA models as has as the PBSA model.

The MM-PBSA and GBSA approaches are the endpoint methods and usually only consider the initial unbound state and the final bound state. The binding free energy change, dGbind, is written as:

$$d\text{Gbind} \equiv d\text{Ggas} + d\text{Gsolv} \equiv \text{(dHgas} + d\text{Htr/ro - TdS)} + \text{(}d\text{Gelsolv} + d\text{Gnpsolv)}.\tag{8}$$

The term *dG*gas refers to total free energy change and the term *dH*gas contains the van der Waals and electrostatic interaction energies as well as internal energy variation, such as bond, angle and torsional angle energies *in vacuo* (i.e., gas phase). The terms *dH*tr/ro denote the energy difference due to translational and rotational degrees of freedom, and becomes 3 RT in the classical limit (i.e., thermal energy is large enough). The term *dS* refers to the conformational entropy change (Tidor and Karplus 1994; Ben-Tal, Honig et al. 2000). The term *dG*solv is the difference between the initial and final solvation free energies and is divided into the electrostatic contribution, *dG*elsolv, and the nonpolar contribution, *dG*npsolv. The term *dG*npsolv, which is the sum of a cavity term and a solute-solvent van der Waals term, is calculated from the SA as follows:

$$d\mathbf{C}\mathbf{n}\mathbf{p}\text{solv} = \mathbf{\sqrt{s}A} + b.\tag{9}$$

The surface tension γ and the constant *b* are 0.00542 kcal mol-1 Å-2 and 0.92 kcal mol-1 respectively, for the MM-PBSA model (Sitkoff, D., K. Sharp and B. Honig. 1994). For GB models, 0.0072 kcal mol-1 Å-2 and 0 kcal mol-1 (Jayaram, Sprous et al. 1998), or else 0.005 kcal mol-1 Å-2 and 0 kcal mol-1 (Gohlke, Kuhn et al. 2004) are used. The SA in AMBER is calculated by using the LCPO algorithm (Weiser, Shenkin et al. 1999; Still, Tempczyk et al. 1990) to compute an analytical approximation to the solvent accessible area of the molecule.

The several types of the GBSA models are not only applied to many protein folding simulations (Zhou 2003), but also to nucleic acid conformational dynamics from massively parallel stochastic simulations, where the ubiquitous helical hairpin conformation is reproduced and folding pathway is investigated (Sorin, Rhee et al. 2003).

### **2.2 Review of recent work**

112 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

They calculated the solvation free energies, *G*s, with this GB model for proteins and nucleic acids, which agreed very well with those of the PB model. The salt-dependence of the electrostatic binding free energy based on the Debye-Hückel approximation is still under

GBSA (PBSA) is simply a GB (PB) model with the hydrophobic solvent accessible surface area (SA) term. This is the most commonly used implicit solvent model combination and is widely used in MD simulations for large bio-molecules. This approach is known as MM/GBSA in the context of molecular mechanics. This formulation can well identify the native states of short peptides with a precise stereoscopic structure (Ho and Dill 2006), although the conformational ensembles produced by GBSA models in other studies differ significantly from those produced by an explicit solvent and do not identify the protein's native state (Zhou 2003). In particular, strong charge-charge interaction such as salt bridges are overstabilised due to insufficient electrostatic screening, and the alpha helix population became higher than the native one. These problems are common in PBSA. Variants of the GB model have also been developed to approximate the electrostatic environment of membranes, which have had some success in folding the transmembrane helices of integral

There are several kinds of software containing the GB algorithm. For example, the AMBER

The MM-PBSA and GBSA approaches are the endpoint methods and usually only consider the initial unbound state and the final bound state. The binding free energy change, dGbind,

The term *dG*gas refers to total free energy change and the term *dH*gas contains the van der Waals and electrostatic interaction energies as well as internal energy variation, such as bond, angle and torsional angle energies *in vacuo* (i.e., gas phase). The terms *dH*tr/ro denote the energy difference due to translational and rotational degrees of freedom, and becomes 3 RT in the classical limit (i.e., thermal energy is large enough). The term *dS* refers to the conformational entropy change (Tidor and Karplus 1994; Ben-Tal, Honig et al. 2000). The term *dG*solv is the difference between the initial and final solvation free energies and is divided into the electrostatic contribution, *dG*elsolv, and the nonpolar contribution, *dG*npsolv. The term *dG*npsolv, which is the sum of a cavity term and a solute-solvent van

The surface tension γ and the constant *b* are 0.00542 kcal mol-1 Å-2 and 0.92 kcal mol-1 respectively, for the MM-PBSA model (Sitkoff, D., K. Sharp and B. Honig. 1994). For GB models, 0.0072 kcal mol-1 Å-2 and 0 kcal mol-1 (Jayaram, Sprous et al. 1998), or else 0.005 kcal mol-1 Å-2 and 0 kcal mol-1 (Gohlke, Kuhn et al. 2004) are used. The SA in AMBER is calculated by using the LCPO algorithm (Weiser, Shenkin et al. 1999; Still, Tempczyk et al. 1990) to compute an analytical approximation to the solvent accessible area of the molecule.

*dG*bind=*dG*gas+ *dG*solv =(*dH*gas+*dH*tr/ro –*TdS*)+( *dG*elsolv+*dG*npsolv). (8)

*dG*npsolv=γSA+*b*. (9)

software package has three types of GBSA models as has as the PBSA model.

investigation (Harris, Bredenberg et al. 2011).

**2.1.3 The GBSA (PBSA) approach** 

membrane proteins (Im, Feig et al. 2003).

der Waals term, is calculated from the SA as follows:

is written as:

As mentioned in the previous section 2.1, the dielectric models and the hybrid approaches are widely used in many scientific and technological fields, such as protein folding, molecular docking and drug design, etc. In particular, the binding free energy (BFE) calculation and the prediction of the binding affinity and binding structure between ligands and proteins is the most important aim (Gilson and Zhou 2007) because the major purpose of molecular docking (Zacharias and Fiorucci 2010; Leis and Zacharias 2011) is to predict the experimentally-obtained BFE and the binding site of a receptor to a specific ligand molecule, and drug design is usually supported by suitable molecular docking methods.

For example, the linear interaction energy (Rastelli, Rio et al. 2010) method - which combines two different continuum solvent models - is applied to calculate protein-ligand BFEs for a set of inhibitors against the malarial aspartic protease plasmepsin II, and the explicit solvent LIE calculations and LIE-PB reproduce absolute experimental BFEs with an average unsigned error of 0.5 and 0.7 kcal mol-1 respectively (Carlsson, Ander et al. 2006). Moreover, the ligand-water interaction energies - which are calculated from both PB and GB models using snapshots from explicit solvent MD simulations of the ligand and proteinligand complex - are compared with the explicit solvent MD results. The obtained energy from the explicit water MD agrees well with those from the PB model, although the GB model overestimates the change in solvation energy, which overestimation is caused by consistent underestimation of the effective Born radii in the protein-ligand complex.

Xu and Wang applied the MM-PBSA method to FK506-binding proteins (Xu and Wang 2006) - which are important targets of pharmaceutical interests - and calculated the binding of a set of 12 non-immunosuppressive small-molecule inhibitors to FKBP12 through MD simulations, where each complex is subjected to 1-ns MD simulation conducted in an explicit solvent environment under constant temperature and pressure. The BFE of each complex is then calculated with the MM-PBSA method in the AMBER program and the MM-PBSA computation agrees very well with the experimentally determined BFEs, with a correlation coefficient (R2) of 0.93 and a standard deviation as low as 0.30 kcal mol-1. The vibrational entropy term given by the normal mode analysis is necessary for achieving this correlation. Moreover, an adjustment to one weight factor in the PBSA model is essential to correct the absolute values of the final binding free energies to a reasonable range, which suggests that the very good correlation is due to the similar properties of ligand molecules and that this artificial weight factor is not universal. A comparison of the MM-PBSA model with a Linear Response Approximation model suggests that the MM-PBSA method seems to be robust in binding affinity prediction for this class of compounds (Lamb, Tirado-Rives et al. 1999).

To systematically evaluate the performance of MM-PBSA and several versions of the MM-GBSA models, extensive calculations of BFEs are done for 59 ligands interacting with six different proteins with the AMBER 9.0 software (Hou, Wang et al. 2011). First, the effects of the length of the MD simulation are explored, ranging from 400 to 4800 ps, and the simulation length has an obvious impact on the predictions. Interestingly, longer MD simulation is not always necessary for achieving better predictions. Second, the effect of a solute dielectric constant (1, 2, or 4) on the BFEs of MM-PBSA is also checked and the predictions are quite sensitive to the solute dielectric constant. Therefore, this parameter should be carefully determined according to the characteristics of the protein/ligand binding interface. Third, conformational entropy often shows large fluctuations in MD trajectories, and a large number of snapshots are necessary to achieve stable predictions. Next, the comparison of the accuracy of the BFEs of three GB models: (1) GB-HCT, the pair wise model by Hawkins et al. (Hawkins, Cramer et al. 1996) parameterised by Tsui and Case (Tsui and Case 2000); (2) GB-OC1 and (Case, Cheatham et al.) GB-OC2, the parameters of which are modified by Onufriev et al. (Onufriev, Bashford et al. 2004) and the GB-OC1 model which gives better results compared to the other two GB models in ranking the binding affinities of the studied inhibitors. This may be explained by the better agreement of GB-OC1 with PBSA. The better performance of MM-PBSA when compared with MM-GBSA in calculating absolute - but not necessarily relative - BFEs is confirmed, which is not surprising because the GBSA is the approximation of PBSA, but it suggests the reliability of the dielectric continuum model itself. Considering its computational efficiency, MM-GBSA gives good relative BFEs and is much faster than MM-PBSA, and can serve as a powerful tool in drug design where the correct ranking of inhibitors is often emphasised and the obtaining of the absolute value of BFEs is not so important.

Interestingly, the successive study of MM-PBSA and MM- GBSA-OC1 using 98 proteinligand complexes to develop an excellent scoring function by Hou et al. shows that MM-GBSA (success rate 69.4%) outperformed MM-PBSA (45.5%) and many popular scoring functions in identifying the correct binding conformations, and the best prediction of the MM-GBSA model with an internal dielectric constant of 2.0 produced a Spearman correlation coefficient of 0.66, which is better than MM/PBSA (0.49) and almost all the scoring functions used in molecular docking (Hou, Wang et al. 2011). However, the reason why the PBSA underperforms the GBSA is not clear. One possibility is the difference of the SA term and the other possibility is the insufficiency of the conformational sampling of proteins, as the authors are also emphasising the importance of MD calculation time. In any case, MM-GBSA performs well, for both binding pose predictions and binding free-energy estimations and it is efficient at re-scoring the top-hit poses produced by other less-accurate scoring functions.

As AMBER and other software packages - including the PB and GB models - are widely used and drug design is the important issue, many studies concerning ligand-protein docking based on the dielectric model have been done (Rastelli, Rio et al. 2010). The above calculation results of the GB and PB dielectric models are limited to relatively small ligand molecules and receptor proteins, and the size of the complex is not so large compared to socalled super-molecules, such as the immune complex and membrane proteins, etc. To calculate and analyse the BFE of a large, complex T-cell receptor (TCR) and immunogenic peptides (p) presented by class I major histocompatibility complexes (MHC), binding free energy decomposition (BFED) calculations based on the MM–GBSA approach including entropic terms were done on the 2C TCR/SIYR/H-2Kb system and provided a detailed description of the energetics of the interaction (Zoete and Michielin 2007), since this BFED method can detect the important individual side chains for the stability of a protein fold

the length of the MD simulation are explored, ranging from 400 to 4800 ps, and the simulation length has an obvious impact on the predictions. Interestingly, longer MD simulation is not always necessary for achieving better predictions. Second, the effect of a solute dielectric constant (1, 2, or 4) on the BFEs of MM-PBSA is also checked and the predictions are quite sensitive to the solute dielectric constant. Therefore, this parameter should be carefully determined according to the characteristics of the protein/ligand binding interface. Third, conformational entropy often shows large fluctuations in MD trajectories, and a large number of snapshots are necessary to achieve stable predictions. Next, the comparison of the accuracy of the BFEs of three GB models: (1) GB-HCT, the pair wise model by Hawkins et al. (Hawkins, Cramer et al. 1996) parameterised by Tsui and Case (Tsui and Case 2000); (2) GB-OC1 and (Case, Cheatham et al.) GB-OC2, the parameters of which are modified by Onufriev et al. (Onufriev, Bashford et al. 2004) and the GB-OC1 model which gives better results compared to the other two GB models in ranking the binding affinities of the studied inhibitors. This may be explained by the better agreement of GB-OC1 with PBSA. The better performance of MM-PBSA when compared with MM-GBSA in calculating absolute - but not necessarily relative - BFEs is confirmed, which is not surprising because the GBSA is the approximation of PBSA, but it suggests the reliability of the dielectric continuum model itself. Considering its computational efficiency, MM-GBSA gives good relative BFEs and is much faster than MM-PBSA, and can serve as a powerful tool in drug design where the correct ranking of inhibitors is often emphasised and the

Interestingly, the successive study of MM-PBSA and MM- GBSA-OC1 using 98 proteinligand complexes to develop an excellent scoring function by Hou et al. shows that MM-GBSA (success rate 69.4%) outperformed MM-PBSA (45.5%) and many popular scoring functions in identifying the correct binding conformations, and the best prediction of the MM-GBSA model with an internal dielectric constant of 2.0 produced a Spearman correlation coefficient of 0.66, which is better than MM/PBSA (0.49) and almost all the scoring functions used in molecular docking (Hou, Wang et al. 2011). However, the reason why the PBSA underperforms the GBSA is not clear. One possibility is the difference of the SA term and the other possibility is the insufficiency of the conformational sampling of proteins, as the authors are also emphasising the importance of MD calculation time. In any case, MM-GBSA performs well, for both binding pose predictions and binding free-energy estimations and it is efficient at re-scoring the top-hit poses produced by other less-accurate

As AMBER and other software packages - including the PB and GB models - are widely used and drug design is the important issue, many studies concerning ligand-protein docking based on the dielectric model have been done (Rastelli, Rio et al. 2010). The above calculation results of the GB and PB dielectric models are limited to relatively small ligand molecules and receptor proteins, and the size of the complex is not so large compared to socalled super-molecules, such as the immune complex and membrane proteins, etc. To calculate and analyse the BFE of a large, complex T-cell receptor (TCR) and immunogenic peptides (p) presented by class I major histocompatibility complexes (MHC), binding free energy decomposition (BFED) calculations based on the MM–GBSA approach including entropic terms were done on the 2C TCR/SIYR/H-2Kb system and provided a detailed description of the energetics of the interaction (Zoete and Michielin 2007), since this BFED method can detect the important individual side chains for the stability of a protein fold

obtaining of the absolute value of BFEs is not so important.

scoring functions.

with computational alanine scanning of the insulin monomer (Zoete and Meuwly 2006). A correlation between the decomposition results and experimentally-determined activity differences for alanine mutants of the TCR-pMHC complex is 0.67 when the conformational entropy is neglected, and 0.72 when the entropy is considered. Similarly, a comparison of experimental activities with variations in the BFEs determined by computational alanine scanning yields correlations of 0.72 and 0.74 when the entropy is neglected or taken into account, respectively. In addition, a comparison of the two theoretical approaches for estimating the role of each side chain in the complex formation is given, and a new *ad hoc* approach for decomposing the vibrational entropy term into atomic contributions - the linear decomposition of vibrational entropy (LDVE) - is introduced. The latter allows the rapid calculation of the entropic contribution of interesting side chains to the binding. This approach is justified by the idea that the most important contributions to the vibrational entropy of a molecule originate from residues that contribute most to the vibrational amplitude of the normal modes. The results of the LDVE are very similar to those of the exact but highly computationally demanding method. The BFED approach is also applicable to the design of rational TCR by calculating each amino acid contribution in mutated TCR. As melanoma patients frequently show unusually positive clinical outcomes, it represents an interesting target for adoptive transfer with modified TCR. Sequence modifications of TCR which potentially increase the affinity for this epitope have been proposed and tested *in vitro*. T-cells expressing some of the proposed TCR mutants showed better T-cell functionality, with the improved killing of peptide-loaded T2 cells and better proliferative capacity compared to the wild type TCR expressing cells (Zoete, Irving et al. 2010).

As there are still not many applications for massive simulations with dielectric models to large bio-molecules like the TCR-pMHC complex, more extensive studies are necessary to evaluate the validity of the method and improve its accuracy and performance because the excluded volume effect due to water entropy change in binding will become larger in the larger systems.

### **2.3 The correlation between calculation-cost and accuracy**

It is not easy to state the calculation cost and accuracy exactly because the method is only now developing and the accuracy depends on the system size.

Previous studies have shown a very good correlation between PB and GB results because the GB parameter is modified to achieve better agreement with that of PB (Gohlke, Kuhn et al. 2004; Onufriev, Bashford et al. 2004). Moreover, GB and PB methods also enable the rapid scoring of protein structures when they are combined with physics-based energy functions. The direct comparison of these two approaches on large protein data sets is done with a scoring function based on a GB and PB solvation model and short MD simulations. Against seven publicly available decoy sets, the results of the MM-PBSA approach are comparable to the GB-based scoring function (Lee, Yang et al. 2005).

We also compared the MM-PBSA and MM-GBSA methods. Table 1 shows the comparison of the binding electrostatic free energies of the PB and GB methods for two TCR-pMHC complexes (2gj6 and 3pwp), a complex of A6 and Tax peptide-HLA A2, and A6 with Hud-A2 respectively. Constant regions of TCR were removed (Gregoire, Lin et al. 1996), hydrogen was added and the complexes were neutralised and solvated with TIP3P. The numbers of atoms involved in the systems were 130,545 for 2GJ6 and 127,023 for 3PWP. Calculations were performed with Sander of AMBER 11 for 5 ns. The *G*elsolv, which is always largely negative in each case, represents the electrostatic energy contribution due to solvents. The *G*np is the hydrophobic and van der Waals contributions were calculated from the solvent accessible surface area (SA). The difference of the PB and GB results of each case is 3-4%, although the total binding free energy, *dG*bind, differs by almost 20% because the binding energies *in vacuo*, *dG*gas, and the contribution of the solvent, *dG*solv=*dG*elsolv+*dG*npsolv, have a different sign and cancel each other. We must note that the ratio of the SA contributions between GBSA and PBSA is larger than the *E*elsov, although the absolute contribution is 1/10th of the *G*elsolv.


Table 1. A comparison of the binding electrostatic free energies of the PBSA and GB methods for two TCR-pMHC complexes (PDB ID: 2gj6 and 2pwp). The *G*elsolv, which is always largely negative in each case, represents the electrostatic energy contribution due to solvents. The *G*npsolv is the hydrophobic and van der Waals contributions are calculated from the solvent accessible surface area (SA). The total binding free energy, *dG*bind, is the sum of the binding energy *in vacuo*, *dG*gas, and the contribution of the solvent, *dG*solv=*dG*elsolv+*dE*SA. All energies in the table are given in kcal mol-1.

### **3. Coarse-grained (CG) simulation**

### **3.1 The limits of all-atom simulations**

Even though all-atom simulations provide the most detailed information about the system of interest, its calculation costs are quite high. A system containing a large protein molecule such as several 105 to 106 Dalton comes up to several 105 atoms when solvated in explicit water molecules, and expands to nm3 in size; hence, the calculation time of less than sec even using a recent multi-core PC. These figures are too short and still too small to reproduce such biologically interesting phenomena as protein folding, protein-assembly and enzymatic reaction, etc. Therefore, the increase of calculation efficiency is quite an urgent requirement. The calculation cost increases approximately in proportion to the square of the number of atoms, and the time for one step is approximately proportional to the order of the square-root of the mean mass of elements. The number of atoms constituting an amino acid (AA), when polymerized in a peptide, is 7 (Gly) to 24 (Trp), and the mass is between 57 (Gly) and 186 (Trp) - about 5 to 15 times of a C, N or O. When an AA is coarse-grained to 2 to 4 pseudo-atoms, the calculation cost decreased by 2 to 3 orders of magnitude, and the time for a step increases by 2 or 3 orders. In most CG models, the interaction between pseudo-atoms through bonds of less than 5 is described as follows:

Extension potential between two beads

116 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

numbers of atoms involved in the systems were 130,545 for 2GJ6 and 127,023 for 3PWP. Calculations were performed with Sander of AMBER 11 for 5 ns. The *G*elsolv, which is always largely negative in each case, represents the electrostatic energy contribution due to solvents. The *G*np is the hydrophobic and van der Waals contributions were calculated from the solvent accessible surface area (SA). The difference of the PB and GB results of each case is 3-4%, although the total binding free energy, *dG*bind, differs by almost 20% because the binding energies *in vacuo*, *dG*gas, and the contribution of the solvent, *dG*solv=*dG*elsolv+*dG*npsolv, have a different sign and cancel each other. We must note that the ratio of the SA contributions between GBSA and PBSA is larger than the *E*elsov, although

*G*elsolv -7502.3 -7233.6 1.037 -7367.2 -7112.6 1.036 *G*npsolv 193.3691 152.9235 1.2645 188.3416 151.8496 1.2403

*G*elsolv -2512 -2421.1 1.038 -2358.79 -2259.25 1.044 *G*npsolv 73.4152 58.0731 1.2642 71.2015 57.4698 1.2389

*G*elsolv -5340.8 -5151.5 1.037 -5340.08 -5162.4 1.034 *G*npsolv 133.8713 105.4387 1.2697 131.2283 104.4543 1.2563

*dG*elsolv 350.479 338.916 1.034 331.6509 309.0868 1.073 *dG*npsolv -13.9175 -10.5883 1.314 -14.0882 -10.0745 1.398 *dE*el\_solv+*dE*SA 336.561 328.328 1.025 317.563 299.012 1.062

*dG*gas -388.76 -388.76 1 -380.55 -380.55 1 *dG*solv 336.561 328.328 1.025 317.563 299.012 1.062 *dG*bind -52.199 -60.432 0.864 -62.99 -81.54 0.773 Table 1. A comparison of the binding electrostatic free energies of the PBSA and GB methods for two TCR-pMHC complexes (PDB ID: 2gj6 and 2pwp). The *G*elsolv, which is always largely negative in each case, represents the electrostatic energy contribution due to solvents. The *G*npsolv is the hydrophobic and van der Waals contributions are calculated from the solvent accessible surface area (SA). The total binding free energy, *dG*bind, is the

sum of the binding energy *in vacuo*, *dG*gas, and the contribution of the solvent, *dG*solv=*dG*elsolv+*dE*SA. All energies in the table are given in kcal mol-1.

Even though all-atom simulations provide the most detailed information about the system of interest, its calculation costs are quite high. A system containing a large protein molecule such as several 105 to 106 Dalton comes up to several 105 atoms when solvated in explicit water molecules, and expands to nm3 in size; hence, the calculation time of less than sec

GBSA PBSA GB/PB GBSA PBSA GB/PB

the absolute contribution is 1/10th of the *G*elsolv.

Difference = Complex-(Receptor + Ligand)

**3. Coarse-grained (CG) simulation 3.1 The limits of all-atom simulations** 

Complex

Ligand

Receptor protein

2gj6 3pwp

$$
\Delta L^{\text{bond}} = \frac{1}{2}k \left(r - r\_0\right)^2 \tag{10}
$$

Angle potential between three beads

$$
\Delta U^{angle} = \frac{1}{2}k\left(\theta - \theta\_0\right)^2\tag{11}
$$

Dihedral angle between four beads

$$\mathcal{U}^{\text{dibedral}} = \frac{1}{2}k \left[1 - \cos\left(n\phi - \phi\_0\right)\right] \tag{12}$$

And, the unbound potential between two beads can be expressed as

$$\mathcal{U}^{Lf} = 4\varepsilon \left[ \left( \frac{\sigma}{r} \right)^{12} - \left( \frac{\sigma}{r} \right)^{6} \right] \tag{13}$$

The whole energy of the system is described as the combination of these elemental potentials. For example, the Head-Gordon et al. model (Brown, Fawzi et al. 2003) is described as:

$$\begin{split} H &= \sum\_{\theta} \frac{1}{2} K\_{\theta} \left( \theta - \theta\_{0} \right)^{2} + \sum\_{\phi} \left[ A \left( 1 + \cos \phi \right) + B \left( 1 - \cos \phi \right) + C \left( 1 + \cos 3\phi \right) + D \left( 1 + \cos \left[ \phi + \frac{\pi}{4} \right] \right) \right] \\ &+ \sum\_{i, j \ge i+3} 4 \varepsilon\_{\theta i} S\_{1} \left[ \left( \frac{\sigma}{r\_{ij}} \right)^{12} - S\_{s} \left( \frac{\sigma}{r\_{ij}} \right)^{6} \right] \end{split} \tag{14}$$

where,, and i, j are summed for all the AAs contained in the peptide. The interaction between non-bonded pseudo-atoms is usually described as the Lennard-Jones potential. The methods for configuration sampling, usually MD (Shih, Arkhipov et al. 2006) and the Monte Carlo simulation (Levy, Karplus et al. 1980; Horejs, Mitra et al. 2011), are the same as those used in all-atom simulations. The equation of motions for MD is principally the same as that used in all-atom simulations, i.e.,

$$
\rho m \frac{d^2 \vec{r}}{dt^2} = \vec{F} - \Gamma \frac{d\vec{r}}{dt} + \vec{\mathcal{W}} \tag{15}
$$

where *F* , , and *W* are external force, friction and thermal noise, respectively. Any modification is made according to the kind of ensemble adopted.

#### **3.2 The difference of CG models between proteins and other molecules**

As might be guessed by Fig.2A and B, it is easier to treat a homopolymer by the CG model than to treat a polypeptide or a protein. A homopolymer can be described with rather a few parameters and, under certain circumstances, several components can be coarse-grained as a pseudo-atom (4 styrenes in a dotted circle are treated as one bead). Rheological features such as phase-transition, diffusion coefficient, compressibility, ductility, elasticity and viscosity have been reproduced fairly well (Yaoita, Isaki et al. 2008; Harmandaris and Kremer 2009; Kalra and Joo 2009; Posel, Lísal et al. 2009). On the other hand, peptides and proteins consist of diversified 20 AAs and the particular functions of proteins such as specific binding and enzymatic functions are based on a unique configuration of those characteristic AAs. Therefore, to evaluate the interaction on the CG model is especially difficult due to the effect of averaging specific properties and the anisotropicity of components. Notwithstanding this state of affairs, some CG models have come to predict the docking and binding of proteins fairly well. In this section, representative protein CG models are reviewed and the application of the CG model to the evaluation of TCR-pMHC interaction is foreseen.

#### **3.3 Representative CG models**

#### **3.3.1 The one-bead model**

Many one-bead models (Taketomi, Ueda et al. 1975; Brown, Fawzi et al. 2003; Jang, Hall et al. 2004) can be deemed as descendants of the Go-model. Go-like models, even though extremely simplified in their format, principally succeeded in reproducing several aspects of protein folding. This is presumably due to the finding that the protein-folding rate and mechanism are largely determined by a protein's topology rather than its inter-atomic interaction (Baker 2000). Those descendant models have equipped their own features, but still have a tendency towards a reference configuration. This might be due to the difficulty of incorporating the geometric and physicochemical aspects of all the AAs in only a few parameters. Recently, the finding that the underlying physicochemical principles of the interaction between the domains in protein folding are similar to those between the binding sites of protein assembly has been accepted (Haliloglu, Keskin et al. 2005; Levy, Cho et al. 2005; Turjanski, Gutkind et al. 2008; Baxter, Jennings et al. 2011). This fact will probably provide another aspect of the application of the CG model to issues of protein-binding.

Miyazawa and Jernigan (MJ) extracted inter-residue potentials from the crystallography of 1168 proteins (Miyazawa and Jernigan 1996). The principle adopted in this method is that the number of residue-residue contacts observed in a large number of protein crystals will represent the actual intrinsic inter-residue interactions. Namely, to regard the effect (contacts in the observed structure) in the same light as the cause (interaction energy) based on "the principle of structural consistency" or "the principle of minimal frustration".

*d r dr mF W dt dt* 

As might be guessed by Fig.2A and B, it is easier to treat a homopolymer by the CG model than to treat a polypeptide or a protein. A homopolymer can be described with rather a few parameters and, under certain circumstances, several components can be coarse-grained as a pseudo-atom (4 styrenes in a dotted circle are treated as one bead). Rheological features such as phase-transition, diffusion coefficient, compressibility, ductility, elasticity and viscosity have been reproduced fairly well (Yaoita, Isaki et al. 2008; Harmandaris and Kremer 2009; Kalra and Joo 2009; Posel, Lísal et al. 2009). On the other hand, peptides and proteins consist of diversified 20 AAs and the particular functions of proteins such as specific binding and enzymatic functions are based on a unique configuration of those characteristic AAs. Therefore, to evaluate the interaction on the CG model is especially difficult due to the effect of averaging specific properties and the anisotropicity of components. Notwithstanding this state of affairs, some CG models have come to predict the docking and binding of proteins fairly well. In this section, representative protein CG models are reviewed and the application of the CG model to the evaluation of TCR-pMHC

Many one-bead models (Taketomi, Ueda et al. 1975; Brown, Fawzi et al. 2003; Jang, Hall et al. 2004) can be deemed as descendants of the Go-model. Go-like models, even though extremely simplified in their format, principally succeeded in reproducing several aspects of protein folding. This is presumably due to the finding that the protein-folding rate and mechanism are largely determined by a protein's topology rather than its inter-atomic interaction (Baker 2000). Those descendant models have equipped their own features, but still have a tendency towards a reference configuration. This might be due to the difficulty of incorporating the geometric and physicochemical aspects of all the AAs in only a few parameters. Recently, the finding that the underlying physicochemical principles of the interaction between the domains in protein folding are similar to those between the binding sites of protein assembly has been accepted (Haliloglu, Keskin et al. 2005; Levy, Cho et al. 2005; Turjanski, Gutkind et al. 2008; Baxter, Jennings et al. 2011). This fact will probably provide another aspect of the application of the CG model to issues of protein-binding.

Miyazawa and Jernigan (MJ) extracted inter-residue potentials from the crystallography of 1168 proteins (Miyazawa and Jernigan 1996). The principle adopted in this method is that the number of residue-residue contacts observed in a large number of protein crystals will represent the actual intrinsic inter-residue interactions. Namely, to regard the effect (contacts in the observed structure) in the same light as the cause (interaction energy) based on "the principle of structural consistency" or "the principle of minimal frustration".

(15)

are external force, friction and thermal noise, respectively. Any

2 2

**3.2 The difference of CG models between proteins and other molecules** 

modification is made according to the kind of ensemble adopted.

where *F* 

, , and *W*

interaction is foreseen.

**3.3 Representative CG models** 

**3.3.1 The one-bead model** 

Homopolymer such as polystytene can be described with rather a few parameters, and in some cases, several units are mapped to one bead (A). Protein consists of heterogenious components, hence more detailed and complicated description (B). Main chain is represented by C and each side chain is mapped to one bead, which retains its original geometiric and physico-chemical features (Liwo, Pincus et al. 1993) (C). MARTINI force field maps more beads to a side chain (Marrink, Monticelli et al. 2008), enabled to simulate the release of inner water molecules through stress-sensitive channel enbedded in vesicle membrane (Louhivouri, Lisselada et al. 2008) (D). OPEP model all the atoms of main chain and maps one bead for side chain. As can be guessed, this models is suitable for dealing with the issues where backbone structure such as -helix and -sheet play essential roles (Chebaro et al. 2009, Laghaei et al. 2011, Nasica-Labouze et al. 2011) (E).

Adopting this model for the parameters of LJ potentials, Kim and Hummer constructed a one-bead model combined with the Debye-Hückel type potential and performed configuresampling on replica exchange MC - applied to ubiquitin binding - and obtained good agreement with other experiments (Kim and Hummer 2008). Chakraborty's group applied an MJ matrix to estimate TCR-pMHC and explained the effect of the HLA class I haplotype on TCR repertoire-formation (Kosmrlj, Read et al. 2010). The above mentioned CG models are tabulated in Table 2.

### **3.3.2 UNRES**

Scheraga's group described a CG model which consists of a C, side chain centroid (SC) and one dihedral angle (Liwo, Pincus et al. 1993). They searched the conformation space on this model with compactness of the protein as an indicator. The obtained structure was then decoded into an all-atom-backbone with the SC model and then searched further for the lowest-energy structure. Finally, an all-atom model was reconstructed from the obtained structure and searched for the lowest-energy structure on an electrostatically driven Monte Carlo (EDMC) simulation based on the ECEPP/2 potential. They succeeded in predicting *ab initio* the moderate size of proteins (53-235 residues) (Oldziej, Czaplewski et al. 2005). This hybrid method - the sampling of a configuration on a CG model and the estimation of binding energy on an atomistic model - presents quite a reasonable combination of efficiency and accuracy. Their recent accomplishment was a 1 msec simulation of more than 500 AA proteins through massive parallelisation (Scheraga, Maisuradze et al. 2010).

### **3.3.3 ATTRACT**

Zacharias described a docking method of protein-protein or protein-ligand using a reduced protein model and docking algorithm, ATTRACT (Zacharias 2003). An AA is represented with 2 to 3 (Zacharias 2003) or 2 to 4 (Zacharias and Fiorucci 2010) pseudo-atoms and the interactions of specific pseudo-atom pairs, including their size and physicochemical characters, are interweaved into the parameters of the Lennard-Jones potential. ATTRACT assumes that both interacting molecules are rigid, smaller molecules is tried to dock from thousands of sites with 6 degrees of freedom, 3-translational and 3-rotational. Docking includes the minimisation of side chains described as rotamer, hence total minimisation is performed. They applied this CG model and ATTRACT to the Critical Assessment of Prediction of Interest (CAPRI) (Janin 2002) and showed two acceptable bindings out of 6 targets (May and Zacharias 2007) or else obtained better (4 out of 6 targets) prediction by improving the scoring function and docking method (Zacharias and Fiorucci 2010). The estimation of TCR and pMHC binding not only deals with the binding energy of a predetermined configuration, but also deals with the determination of the bindingconfiguration, because the TCR-pMHC complex has several binding modes (Wucherpfennig, Call et al. 2009). They showed that it is possible to uncover a binding site by using an electrostatic desolvation profile (Zacharias and Fiorucci 2010) based on ODA method (Fernandez-Recio, Totrov et al. 2005).

### **3.3.4 The MARTINI force field**

The MARTINI force field was originally devised for describing lipids or surfactants, such as dipalmitoylphosphatidylcholine (DPPC), dicapryloyl-PC (DCPC), dodecylphosphocholine


Adopting this model for the parameters of LJ potentials, Kim and Hummer constructed a one-bead model combined with the Debye-Hückel type potential and performed configuresampling on replica exchange MC - applied to ubiquitin binding - and obtained good agreement with other experiments (Kim and Hummer 2008). Chakraborty's group applied an MJ matrix to estimate TCR-pMHC and explained the effect of the HLA class I haplotype on TCR repertoire-formation (Kosmrlj, Read et al. 2010). The above mentioned CG models

and one dihedral angle (Liwo, Pincus et al. 1993). They searched the conformation space on this model with compactness of the protein as an indicator. The obtained structure was then decoded into an all-atom-backbone with the SC model and then searched further for the lowest-energy structure. Finally, an all-atom model was reconstructed from the obtained structure and searched for the lowest-energy structure on an electrostatically driven Monte Carlo (EDMC) simulation based on the ECEPP/2 potential. They succeeded in predicting *ab initio* the moderate size of proteins (53-235 residues) (Oldziej, Czaplewski et al. 2005). This hybrid method - the sampling of a configuration on a CG model and the estimation of binding energy on an atomistic model - presents quite a reasonable combination of efficiency and accuracy. Their recent accomplishment was a 1 msec simulation of more than

500 AA proteins through massive parallelisation (Scheraga, Maisuradze et al. 2010).

Zacharias described a docking method of protein-protein or protein-ligand using a reduced protein model and docking algorithm, ATTRACT (Zacharias 2003). An AA is represented with 2 to 3 (Zacharias 2003) or 2 to 4 (Zacharias and Fiorucci 2010) pseudo-atoms and the interactions of specific pseudo-atom pairs, including their size and physicochemical characters, are interweaved into the parameters of the Lennard-Jones potential. ATTRACT assumes that both interacting molecules are rigid, smaller molecules is tried to dock from thousands of sites with 6 degrees of freedom, 3-translational and 3-rotational. Docking includes the minimisation of side chains described as rotamer, hence total minimisation is performed. They applied this CG model and ATTRACT to the Critical Assessment of Prediction of Interest (CAPRI) (Janin 2002) and showed two acceptable bindings out of 6 targets (May and Zacharias 2007) or else obtained better (4 out of 6 targets) prediction by improving the scoring function and docking method (Zacharias and Fiorucci 2010). The estimation of TCR and pMHC binding not only deals with the binding energy of a predetermined configuration, but also deals with the determination of the bindingconfiguration, because the TCR-pMHC complex has several binding modes (Wucherpfennig, Call et al. 2009). They showed that it is possible to uncover a binding site by using an electrostatic desolvation profile (Zacharias and Fiorucci 2010) based on ODA

The MARTINI force field was originally devised for describing lipids or surfactants, such as dipalmitoylphosphatidylcholine (DPPC), dicapryloyl-PC (DCPC), dodecylphosphocholine

, side chain centroid (SC)

Scheraga's group described a CG model which consists of a C

are tabulated in Table 2.

**3.3.2 UNRES** 

**3.3.3 ATTRACT** 

method (Fernandez-Recio, Totrov et al. 2005).

**3.3.4 The MARTINI force field** 

(DPC) and cholesterol (Marrink, de Vries et al. 2004; Marrink, Risselada et al. 2007). The adoption of a very limited atom type and short range potentials provided very efficient computation, hence the micrometer length in scales and milliseconds in time, and succeeded in the simulation of the spontaneous aggregation of DPPC lipids into a bilayer and the formation of DPC in water. The hydrogen atom is neglected in this model. Heavy four atoms on average are represented as one pseudo-atom (four-to-one mapping) with an exception for ringlike molecules. Ringlike molecules are mapped with higher resolution (up to two-to-one mapping). Interaction sites are classified into 4 types: polar (P), nonpolar (N), apolar (C) and charged (Q). Within a main type, subtypes are distinguished either by a letter denoting the hydrogen-bonding capabilities (d = donor, a = acceptor, da= both, 0 = none) or by a number indicating the degree of polarity (from 1 = lower polarity to 5 = higher polarity). The interaction of each atom-type was parameterised at five levels: attractive (e = 5 kJ/mol), semi-attractive (e = 4.2 kJ/mol), intermediate (e = 3.4 kJ/mol), semi-repulsive (e = 2.6 kJ/mol) and repulsive ( e = 1.8 kJ/mol). Non-bonded interactions between the interaction sites i and j are described by the Lennard-Jones potential:

$$\mathcal{U}I\_{Lf}\left(r\right) = 4\varepsilon\_{ij}\left[\left(\frac{\sigma\_{ij}}{r}\right)^{12} - \left(\frac{\sigma\_{ij}}{r}\right)^{6}\right] \tag{16}$$

with ij representing the effective minimum distance of approach between two particles and ij representing the strength of their interaction. This model was extended to deal with proteins (Marrink, Monticelli et al. 2008). The basic parameters are the same as used in the lipid model. Bonded interaction is described by the following set of potential energy functions acting between the bonded sites i, j, k, and l with an equilibrium distance db, an angle a, and a dihedral angle i and id:

$$V\_b = \frac{1}{2} K\_b \left( d\_{ij} - d\_b \right)^2 \tag{17}$$

$$V\_a = \frac{1}{2} K\_a \left[ \cos \left( \varphi\_{ijk} \right) - \cos \left( \varphi\_a \right) \right]^2 \tag{18}$$

$$V\_d = K\_d \left[ 1 + \cos \left( n\nu\_{ijkl} - \varphi\_d \right) \right] \tag{19}$$

$$V\_{id} = \mathcal{K}\_d \left(\boldsymbol{\wp}\_{ijkl} - \boldsymbol{\wp}\_{id}\right)^2 \tag{20}$$

where *V*b, *V*a, *V*d and *V*id represent potential sites for bonding, stiffness, dihedral angle and improper dihedral angle, respectively. The total energy of the system is obtained by summing (17) to (20). The mapping of all AAs is mapped into 4 types of beads or a combination of them. In this mapping, Leu, Pro, Ile, Val, Cys and Met are classified as apolar (C-type), where as Thr, Ser, Asn and Gln are polar (P-type). Glu and Asp are charged (Q-type), and Arg and Lys are modelled by a combination of a Q and an uncharged particle (N-type). The bulky ring-based side chains are modelled by three (His, Phe, and Tyr) or four (Trp) beads. Gly and Ala residues are only represented by the backbone particle. The type of

(DPC) and cholesterol (Marrink, de Vries et al. 2004; Marrink, Risselada et al. 2007). The adoption of a very limited atom type and short range potentials provided very efficient computation, hence the micrometer length in scales and milliseconds in time, and succeeded in the simulation of the spontaneous aggregation of DPPC lipids into a bilayer and the formation of DPC in water. The hydrogen atom is neglected in this model. Heavy four atoms on average are represented as one pseudo-atom (four-to-one mapping) with an exception for ringlike molecules. Ringlike molecules are mapped with higher resolution (up to two-to-one mapping). Interaction sites are classified into 4 types: polar (P), nonpolar (N), apolar (C) and charged (Q). Within a main type, subtypes are distinguished either by a letter denoting the hydrogen-bonding capabilities (d = donor, a = acceptor, da= both, 0 = none) or by a number indicating the degree of polarity (from 1 = lower polarity to 5 = higher polarity). The interaction of each atom-type was parameterised at five levels: attractive (e = 5 kJ/mol), semi-attractive (e = 4.2 kJ/mol), intermediate (e = 3.4 kJ/mol), semi-repulsive (e = 2.6 kJ/mol) and repulsive ( e = 1.8 kJ/mol). Non-bonded interactions between the interaction sites i and j are described by the Lennard-Jones

12 6

 

*V Kd d b b ij b* (17)

*<sup>a</sup>* (18)

*ijkl d* (19)

(20)

(16)

 

angle a, and a dihedral angle i and id:

<sup>4</sup> *ij ij U r LJ ij r r* 

with ij representing the effective minimum distance of approach between two particles and ij representing the strength of their interaction. This model was extended to deal with proteins (Marrink, Monticelli et al. 2008). The basic parameters are the same as used in the lipid model. Bonded interaction is described by the following set of potential energy functions acting between the bonded sites i, j, k, and l with an equilibrium distance db, an

<sup>1</sup> <sup>2</sup>

 <sup>1</sup> <sup>2</sup> cos cos

   

 

2

*V K id d ijkl id* 

where *V*b, *V*a, *V*d and *V*id represent potential sites for bonding, stiffness, dihedral angle and improper dihedral angle, respectively. The total energy of the system is obtained by summing (17) to (20). The mapping of all AAs is mapped into 4 types of beads or a combination of them. In this mapping, Leu, Pro, Ile, Val, Cys and Met are classified as apolar (C-type), where as Thr, Ser, Asn and Gln are polar (P-type). Glu and Asp are charged (Q-type), and Arg and Lys are modelled by a combination of a Q and an uncharged particle (N-type). The bulky ring-based side chains are modelled by three (His, Phe, and Tyr) or four (Trp) beads. Gly and Ala residues are only represented by the backbone particle. The type of

2

*VK n d d* 1 cos

2 *V K a a ijk*

potential:

the backbone particle depends on the protein secondary structure; free in solution or in a coil or bend, the backbone has a strong polar character (P-type); as part of helix or strand, the interbackbone hydrogen bonds reduce the polar character significantly (N-type). Proline is less polar due to the lack of hydrogen-donor capabilities. More detailed geometrical representation is given in Fig.2 D, illustrating the binding distance, angle, dihedral angle, improper angle and bead configuration. This CG protein model contains directional specificity and heterogeneity in side chains to some extent, hence a feature of a secondary structure (-helix and -strand) and the gross physicochemical property, such as being charged, hydrophilic and hydrophobic. They succeeded in the partitioning of AAs in the DOPC bilayer, keeping the AA association (Leu-Leu, Lys-Glu) constant in water, the portioning and orientation of pentapeptides at the border of the water and cyclohexane. The tilt and orientation of hexapeptides in the DOPC bilayer is also reproduced after sub- sec to sec MD simulation on GROMACS software (van Der Spoel, Lindahl et al. 2005). They recently accomplished the simulation of the rapid release of content from a pressurised liposome through a particular mechano-sensitive protein channel, MscL, embedded in the liposomal membrane (Louhivuori, Risselada et al. 2010). The behaviour of this tiny functional organelle, which consists of 5 MscL molecules, 2108 DOPC lipids, 5,444 water beads with an additional 54,649 water beads forming a 4 m layer around the vesicle, was described in almost atomistic detail. In response to the increase of internal pressure, this vesicle released water molecules by opening the Mscl channel. MD was performed for 40 s, which corresponds to 160 s in an all-atom model. This model demonstrated that CG-MD provides for the computer-aided design of super-molecules and organelles of a practically usable size.

#### **3.3.5 The optimised potential for efficient peptide-structure representation (OPEP) model**

OPEP is, as shown in Fig.2 E, a CG protein model that uses a detailed representation of all backbone atoms (N, H, C, C and O) and reduces each side chain to one single bead with appropriate geometrical parameters and physicochemical properties (Derreumaux and Forcellino 2001). The OPEP energy function, which includes the implicit effects of an aqueous solution - expressed as (21) - is formulated as a sum of local potentials (*E*local), nonbonded potential (*E*nonbonded), and hydrogen-bonding potential (*E*H-bond):

$$E = E\_{local} + E\_{nonbonded} + E\_{H-bound} \tag{21}$$

Local potentials are expressed by:

$$E\_{\text{local}} = w\_b \sum\_{\text{bonds}} K\_b (r - r\_{\text{eq}})^2 + w\_a \sum\_{\text{angles}} K\_a (\alpha - \alpha\_{\text{eq}})^2 + w\_\Omega \sum\_{\text{imp-transions}} k\_\Omega (\Omega - \Omega\_{\text{eq}})^2 + w\_{\phi, \nu} (\sum\_{\phi} E\_\phi + \sum\_{\nu} E\_\nu) \tag{22}$$

*K*b, *K*a, and *K* represent force constants associated with changes in bond length, the bond angles of all particles and force constants related to changes in improper torsions of the side chains. The dihedral potentials associated with N-Cassociated are expressed as (23) and C-C expressed as as (24), respectively:

$$E\_{\phi} = k\_{\phi\psi} \left(\phi - \phi\_0\right)^2 \tag{23}$$

$$E\_{\psi} = k\_{\phi\psi} \left(\psi - \psi\_0\right)^2 \tag{24}$$

The non-bonded functions are expressed by:

$$\begin{aligned} E\_{\text{nonbonded}} &= w\_{1,4} \sum\_{1,4} E\_{\text{VdV}} + w\_{\text{Ca},\text{Ca}} \sum\_{\text{Ca},\text{Ca}} E\_{\text{VdV}} + w\_{1>4} \sum\_{M^\circ, M^\circ} E\_{\text{VdV}} + \\ w\_{1>4} \sum\_{M^\circ, \text{Ca}} E\_{\text{VdV}} + w\_{1>4} \sum\_{M, \text{SC}} E\_{\text{VdV}} + \sum\_{\text{Sc, Sc}} w\_{\text{Sc, Sc}} E\_{\text{VdV}} \end{aligned} \tag{25}$$

which includes all the interaction works through more than 3-bonds, and all these functions are expressed as Van der Waals potentials, as shown in (11):

$$E\_{\rm ValV} = \varepsilon\_{ij} \left( \left( \frac{r\_{ij}^0}{r\_{ij}} \right)^{12} - 2 \left( \frac{r\_{ij}^0}{r\_{ij}} \right)^6 \right) H(\varepsilon\_{ij}) - \varepsilon\_{ij} \left( \frac{r\_{ij}^0}{r\_{ij}} \right)^6 H(-\varepsilon\_{ij}) \tag{26}$$

Here, the Heavyside function *H*(x) = 1 if x >= 0 and 0 of x < 0, rij is the distance between particles i and j, 0 00 ( )/2 *ij i j r rr* with <sup>0</sup> *ir* as the Van der Waals radius of particle i.

The hydrogen-bonding potential (*E*H-bond) consists of two-body and three-body terms (Derreumaux, Maupetit et al. 2007).

This model was originally devised for predicting the structure and folding of proteins (Derreumaux 1999; Derreumaux and Forcellino 2001) and, by combining a Monte Carlo simulation, fairly succeeded in prefiguring basic supersecondary structures. This model, containing all the protein-backbone components, excels in issues where secondarystructure features play an essential role. They combined this potential with MD, which resulted in reproducing the aggregation of Alzheimer's A16-22, (Derreumaux and Mousseau 2007; Wei, Song et al. 2008). In adopting the sampling of Replica Exchange MD (REMD), they obtained an accurate structural description of Alzheimer's Amyloid-, hairpin and Trp-cage peptides (Derreumaux, Chebaro et al. 2009; Derreumaux, Chebaro et al. 2009). A detailed atomic characterisation of oligomer-formation was obtained by combining OPEP, the atomistic model and REMD (Nasica-Labouze, Meli et al. 2011) Their reduced model on REMD enabled the calculation of several tens of sec in 40 replicas and the full assessment of convergence to the equilibrium ensemble, demonstrating the probability of determining the thermodynamic features of large proteins and assemblies (Laghaei, Mousseau et al. 2011).

As was mentioned above, the main CG models are tabulated in Table 2.

#### **3.4 The trial for the TCR-pMHC and larger systems**

At the starting point of the whole immunological synapse (IS) simulation, Wan, Flower and Coveny constructed a ternary complex of TCR-pMHC-CD4 between opposite membranes which consists of 329,265 atoms - and performed molecular dynamics for 10 ns on 128 processors of SGI Altix (Wan, Flower et al. 2008). It took 23 hours for one ns simulation. This run was not enough to calculate the binding free-energy by MM/PBSA due to the shortness of the simulation time and the lack of entropy evaluation. They intended to simulate a system consisting of four sets of the TCR-pMHC-CD4 complex, made up of about one

<sup>0</sup> *E k* ( )

1,4 , 1 4 1,4 , ', '

which includes all the interaction works through more than 3-bonds, and all these functions

*VdW ij ij ij ij ij ij ij rr r E H H rr r*

Here, the Heavyside function *H*(x) = 1 if x >= 0 and 0 of x < 0, rij is the distance between

The hydrogen-bonding potential (*E*H-bond) consists of two-body and three-body terms

This model was originally devised for predicting the structure and folding of proteins (Derreumaux 1999; Derreumaux and Forcellino 2001) and, by combining a Monte Carlo simulation, fairly succeeded in prefiguring basic supersecondary structures. This model, containing all the protein-backbone components, excels in issues where secondarystructure features play an essential role. They combined this potential with MD, which resulted in reproducing the aggregation of Alzheimer's A16-22, (Derreumaux and Mousseau 2007; Wei, Song et al. 2008). In adopting the sampling of Replica Exchange MD (REMD), they obtained an accurate structural description of Alzheimer's Amyloid-, hairpin and Trp-cage peptides (Derreumaux, Chebaro et al. 2009; Derreumaux, Chebaro et al. 2009). A detailed atomic characterisation of oligomer-formation was obtained by combining OPEP, the atomistic model and REMD (Nasica-Labouze, Meli et al. 2011) Their reduced model on REMD enabled the calculation of several tens of sec in 40 replicas and the full assessment of convergence to the equilibrium ensemble, demonstrating the probability of determining the thermodynamic features of large proteins and assemblies

At the starting point of the whole immunological synapse (IS) simulation, Wan, Flower and Coveny constructed a ternary complex of TCR-pMHC-CD4 between opposite membranes which consists of 329,265 atoms - and performed molecular dynamics for 10 ns on 128 processors of SGI Altix (Wan, Flower et al. 2008). It took 23 hours for one ns simulation. This run was not enough to calculate the binding free-energy by MM/PBSA due to the shortness of the simulation time and the lack of entropy evaluation. They intended to simulate a system consisting of four sets of the TCR-pMHC-CD4 complex, made up of about one

 

 

<sup>12</sup> <sup>6</sup> <sup>6</sup> 00 0 2 ( ) *ij ij ij*

*ir* as the Van der Waals radius of particle i.

*nonbonded VdW C C VdW VdW*

*E wE w E w E*

 

*VdW VdW Sc Sc VdW*

 

1 4 1 4 , ', , ,

*w E w E wE*

*M C M SC Sc Sc*

As was mentioned above, the main CG models are tabulated in Table 2.

**3.4 The trial for the TCR-pMHC and larger systems** 

The non-bonded functions are expressed by:

particles i and j, 0 00 ( )/2 *ij i j r rr* with <sup>0</sup>

(Derreumaux, Maupetit et al. 2007).

(Laghaei, Mousseau et al. 2011).

are expressed as Van der Waals potentials, as shown in (11):

 2

*C C M M*

(25)

  (26)

(24)

million atoms. They pointed out the difficulty of the whole IS simulation on the all-atom model due to the too heavy load imposed upon the computer, and pointed out the feasability of adopting the hybrid atomistic/CG simulation for accomplishing the project (Diestler, Zhou et al. 2006).

At present, there have been only very limited trials of evaluated TCR-pMHC binding energy by the CG model. The evaluation of TCR-pMHC binding consists of at least three steps: 1) to determine the binding site, 2) to determine the binding configuration, and 3) to calculate the binding energy. Several works have provided not only the method to determine the binding configuration but also to detect the binding site from the surface nature of its own (Fernandez-Recio, Totrov et al. 2005; Burgoyne and Jackson 2006; Fiorucci and Zacharias 2010). The factors that concern the evaluation of TCR-pMHC binding are: 1) the evaluation of energy from a particular configuration, and 2) the sampling of independent configurations. In most CG models, the calculation of binding energy as the function of the configuration is based on their own parameters (Liwo, Pincus et al. 1993; Miyazawa and Jernigan 1996; Derreumaux 1999; Zacharias 2003; Buchete, Straub et al. 2004; Oldziej, Czaplewski et al. 2005; Zhou, Thorpe et al. 2007; Kim and Hummer 2008; Marrink, Monticelli et al. 2008). The sampling of independent configurations is most time-consuming but critically important process. If the sampling on the CG model reflects the distribution of the atomistic model with reasonable fidelity, it is quite a smart way to sample configurations on a CG-model (Chebaro, Dong et al. 2009), to reconstruct to the atom-scale the structure and then calculate the binding energy on these reconstructed atomistic structures using MM/PBSA. From this point of view, a general method to reconstruct the all-atom from the C atom position, RACOGS, was devised and the energy landscapes of both the CG- and the all-atom-model were shown to be quite similar, suggesting the validity of this principle (Heath, Kavraki et al. 2007).

### **4. Application of GPGPU in molecular dynamics**

As mentioned above, all-atom simulation is very expensive, and hence is restricted scope in both time and scale. There have been attempts to breakthrough these circumstances, not only by improving the algorithm but also by devising novel hardware. Special purpose machines for MD have been developed (Susukita, Ebisuzaki et al. 2003; Shaw, Deneroff et al. 2008) and showed fairly good performance (Kikugawa, Apostolov et al. 2009). However, such purpose-specific machines are very expensive and their continuous development is difficult. The recent development of the general purpose graphic processor unit (GPGPU) has had much influence on high performance computing (Giupponi, Harvey et al. 2008). In 2011, three of the top 5 super-computers are constructed mainly on NVIDIA's GPGPU (http://www.top500.org/). Many applications are now being preparing to respond to this momentum, and representative molecular dynamics software such as Amber, CHARM, GROMACS and NAMDA are now being prepared to equip programs working on GPGPUs. Recent representative GPGPUs, such as Tesla C2075, have a performance of 1.03 T Flops on single precision. We calculated the binding energy of two TCR-pMHC complexes, 2GJ6 and 3PWP, on C2075 and compared the results calculated on a Xeon processor. After heating, density-equilibration and equilibration, product runs were performed for 10 runs, corresponding to 5 nsec in total. The results are shown in Table 3. As can be seen from the table, the performance of a Tesla C2075 is about 40 to 50 cores of a present Xeon CPU. The obtained values are 8% ~ 30% larger by GPGPU than that by CPU. This may be due to the difference of detailed algorithm adopted for the calculations using CPU and GPGPU.


Table 3. Comparison of CPU and GPGPU All energies in the table are given in kcal mol-1.

### **5. Conclusion**

Physically meaningful models are rapidly advancing and are being applied to large macromolecular systems with the rapid evolution of parallel computation and hardware, such as multi-core processors and GPGPUs. Although the exact models become realistic for calculations of large bio-molecules, continuum dielectric models are still useful for the binding free energy calculation and bound complex structure prediction as well as the structure prediction tasks of bio-molecules such as proteins and nucleic acids, etc., because of the high cost performance and fairly good accuracy. In future, hybrid approaches will become promising, where QM model, the all-atom model, the CG model and continuum models are combined with a good conformational sampling technique such as the ER method, and we can choose the optimal hybrid approach according to purpose and the system size.

It has been clear that the calculation of TCR-pMHC binding energy with reasonable efficiency and accuracy is feasible. MMPBSA/GBSA seems quite promising. The sampling method affects both the efficiency and accuracy of the calculation. The combination of sampling on the CG model and energy-calculation on the atomistic model is very reasonable approaches. GPGPUs will be quite important facilities. A combination of those factors will provide for the valid simulation of biologically interesting phenomena for an adequately long time.

### **6. Acknowledgment**

This work was partially supported by JSPS KAKENHI Grant Number 22590194 for HT.

### **7. References**

Asakura, S. & Oosawa, F. (1954). "On interaction between two bodies immersed in a solution of macromolecules." *Journal of Chemical Physics* 22: 1255-1256.

Baker, D. (2000). "A surprising simplicity to protein folding." *Nature* 405(6782): 39-42.

C2075 is about 40 to 50 cores of a present Xeon CPU. The obtained values are 8% ~ 30% larger by GPGPU than that by CPU. This may be due to the difference of detailed

Xeon E5620 (4 core) 2GJ6 35.7 hours GB -50.3 ± 8.7

Tesla C2075 2GJ6 3.09 hour GB -53.8 ± 7.64

Table 3. Comparison of CPU and GPGPU All energies in the table are given in kcal mol-1.

Physically meaningful models are rapidly advancing and are being applied to large macromolecular systems with the rapid evolution of parallel computation and hardware, such as multi-core processors and GPGPUs. Although the exact models become realistic for calculations of large bio-molecules, continuum dielectric models are still useful for the binding free energy calculation and bound complex structure prediction as well as the structure prediction tasks of bio-molecules such as proteins and nucleic acids, etc., because of the high cost performance and fairly good accuracy. In future, hybrid approaches will become promising, where QM model, the all-atom model, the CG model and continuum models are combined with a good conformational sampling technique such as the ER method, and we can choose the optimal hybrid approach according to purpose and the

It has been clear that the calculation of TCR-pMHC binding energy with reasonable efficiency and accuracy is feasible. MMPBSA/GBSA seems quite promising. The sampling method affects both the efficiency and accuracy of the calculation. The combination of sampling on the CG model and energy-calculation on the atomistic model is very reasonable approaches. GPGPUs will be quite important facilities. A combination of those factors will provide for the valid simulation of biologically interesting phenomena for an adequately

This work was partially supported by JSPS KAKENHI Grant Number 22590194 for HT.

of macromolecules." *Journal of Chemical Physics* 22: 1255-1256. Baker, D. (2000). "A surprising simplicity to protein folding." *Nature* 405(6782): 39-42.

Asakura, S. & Oosawa, F. (1954). "On interaction between two bodies immersed in a solution

PDB ID Product run /run *H*

3PWP 34.9 hours GB -51.3 ± 8.9

3PWP 3.03 hours GB -61.5± 6.8

PB -54.1 ± 11.8

PB -60.1 ± 11.2

PB -63.4 ± 12.5

PB -78.0 ± 10.9

algorithm adopted for the calculations using CPU and GPGPU.

**5. Conclusion**

system size.

long time.

**6. Acknowledgment** 

**7. References** 


Fernandez-Recio, J.; Totrov, M.; Skorodumov, C. & Abagyan, R. (2005). "Optimal docking

Fiorucci, S. & Zacharias, M. (2010). "Prediction of protein-protein interaction sites using

Fogolari, F.; Brigo, A. & Molinari, H. (2002). "The Poisson-Boltzmann equation for

Forsten, K. E.; Kozack, R. E.; Lauffenburger, D. A. & Subramaniam, S. (1994). "Numerical

Gilson, M. K. & Honig, B. H. (1987). "Calculation of electrostatic potentials in an enzyme

Gilson, M. K. & Zhou, H. X. (2007). "Calculation of protein-ligand binding affinities." *Annual* 

Giupponi, G.; Harvey, M. J. & De Fabritiis, G. (2008). "The impact of accelerator processors

Gohlke, H.; Kuhn, L. A. & Case, D. A. (2004). "Change in protein flexibility upon complex

Gregoire, C.; Lin, S. Y.; Mazza, G.; Rebai, N.; Luescher, I. F. & Malissen, B. (1996). "Covalent

Haliloglu, T.; Keskin, O.; Ma, B. & Nussinov, R. (2005). "How similar are protein folding and

Harmandaris, V. A. & Kremer, K. (2009). "Dynamics of Polystyrene Melts through

Harris, R. C.; Bredenberg, J. H.; Silalahi, A. R.; Boschitsch, A. H. & Fenley, M. O. (2011).

Hawkins, G. D.; Cramer, C. J. & Truhlar, D. G. (1996). "Parametrized models of aqueous free

Heath, A. P.; Kavraki, L. E. & Clementi, C. (2007). "From coarse-grain to all-atom: toward

Ho, B. K. & Dill, K. A. (2006). "Folding very short peptides using molecular dynamics." *PLoS* 

Horejs, C.; Mitra, M. K.; Pum, D.; Sleytr, U. B. & Muthukumar, M. (2011). "Monte Carlo

Hierarchical Multiscale Simulations." *Macromolecules* 42(3): 791-802.

a dielectric medium." *J Phys Chem B* 100(51): 19824-19839.

multiscale analysis of protein landscapes." *Proteins* 68(3): 646-661.

electrostatic desolvation profiles." *Biophys J* 98(9): 1921-1930.

*Review of Biophysics and Biomolecular Structure* 36: 21-42.

complex." *Proc Natl Acad Sci U S A* 93(14): 7184-7189.

conserved residues." *Biophys J* 88(3): 1552-1559.

134-143.

*Recognition* 15(6): 377-392.

24): 1052-1058.

*Chem* 156(1): 79-87.

*Comput Biol* 2(4): e27.

*Phys* 134(12): 125103.

337.

system." *J Phys Chem B* 98(21): 5580-5586.

active site." *Nature* 330(6143): 84-86.

area: a new method for predicting protein-protein interaction sites." *Proteins* 58(1):

biomolecular electrostatics: a tool for structural biology." *Journal of Molecular* 

solution of the nonlinear Poisson-Boltzmann equation for a membrane-electrolyte

for high-throughput molecular modeling and simulation." *Drug Discov Today* 13(23-

formation: Analysis of Ras ]Raf using molecular dynamics and a molecular framework approach." *PROTEINS: Structure, Function, and Bioinformatics* 56(2): 322-

assembly of a soluble T cell receptor-peptide-major histocompatibility class I

protein binding nuclei? Examination of vibrational motions of energy hot spots and

"Understanding the physical basis of the salt dependence of the electrostatic binding free energy of mutated charged ligand-nucleic acid complexes." *Biophys* 

energies of solvation based on pairwise descreening of solute atomic charges from

study of the molecular mechanisms of surface-layer protein self-assembly." *J Chem* 


Lamb, M. L.; Tirado-Rives, J. & Jorgensen, W. L. (1999). "Estimation of the binding affinities

Lee, M. C.; Yang, R. & Duan, Y. (2005). "Comparison between Generalized-Born and

Leis, S. & Zacharias, M. (2011). "Efficient inclusion of receptor flexibility in grid-based

Levy, R. M.; Karplus, M. & McCammon, J. A. (1980). "Molecular dynamics studies of NMR

Levy, Y.; Cho, S. S.; Onuchic, J. N. & Wolynes, P. G. (2005). "A survey of flexible protein

Liwo, A.; He, Y. & Scheraga, H. A. (2011). "Coarse-grained force field: general folding

Liwo, A.; Pincus, M. R.; Wawak, R. J.; Rackovsky, S. & Scheraga, H. A. (1993). "Prediction of

Louhivuori, M.; Risselada, H. J.; van der Giessen, E. & Marrink, S. J. (2010). "Release of

Lu, B. Z.; Zhou, Y. C.; Holst, M. J. & McCammon, J. A. (2008). "Recent progress in numerical

Luchko, T.; Gusarov, S.; Roe, D. R.; Simmerling, C.; Case, D. A.; Tuszynski, J. & Kovalenko,

Marrink, S. J.; de Vries, A. H. & Mark, A. E. (2004). "Coarse grained model

Marrink, S. J.; Monticelli, L.; Kandasamy, S. K.; Periole, X.; Larson, R. G. & Tieleman, D. P.

Marrink, S. J.; Risselada, H. J.; Yefimov, S.; Tieleman, D. P. & de Vries, A. H. (2007). "The

Matubayasi, N. & Nakahara, M. (2000). "Theory of solutions in the energetic representation.

Matubayasi, N. & Nakahara, M. (2002). "Theory of solutions in the energy representation. II.

May, A. & Zacharias, M. (2007). "Protein-protein docking in CAPRI using ATTRACT to

I. Formulation." *Journal of Chemical Physics* 113(15): 6070-6081.

account for global and local flexibility." *Proteins* 69(4): 774-780.

structure prediction." *Journal of molecular modeling* 12(1): 101-110.

protein-ligand docking\*." *J Comput Chem* 32(16): 3433-3439.

relaxation in proteins." *Biophys J* 32(1): 628-630.

theory." *Phys Chem Chem Phys* 13(38): 16890-16901.

pancreatic polypeptide." *Protein Sci* 2(10): 1715-1731.

*Communications in Computational Physics* 3(5): 973-1009.

*Chemical Theory and Computation* 4(5): 819-834.

*Chem B* 111(27): 7812-7824.

landscapes." *J Mol Biol* 346(4): 1121-1145.

*Sci U S A* 107(46): 19856-19860.

624.

760.

3616.

*chemistry* 7(5): 851-860.

of FKBP12 inhibitors using a linear response method." *Bioorganic & medicinal* 

Poisson-Boltzmann methods in physics-based scoring functions for protein

binding mechanisms and their transition states using native topology based energy

protein conformation on the basis of a search for compact structures: test on avian

content through mechano-sensitive gates in pressurized liposomes." *Proc Natl Acad* 

methods for the Poisson-Boltzmann equation in biophysical applications."

A. (2010). "Three-dimensional molecular theory of solvation coupled with molecular dynamics in Amber." *Journal of Chemical Theory and Computation* 6(3): 607-

for semiquantitative lipid simulations." *Journal of Physical Chemistry B* 108(2): 750-

(2008). "The MARTINI coarse-grained force field: Extension to proteins." *Journal of* 

MARTINI force field: coarse grained model for biomolecular simulations." *J Phys* 

Functional for the chemical potential." *Journal of Chemical Physics* 117(8): 3605-


Shih, A. Y.; Arkhipov, A.; Freddolino, P. L. & Schulten, K. (2006). "Coarse grained protein-

Sorin, E. J.; Rhee, Y. M.; Nakatani, B. J. & Pande, V. S. (2003). "Insights into nucleic acid

Srinivasan, J.; Trevathan, M. W.; Beroza, P. & Case, D. A. (1999). "Application of a pairwise

Still, W. C.; Tempczyk, A.; Hawley, R. C. & Hendrickson, T. (1990). "Semianalytical

Susukita, R.; Ebisuzaki, T.; Elmegreen, B. G.; Furusawa, H.; Kato, K.; Kawai, A.; Kobayashi,

Takahashi, T.; Nakamura, H. & Wada, A. (1992). "Electrostatic forces in two lysozymes:

Takahashi, T.; Sugiura, J. & Nagayama, K. (2002). "Comparison of all atom, continuum, and

Taketomi, H.; Ueda, Y. & Go, N. (1975). "Studies on protein folding, unfolding and

Tidor, B. & Karplus, M. (1994). "The Contribution of Vibrational Entropy to Molecular

Tsui, V. & Case, D. A. (2000). "Theory and applications of the generalized Born solvation

Turjanski, A. G.; Gutkind, J. S.; Best, R. B. & Hummer, G. (2008). "Binding-induced folding

Van Der Spoel, D.; Lindah, E.; Hess, B.; Groenhof, G.; Mark, A.E. & Berendsen, H.J.C. (2005) "GROMACS: fast, flexible and free." J Comput Chem 26(16): 1701-1718 Wagoner, J. A. & Pande, V. S. (2011). "A smoothly decoupled particle interface:

Wan, S.; Coveney, P. V. & Flower, D. R. (2005). "Molecular basis of peptide recognition by

model in macromolecular simulations." *Biopolymers* 56(4): 275-291.

3684.

909.

8237.

459.

414.

e1000060.

214103.

175(3): 1715-1723.

85(2): 790-803.

*Acta)* 101(6): 426-434.

*American Chemical Society* 112(16): 6127-6129.

*Communications* 155(2): 115-131.

lipid model with application to lipoprotein particles." *J Phys Chem B* 110(8): 3674-

conformational dynamics from massively parallel stochastic simulations." *Biophys J*

generalized Born model to proteins and nucleic acids: inclusion of salt effects." *Theoretical Chemistry Accounts: Theory, Computation, and Modeling (Theoretica Chimica* 

treatment of solvation for molecular mechanics and dynamics." *Journal of the* 

Y.; Koishi, T.; McNiven, G. D.; Narumi, T. & Yasuoka, K. (2003). "Hardware accelerator for molecular dynamics: MDGRAPE-2." *Computer Physics* 

calculations and measurements of histidine pKa values." *Biopolymers* 32(8): 897-

linear fitting empirical models for charge screening effect of aqueous medium surrounding a protein molecule." *Journal of Chemical Physics* 116(18): 8232-

fluctuations by computer simulation. I. The effect of specific amino acid sequence represented by specific inter-unit interactions." *Int J Pept Protein Res* 7(6): 445-

Association:: The Dimerization of Insulin." *Journal of Molecular Biology* 238(3): 405-

of a natively unstructured transcription factor." *PLoS Comput Biol* 4(4):

New methods for coupling explicit and implicit solvent." *J Chem Phys* 134:

the TCR: affinity differences calculated using large scale computing." *J Immunol*


## **An Assessment of the Conformational Profile of Neuromedin B Using Different Computational Sampling Procedures**

Parul Sharma1, Parvesh Singh1, Krishna Bisetty1 and Juan J Perez2 *1Department of Chemistry, Durban University of Technology, Steve Biko campus, Durban 2Department d' Enginyeria Quimica, UPC, ETS d'Enginyers Industrials, Barcelona 1South Africa 2Spain* 

### **1. Introduction**

134 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Zoete, V. & Meuwly, M. (2006). "Importance of individual side chains for the stability of a

Zoete, V. & Michielin, O. (2007). "Comparison between computational alanine scanning and

*Chem* 27(15): 1843-1857.

1047.

protein fold: computational alanine scanning of the insulin monomer." *J Comput* 

per-residue binding free energy decomposition for protein-protein association using MM-GBSA: application to the TCR-p-MHC complex." *Proteins* 67(4): 1026-

> Neuromedin B (NMB) (Minamino et al., 1983), a ten residue (GNLWATGHFM-NH2, Figure 1) neuropeptide, belongs to the ranatensin subfamily of bombesin-like peptides (Erspamer, 1980) which exhibits a wide range of biological responses in the central nervous system and gastrointestinal tract including thermoregulation (Marki et al., 1981), stimulation of the secretion of gastrointestinal hormones (Ghatei et al., 1982), regulation of smooth muscle contraction (Erspamer, 1988), the ability to function as a growth factor in small cell lung cancer cells and murine 3T3 cells (Corps et al., 1985; Cuttitta et al., 1985; Moody et al., 1985). Its mechanism of action involves the initial binding to the three cell surface receptors (Ohki-Hamazaki, 2000) with different pharmacological profile: the neuromedin B receptor (NMB-R or bb1) (Wada et al., 1991), the gastrin-releasing peptide receptor (GRP-R, or bb2) (Corjay et al., 1991), and bombesin receptor subtype 3 (BRS-3, or bb3) (Gorbulev et al., 1992). NMB binds to NMB-R with highest affinity, GRP-R with lower affinity and BRS-3 with lowest affinity (Mantey et al., 1997).

Fig. 1. Extended structure of Neuromedin B showing different residues

A number of spectroscopic studies of NMB including nuclear magnetic resonance (NMR) (Lee & Kim, 1999), Infrared (IR) (Erneand & schwyzer, 1987), Circular Dichroism (CD) and Fluorescence spectroscopy (Polverini et al., 1998) are widely reported in the literature. A recent study of the structure activity relationship (SAR) of bombesin (Glp-Gln-Arg-Leu-Gly-Asn-Gln-Trp-Ala-Val-Gly-His-Leu-Met-NH2) using alanine scan to determine the contribution of specific residues to a protein's function by mutating the residues to alanine (Horwell et al., 1996), suggested that Trp4, His8 and Leu12 residues corresponding to Trp4, His8 and Phe9 respectively in NMB, are important for the binding to the NMB receptors (Sainz et al., 1998). It is reported that (Erneand & schwyzer, 1987) in the phospholipids bilayer NMB adopts α-helical conformation in the C-terminal region. Recent studies have demonstrated that small peptides are able to exist in a dynamic equilibrium between folded and unfolded structures, depending on the solvent polarity and their interaction with the membrane phase (Erne et al., 1985). In aqueous solutions small peptides are known to adopt many conformations since the hydrogen bond formation between the polar backbone carbonyl and the amide groups and water solvent effectively competes with an intramolecular hydrogen-bond formation (Erne et al., 1985; Kaiser & Kezdy, 1987; Zhong & Jr. Johnson, 1992). Based on CD, fluorescence and molecular dynamics (MD) studies (Polverini et al., 1998), it has been observed that NMB adopts an -helical structure in an apolar environment. However, in aqueous solution NMB adopts unordered and very flexible structures. In vacuum 50% of the structures of NMB are helix-like, with a righthanded chirality beginning from the tryptophan residue through to the C terminus and was found to be independent of the initial conformation. Moreover, two-dimensional (2D) NMR studies of NMB suggest that the peptide adopts a relaxed helical conformation from Trp4 to Met10 in a 50% aqueous trifluoroethanol (TFE) solution, and in 150 mM sodium dodecyl sulfate (SDS) micelles. Several reports also suggested that there might be a conformational change to a -turn type structure upon binding to the receptor (Coy et al., 1988; Rivier & Brown, 1978). Despite being remarkably vital, spectroscopic methods alone cannot provide all the structural details necessary to fully understand the conformational profile of the peptides in solution due to the flexibility of these molecules. Therefore, despite having a great biological and pathological significance, the unique native conformation of NMB has not yet been clearly elucidated on the basis of available spectroscopic results.

Computational studies on the other hand, can provide detailed complementary information about the intrinsic conformational features of the peptide. The methodologies available nowadays to investigate the propensities of a peptide to adopt different conformations are solid enough to provide a reasonable picture of the conformational features of a peptide and the way the solvent affects them. Recently, Generalized Born surface area implicit solvent models (Calimet et al., 2001; Dominy & Brooks, 1999) have been used in bimolecular simulations. This methodology has become popular, especially in molecular dynamics applications due to its relative simplicity and computational efficiency, compared to the more standard numerical solution of the Poisson–Boltzmann (PB) equation. The recent modifications to the standard GB implementations extend its applicability to the entire range from low- to high dielectric environments and thus play an imperative role to reproduce the environment induced by different explicit solvents (Feig & Brooks, 2004; Sigalov et al., 2005).

The present work involves the employment of different computational procedures to explore the configurational space of NMB and to provide an adequate atomic description of the peptide, compatible with the aggregated information provided by different experimental techniques. Specifically, the configurational space of NMB peptide has been explored using standard molecular dynamics (MD), multi-canonical replica exchange molecular dynamics (REMD) and simulated annealing (SA) sampling techniques using the Langevin thermostat. The Onufriev, Bashford, and Case (OBC) implicit water model (Onufriev et al., 2004) has been employed for the current investigations as this solvent model in combination with AMBERff96 is reported to generate a better extent of the helices and β-sheet conformations in peptides (Terada & Shimizu, 2008).

### **2. Computational methods**

136 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

recent study of the structure activity relationship (SAR) of bombesin (Glp-Gln-Arg-Leu-Gly-Asn-Gln-Trp-Ala-Val-Gly-His-Leu-Met-NH2) using alanine scan to determine the contribution of specific residues to a protein's function by mutating the residues to alanine (Horwell et al., 1996), suggested that Trp4, His8 and Leu12 residues corresponding to Trp4, His8 and Phe9 respectively in NMB, are important for the binding to the NMB receptors (Sainz et al., 1998). It is reported that (Erneand & schwyzer, 1987) in the phospholipids bilayer NMB adopts α-helical conformation in the C-terminal region. Recent studies have demonstrated that small peptides are able to exist in a dynamic equilibrium between folded and unfolded structures, depending on the solvent polarity and their interaction with the membrane phase (Erne et al., 1985). In aqueous solutions small peptides are known to adopt many conformations since the hydrogen bond formation between the polar backbone carbonyl and the amide groups and water solvent effectively competes with an intramolecular hydrogen-bond formation (Erne et al., 1985; Kaiser & Kezdy, 1987; Zhong & Jr. Johnson, 1992). Based on CD, fluorescence and molecular dynamics (MD) studies (Polverini et al., 1998), it has been observed that NMB adopts an -helical structure in an apolar environment. However, in aqueous solution NMB adopts unordered and very flexible structures. In vacuum 50% of the structures of NMB are helix-like, with a righthanded chirality beginning from the tryptophan residue through to the C terminus and was found to be independent of the initial conformation. Moreover, two-dimensional (2D) NMR studies of NMB suggest that the peptide adopts a relaxed helical conformation from Trp4 to Met10 in a 50% aqueous trifluoroethanol (TFE) solution, and in 150 mM sodium dodecyl sulfate (SDS) micelles. Several reports also suggested that there might be a conformational change to a -turn type structure upon binding to the receptor (Coy et al., 1988; Rivier & Brown, 1978). Despite being remarkably vital, spectroscopic methods alone cannot provide all the structural details necessary to fully understand the conformational profile of the peptides in solution due to the flexibility of these molecules. Therefore, despite having a great biological and pathological significance, the unique native conformation of NMB has

not yet been clearly elucidated on the basis of available spectroscopic results.

Sigalov et al., 2005).

Computational studies on the other hand, can provide detailed complementary information about the intrinsic conformational features of the peptide. The methodologies available nowadays to investigate the propensities of a peptide to adopt different conformations are solid enough to provide a reasonable picture of the conformational features of a peptide and the way the solvent affects them. Recently, Generalized Born surface area implicit solvent models (Calimet et al., 2001; Dominy & Brooks, 1999) have been used in bimolecular simulations. This methodology has become popular, especially in molecular dynamics applications due to its relative simplicity and computational efficiency, compared to the more standard numerical solution of the Poisson–Boltzmann (PB) equation. The recent modifications to the standard GB implementations extend its applicability to the entire range from low- to high dielectric environments and thus play an imperative role to reproduce the environment induced by different explicit solvents (Feig & Brooks, 2004;

The present work involves the employment of different computational procedures to explore the configurational space of NMB and to provide an adequate atomic description of the peptide, compatible with the aggregated information provided by different experimental techniques. Specifically, the configurational space of NMB peptide has been explored using standard molecular dynamics (MD), multi-canonical replica exchange

### **2.1 Replica Exchange Molecular Dynamics (REMD)**

The leap module of AMBER 9 (Case et al., 2006) was used to generate the extended conformation of NMB with its N-terminal protonated and C-terminal amidated. The extended structure of NMB was energetically minimized untill a convergence criterion of 0.005 kcal mol-1 Å-1 was achieved. REMD was subsequently performed on the minimized structure using the Generalized Born implicit solvent model (solvent dielectric constant 78.5, surface tension 0.005 cal/mol-1Å2) was used to model the effects of solvation (Sitkoff et al., Tsui & Case, 2001). The internal dielectric constant around the peptide was set to 1. The SHAKE algorithm with a relative geometric tolerance of 10-5 was used to constrain all bond lengths to their equilibrium distances. Prior to the REMD simulations, standard MD simulations were performed for 5 ns at different temperatures ranging from 200 to 900 K, with a temperature difference of 100 K. In the present study, twelve replicas were used and the temperature of each replica was set to: 277, 300, 326, 354, 385, 419, 457, 498, 544, 595, 651, and 713 K, with a time step of 0.2 fs. The temperature during the MD simulations was regulated by the Langevin thermostat (Wu & Brooks, 2003; Andersen, 1980). Each replica was simulated simultaneously and independently at different replica temperatures. The replica exchange was performed every 2 ps for 50,000 steps during the REMD simulations.

### **2.2 Molecular Dynamics (MD)**

MD trajectory was undertaken using the Generalized Born (GB) approximation at 300 K employing the Langevin coupling algorithm. Internal dielectric constant around the peptide was set to 1, while the external dielectric constant of 78.5 corresponding to water was employed. In order to mimic the physiological conditions a 0.2 M salt concentration was used. SHAKE was used on all bonds involving hydrogen atoms with a time-step of 2 fs.

### **2.3 Simulated Annealing (SA)**

The extended conformation of NMB peptide was energy minimized using the steepest descent method followed by a conjugate gradient method until a convergence of less than 0.001 kcal mol-1 Å-1 between successive steps was achieved using the SANDER module of AMBER 9 (Case et al., 2006). The SA calculation was performed under implicit solvent conditions using the GB-OBC continuum solvent model (Onufriev et al., 2004). For this purpose, all electrostatic calculations throughout this study were done with the relative permittivity of 80. The minimized starting structure was heated up to 900 K at a rate of 100 K ps-1. This means that the structure was first heated to 200 K, allowed to equilibrate and then reheated to 300 K, and this heating process was repeated until a temperature of 900 K was reached. The high temperature was used to provide the molecules with sufficient kinetic energy to enable them to cross energy barriers between different conformations, as quickly as possible. At this point the structure was slowly cooled from 900 K down to 200 K at a rate of 50 K ps-1. In this technique the system was cooled down at regular time intervals, by decreasing the simulation temperature from 900 K to 200 K in intervals of 50 K. As the temperature approaches 200 K the molecule is trapped in the nearest local minimum conformation. At the end of the annealing cycle, the geometry of the structure was minimized at 200 K, in order to remove the internal strain of the molecule. Information regarding the coordinates and minimized energy data at 200 K is saved separately on a data file, which completes a single cycle of simulated annealing. Subsequently the optimized structure was used as the starting conformation for the next cycle of SA. The 8000 cycles of iterative simulated annealing resulted in a library of 8000 structures accumulated (with each cycle corresponding to a single structure), and ranked according to their energy values. The primary objective of a conformational analysis is in the identification of low energy structures, which forms an important part of understanding the relationship between the structure and the biological activity of a molecule. The biological activity of a drug molecule depends on a single unique conformation hidden amongst all the low energy conformations (Ghose et al., 1989). The search for this so-called bioactive conformation for sets of compounds is one of the major tasks in medicinal chemistry. Only the bioactive conformation can bind to the specific macromolecular environment at the active site of the receptor protein (Jörgensen, 1991). An understanding of the manipulation of the conformational structures of peptides using highly restricted segments ultimately leads to the design of bioactive peptides to fit the three dimensional receptor site requirements. In the identification of low energy structures, the SA strategy employed is widely used in the characterization of low energy conformations (Ghose et al., 1989), and the following protocols were used. Firstly, the structures were rank-ordered by energy every 100 cycles and checked for uniqueness. The efficiency of this process was monitored according to the equation 1:

$$\mathcal{A}(N) = \frac{\xi(N)100}{N\xi(100)}\tag{1}$$

The efficiency parameter, , was computed every 100 cycles of SA, which is defined as the number of unique conformations, , found after N cycles of SA, (N), divided by N, and adjusted by a coefficient so that the efficiency parameter is unity after the first 100 cycles performed, which completes the criterion of the iterative process (Corcho, 1999). The procedure was terminated in all cases when the calculated efficiency of the process,was at least 10% below the starting value. The evolution of this parameter was monitored along the conformational profile, for the peptide.

#### **3. Results and discussion**

The sampling efficiency of MD and REMD trajectories was monitored by establishing different conformational patterns attained by NMB during the progress of the simulations. For this purpose, the CLASICO program (Corcho, 2004) was used to compute the pattern profile for every snapshot of MD and REMD trajectories, and is depicted in Figure 2a and Figure 2b, respectively. Accordingly, 105 439 (52.7%) patterns (Figure 2a) were obtained for 200 000 snapshots of MD whereas 68 753 (68.7%) patterns (Figure 2b) were identified for 100 000 snapshots of REMD trajectory. These plots provide a broad estimation of the performance of the different protocols in sampling new patterns. A closer inspection of Figure 2a reveals the appearance of new patterns in a uniform fashion for initial 10 ns trajectory probably due to

structure was slowly cooled from 900 K down to 200 K at a rate of 50 K ps-1. In this technique the system was cooled down at regular time intervals, by decreasing the simulation temperature from 900 K to 200 K in intervals of 50 K. As the temperature approaches 200 K the molecule is trapped in the nearest local minimum conformation. At the end of the annealing cycle, the geometry of the structure was minimized at 200 K, in order to remove the internal strain of the molecule. Information regarding the coordinates and minimized energy data at 200 K is saved separately on a data file, which completes a single cycle of simulated annealing. Subsequently the optimized structure was used as the starting conformation for the next cycle of SA. The 8000 cycles of iterative simulated annealing resulted in a library of 8000 structures accumulated (with each cycle corresponding to a single structure), and ranked according to their energy values. The primary objective of a conformational analysis is in the identification of low energy structures, which forms an important part of understanding the relationship between the structure and the biological activity of a molecule. The biological activity of a drug molecule depends on a single unique conformation hidden amongst all the low energy conformations (Ghose et al., 1989). The search for this so-called bioactive conformation for sets of compounds is one of the major tasks in medicinal chemistry. Only the bioactive conformation can bind to the specific macromolecular environment at the active site of the receptor protein (Jörgensen, 1991). An understanding of the manipulation of the conformational structures of peptides using highly restricted segments ultimately leads to the design of bioactive peptides to fit the three dimensional receptor site requirements. In the identification of low energy structures, the SA strategy employed is widely used in the characterization of low energy conformations (Ghose et al., 1989), and the following protocols were used. Firstly, the structures were rank-ordered by energy every 100 cycles and checked for uniqueness. The efficiency of this process was monitored according to the equation 1:

> ( )100 ( ) (100) *<sup>N</sup> <sup>N</sup> N*

number of unique conformations, , found after N cycles of SA, (N), divided by N, and adjusted by a coefficient so that the efficiency parameter is unity after the first 100 cycles performed, which completes the criterion of the iterative process (Corcho, 1999). The procedure was terminated in all cases when the calculated efficiency of the process,was at least 10% below the starting value. The evolution of this parameter was monitored along the

The sampling efficiency of MD and REMD trajectories was monitored by establishing different conformational patterns attained by NMB during the progress of the simulations. For this purpose, the CLASICO program (Corcho, 2004) was used to compute the pattern profile for every snapshot of MD and REMD trajectories, and is depicted in Figure 2a and Figure 2b, respectively. Accordingly, 105 439 (52.7%) patterns (Figure 2a) were obtained for 200 000 snapshots of MD whereas 68 753 (68.7%) patterns (Figure 2b) were identified for 100 000 snapshots of REMD trajectory. These plots provide a broad estimation of the performance of the different protocols in sampling new patterns. A closer inspection of Figure 2a reveals the appearance of new patterns in a uniform fashion for initial 10 ns trajectory probably due to

(1)

, was computed every 100 cycles of SA, which is defined as the

The efficiency parameter,

conformational profile, for the peptide.

**3. Results and discussion** 

folding of the peptide. A sharp increase in patterns number was observed for the next 10 ns followed by a slow but regular increase of patterns throughout the trajectory. However, the peptide conformations seem to get trapped (dark areas) in regions of the conformational space at certain intervals clearly suggesting its restrictive nature to explore new patterns. In the case of REMD (Figure 2b), conformations with new patterns were sampled from the start of the simulation and progresses in a uniform fashion during the expansion of the trajectory. Moreover, the presence of less darker regions in the plot (Figure 2b) reveals that new patterns are explored with less restriction, clearly suggesting the better sampling performance of REMD over MD. Moreover, convergence seems to be attained in case of REMD after 100 ns as the appearance of new patterns was almost negligible at the end of trajectory.

Fig. 2. Evaluation of new patterns for the NMB in (a) MD and (b) REMD trajectories.

Secondary structure analysis was performed for every snapshot of MD and REMD calculations using the CLASICO program (Corcho, 2004) employing a three-residue window. Figures 3a-3b represents the statistics of the conformational motifs for each residue of the NMB peptide in MD and REMD trajectories, respectively. Figure 3a shows the classification of secondary structures obtained in MD trajectory where the peptide exhibits predominantly -turns (~25%) between residues 3 to 9 residues with a stronger propensity between residues 5 and 6. Additionally, an -helical region (4-5%) flanked by residues 3 to 6 was also observed in some of the sampled structures (Figure 3a) with the complete absence of 310-helical conformations.

Fig. 3. Motif abundance for NMB in (a) MD and (b) REMD trajectories. Conformational motifs are labeled: H (-helix), PI (-helix), PP2 (polyproline II), Ext (extended), S (-strand).

of the NMB peptide in MD and REMD trajectories, respectively. Figure 3a shows the classification of secondary structures obtained in MD trajectory where the peptide exhibits predominantly -turns (~25%) between residues 3 to 9 residues with a stronger propensity between residues 5 and 6. Additionally, an -helical region (4-5%) flanked by residues 3 to 6 was also observed in some of the sampled structures (Figure 3a) with the complete absence

(a)

(b)

Fig. 3. Motif abundance for NMB in (a) MD and (b) REMD trajectories. Conformational motifs are labeled: H (-helix), PI (-helix), PP2 (polyproline II), Ext (extended), S (-strand).

of 310-helical conformations.

To some extent -strands (2-3%) in a region between residues 2 to 6 were also found in some of the conformations. The REMD protocol (Figure 3b) on other hand, was more efficient at inducing β-turns (~33%) in the sampled conformations flanked by residues 2 to 9 with a strong propensity between residues 5 and 6. The second major conformational motif attained by sampled structures in the REMD trajectory was -helical region (8-10%) flanked by residues 3 to 9 with a good propensity between residues 3 to 6 whereas a very low propensity between residues 7 to 9. All -helical conformations were observed to be righthanded, and to a minor extent conformations exhibiting 310-helical region between residues 4 to 6 were also obtained (Figure 3b). Structures with -strands between residues 2 to 6 were also observed in REMD calculations.

The �-turn motifs attained by residues of the NMB peptide were further classified into different types using two-residue window of the CLASICO program (Corcho, 2004), and are depicted in the Figures 4a-4b. The motifs obtained in MD trajectory (Figure 4a), shows the predominance of �-turn type I between residues 4 and 9 with a high propensity of type II between residues 6 and 7. To some extent �-turn type III was also observed between residues 3 to 9, with a high propensity between residues 5 and 6 and a low propensity between residues 3 to 4 and 7 to 9. In addition, β-turn type ii (mirror conformation of β-turn type II) was also observed between residues 5 to 8 (Figure 4a). On the other hand, conformations obtained from the REMD trajectory attain preferably �-turn type I between residues 3 to 9 with a strong propensity between residues 3 to 6 (Figure 4b). Structures displaying �-turn type III flanked by residues 3 to 9, with a strong propensity between residues 5 and 6 were also sampled in REMD trajectory. It should be noted that the CLASICO program does not include the first and last residues of the peptide in the secondary structure calculations which accounts for the absence of any of secondary structural features in Figures 3-4.

Since, the computational analysis described above provides an estimation of the average structure of NMB, it was considered worthwhile to compare the results of the different protocols with the reported NMR experiments reported in literature (Lee & Kim, 1999). The average distances corresponding to NMR NOE's were computed independently using the Clasterit algorithm of the CLASICO program (Corcho, 2004). Distances are reported as the average of the distance computed for each snapshot with a tolerance factor of ±1.96 standard deviations, covering a 95% of the variance assuming that they exhibit a normal distribution. In case of long distances (LD), the NOEs considered in the present study include: i) C�-N (bN2) between residues 2-4(m), 4-6 (w), 5-7(w) and 6-8 (w); ii) C-N (aN3) between residues 4-7(m), 5-8(w), 6-9(m), and 7-10(m); iii) C- N (aN2) between residues 1-3(w), 2-4(m), 4-6(m), 5-7(w), 6-8(m), 7-9(w), 8-10(w), and 9-N(w) [14]. In case of short distances (SD), the NOEs include: i) C-N (aN1) between residues 1-2(s), 2-3(s) and 7-8(s); ii) N-N (NN1) between residues 2-3(w), 3-4(s), 6-7(s), 7-8(s), 8-9(s), 9-10(s), 10-N(w). In both cases, m, w and s stand for medium, weak and strong NOEs respectively.

The overlaps between distances obtained from NMR experiments and those computed from the present studies are depicted in Figures 5a-d. These overlapping results compare both long distances (i to i+2 and i to i+ 3 type interactions) and short distances (i to i+1 type interactions) between the atoms. Specifically, Figure 5a reveals that only 12 long distances (LD) in the case of MD are in agreement with the corresponding NMR distances, and clearly suggests the absence of NMR structure in this simulation. Since, only LD accounts for the secondary structures, a good agreement of short distances (SD) in case of MD (Figure 5c) does not make any contribution in the helicity. On the other hand, all computed LD (Figure 5b) and SD (Figure 5d) from REMD calculations corresponds to the NMR distances, clearly revealing the presence of the NMR structure. However, elongation of the computed distances (Figures 5b and 5d) clearly reveals the rapid exchange between NMR and unordered structures in this segment of trajectory. Overall these results reveal that peptide is in a rapid equilibrium between ordered and unordered conformations and suggests low conformational energy barrier between them accounting for the higher flexibility of NMB. Moreover, REMD method is more efficient to induce helicity and β-turns in the peptide and was also successful in sampling the NMR structures which were completely absent in the MD trajectory.

Fig. 4. Type of -turns attained by NMB peptide in **(a)** MD and **(b)** REMD trajectories.

structures, a good agreement of short distances (SD) in case of MD (Figure 5c) does not make any contribution in the helicity. On the other hand, all computed LD (Figure 5b) and SD (Figure 5d) from REMD calculations corresponds to the NMR distances, clearly revealing the presence of the NMR structure. However, elongation of the computed distances (Figures 5b and 5d) clearly reveals the rapid exchange between NMR and unordered structures in this segment of trajectory. Overall these results reveal that peptide is in a rapid equilibrium between ordered and unordered conformations and suggests low conformational energy barrier between them accounting for the higher flexibility of NMB. Moreover, REMD method is more efficient to induce helicity and β-turns in the peptide and was also successful in

(a)

(b)

Fig. 4. Type of -turns attained by NMB peptide in **(a)** MD and **(b)** REMD trajectories.

sampling the NMR structures which were completely absent in the MD trajectory.

Fig. 5. Comparison of NMR derived long distances (LD) obtained from Lee & Kim, 1999 shown in orange and the computed average distances in a interval containing 95% of the structures for **(a)** MD and **(b)** REMD trajectories. Similar comparison of short distances (SD) for **(c)** MD and **(d)** REMD trajectories.

The conformational searches for NMB presented in this study were performed with the simulated annealing (SA) protocol in an iterative fashion, as a sampling technique. The sampling procedure was stopped after 8000 cycles of an iterative SA process for which the sampling efficiency (see Eqn. 1) was found to be less than 10% of the starting value. The evolution of this parameter was computed using Eqn. 1 and monitored along the conformational profile, as shown in Figure 6.

The shape of the above figure suggests that the search procedure employed was acceptable both in quality and computer time. Low levels of performance were reached in finding new low energy conformations ( = 0.1) which is the expected result for most peptide analogues (Filizola et al., 1998). After completion of 8000 cycles of SA, the resulting 8000 structures were stored in a library. The criterion described in the Methods Section 2.3 was used to compute the total number of unique conformations for the NMB peptide. Of the total 5521 unique conformations, only 205 low energy structures (<5 kcal mol-1) were observed (Table 1).

Fig. 6. Normalised efficiency values for the NMB peptide search.


Table 1. Summary of conformational analysis obtained from SA

To describe the preferred conformational domains exhibited by the peptide, the low energy structures were clustered into groups according to the values of the root mean square deviation (RMSD) of the distances between the backbone atoms of every structure.

Furthermore the RMSD's for the unique conformations were calculated using the Kleiweg clustering method (Kleiweg et al., 2004) to perform the hierarchical cluster analysis. The clustering can be visually represented by constructing a dendrogram, which indicates the relationship between the items in the data set (i.e. RMSD) and is graphically represented in Figure 7. The dendrogram enables us to identify how many clusters there are at any stage and what the corresponding members of the clusters are. It is a useful tool to show the underlying structure of the data and for suggesting the appropriate number of clusters to choose. A line drawn horizontally across the dendrogram enables one to read off how many clusters there are at any particular distance measured, as shown in Figure 7.

Since the objective of cluster analysis is to determine the representative structures of the conformational space explored, a carefully selected cut-off value is important as this method will avoid choosing subclusters. Accordingly the RMSD value of 0.8 Å was chosen as a cutoff value. Since another goal of this work was to get a better understanding of the structural motifs of the NMB peptide, further conformational analysis for this peptide was carried out

(Filizola et al., 1998). After completion of 8000 cycles of SA, the resulting 8000 structures were stored in a library. The criterion described in the Methods Section 2.3 was used to compute the total number of unique conformations for the NMB peptide. Of the total 5521 unique

conformations, only 205 low energy structures (<5 kcal mol-1) were observed (Table 1).

Fig. 6. Normalised efficiency values for the NMB peptide search.

Table 1. Summary of conformational analysis obtained from SA

conformations

H-GNLWATGHFM-NH2 5521 10 205

deviation (RMSD) of the distances between the backbone atoms of every structure.

clusters there are at any particular distance measured, as shown in Figure 7.

To describe the preferred conformational domains exhibited by the peptide, the low energy structures were clustered into groups according to the values of the root mean square

Furthermore the RMSD's for the unique conformations were calculated using the Kleiweg clustering method (Kleiweg et al., 2004) to perform the hierarchical cluster analysis. The clustering can be visually represented by constructing a dendrogram, which indicates the relationship between the items in the data set (i.e. RMSD) and is graphically represented in Figure 7. The dendrogram enables us to identify how many clusters there are at any stage and what the corresponding members of the clusters are. It is a useful tool to show the underlying structure of the data and for suggesting the appropriate number of clusters to choose. A line drawn horizontally across the dendrogram enables one to read off how many

Since the objective of cluster analysis is to determine the representative structures of the conformational space explored, a carefully selected cut-off value is important as this method will avoid choosing subclusters. Accordingly the RMSD value of 0.8 Å was chosen as a cutoff value. Since another goal of this work was to get a better understanding of the structural motifs of the NMB peptide, further conformational analysis for this peptide was carried out

Number of classes obtained from cluster analysis

Number of unique conformations within 5 kcal mol-1

NMB sequence Total number of unique

on the low energy structures. The 205 unique conformations were thereby classified into ten clusters (D1 to D10), summarized in Table 2. For each of the ten different classes of clusters, a representative of the cluster was chosen on the basis of having the lowest relative energy (designated by Er min, Table 2).

Fig. 7. Dendrogram showing different clusters for NMB classified according to their RMSD (shown along y-axis) using the Kleiweg clustering method.


Table 2. Cluster analysis for H-GNLWATGHFM-NH2.

The five most abundant clusters represented by D2, D4, D6, D7 and D8 in Table 2 corresponds to 10.2%, 41.6%, 21.9%, 6.3% and 9.2% of the total number of structures respectively, clearly suggesting that the bulk of the structures are restricted to small number of clusters. Inspection of Table 2 reveals that 89% of the structures are represented by the most abundant clusters.

Fig. 8. Conformational motif abundance attained by NMB peptides in (a) cluster D2 (b) cluster D4 **(c)** cluster D6 **(d)** cluster D7 and **(e)** cluster D8. Conformational motifs are labeled: H (-helix), 310 (310-helix) and Ext (extended).

All the structures in each of the five most abundant clusters were analyzed to determine the conformational motifs attained by the NMB peptide, using CLASTERIT algorithm of the CLASICO program (Corcho, 2004).

The statistics of all the motifs found in clusters D2, D4, D6, D7 and D8 are depicted in Figures 8a–8e respectively. A closer inspection of Figure 8a reveals that most of the structures in cluster D2 (Table 2) predominantly exhibit an -helical region between residues 3 to 6, while most of the residues (2-4, 8-9, Figure 8a) prefer to stay in the extended form. The structures of the most populated cluster D4, on other hand, were observed displaying a -turn type II between residues 6 and 7 while the rest showed only an extended region (Figure 8b). The conformations of the second most abundant cluster D6 (Figure 8c) displayed predominantly an -helical region between residues 3 to 6 with a stronger propensity between residues 4-5. To some extent the -turn type III (310-helical) and type I between residues 3-6 were also observed in some of the structures. However residue 7 did not show any secondary structural feature as most of the structures in clusters D7 were extended, while some displaying -helical region between residues 4-6 and -turns of type II and type I between residues 6-7 and 5-6 respectively, were also part of the cluster (Figure 8d). Almost 60 % of the structures in cluster D5 did not display any ordered conformation except a -turn type III (310- helical) between residues 4 and 5 (Figure 8e).

### **4. Conclusion**

146 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

(a) (b)

(c) (d)

(e)

Fig. 8. Conformational motif abundance attained by NMB peptides in (a) cluster D2 (b) cluster D4 **(c)** cluster D6 **(d)** cluster D7 and **(e)** cluster D8. Conformational motifs are labeled:

H (-helix), 310 (310-helix) and Ext (extended).

The present results suggest that the peptide adopts different folded and unfolded conformations regardless of the protocols used. REMD under GB conditions sample the new patterns in a uniform fashion and appears to have easily reached the convergence (Figure 6.3b), whereas conformations within the MD simulation seems to get trapped in certain regions of the conformational space making it less efficient. Moreover, the results obtained from REMD and MD computational protocols were compared with the available NMR results of NMB in literature. The comparison indicates that REMD shows good agreement with the reported NMR results. MD results, on the other hand, do not correspond with the reported NMR NOEs, clearly indicating the absence of NMR derived structures in this simulation. Moreover, the results obtained from SA is also in agreement with the corresponding REMD results clearly suggesting the probable existence of both turns and helicity in the NMB peptide, and thus may be responsible for binding of NMB at its receptor site. Hence, the present work provides comprehensive information about the conformational preferences of NMB explored using three different techniques which could be helpful to better understand its native conformation for future investigations.

### **5. Acknowledgement**

Dr P Singh gratefully acknowledges the financial support from the Durban University of Technology, and the National Research Foundation (NRF). KB gratefully acknowledges the experiences and insights gained from the Spanish collaborators through the SA-Spain bilateral agreement. The authors would like to express their acknowledgement to the Centre for High Performance Computing, an initiative supported by the Department of Science and Technology of South Africa.

#### **6. References**


Minamino, N.; Kangawa, K. & Matsuo, H. (1983). Neuromedin B: a novel bombesin like

Erspamer, V. (1980). Comprehensive Endocrinology (Glass, G.B.J., Ed.). 343-361, Raven

Marki, W.; Brown, M. & Rivier. J. E. (1981). Bombesin analogs: Effects on thermoregulation

Ghatei, M. A.; Jung, R. T.; Stevenson, J. C.; Hillyard, C. J.; Adrian, T. E.; Lee, Y. C.;

Erspamer, V. (1988). Discovery, isolation and characterization of bombesin-like peptides,

Cuttitta, F.; Carney, D. N.; Mulshine, J.; Moody, T. W.; Fedorko, J.; Fischler, A. & Minna, J. D.

Corps, A. N.; Rees, L. H. & Brown, K. D. (1985). A peptide that inhibits the mitogenic

Moody, T. W.; Carney, D. N.; Cuttita, F.; Quattrocchi, K. & Minna, J. D. (1985). High affinity

Corjay, M. H.; Dobrzanski, D. J.; Way, J. M.; Viallet, J.; Shapira, H.; Worland, P.; Sausville, E.

Gorbulev, V.; Akhundova, A.; Biichner, H. & Fahrenholz. F. (1992). *Molecular cloning of a new* 

Mantey, S.; Weber, C.; Sainz, E.; Akeson, M.; Ryan, R.; Pradhan, T.; Searles, R.; Spindel, E.;

Erneand, D. & Schwyzer, R. (1987). Membrane structure of bombesin studied by infrared

neuromedin B, and neuromedin C, *Biochemistry*, Vol.26, pp. 6316-6319.

resonance spectroscopy, *FEBS letters*, Vol.460, pp. 263-269.

Ohki-Hamazaki, H. (2000). Neuromedin B, *Progress in Neurobiology*, Vol.62, pp. 297-312. Wada, E.; Way, J.; Shapira, H.; Kusano, K.; Lebacq-Verheyden, A. M.; Coy, D.; Jensen, R. &

and glucose metabolism, *Peptides Supplement*, Vol.2, pp. 169-177.

*Endocrinology and Metabolism*, Vol.54, pp. 980-985.

small-cell lung cancer, *Nature*, Vol.316, pp. 823-826.

*Annals of the New York Academy of Science*, Vol.547, pp. 3-9.

*Communications*, Vol.114, pp. 541–548.

Press, New York.

Vol.231, pp. 781-784.

421-430.

*Sciences*, Vol.37, pp. 105-113.

Vol.266, pp. 18771-18779.

*Biochemistry*, Vol.208, pp. 405-410.

peptide identified in porcine spinal cord, *Biochemical and Biophysical Reserach* 

Christofides, N. D.; Sarson, D. L.; Mashiter, K.; MacIntyre, I. & Bloom, S.R. (1982). Bombesin - Action on gut hormones and calcium in man, *The Journal of Clinical* 

(1985). Bombesin-like peptides can function as autocrine growth factors in human

stimulation of Swiss 3T3 cells by bombesin or vasopressin, *Biochemical Journal*,

receptors for bombesin/GRP-like peptides on human small cell lung cancer, *Life* 

Battey, J. (1991). cDNA cloning, characterization, and brain region-specific expression of a neuromedin-B-preferring bombesin receptor, *Neuron*, Vol*.*6, pp.

A. & Battey, J. F. J. (1991). Two distinct homhcsin receptor subtypes are expressed and functional in human lung carcinoma cells, *Journal of Biological Chemistry*,

*bombesin receptor subtype expressed in uterus during pregnancy, European Journal of* 

Battey, J. F.; Coy, D. H. & Jensen, R. T. (1997). Discovery of a high affinity radioligand for the human orphan receptor, bombesin receptor subtype 3, which demonstrates it has a unique pharmacology compared to other mammalian bombesin receptors, *The Journal of Biological Chemistry*, Vol.272, pp. 26062-26071. Lee, S. & Kim. Y. (1999). Solution structure of neuromedin B by 1H nuclear magnetic

spectroscopy. Prediction of membrane interactions of gastrin-releasing peptide,

**6. References** 


https://lafarga.cpl.upc.edu/projects/clusterit [Accessed June 12, 2008].


## **Essential Dynamics on Different Biological Systems: Fis Protein, tvMyb1 Transcriptional Factor and BACE1 Enzyme**

Lucas J. Gutiérrez1,2, Ricardo D. Enriz1,2 and Héctor A. Baldoni1,3 *1Área de Química General e Inorgánica Universidad Nacional de San Luis (UNSL), San Luis 2Instituto Multidisciplinario de Investigaciones Biológicas de San Luis (IMIBIO-SL, CONICET), San Luis 3Instituto de Matemática Aplicada San Luis (IMASL, CONICET), San Luis Universidad Nacional de San Luis (UNSL) Argentina*

### **1. Introduction**

150 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Case, D. A.; Darden, T. A.; Cheatham, T .E. III.; Simmerling, C .L.; Wang, J.; Duke, R. E.; Luo,

Sitkoff, D.; Sharp, K. A. & Honig, B. (1994). Accurate Calculation of Hydration Free Energies

Tsui, V. & Case. D. A. (2001). Theory and applications of the generalized Born solvation model in macromolecular simulation, *Biopolymers*, Vol.56, pp.275-291. Wu, X. & Brooks, B. R. (2003). Self-guided Langevin dynamics simulation method, *Chemical* 

Andersen, H. C. (1980). Molecular dynamics simulations at constant pressure and/or

Ghose, A. K.; Crippen, G. M.; Revankar, G. R.; Smee, D. F.; McKernan, P. A. & Robins, R. K.

Jörgensen, W. L. (1991). Rusting of the lock and key model for protein-ligand binding,

Corcho, F. J.; Filizola, M. & Perez, J. J. (1999). Assessment of the bioactive conformation of

methods, *Journal of Biomolecular Structure & Dynamics*, Vol.5, pp. 1043-1052.

Corcho, F.; Canto, J. & Perez, J. J. (2004). Comparative analysis of the conformational profile

Filizola, M.; Centeno, N. B.; Farina, M. C. & Perez, J. J. (1998). Conformational analysis of the

Kleiweg, P.; Nerbonne, J. & Bosveld, L. (2004). Geographical Projection of cluster

Jain, A. K.; Murty, M. N. & Flynn, P. J. (1999). Data Clustering: A Review. ACM Computing

composites. In: A. Blackwell, K. Marriott and A. Shimojima (Eds.).

(1989). Analysis of the in Vitro Antiviral Activity of Certain Ribonucleosides against Parainfluenza Virus Using a Novel Computer Aided Receptor Modeling

the farnesyltransferase protein binding recognition motif by computational

of substance P using simulated annealing and molecular dynamics, *Journal of* 

highly potent bradykinin antagonist Hoe-140 by means of two different computational methods, *Journal of Biomolecular Structure & Dynamics*, Vol.15, pp.

temperature, *Journal of Chemical Physics*, Vol.72, pp. 2384-2393.

Procedure, Journal *of Medicinal Chemistry*, Vol.32, pp. 746-756.

https://lafarga.cpl.upc.edu/projects/clusterit [Accessed June 12, 2008].

University of California, San Francisco.

*Physics Letters*, Vol.381, pp. 512-518.

*Science*, Vol.254, pp. 954-955.

LaFargaCPL: CLASTERIT: Project Info. Available at:

Surveys Vol.31, No.3. pp. 264-323.

*Computational Chemistry*, Vol.25, pp.1937-1952.

1988.

639-652.

R.; Merz, K. M.; Pearlman, D. A.; Crowley, M.; Walker, R .C.; Zhang, W.; Wang, B.; Hayik, S.; Roitberg, A.; Seabra, G.; Wong, K .F.; Paesani, F.; Wu, X.; Brozell, S.; Tsui, V.; Gohlke, H.; Yang, L.; Tan, C.; Mongan, J.; Hornak, V.; Cui, G.; Beroza, P.; Mathews, D. H.; Schafmeister, C.; Ross, W. S. & Kollman, P. A. (2006). AMBER 9.

Using Macroscopic Solvent Models, *Journal of Physical Chemistry*, Vol.98, pp.1978-

Proteins and enzymes poses a non-covalent 3D structure and therefore their intrinsic flexibility allows the existence of an ensemble of different conformers which are separated by low-energy barriers. These ranges of available conformers for proteins in solution are due to the relative movements among the different domains. Domain motions are important for a variety of protein functions, including catalysis, regulation of activity, transport of metabolites, formation of protein assemblies, and cellular locomotion.

Considering the importance of these conformational changes it is obvious that the different techniques to evaluate these behaviours are very important in order to understand the biological effects. In the present chapter we report molecular dynamics (MD) trajectories analyzed by essential dynamics method on three different molecular systems of biological interest: i) DNA-bending protein Fis (Factor for Inversion Stimulation), ii) DNA-tvMyb1 (*Trichomonas vaginalis* transcriptional factor) and iii) the BACE1 (site amyloid cleaving enzyme 1). Although the general structural characteristics for the above systems are well known, comparatively little information is available about their flexibility and dynamics. This is in part due to difficulties with obtaining such information experimentally. Thus, our primary interest was the comparison between the unligated and the complexed state, because the corresponding conclusions may reveal motions of functional relevance.

#### **2. Methodology**

#### **2.1 Molecular dynamics simulations**

Twenty-nanoseconds MD simulations were performed for the three systems under study (Fis-protein, DNA- tvMyb1 protein and BACE1 enzyme) in order to relax and investigate the dynamical behaviour of these systems. All the simulations were performed by using the Amber program (Case et al., 2008). The crystal structure for each system was obtained from the Protein Data Bank (PDB) and such structures were used as the initial model for the different dynamics simulations. The PDB entries were 3IV5, 2KDZ and 1M4H for the DNAbending protein Fis, DNA-tvMyb1 and BACE1, respectively. An all atom force field was taken from FF99SBilnd (Lindorff-Larsen et al., 2010) for the protein and FF99csc0 (Perez et al., 2007) for the DNA.

Each system was soaked in a truncated octahedral periodic box of TIP3P water molecules. The distance between the edges of the water box and the closest atom of the solutes was at least 10Å. Counterions were added to neutralize the charge of the systems. The entire system was subject to energy minimization in two stages to remove bad contacts between the complex and the solvents molecules. First, the water molecules were minimized by holding the solute fixed with harmonic constraint of strength 100 kcal/molÅ2. Second, conjugate gradient energy minimizations were performed repeatedly four times using positional restraints to all heavy atoms of the receptor with 15, 10, 5 and 0 kcal/molÅ2. The system was then heated from 0 K to 300 K in 300 ps and equilibrated at 300 K for another 200 ps. After the minimization and heating, 20 ns dynamics simulations were performed at the NPT assemble (temperature of 300 K and pressure of 1 atm). During the minimization and MD simulations, particle mesh Ewald (PMD) method was employed to treat the long-range electrostatic interactions in a periodic boundary condition. The SHAKE method was used to constrain hydrogen atoms, allowing a time step for all MD is 2 fs. The direct space non bonded cut-off was of 8Å and initial velocities were assigned from a Maxwellian distribution at the initial temperature.

### **2.2 Essential dynamics**

The essential dynamics (ED) method also called Principal Component Analysis (PCA) (Amadei et al., 1993), was used to extract the dimensional subspace in which all biologically relevant motions occur (the so-called essential subspace)( De Groot et al., 1996).

The ED method is based on the diagonalization of the covariance matrix built from atomic fluctuations in a trajectory from which the overall translation and rotations have been removed:

$$\mathbf{C}\_{\mathrm{ij}} = \left\langle \left( \mathbf{x}\_{\mathrm{i}} - \mathbf{x}\_{\mathrm{i,0}} \right) \left( \mathbf{x}\_{\mathrm{j}} - \mathbf{x}\_{\mathrm{j,0}} \right) \right\rangle \tag{1}$$

In which X are the separate x, y, and z coordinates of the atoms fluctuating around their average positions X0. ... represent the average time over the entire trajectory. Here, to construct the protein covariance matrix we have used C atom trajectory. Indeed, it has been shown that the C atom contains all the information for a reasonable description of the protein large concerted motions (Amadei et al., 1993). Upon the covariance matrix diagonalization a set of eigenvalues and eigenvectors was obtained. The motions along a single eigenvector correspond to concerted fluctuations of atoms. On the other hand, the eigenvalues represent the total mean square fluctuation of the system along the corresponding eigenvectors.

#### **2.3 Collective movements**

To examine domain motions in a protein we calculated the cross-correlation (normalized covariance) matrix, Cij, of the fluctuations of each x, y and z coordinates of the C atoms

Amber program (Case et al., 2008). The crystal structure for each system was obtained from the Protein Data Bank (PDB) and such structures were used as the initial model for the different dynamics simulations. The PDB entries were 3IV5, 2KDZ and 1M4H for the DNAbending protein Fis, DNA-tvMyb1 and BACE1, respectively. An all atom force field was taken from FF99SBilnd (Lindorff-Larsen et al., 2010) for the protein and FF99csc0 (Perez et

Each system was soaked in a truncated octahedral periodic box of TIP3P water molecules. The distance between the edges of the water box and the closest atom of the solutes was at least 10Å. Counterions were added to neutralize the charge of the systems. The entire system was subject to energy minimization in two stages to remove bad contacts between the complex and the solvents molecules. First, the water molecules were minimized by holding the solute fixed with harmonic constraint of strength 100 kcal/molÅ2. Second, conjugate gradient energy minimizations were performed repeatedly four times using positional restraints to all heavy atoms of the receptor with 15, 10, 5 and 0 kcal/molÅ2. The system was then heated from 0 K to 300 K in 300 ps and equilibrated at 300 K for another 200 ps. After the minimization and heating, 20 ns dynamics simulations were performed at the NPT assemble (temperature of 300 K and pressure of 1 atm). During the minimization and MD simulations, particle mesh Ewald (PMD) method was employed to treat the long-range electrostatic interactions in a periodic boundary condition. The SHAKE method was used to constrain hydrogen atoms, allowing a time step for all MD is 2 fs. The direct space non bonded cut-off was of 8Å and initial velocities

The essential dynamics (ED) method also called Principal Component Analysis (PCA) (Amadei et al., 1993), was used to extract the dimensional subspace in which all biologically

The ED method is based on the diagonalization of the covariance matrix built from atomic fluctuations in a trajectory from which the overall translation and rotations have been

In which X are the separate x, y, and z coordinates of the atoms fluctuating around their average positions X0. ... represent the average time over the entire trajectory. Here, to construct the protein covariance matrix we have used C atom trajectory. Indeed, it has been shown that the C atom contains all the information for a reasonable description of the protein large concerted motions (Amadei et al., 1993). Upon the covariance matrix diagonalization a set of eigenvalues and eigenvectors was obtained. The motions along a single eigenvector correspond to concerted fluctuations of atoms. On the other hand, the eigenvalues represent

To examine domain motions in a protein we calculated the cross-correlation (normalized covariance) matrix, Cij, of the fluctuations of each x, y and z coordinates of the C atoms

the total mean square fluctuation of the system along the corresponding eigenvectors.

C= X X X X ij i i,0 j j,0 (1)

were assigned from a Maxwellian distribution at the initial temperature.

relevant motions occur (the so-called essential subspace)( De Groot et al., 1996).

al., 2007) for the DNA.

**2.2 Essential dynamics** 

**2.3 Collective movements** 

removed:

from their average during the last ten nanosecond of the simulation. The displacement vectors ri and rj of atoms i and j, the matrix, Cij is given by (Ichiye & Karplus, 1991):

$$\mathbf{C\_{ij}} = \frac{\left\langle \Delta \mathbf{r\_i} \times \Delta \mathbf{r\_j} \right\rangle}{\left( \left\langle \Delta \mathbf{r\_i}^2 \right\rangle \left\langle \Delta \mathbf{r\_j}^2 \right\rangle \right)^{1/2}} \tag{2}$$

Where ri is the displacement from the mean position of the i-th atom and the angle in brackets represent the average time over the entire trajectory. The elements of the crosscorrelation matrix take values from -1 to 1. Positive values of Cij represent a motion in the same direction between atoms i and j, and negative Cij values represent a motion in the opposite direction. When Cij is close to zero, the atomic motions are uncorrelated, and their movements are random compared to each other.

PCA was carried out using the PCAZIP software (Meyer et al., 2006). Geometrical analysis was performed using the ptraj module in AmberTools (Case et al., 2008).

#### **2.4 Binding energy decomposition**

The MM-GBSA method (Kollman et al., 2000) was applied to the last ten nanosecond of simulation and was used within the one-trajectory approximation. Briefly, the binding affinity for a complex corresponds to the free energy of association written as:

$$\Delta\mathbf{G}\_{\text{bind}} = \mathbf{G}\_{\text{complex}} - \left(\mathbf{G}\_{\text{receeptor}} - \mathbf{G}\_{\text{ligand}}\right) \tag{3}$$

In MM-GBSA protocol, the binding affinity in equation (3) is typically calculated using

$$
\Delta \mathbf{G} = \Delta \mathbf{E}\_{\mathbf{MM}} + \Delta \mathbf{G}\_{\text{solv}} - \text{T} \Delta \mathbf{S} \tag{4}
$$

Where ΔEMM represents the change in molecular mechanics potential energy upon formation of the complex, calculated using all bonded and non bonded interactions. Solvation free energy penalty, ΔGsolv, is composed of the electrostatic component (GGB) and a nonpolar component (GNP):

$$
\Delta \mathbf{G}\_{\text{solv}} = \Delta \mathbf{G}\_{\text{GB}} + \Delta \mathbf{G}\_{\text{NP}} \tag{5}
$$

G*GB* is the polar solvation contribution calculated by solving the GB equation. Dielectric constants of 1 and 80 were used for the interior and exterior, respectively.

The hydrophobic contribution to the solvation free energy, ΔGNP, is estimated using the equation:

$$\Delta \mathbf{G}\_{\text{NP}} = \mathbf{a} \cdot \mathbf{S} \mathbf{A} \mathbf{S} \mathbf{A} + \boldsymbol{\beta} \tag{6}$$

Where SASA is the solvent-accessible surface area computed by means of the Linear Combination of Pairwise Overlap (LCPO) method (Onufriev et al., 2000) with a solvent probe radius of 1.4 Å. The surface tension proportionality constant and the free energy of non polar solvation for a point solute β were set to its standard values, 0.00542 kcal/(molÅ2) and 0.92 kcal/mol, respectively (Sitkoff et al., 1994).

### **3. Results and discussion**

### **3.1 Fis-protein**

DNA-binding proteins can broadly be divided into those that recognize their DNA-binding sites with high sequence discrimination and those that bind DNA with little or no obvious sequence preference (Stefano et al., 2010). Examples of the latter class are the nucleoidassociated proteins Fis (Factor for Inversion Stimulation), IHF, UH, and H-NS in eubacteria, chromatin-associated proteins like the HMGB family, and histones in eukaryotes.

Fis protein participates in a wide array of cellular activities such as modulation of DNA topology during growth, (Schneider et al., 1997, 1999) regulation of certain site-specific DNA recombination events, (Betermier et al., 1989; Dorgai et al., 1993; Finkel & Johnson, 1992; Johnson et al., 1986; Kahmann et al., 1985; Weinreich & Reznikoff, 1992) and regulation of the transcription of a large number of genes during different stages of growth, (Kelly et al., 2004; Xu & Johnson, 1995) including ribosomal RNA and tRNA genes and genes involved in virulence and biofilm formation (Bosch et al., 1990; Ross et al., 1990; Falconi et al., 2001; Goldberg et al., 2001; Prosseda et al., 2004; Sheikh et al., 2001; Wilson et al., 2001). In addition, Fis protein can affect various biological processes involved in site-specific DNA recombination, DNA replication, or transcription (Drlica & Rouviere-Yaniv, 1987; Finkel & Johnson, 1992; Schmid, 1990). In some cases, two or more proteins may cooperate in the same process. For example, Fis and HU participate in Hin-mediated DNA recombination (Johnson et al., 1986) and Fis and IHF aid in promoting site-specific recombination of DNA (Ball & Johnson, 1991; Johnson et al., 1986; Schneider et al., 1997; Thompson et al., 1987). In other cases, these proteins can play opposing roles, as with Fis and H-NS on transcription of *hns* (Falconi et al., 1996) or IHF and Fis on transcription of *fis* (Pratt et al., 1997).

Regarding the structural aspects, Fis protein is a homodimer composed of two identical 98 amino acid subunits where each Fis subunit contains a -hairpin (residues 11 to 26) followed by four -helices (A, B, C, and D) separated by short turns, forming a helix–turn–helix (HTH, residues 74-95) DNA binding motif. Actually it is accepted that Fis protein is joined to non-specific DNA sequences.

The flexion produced by these proteins in the DNA helixes increment the phosphate groups that interacting at the protein flank. The bending of DNA chain is produced because there are two contacts regions between Fis protein and DNA. One is practically fixed during the bending process, while the other slides along the DNA chain. This causes DNA to bend and the twist motions of DNA helix allows the DNA base pair step motions. It appears that there are two different kinds of motions induced by the protein–DNA binding processes. One is the bending or unzipping proceeding along the DNA helical axis, and the other is mainly the base pair opening process which is in the direction orthogonal to the helical axis.

The deviation of the simulated dynamic from the crystal structure was monitored by the temporal evolution of the root-mean-square deviation (RMSD) of C atoms. This analysis provides a measure of the structural drift from the initial coordinates as well as the atomic fluctuation over the course of an MD simulation. Most large-scale changes in the overall

non polar solvation for a point solute β were set to its standard values, 0.00542

DNA-binding proteins can broadly be divided into those that recognize their DNA-binding sites with high sequence discrimination and those that bind DNA with little or no obvious sequence preference (Stefano et al., 2010). Examples of the latter class are the nucleoidassociated proteins Fis (Factor for Inversion Stimulation), IHF, UH, and H-NS in eubacteria,

Fis protein participates in a wide array of cellular activities such as modulation of DNA topology during growth, (Schneider et al., 1997, 1999) regulation of certain site-specific DNA recombination events, (Betermier et al., 1989; Dorgai et al., 1993; Finkel & Johnson, 1992; Johnson et al., 1986; Kahmann et al., 1985; Weinreich & Reznikoff, 1992) and regulation of the transcription of a large number of genes during different stages of growth, (Kelly et al., 2004; Xu & Johnson, 1995) including ribosomal RNA and tRNA genes and genes involved in virulence and biofilm formation (Bosch et al., 1990; Ross et al., 1990; Falconi et al., 2001; Goldberg et al., 2001; Prosseda et al., 2004; Sheikh et al., 2001; Wilson et al., 2001). In addition, Fis protein can affect various biological processes involved in site-specific DNA recombination, DNA replication, or transcription (Drlica & Rouviere-Yaniv, 1987; Finkel & Johnson, 1992; Schmid, 1990). In some cases, two or more proteins may cooperate in the same process. For example, Fis and HU participate in Hin-mediated DNA recombination (Johnson et al., 1986) and Fis and IHF aid in promoting site-specific recombination of DNA (Ball & Johnson, 1991; Johnson et al., 1986; Schneider et al., 1997; Thompson et al., 1987). In other cases, these proteins can play opposing roles, as with Fis and H-NS on transcription of

chromatin-associated proteins like the HMGB family, and histones in eukaryotes.

*hns* (Falconi et al., 1996) or IHF and Fis on transcription of *fis* (Pratt et al., 1997).

Regarding the structural aspects, Fis protein is a homodimer composed of two identical 98 amino acid subunits where each Fis subunit contains a -hairpin (residues 11 to 26) followed by four -helices (A, B, C, and D) separated by short turns, forming a helix–turn–helix (HTH, residues 74-95) DNA binding motif. Actually it is accepted that Fis protein is joined

The flexion produced by these proteins in the DNA helixes increment the phosphate groups that interacting at the protein flank. The bending of DNA chain is produced because there are two contacts regions between Fis protein and DNA. One is practically fixed during the bending process, while the other slides along the DNA chain. This causes DNA to bend and the twist motions of DNA helix allows the DNA base pair step motions. It appears that there are two different kinds of motions induced by the protein–DNA binding processes. One is the bending or unzipping proceeding along the DNA helical axis, and the other is mainly

The deviation of the simulated dynamic from the crystal structure was monitored by the temporal evolution of the root-mean-square deviation (RMSD) of C atoms. This analysis provides a measure of the structural drift from the initial coordinates as well as the atomic fluctuation over the course of an MD simulation. Most large-scale changes in the overall

the base pair opening process which is in the direction orthogonal to the helical axis.

kcal/(molÅ2) and 0.92 kcal/mol, respectively (Sitkoff et al., 1994).

**3. Results and discussion** 

to non-specific DNA sequences.

**3.1 Fis-protein** 

RMSD Cocurred within 7 ns. From this point, our results indicate that the molecular system is equilibrated and therefore we analysed the last 10 ns of simulation. The value obtained for the RMSD of unligated and complexed forms were 2.40(0.41) and 1.55(0.28) respectively, for the mean value and the standard deviation.

The local protein mobility was analyzed by calculating the average C -factors of unligated and complexed Fis-protein and compared with those values previously reported for the crystal. The -factors determine the atomic fluctuation in a protein giving information about the flexibility of such structure.

Regarding figure 1a it might be observed that the -factors obtained for the unligated Fis protein are very similar to those from the experimental data except in determined regions. These regions comprise those amino acids involved in turns, loops and the 1D y 2D helixes.

Figure 1b gives the -factors analysis for DNA bounded Fis-protein. It can be seen that residues 89-91 and 92-102 show a high mobility because they correspond to the terminal residues of the first subunit and the initial residues of the second one, respectively. With respect to the rest of the regions only small differences are observed and these slight differences might be due to the restricted motion in the crystal.

Fig. 1. Thermal factor analysis. MD (red line), experimental (black line).

To analyze the internal motions figure 2 displays the eigenvalues obtained from the diagonalization of the covariance matrix of the atomic fluctuations ranked in a decrement order. The two first eigenvalues represent about 58% and 60% of the collective movement for the unligated and complexed forms, respectively; whereas the last eigenvalues correspond to the small-amplitude vibrations.

In order to evaluate the collective movement, which is determined by the first eigenvector, we projected the last ten ns of the trajectory on the first two eigenvectors. From this resulting trajectory it is possible to calculate a new root-mean-square fluctuation (RMSF) for each residue in order to visualize which residues are the responsible for such movement. Figure 3 shows the RMSF obtained for each residue projected on the first eigenvector for the unligated and complexed forms of Fis protein. The HTH motifs in both subunities displayed a higher mobility at the unligated forms with respect to the complexed form of Fis protein. This is a logical behaviour considering that the HTH motifs are the regions for the binding to DNA. The presences of DNA makes the HTH motifs much more rigid and increase the mobility in the loops 40-49, 51-64 (subunit 1) and 58-70 (subunit 2) as well as in the helixes 2B.

Fig. 2. Comparison between the eigenvalues with the corresponding eigenvector indices obtained from the Ca covariance matrix.

Fig. 3. Residue displacement in the subspace spanned by the first eigenvector. Unligated (black line) and complexed (red line).

Figure 4 gives the movement described the first eigenvector. The red porcupine needles indicate the direction of displacements of motions whereas the size of the needle is proportional to the amount of each displacement. This situation allows us to observe in which direction are moving the previously discussed domains. In short the 1D and 2D domain, which are the mainly responsible of the DNA recognition, are moving in opposite directions (negatively correlated). In addition the 1D domain displayed a higher mobility in comparison to 2D domain. Thus, we can argue that the DNA is gliding by one of the binding regions (HTH motifs whit high mobility), whereas the other binding region is keeping fixed during the flexion process of DNA (HTH with low mobility).

To study the interdomain motions of the residues that make up the dimmer, we examined the correlated maps for the unligated and complexed forms of Fis-protein. Figures 5 a and b display the Dynamics cross-correlation map (DCCM) among the C atoms calculated from the MD simulations for the complexed and unligated Fis-protein, respectively. In both cases, unligated and complexed form, the movements are negatively correlated indicating an

the HTH motifs much more rigid and increase the mobility in the loops 40-49, 51-64

Unligated Complexed

0 10 20 30 40 50 **Eigenvector index**

Fig. 2. Comparison between the eigenvalues with the corresponding eigenvector indices

**Subunit 1 Subunit 2**

**40-49 51-64 74-95 58-70 74-95** 

**Residue index** 

Fig. 3. Residue displacement in the subspace spanned by the first eigenvector. Unligated

keeping fixed during the flexion process of DNA (HTH with low mobility).

Figure 4 gives the movement described the first eigenvector. The red porcupine needles indicate the direction of displacements of motions whereas the size of the needle is proportional to the amount of each displacement. This situation allows us to observe in which direction are moving the previously discussed domains. In short the 1D and 2D domain, which are the mainly responsible of the DNA recognition, are moving in opposite directions (negatively correlated). In addition the 1D domain displayed a higher mobility in comparison to 2D domain. Thus, we can argue that the DNA is gliding by one of the binding regions (HTH motifs whit high mobility), whereas the other binding region is

To study the interdomain motions of the residues that make up the dimmer, we examined the correlated maps for the unligated and complexed forms of Fis-protein. Figures 5 a and b display the Dynamics cross-correlation map (DCCM) among the C atoms calculated from the MD simulations for the complexed and unligated Fis-protein, respectively. In both cases, unligated and complexed form, the movements are negatively correlated indicating an

(subunit 1) and 58-70 (subunit 2) as well as in the helixes 2B.

**16%**

**23% 44% 35%**

**Eigenvalue (A2**

obtained from the Ca covariance matrix.

(black line) and complexed (red line).

**Eigvec1-disp (Å)** 

**)**

expansion and contraction for the binding site located between the HTH motifs of each subunity. On the basis of our results it seems reasonable to argue that this movement is the responsible of DNA flexion.

Figure 6 gives the interaction binding energy decomposition by residue obtained for the complexed Fis protein. In this figure the strength of each interaction might be very well appreciated. Our results are in an almost complete agreement with those reported by Stefano et al. showing that Thr75, Asn73, Gln74, Arg76, Ile83, Asn84, Arg85, Thr87 and Arg89 makes the major contribution for the binding of DNA with Fis protein. It should be noted that only Lys90 is missing in our results in comparison to the experimental data (Stefano et al., 2010).

Fig. 4. Porcupine plots obtained for the unligated (a) and complexed (b) forms of Fis protein.

Fig. 5. Cross-correlation matrix obtained for the fluctuations of unligated (a) and complexed (b) forms of Fis protein.

Fig. 6. DNA-Fis protein residue interaction spectrum. The y-axis denotes the interaction energy between the inhibitor and specific residues. The same pattern was observed in both subunities.

### **3.2 tvMyb1 transcriptional factor**

The transcription regulator tvMyb1 was the first Myb family protein identified in Trichomonas vaginalis (*T. vaginalis*)*,* a flagellated protozoan parasite of humans, causative agent of trichomoniasis, the most common non-viral sexually transmitted infection worldwide (WHO, 1995). T. vaginalis infection may cause adverse health consequences, including preterm abort and pelvic inflammatory disease in women, as well as infertility and increased incidence of human immunodeficiency virus transmission in women and men (Cotch et al, 1997; Laga et al., 1993; Martinez-Garcia et al., 1996; Moodley et al., 2002; Sherman et al., 1987; Sorvillo et al., 1998).

*T. vaginalis* as other pathogens requires iron for its metabolism. This cation is essential for its growth in the human vagina, where the iron concentration is constantly changing through the menstrual cycle. Iron also favours the adherence of the parasite to vaginal epithelial cells, metabolism and multiplication in culture (Gorrell, 1985; Lehker and Alderete, 1992). In spite of the high prevalence of trichomoniasis and the complications associated with the disease, *T. vaginalis* remains one of the most poorly studied parasites with respect to virulence properties and pathogenesis.

Myb proteins contains DNA-binding domains composed of one, two or three repeated motifs (called R1, R2 and R3 respectively) of approximately 50 amino acids, surrounded by three conserved tryptophan residues (Lipsick, 1996). These tryptophans play a role in the folding of the hydrophobic core of the Myb domains, and are generally conserved in all Myb protein. This hydrophobic core generates the helix-turn-helix (HTH) structure of the DNAbinding domain (Sakura et al., 1989). These HTH motifs recognize the major groove of the target DNA sequences.

Recently, Lou et al. reported the structural basis for the tvMyb35–141/DNA interaction investigated using nuclear magnetic resonance, chemical shift perturbations, residual dipolar couplings, and DNA specificity (Lou, 2009). In addition, these authors showed that the tvMyb35–141 fragment is the minimal DNA-binding domain encompassing two Myb-like DNA-binding motifs designated as R2 and R3 motifs. Both R2 and R3 motifs consists of three helices adopting a HTH conformations. Both motifs are connected by a long loop. The

**Asn73 Gln74 Arg76**

**Thr75**

**Arg89**

**Residue number** 

Fig. 6. DNA-Fis protein residue interaction spectrum. The y-axis denotes the interaction energy between the inhibitor and specific residues. The same pattern was observed in both

The transcription regulator tvMyb1 was the first Myb family protein identified in Trichomonas vaginalis (*T. vaginalis*)*,* a flagellated protozoan parasite of humans, causative agent of trichomoniasis, the most common non-viral sexually transmitted infection worldwide (WHO, 1995). T. vaginalis infection may cause adverse health consequences, including preterm abort and pelvic inflammatory disease in women, as well as infertility and increased incidence of human immunodeficiency virus transmission in women and men (Cotch et al, 1997; Laga et al., 1993; Martinez-Garcia et al., 1996; Moodley et al., 2002;

*T. vaginalis* as other pathogens requires iron for its metabolism. This cation is essential for its growth in the human vagina, where the iron concentration is constantly changing through the menstrual cycle. Iron also favours the adherence of the parasite to vaginal epithelial cells, metabolism and multiplication in culture (Gorrell, 1985; Lehker and Alderete, 1992). In spite of the high prevalence of trichomoniasis and the complications associated with the disease, *T. vaginalis* remains one of the most poorly studied parasites with respect to

Myb proteins contains DNA-binding domains composed of one, two or three repeated motifs (called R1, R2 and R3 respectively) of approximately 50 amino acids, surrounded by three conserved tryptophan residues (Lipsick, 1996). These tryptophans play a role in the folding of the hydrophobic core of the Myb domains, and are generally conserved in all Myb protein. This hydrophobic core generates the helix-turn-helix (HTH) structure of the DNAbinding domain (Sakura et al., 1989). These HTH motifs recognize the major groove of the

Recently, Lou et al. reported the structural basis for the tvMyb35–141/DNA interaction investigated using nuclear magnetic resonance, chemical shift perturbations, residual dipolar couplings, and DNA specificity (Lou, 2009). In addition, these authors showed that the tvMyb35–141 fragment is the minimal DNA-binding domain encompassing two Myb-like DNA-binding motifs designated as R2 and R3 motifs. Both R2 and R3 motifs consists of three helices adopting a HTH conformations. Both motifs are connected by a long loop. The

**Arg85**

**Asn84 Thr87**

**Ile83**


subunities.

**3.2 tvMyb1 transcriptional factor** 

Sherman et al., 1987; Sorvillo et al., 1998).

virulence properties and pathogenesis.

target DNA sequences.

**Energy interaction (kcal/mol)**

**Energy interaction (kcal/mol)** 

published experimental data indicates that the orientation between R2 and R3 motifs dramatically changes upon DNA binding through a number of key contacts involving residues in helices 3 and 6 to the DNA major groove.

A useful method monitoring the flexibility of a protein is the order parameter S2 (Peter et al. 2001; Showalter et al. 2007). This normalized autocorrelation function related to the protein N-H bound vector were evaluated from the equilibrium averaged MD trajectories. S2 gives a measure of the system flexibility, being 1 in a completely rigid system or zero for total flexibility where all the possible conformations are sampled. Figure 7 shows that the calculated S2, during the last ten ns of simulation, for both complexed and unligated forms of the tvMyb1 protein. This figure is in agreement with those obtained from the NMR measurements (Lou et al., 2009). On the basis of such similitude we can argue that the simulated system follows the same trend to that observed from experimental data.

The S2 obtained for the R2 (residues 40-80) domain are 0.74 and 0.79 for the complexed and unligated structures, whereas the S2 obtained for the R3 doma(residues 92-135) in are 0.81 and 0.74 for the complexed and unligated forms, respectively. From these results it might be argue that the R2 of the unligated structure is more rigid than the R2 motif of the complexed form. In turn, the R3 domain of complexed structure is somewhat rigid with respect to the unligated form.

Fig. 7. tvMyb1 backbone N-H bond order parameter profiles from the last 10 ns of trajectory. Complexed form (red circles) and unligated form (black circles).

Figure 8 displays the eigenvalues obtained from the diagonalization of the covariance matrix of the atomic fluctuations ranked in a decrement order. The two first eigenvalues represent about 60% and 65% of the collective movement for the unligated and complexed forms, respectively.

In order to analyse which residues are the responsible of the different movements, we calculate the root-mean-square fluctuation (RMSF) from the equilibrated trajectory. Figure 9a shows the RMSF for each residue projected on the first eigenvector for the unligated and complexed forms of tvMyb1 protein. In this figure it is possible to observe that residues 83- 90 on the unligated form show higher fluctuation with respect to that observed for the same residues at the complexed form, the same trend was observed for residues 117-130. In contrast, residues 96-101 display in the complexed form a higher fluctuation than that observed in unligated form.

Figure 9b shows the RMSF for each residue projected on the second eigenvector. In this figure it might be observed that there is not a significant difference in the R2 domain. However residues 79-86 in the unligated form displayed higher flexibility in comparison to the complexed form. On the other hand residue 93-96, 99-104 and 119-121 are more flexible in the complexed form with respect to the unligated form.

Fig. 8. Eigenvalues plotted against the corresponding eigenvector indices obtained from the Ca covariance matrix.

Fig. 9. Residue displacement in the subspace spanned by the first eigenvector (a) and the second eigenvector (b). Unligated and complexed forms are denoted in black and red lines, respectively.

Fig. 10. Cross-correlation matrix obtained for the fluctuations of unligated (a) and complexed (b) forms of tvMyb1 protein.

Figure 9b shows the RMSF for each residue projected on the second eigenvector. In this figure it might be observed that there is not a significant difference in the R2 domain. However residues 79-86 in the unligated form displayed higher flexibility in comparison to the complexed form. On the other hand residue 93-96, 99-104 and 119-121 are more flexible

> 0 10 20 30 40 50 **Eigenvector index**

Fig. 9. Residue displacement in the subspace spanned by the first eigenvector (a) and the second eigenvector (b). Unligated and complexed forms are denoted in black and red lines,

> **H1 H2 H3**

Fig. 10. Cross-correlation matrix obtained for the fluctuations of unligated (a) and

**H4 H5 H6** 

Fig. 8. Eigenvalues plotted against the corresponding eigenvector indices obtained from the

**Eigvec2-disp (Å)** 

Unligated Complexed

**79-86 93-96 99-104 119-121** 

**Residue number** 

**H1 H2 H3 H4 H5 H6** 

in the complexed form with respect to the unligated form.

**33% 32% 41%**

**19%**

**83-90 96-101 117-130**

a) b)

0

**Residue index** 

a) b)

**H1 H2 H3 H4 H5 H6** 

complexed (b) forms of tvMyb1 protein.

10

20

**Autovalues (A2**

Ca covariance matrix.

**Eigvec1-disp (Å)** 

respectively.

**H1 H2 H3 H4 H5 H6**  **)**

30

40

Fig. 11. DNA-tvMyb1 protein residue interaction spectrum. The y-axis denotes the interaction energy between the inhibitor and specific residues.

To better understand the relationship between R2 and R3 domains, we plotted the cross correlation maps for complexed and unligated forms of the tvMyb1 protein (figure 10). Figure 10a gives the cross correlation map obtained for the unligated form. In this figure we can observe that R2 domain (approximately residues 44-72) move in a negatively correlated direction with the R3 domain (approximately residue 97-134). Whereas the H4 and H5 (residues 96-112) move in a positively correlated direction in conjunction with H6 (residues 118-134) in the R3 domain. Figure 10b shows the same behaviour for the complexed form but it should be noted that in this case this movement is somewhat attenuated.

The interaction binding energy decomposition by residue, obtained for the complexed tvMyb1 protein, is show in Figure 11. Our results indicate that residues Lys35, Arg71, Arg74 (R2 domain) , residue Arg86 (loop L1) , and residues Pro89, Asn110, Lys114, Asn123, Arg125, Arg127 and His134 (R3 domain) makes the strongest interactions with the DNA molecule showing that our theoretical results are in agreement with the reported experimental data (Lou, 2009).

### **3.3 BACE1 enzyme**

An estimated of 24 million people Worldwide have dementia, the majority of whom are though to have Alzheimer's disease. The two core pathological hallmarks of Alzheimer's disease are amyloid plaques and neurofibrillary tingles. The amyloid cascade hypothesis suggests that deposition of amyloid triggers neuronal dysfunction and death in the brain

A42 induces neuronal lipid peroxidation and protein oxidation in vivo and in vitro, possibly by generating radicals (Butterfield, 2001, 2003; Hensley et al., 1995) (78–80). Although the neurotoxicity of A42 is related to the generation of H2O2 (Behl et al., 1994) (81) the chemistry involved in generating the oxidation products via A42 remains unclear. The A is generated from the amyloid precursor protein (APP) rupture by two proteases: -site amyloid cleaving enzyme and -site amyloid cleaving enzyme (-secretase). Thus, secretace and -secretase are attractive targets for the treatment of AD.


The flexibility of the free BACE1 enzyme has already been reported by Caflisch et al. and by Chakraborty et al. and our simulations are in agreement with these previously reported results. Figures 12a and b gives the experimental and calculated β-factors obtained from the last 10 ns of simulation for free and complexed enzyme, respectively. In both Figures it is possible to appreciate the same general trend. However, in the complexed enzyme the mobility is reduced with respect to free enzyme, due to the presence of the inhibitor.

Fig. 12. Thermal factor analysis. MD simulations (red line), experimental data (black line).

In the free and complexed form of BACE1 enzyme it is possible to appreciate different regions with significant mobility. The most representatives are: the 10s loop (residues 9-14), located between two strands at the bottom of the S3 sub pocket; the β-harpin flap (residues 67-77); the A loop (residues 158-167); the F loop (residues 311-318) and D loop (residues 270- 273). It is noteworthy to note that the 10s loop, β-harpin flap, A loop loop and F are part of the catalytic cavity.

Figure 13 shows the eigenvalues obtained from the diagonalization of the covariance matrix. Comparing the two models this plot indicates that the presence of the inhibitor in the complexed enzyme decreases the first two eigenvalues drastically with respect to the free form, in accordance with the results shown in Figure 1a) and 1b). The first two eigenvalues


The flexibility of the free BACE1 enzyme has already been reported by Caflisch et al. and by Chakraborty et al. and our simulations are in agreement with these previously reported results. Figures 12a and b gives the experimental and calculated β-factors obtained from the last 10 ns of simulation for free and complexed enzyme, respectively. In both Figures it is possible to appreciate the same general trend. However, in the complexed enzyme the

mobility is reduced with respect to free enzyme, due to the presence of the inhibitor.

**-factor (Å2)** 

**-factor (Å2)** 

the catalytic cavity.

the presence of an inhibitor in the exosite.

a) b)

**Residue number Residue number** 

Fig. 12. Thermal factor analysis. MD simulations (red line), experimental data (black line).

In the free and complexed form of BACE1 enzyme it is possible to appreciate different regions with significant mobility. The most representatives are: the 10s loop (residues 9-14), located between two strands at the bottom of the S3 sub pocket; the β-harpin flap (residues 67-77); the A loop (residues 158-167); the F loop (residues 311-318) and D loop (residues 270- 273). It is noteworthy to note that the 10s loop, β-harpin flap, A loop loop and F are part of

Figure 13 shows the eigenvalues obtained from the diagonalization of the covariance matrix. Comparing the two models this plot indicates that the presence of the inhibitor in the complexed enzyme decreases the first two eigenvalues drastically with respect to the free form, in accordance with the results shown in Figure 1a) and 1b). The first two eigenvalues account for approximately 59 and 56% of the collective motion of the complexed and free for of the enzyme, respectively and therefore we focussed our study on these two eigenvectors.

Fig. 13. Eigenvalues plotted against the corresponding eigenvector indices obtained from the C covariance matrix.

The motion along any eigenvector can be visualized by projecting all trajectory frames onto a specific eigenvector. Thus, from this new path we calculated the RMSF for the first two eigenvectors (Figure 14). The dynamic behaviour of both forms of the enzyme (i.e. free and complexed) differs in specific areas. In the free form of the enzyme a high mobility was observed for the β-harpin flap and loops A and F, while in the complexed form these regions displayed a similar behaviour. However, loops A and F displayed an attenuated flexibility with respect to the movement on the free enzyme. Another differential behaviour was found in residues 87-93 and 325-330. This behaviour may be relevant because it has been reported that residues 325-330 in the flap are responsible for regulating the ingresses and egress of the BACE1 substrate. (Chakraborty et al., 2011). The collective movement, in for free and complexed form of the enzyme, might be visualized from the porcupine plots in Figures 15 and 16. The collective movement represented by the eigenvectors could be analyzed in two parts a) movement between the N-terminal lobe (residues 1-150) and the Cterminal lobe (residues 151-385), and b) the movement on specific regions that are part of the catalytic cliff.

Figure 14 shows the representation of the first eigenvector for the free and complexed for of the enzyme. Figure 14 a) shows that the C-terminal lobe and N-terminal lobe move in opposite directions. In addition, we see that the β-harpin flap moves toward the catalytic cliff while loops A and F move away, allowing a hinge movement between the lobes. The concerted movements of the above mentioned regions are the ones that dominate the transition between open and closed conformation of BACE1 in agreement with those results reported by Chakraborty. In addition the first eigenvector for the complexed for of the enzyme, Figure 14b displayed an opening between the lobes where the C-terminal lobe moves away from to the N-terminal lobe. It is also possible to appreciate the movement of the β-harpin flap which moves in the opposite direction to the loop 325-330. In the free form of the enzyme, the described movement it is not evident. This finding is very important because these two loops are directly responsible for ingresses and egress of the substrate.

The complexed form of the enzyme (figure 15) displayed a dismissed movement due to the presence of the inhibitor. It is interesting to note that loops A, D and F, which forms part of the exosite, shows a lower movement with respect to the free enzyme due that the inhibitor makes strong interactions with the residues Gln163 (A loop), Trp270 (D loop), Asp311, Thr314 and Asp317 (loop F) in the complexed form of the enzyme as reported by Gutierrez et al. (Gutierrez et al., 2010).

**a)** 

Fig. 14. Porcupine plots obtained for the first a) and second b) eigenvector of the unligated form of BACE1.

The complexed form of the enzyme (figure 15) displayed a dismissed movement due to the presence of the inhibitor. It is interesting to note that loops A, D and F, which forms part of the exosite, shows a lower movement with respect to the free enzyme due that the inhibitor makes strong interactions with the residues Gln163 (A loop), Trp270 (D loop), Asp311, Thr314 and Asp317 (loop F) in the complexed form of the enzyme as reported by Gutierrez

**b)** 

**C-Terminal N-Terminal** 

Fig. 14. Porcupine plots obtained for the first a) and second b) eigenvector of the unligated

**-harpin flap** 

**10s loop** 

**A loop** 

**F loop** 

et al. (Gutierrez et al., 2010).

**a)** 

**-harpin flap** 

form of BACE1.

**10s loop** 

**A loop** 

**F loop** 

**C-Terminal N-Terminal**

Fig. 15. Porcupine plots obtained for the first a) and second b) eigenvector of the complexed form of BACE1.

### **4. Conclusion**

In the present chapter we reported MD simulations performed on three different molecular systems of biological interest: i) DNA-bending protein Fis (Factor for Inversion Stimulation), ii) DNA-tvMyb1 (*Trichomonas vaginalis* transcriptional factor) and iii) the BACE1 (β-site amyloid cleaving enzyme 1). The model structures proposed accounted for different experimental biological date for these systems indicating that the essential dynamics (ED) method also called Principal Component Analysis (PCA) is a very fruitful tool in order to evaluate the conformational behaviour of molecular systems like reported here. On the other hand, examining the cross-correlation (normalized covariance) matrix, we were able to obtain collective movements which allow examining domain motions for these models. The MM-GBSA methods in turn, allow us to determine binding hot-spots residues as well as the binding energy decomposition giving additional information which in general is very difficult to obtain from experimental data. It is clear that these MD simulations, if it is possible, must be corroborated with accurate experimental data in order to determine their real reaches and limitations. However, it is evident that these theoretical techniques properly applied are very useful and gives additional information to determine the conformational behaviour of complex biological systems.

### **5. Acknowledgments**

This work was supported by Universidad Nacional de San Luis (UNSL), Instituto Multidisciplinario de Investigaciones Biológicas de San Luis (IMIBIO-SL, CONICET), Instituto de Matemática Aplicada San Luis (IMASL−CONICET) and Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET−Argentina). L.J.G. gratefully acknowledges a CONICET fellowship; R.D.E. and H.A.B. are staff members of CONICET.

### **6. References**


amyloid cleaving enzyme 1). The model structures proposed accounted for different experimental biological date for these systems indicating that the essential dynamics (ED) method also called Principal Component Analysis (PCA) is a very fruitful tool in order to evaluate the conformational behaviour of molecular systems like reported here. On the other hand, examining the cross-correlation (normalized covariance) matrix, we were able to obtain collective movements which allow examining domain motions for these models. The MM-GBSA methods in turn, allow us to determine binding hot-spots residues as well as the binding energy decomposition giving additional information which in general is very difficult to obtain from experimental data. It is clear that these MD simulations, if it is possible, must be corroborated with accurate experimental data in order to determine their real reaches and limitations. However, it is evident that these theoretical techniques properly applied are very useful and gives additional information to determine the

This work was supported by Universidad Nacional de San Luis (UNSL), Instituto Multidisciplinario de Investigaciones Biológicas de San Luis (IMIBIO-SL, CONICET), Instituto de Matemática Aplicada San Luis (IMASL−CONICET) and Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET−Argentina). L.J.G. gratefully acknowledges

Amadei, A., Linssen, A.B.M., & Berendsen, H.J.C. (1993). Essential dynamics of proteins.

Ball, C. A., & Johnson, R. C. (1991). Efficient excision of phage lambda from the *Escherichia* 

Ball, C. A., & Johnson, R. C. (1991). Multiple effects of Fis on integration and the control of lysogeny in phage l. *Journal of bacteriology*, Vol.173, No.13, (July 1991), pp.4032–4038. Behl, C., Davis, J. B., Lesley, R., & Schubert, D. (1994). Hydrogen peroxide mediates amyloid

Betermier, M., Lefrere, V., Koch, C., Alazard, R., & Chandler, M. (1989). The Escherichia coli

Bosch, L., Nilsson, L., Vijgenboom, E., & Verbeek, H. (1990). FIS-dependent trans-activation

Butterfield, D. A., Drake, J., Pocernich, C., & Castegna, A. (2001). Evidence of oxidative

*coli* chromosome requires the Fis protein. *Journal of bacteriology*, Vol.173, No. 13,

protein, Fis: specific binding to the ends of phage Mu DNA and modulation of phage growth. *Molecular Microbiology*, Vol.3, No. 4, (April 1983), pp.459–468, ISSN 0950-382X

of tRNA and rRNA operons of Escherichia coli. *Biochimica et Biophysica Acta*,

damage in Alzheimer's disease brain: central role for amyloid -peptide. *Trends in Molecular Medicine*, Vol.7, No. 12, (December 2001), pp.548–554 ISSN 1471-4914 Butterfield, D. A.: Amyloid -peptide (1–42)-associated free radical-induced oxidative

stress and neurodegeneration in Alzheimer's disease brain: mechanisms and consequences. (2003). *Current Medicinal Chemistry*, Vol. 10, No. 24, (December 2003),

*Proteins: Structure, Function and Genetics*, Vol.17, (1993), pp.412-425.

beta protein toxicity. Cell, Vol. 77, No. 6, (June 1994), pp. 817–827

Vol.1050, No. 1, (August 1990), pp.293–301, ISSN:0006-3002

a CONICET fellowship; R.D.E. and H.A.B. are staff members of CONICET.

conformational behaviour of complex biological systems.

(July 1991), pp.4027–4031.

pp.2651–2659, ISSN 0929-8673

**5. Acknowledgments** 

**6. References** 


http://www.freepatentsonline.com/7314726.html


Hampel, H., Shen, Y., Walsh, D. M., Aisen, P., Shaw, L. M., Zetterberg, H., Trojanowski, J. Q.

Hensley, K., Hall, N., Subramaniam, R., Cole, P., Harris, M., Aksenov, M., Aksenova, M.,

Ichiye, T., & Karplus, M. (1991). Collective motions in proteins: a covariance analysis of atomic

Johnson, R. C., Bruist, M. F., & Simon, M. I. (1986). Host protein requirements for *in vitro*  site-specific DNA inversion. *Cell*, Vol.46, No.4, (1986) pp.531–539, ISSN 0092-8674 Kahmann, R., Rudt, F., Koch, C., & Mertens, G. (1985). G inversion in bacteriophage Mu

Kelly, A., Goldberg, M. D., Carroll, R. K., Danino, V., Hinton, J. C., & Dorman, C. J. (2004). A

Kollman, P. A., Massova, I. Reyes, C., Kuhn, B., Huo, S., Chong, L., Lee, M., Lee, T., Duan,

*Chemical Research*, Vol. 33, (April 2000), No. 12, pp. 889–897, ISSN 0001-4842 Kornacker, M. G., Copeland, R. A., Hendrick, J., Lai, Z., Mapelli, C., Witmer, M. R.,

Kornacker, M. G., Lai, Z., Witmer, M., Ma, J., Hendrick, J., Lee, V. G., Riexinger, D. J.,

Krishnaswamy, S., & Betz, A. (1997). Exosites Determine Macromolecular Substrate Recognition by Prothrombinase. *Biochemistry*, Vol. 36, No. 40, (1997), pp.12080-12086 Laga, M., Manoka, A., Kivuvu, M., Malele, B., Tuliza, M., Nzila, N., Goeman, J., Behets, F.,

Lehker, M. W., & Alderete, J. F. (1992). Iron regulates growth of Trichomonas vaginalis and

Lin, X., Koelsch, G., Wu, S., Downs, D., Dashti, A., & Tang, J. (2000). Human aspartic

*Biochemistry*, Vol. 44, No. 34, (August 1997), pp.11567-11573.

No. 1, (January 1993), pp.95-102, ISSN 0269-9370

2000), pp. 1456-1460, ISSN 1091-6490

Vol.6, No. 1, (January 1992), pp.123–132, ISSN 0950-382X

No. 5, (November 1995), pp.2146–2156, ISSN 1471-4159

346, ISSN 0014-4886

No. 3, (July 1985), pp.771–780.

*US Patent 7314726*, Available from: http://www.freepatentsonline.com/7314726.html

2004), pp.2037–2053.

& Blennow, K. (2009). Biological markers of amyloid β-related mechanisms in Alzheimer's disease. *Experimental Neurology*, Vol. 223, No. 3, (June 2010), pp.334-

Gabbita, S. P., Wu, J. F., Carney, J. M., Lovell, M., Markesbery, W. R., & Butterfield, D. A. (1995). Brain regional correspondence between Alzheimer's disease histopathology and biomarkers of protein oxidation. *Journal Neurochemistry*, Vol. 65,

fluctuations in molecular dynamics and normal mode simulations. *Proteins: Structure, Function, and Bioinformatics*, Vol.11, No. 3, (November 1991), pp.205-217, ISSN 1097-0134

DNA is stimulated by a site within the invertase gene and a host factor. *Cell*, Vol41,

global role for Fis in the transcriptional control of metabolism and type III secretion in Salmonella enterica serovar Typhimurium. *Microbiology*, Vol.150, No.7, (July

Y., Wang, W., Oreola Donini, O., Cieplak, P., Srinivasan, J., Case, D. A. & Cheatham, T. E. (2000). Calculating Structures and Free Energies of Complex Molecules: Combining Molecular Mechanics and Continuum Models. *Accounts of* 

Marcinkeviciene, J., Metzler, W., Lee, V., Riexinger, D. J. (2008). Beta secretase exosite binding peptides and methods for identifying beta secretase modulators.

Mapelli, C., Metzler, W., & Copeland, R. A. (2005). An inhibitor binding pocket distinct from the catalytic active site on human -APP cleaving enzyme.

Batter, V., & Alary, M. (1993). Nonulcerative sexually transmitted diseases as risk factors for HIV-1 transmission in women: results from a cohort study. AIDS Vol.7,

the expression of immunogenic trichomonad proteins. *Molecular Microbiology*,

protease memapsin 2 cleaves the beta-secretase site of beta-amyloid precursor protein. *Proceedings of the National Academy of Sciences*, Vol. 97, No. 4, (February


## **MM-GB(PB)SA Calculations of Protein-Ligand Binding Free Energies**

Joseph M. Hayes1 and Georgios Archontis2 *1Institute of Organic & Pharmaceutical Chemistry National Hellenic Research Foundation, Athens 2Department of Physics, University of Cyprus, Nicosia 1Greece 2Cyprus* 

#### **1. Introduction**

170 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Schneider, R., Travers, A., & Muskhelishvili, G. (1997). Fis modulates growth phase-

Sheikh, J., Hicks, S., Dall'Agnol, M., Phillips, A. D., & Nataro, J. P. (2001). Roles for Fis and

*Microbiology,* Vol.41, No. 5, (September 2001), pp.983–997, ISSN 0950-382X Sherman, K. J., Daling, J. R., & Weiss, N. A. (1987). Sexually transmitted diseases and tubal

Showalter, S.A., & Brüschweiler, R. (2007). Validation of molecular dynamics simulations of

Sitkoff, D., Sharp, K., & Honing, B. *J.* (1994). *The Journal of Physical Chemistry,* Vol. 98, No. 7,

Sorvillo F., & Kerndt P. (1998). *Trichomonas vaginalis* and amplification of HIV-1 transmission. *The Lancet,* Vol.351, No. 9097, (January 1998), pp.213-4. ISSN 0140-6736 Stella, S., Cascio, D., & Johnson R. C. (2010). The shape of the Dna minor groove directs

Thompson, J. F., Moitoso de Vargas, L., Koch, C., Kahmann, R., & Landy, A. (1987). Cellular

Vassar, R., Bennett, B. D., Babu-Khan, S., Kahn, S., Mendiaz, E. A., Denis, P., Teplow, D. B.,

Weinreich, M. D., & Reznikoff, W. S. (1992). Fis plays a role in Tn5 and IS50 transposition.

Wilson, R. L., Libby, S. J., Freet, A. M., Boddicker, J. D., Fahlen, T. F., & Jones, B. D. (2001).

World Health Organization. 1995. An Overview of Selected Curable Sexually Transmitted

Xu, J., & Johnson, R. C. (1995). Identification of genes negatively regulated by Fis: Fis and

*Journal of Bacteriology*, Vol.177, No. 4, (February 1995), pp.938–947.

Vol. 286, No. 5440, (October 1999), pp. 735-741, ISSN 0036-8075

*Journal of Bacteriology*, Vol.174, No. 14, (July 1992), pp.4530–4537.

pp.12-6, ISSN 0148-5717

3, (January 2007). pp. 961-975.

(February 1994), pp.1978–1988.

(September 1987), pp.901–908.

2001), pp.79–88, ISSN 0950-382X

Switzerland, pp. 2–27.

2010), pp.814-826, ISSN 0890-9369/10.

*Microbiology,* Vol.26, No. 3, (October 1997), pp.519–530, ISSN 0950-382X Schneider, R., Travers, A., Kutateladze, T., & Muskhelishvili, G. (1999). ADNA architectural

dependent topological transitions of DNA in Escherichia coli. *Molecular* 

protein copules cellular physiology and DNA topology in Escherichia coli. *Molecular Microbiology*, Vol.34, No. 5, (December 1999), pp.953–964, ISSN 0950-382X

YafK in biofilm formation by enteroaggregative Escherichia coli. *Molecular* 

infertility. *Journal of the American Sexually Transmitted Diseases*, Vol.14, No. 1, (1987),

biomolecules using NMR spin relaxation as benchmarks: Application to the AMBER99SB force field. Journal of Chemical Theory and Computation. Vol. 3, No

binding by the DNA-bending protein Fis. *Genes & Development*, Vol. 24, (August

factors couple recombination with growth phase: characterization of a new component in the lambda site-specific recombination pathway. *Cell*, Vol.50, No. 11,

Ross, S., Amarante, P., Loeloff, R., Luo, Y., Fisher, S., Fuller, J., Edenson, S., Lile, J., Jarosinski, M. A., Biere, A. L., Curran, E., Burgess, T., Louis, J. C., Collins, F., Treanor, J., Rogers, G. & Citron, M. (1999). β-Secretase Cleavage of Alzheimer's Amyloid Precursor Protein by the Transmembrane Aspartic Protease BACE. *Science*,

Fis, a DNA nucleoid-associated protein, is involved in Salmonella typhimurium SPI-1 invasion gene expression. *Molecular Microbiology*, Vol.39, No. 1, (February

Disease. In: World Health Organization (ed), Global program of AIDS. Geneva,

RpoS comodulate growth-phase-dependent gene expression in Escherichia coli.

The importance of computational chemistry in modern scientific research is well established. Continuous improvement in software and algorithms for the modeling of chemical interactions has transformed molecular modeling into a powerful tool for many current day research projects. From a medical perspective, one of the ultimate goals in computer-aided drug design (CADD) is the accurate prediction of ligand-binding affinities to a macromolecular target, which can facilitate and speed the routine identification of new candidates in early stage drug discovery projects (Gilson & Zhou, 2007; Hayes & Leonidas, 2010). In particular, structure-based modeling provides an efficient pathway towards exploiting known three-dimensional structural data in the design and proposal of new molecules for experimental evaluation. Docking calculations are now widely used in highthroughput virtual screening of structurally diverse molecules from available compound libraries/databases against specific targets. Once initial "hits" or "lead" molecules are identified (normally low µM inhibitors), modification of their chemical features in the "lead optimization" phase can improve their binding affinities and fine-tune other desirable druglike properties. However, docking calculations currently have limited success beyond the lead identification stage, where more accurate lower-throughput computational methods are needed. In this regard, the Molecular Mechanics/Generalized Born Surface Area (MM-GBSA) and Molecular Mechanics/Poisson-Boltzmann Surface Area (MM-PBSA) methods calculate binding free energies using molecular mechanics (forcefields) and continuum (implicit) solvation models (Kollman et al., 2000).They have been successfully applied across a range of targets and are implemented in software programs such as Amber (Case et al., 2005), Delphi (Rocchia et al., 2001) and Schrödinger (Du et al., 2011). With a target readership from beginner to expert, the current chapter provides an extensive and critical overview of MM-GB(PB)SA methods and their applications. The theoretical foundation of the MM-GB(PB)SA method is first described. We then discuss key aspects which improve the accuracy of results, and highlight potential caveats due to the approximations inherent in the methods. The chapter concludes with a review of recent representative applications, which illustrate both successes and limitations. The emphasis of this chapter is on structurebased drug design (SBDD) efforts, however, the methods have widespread applicability in other areas such as in supramolecular chemistry.

### **2. Theoretical background – "Pathway" and "Endpoint" methods**

Low-throughput computational approaches for the calculation of ligand binding free energies can be divided into "pathway" and "endpoint" methods (Deng & Roux, 2009; Gilson & Zhou, 2007). In pathway methods, the system is converted from one state (e.g., the complex) to the other (e.g., the unbound protein/ligand). This can be achieved by introducing a set of finite or infinitesimal "alchemical" changes to the energy function (the Hamiltonian) of the system through free-energy perturbation (FEP) or thermodynamic integration (TI), respectively (Kollman, 1993; Straatsma & McCammon, 1992). The fundamentals of FEP and TI methods were introduced many decades ago by John Kirkwood (Kirkwood, 1935) and Robert Zwanzig (Zwanzig, 1954). In recent years, their use in the computation of absolute binding affinities has become feasible due to increases in computational power, the development of more accurate models of atomic interactions (Cornell et al., 1995; MacKerell et al., 1998; van Gunsteren et al., 1996), the clarification of the underlying theoretical framework and the introduction of methodological advances (Boresch et al., 2003; Bowers et al., 2006; Deng and Roux, 2009; Gilson et al., 1997; Gilson & Zhou, 2007; Lee & Olson, 2006; Mobley et al., 2007). Combined with atomistic molecular dynamics (MD) or Monte Carlo (MC) simulations in explicit water solvent models, they are arguably the most accurate methods for calculating absolute or relative ligand binding affinities.

The "alchemical" computation of *differences* in binding affinities (rather than *absolute*  affinities) among a set of related ligands for the same target protein is more accurate and technically simpler. A thermodynamic cycle illustrating the basic principles is shown in Figure 1 (Tembe & McCammon, 1984). The horizontal legs describe the experimentally accessible actual binding processes, with free energies Gbind(L1) and Gbind(L2). Since the free energy is a state function, the relative binding free energy Gbind is exactly equal to the difference of the free energies in the horizontal or vertical legs:

$$
\Delta\Delta G\_{\text{bind}} = \Delta G\_{\text{bind}} (L\mathcal{D}) - \Delta G\_{\text{bind}} (L\mathbf{1}) \tag{1}
$$

$$=\Delta G\_{\text{complex}}(L1\rightarrow L2) - \Delta G\_{\text{free}}(L1\rightarrow L2) \tag{2}$$

The simulations follow the vertical steps (Eq. (2)) or unphysical processes, by simulations in water solution that gradually change the energy-function of the system from one "endpoint" to the other through a series of intermediate hybrid states.From Figure 1, this involves the stepwise "alchemical" transformation of ligand L1 to L2 both in its 'free' state (unbound) and in the bound complex, through gradual changes in the forcefield parameters describing the ligand interactions. This leads to the free energy changes Gfree(L1→L2) and Gcomplex(L1→L2), respectively. Averaging over both transformation directions is often used to improve the free-energy estimates, although this is not always the case (Lu & Woolf, 2007). These calculations can be accurate, if conducted with the appropriate care. An overview of current state-of-the-art methods for absolute and relative affinity calculations is in (Chodera et al., 2011).

based drug design (SBDD) efforts, however, the methods have widespread applicability in

Low-throughput computational approaches for the calculation of ligand binding free energies can be divided into "pathway" and "endpoint" methods (Deng & Roux, 2009; Gilson & Zhou, 2007). In pathway methods, the system is converted from one state (e.g., the complex) to the other (e.g., the unbound protein/ligand). This can be achieved by introducing a set of finite or infinitesimal "alchemical" changes to the energy function (the Hamiltonian) of the system through free-energy perturbation (FEP) or thermodynamic integration (TI), respectively (Kollman, 1993; Straatsma & McCammon, 1992). The fundamentals of FEP and TI methods were introduced many decades ago by John Kirkwood (Kirkwood, 1935) and Robert Zwanzig (Zwanzig, 1954). In recent years, their use in the computation of absolute binding affinities has become feasible due to increases in computational power, the development of more accurate models of atomic interactions (Cornell et al., 1995; MacKerell et al., 1998; van Gunsteren et al., 1996), the clarification of the underlying theoretical framework and the introduction of methodological advances (Boresch et al., 2003; Bowers et al., 2006; Deng and Roux, 2009; Gilson et al., 1997; Gilson & Zhou, 2007; Lee & Olson, 2006; Mobley et al., 2007). Combined with atomistic molecular dynamics (MD) or Monte Carlo (MC) simulations in explicit water solvent models, they are arguably the most accurate methods for calculating absolute or relative ligand binding

The "alchemical" computation of *differences* in binding affinities (rather than *absolute*  affinities) among a set of related ligands for the same target protein is more accurate and technically simpler. A thermodynamic cycle illustrating the basic principles is shown in Figure 1 (Tembe & McCammon, 1984). The horizontal legs describe the experimentally accessible actual binding processes, with free energies Gbind(L1) and Gbind(L2). Since the free energy is a state function, the relative binding free energy Gbind is exactly equal to the

The simulations follow the vertical steps (Eq. (2)) or unphysical processes, by simulations in water solution that gradually change the energy-function of the system from one "endpoint" to the other through a series of intermediate hybrid states.From Figure 1, this involves the stepwise "alchemical" transformation of ligand L1 to L2 both in its 'free' state (unbound) and in the bound complex, through gradual changes in the forcefield parameters describing the ligand interactions. This leads to the free energy changes Gfree(L1→L2) and Gcomplex(L1→L2), respectively. Averaging over both transformation directions is often used to improve the free-energy estimates, although this is not always the case (Lu & Woolf, 2007). These calculations can be accurate, if conducted with the appropriate care. An overview of current state-of-the-art methods for absolute and relative affinity calculations is

bind bind bind *G GL GL* ( 2) ( 1) (1)

complex free *G L L GL L* ( 1 2) ( 1 2) (2)

difference of the free energies in the horizontal or vertical legs:

**2. Theoretical background – "Pathway" and "Endpoint" methods** 

other areas such as in supramolecular chemistry.

affinities.

in (Chodera et al., 2011).

Fig. 1. Thermodynamic cycle linking the binding of two ligands L1 and L2 to a protein in solution.

To conclude, the emerging implementation of biomolecular codes on GPU architectures (Harvey et al., 2009; Stone et al., 2011) and the development of simple free-energy protocols (Boresch & Bruckner, 2011) make atomistic methods of absolute or relative affinities very promising for larger-scale calculations in the near future. Nevertheless, at present they are still relatively time-consuming, and require considerable expertise and planning. They preclude the consideration of more than a few complexes per day on a dedicated CPU cluster with a few tens of nodes. A trade-off between computational expense and accuracy is therefore required when the goal is to investigate and compare the binding strengths of a structurally diverse and/or larger set of ligands via MD simulations. For this purpose, much less computationally demanding "endpoint" methods are often successfully applied, such as the "linear interaction energy" (LIE) (Åqvist et al., 2002) or the molecular mechanics – Poisson Boltzmann (MM-PBSA) (Kollman et al., 2000) and the related molecular mechanics – generalised Born (MM-GBSA) approximation (Gohlke et al., 2003). All these methods compute binding free energies along the horizontal legs of Figure 1, but use only models for the "endpoints" (bound and unbound states). The MM-PB(GB)SA methodology is now described in more detail.

#### **3. The MM-GB(PB)SA methodology**

Using MM-GBSA and MM-PBSA methods, relative binding affinities for a set of ligands to a given target can often be reproduced with good accuracy and considerable less computational effort compared to full-scale molecular dynamics FEP/TI simulations. Furthermore, free-energies can be decomposed into insightful interaction and desolvation components (Archontis et al., 2001; Hayes et al., 2011; Polydoridis et al., 2007).

#### **3.1 Thermodynamics & calculation framework**

In the MM-GB(PB)SA formulation, the binding free energy of a ligand (L) to a protein (P) to form the complex (PL) is obtained as the difference (Pearlman, 2005):

$$
\Delta G\_{\text{bind}} = G(PL) - G(P) - G(L) \tag{3}
$$

The free energy of each of the three molecular systems P, L, and PL is given by the expression:

$$\text{G(X)} = E\_{\text{MM}}(X) + G\_{\text{solv}}(X) - TS(X) \tag{4}$$

In Eq. (4), EMM is the total molecular mechanics energy of molecular system X in the gas phase, Gsolv is a correction term (solvation free energy) accounting for the fact that X is surrounded by solvent, and S is the entropy of X.

To apply the MM-GB(PB)SA formulation, a representative set of equilibrium conformations for the complex, free protein and free ligand are first obtained by atomistic MD simulations in explicit solvent. In this post-processing phase, the solvent is discarded and replaced by a dielectric continuum. Changes (∆) in the individual terms (∆EMM, ∆Gsolv, -T∆S) of Eq.(4) between the unbound states and the bound (complex) state are calculated, and contribute to the binding free energies according to Eq.(3). Computation of each of the terms in Eq. (4) is now described in more detail.

EMM is the sum of the bonded (internal), and non-bonded electrostatic and van der Waals energies

$$E\_{\rm MM} = E\_{\rm bonded} + E\_{\rm elec} + E\_{\rm vw} \tag{5}$$

These energy contributions are computed from the atomic coordinates of the protein, ligand and complex using the (gas phase) molecular mechanics energy function (or forcefield). The solvation free energy term Gsolv contains both polar and non-polar contributions. The polar contributions are accounted for by the generalized Born, Poisson, or Poisson-Boltzmann model, and the non-polar are assumed proportional to the solvent-accessible surface area (SASA), *c.f.* Section 3.2:

$$\mathbf{G}\_{\rm solv} = \mathbf{G}\_{\rm PB(GB)} + \mathbf{G}\_{\rm SASA} \tag{6}$$

Finally, the entropy S is decomposed into translational, rotational and vibrational contributions. The first two are computed by standard statistical-mechanical expressions, and the last is typically estimated from a normal-mode (harmonic or quasiharmonic) analysis (Brooks et al., 1995; Karplus & Kushick 1981; Tidor & Karplus 1994). In practice, current software implementations normally determine all three contributions to S as part of a normal-mode analysis.

To improve the accuracy of the computed binding free energies, the various terms of Eq. (4) are averaged over multiple conformations or MD snapshots (typically a few hundred for the EMM and Gsolv contributions). Depending on the extent of conformational fluctuations in the system under consideration, the convergence into stable values may require relatively long (multi-ns) simulations. The computation of the entropy term, however, requires the extensive minimization of the trajectory conformations for the protein, ligand and complex to local minima on the potential energy surfaces, followed then by normal mode analysis. This procedure is costly and prevents the consideration of a large number of conformations; insufficient sampling can therefore sometimes an issue. To decrease the computational cost, the protein can be truncated beyond a certain cutoff distance and the system minimized using a distance-dependent dielectric, which simulates the deleted surroundings (Kongsted & Ryde, 2009). However, a large variation of the entropy term often results from these 'free' minimizations. Including a fixed buffer region (with water molecules) beyond the cut-off can lead to more stable entropy predictions (Kongsted & Ryde, 2009).

The internal energy terms (Ebonded) of the protein and complex can be on the order of a few thousand kcal/mol, and can introduce large uncertainties in the computed binding free

In Eq. (4), EMM is the total molecular mechanics energy of molecular system X in the gas phase, Gsolv is a correction term (solvation free energy) accounting for the fact that X is

To apply the MM-GB(PB)SA formulation, a representative set of equilibrium conformations for the complex, free protein and free ligand are first obtained by atomistic MD simulations in explicit solvent. In this post-processing phase, the solvent is discarded and replaced by a dielectric continuum. Changes (∆) in the individual terms (∆EMM, ∆Gsolv, -T∆S) of Eq.(4) between the unbound states and the bound (complex) state are calculated, and contribute to the binding free energies according to Eq.(3). Computation of each of the terms in Eq. (4) is

EMM is the sum of the bonded (internal), and non-bonded electrostatic and van der Waals

These energy contributions are computed from the atomic coordinates of the protein, ligand and complex using the (gas phase) molecular mechanics energy function (or forcefield). The solvation free energy term Gsolv contains both polar and non-polar contributions. The polar contributions are accounted for by the generalized Born, Poisson, or Poisson-Boltzmann model, and the non-polar are assumed proportional to the solvent-accessible surface area

Finally, the entropy S is decomposed into translational, rotational and vibrational contributions. The first two are computed by standard statistical-mechanical expressions, and the last is typically estimated from a normal-mode (harmonic or quasiharmonic) analysis (Brooks et al., 1995; Karplus & Kushick 1981; Tidor & Karplus 1994). In practice, current software implementations normally determine all three contributions to S as part of

To improve the accuracy of the computed binding free energies, the various terms of Eq. (4) are averaged over multiple conformations or MD snapshots (typically a few hundred for the EMM and Gsolv contributions). Depending on the extent of conformational fluctuations in the system under consideration, the convergence into stable values may require relatively long (multi-ns) simulations. The computation of the entropy term, however, requires the extensive minimization of the trajectory conformations for the protein, ligand and complex to local minima on the potential energy surfaces, followed then by normal mode analysis. This procedure is costly and prevents the consideration of a large number of conformations; insufficient sampling can therefore sometimes an issue. To decrease the computational cost, the protein can be truncated beyond a certain cutoff distance and the system minimized using a distance-dependent dielectric, which simulates the deleted surroundings (Kongsted & Ryde, 2009). However, a large variation of the entropy term often results from these 'free' minimizations. Including a fixed buffer region (with water molecules) beyond the cut-off

The internal energy terms (Ebonded) of the protein and complex can be on the order of a few thousand kcal/mol, and can introduce large uncertainties in the computed binding free

can lead to more stable entropy predictions (Kongsted & Ryde, 2009).

*E E EE* MM bonded elec vw (5)

*GG G* solv PB(GB) *SASA* (6)

surrounded by solvent, and S is the entropy of X.

now described in more detail.

(SASA), *c.f.* Section 3.2:

a normal-mode analysis.

energies

energies. This is prevented in the "single-trajectory" approximation (Gohlke & Case, 2004; Page & Bates, 2006), which employs simulations of a single state (the complex) to generate conformations for all three states (complex, protein and ligand). For each MD conformation sampled, the resulting internal energy terms of the protein and ligand are identical in the bound and the unbound states and cancel exactly in Eq. (3). Hence, effectively only the protein-ligand (non-bonded) interaction energies of the EMM term in Eq. (5) contribute to ∆Gbind. 'Single-trajectory' simulations significantly reduce computational effort and are generally sufficiently accurate for most applications. The downside of the approximation is that any explicit structural relaxation of the protein and ligand upon binding is ignored. Although charge reorganization can be partly taken into account implicitly by setting the protein/ligand (internal) dielectric constants (Section 3.2) to values larger than εin = 1-2 (Archontis et al., 2001; Archontis & Simonson, 2005; Schutz & Warshel, 2001; Simonson, 2003), the neglect of explicit structural relaxation may introduce errors depending on the system (Tamamis et al*.*, 2010). Separate MD simulations for the complex, and unbound receptor and ligands may also be performed (the "three-trajectory" approximation) but

require greater computational effort, although in theory should yield more accurate results. Indeed, Yang and co-workers have recently shown that including separate simulations for the ligand and accounting for the "ligand reorganization" free energy led to significant improvements in binding affinity predictions for a set of ligands targeting XIAP (Yang et al., 2009). In certain cases, therefore, the added expense of separate simulations may be justified.

### **3.2 Solvation models – GBSA and PBSA**

Proteins function usually inside aqueous solutions or in membrane environments, which are in the vicinity of an aqueous medium. The surrounding solvent can influence protein stability and function, ligand binding and protein-protein association. Since the solvent modifies in a non-trivial manner the intramolecular and intermolecular interactions, an accurate inclusion of solvent effects in biomolecular modeling and simulation is a challenging task.

Currently, the most accurate treatment in molecular simulations is achieved by atomicdetail models that represent explicitly the biomolecule and its surrounding environment. Several water models are used successfully to describe the aqueous environment in atomistic simulations; examples include SPC (Berendsen et al., 1981), SPC/E (Berendsen et al., 1987), TIP3P and TIP4P (Jorgensen et al., 1983), and TIP5P (Mahoney & Jorgensen, 2000). In practice, the explicit inclusion of water leads to a considerable increase in both the size of the simulation system and the computational cost of the simulation itself. Furthermore, the computation of solvation or binding free energies requires an exhaustive sampling of the solvent degrees of freedom. A much less costly approach is to represent the solvent implicitly in the simulation, through the incorporation of additional "potential of mean force" terms (Roux, 2001; Roux & Simonson, 1999) in the gas-phase energy function (e.g., Eq. (7) below). These terms depend only on the atomic coordinates of the solute, and express the solute free energy for a given configuration, after the solvent degrees of freedom have been "integrated out" (Roux, 2001; Roux & Simonson, 1999). Thus, the simulation system has the same number of degrees of freedom as in the gas phase and there is no need for explicit sampling over solvent degrees of freedom. The MM-PB(GB)SA method considered here, combine atomistic simulations in explicit solvent for the generation of representative biomolecular conformations with an implicit-solvent estimation of the binding free energies, in a post-processing step.

Conceptually, most implicit solvent models decompose the solvation process into three sequential steps (Cramer & Truhlar, 1999): i) creation of a cavity in solution to accommodate the biomolecule; ii) switching-on dispersion interactions between the biomolecule and surrounding medium, while all atomic charges are set to zero; and iii) switching-on the biomolecular charges. The solvation free energies of steps i) and ii) are normally assumed to be proportional to the SASA of the biomolecule and represent the *non-polar* contributions (GSASA) to Gsolv in Eq. (6), although the validity of this approximation has been questioned for step ii) (Levy et al., 2003). With a positive coefficient of proportionality, an increase in the SASA is associated with an unfavorable increase in solvation free energy, which is partly accounted for by the tendency of non-polar residues to be solvent-excluded. The equation typically used is of the form:

$$G\_{SASA}(X) = \gamma .SASA + \beta \tag{7}$$

with the γ and β parameter values dependant on the method and solvation model (PBSA or GBSA) used (Rastelli et al., 2010).

Meanwhile, step iii) calculates the contribution to solvation free energy due to the charge /electrostatic interactions of the solute with the surrounding solvent, the *polar* contributions (GPB(GB)) to Gsolv in Eq. (6). In continuum-electrostatics models such as PB and GB, the solute is treated as a low-dielectric cavity embedded in a high dielectric medium. The solute charges are in the simplest and most common approximation centered on the individual atoms. The resulting solvation free energy of a molecule X is expressed as (Simonson, 2003):

$$G\_{\rm PB(GB)}(X) = \frac{1}{2} \sum\_{i,j \in X} q\_i q\_j g\_{ij}^{\rm PB(GB)} \tag{8}$$

where the summation is over all the atomic charges {qi}. The quantity gijPB(GB) is determined using the PB model by numerical solution of the Poisson or Poisson-Boltzmann equation (depending on the existence of salt), or using the GB model by an analytical expression with the functional form (Simonson, 2003; Still et al., 1990):

$$\log\_{i\bar{j}}{}^{GB} = \left(\frac{1}{\mathcal{E}} - 1\right) \left[r\_{i\bar{j}}^{n} + B\_{i\bar{j}} \exp\left(-\frac{r\_{i\bar{j}}^{n}}{AB\_{i\bar{j}}}\right)\right]^{-1/n} \tag{9}$$

The parameters Bij depend on the position (distance from the solute-solvent dielectric boundary) of atoms *i* and *j*, and the shape of the entire biomolecule; ε is the solvent dielectric constant, and rij is the distance between *i* and *j*. The constants n and A were set to n=2 and A=4 in the original formulation of Still and coworkers (Still et al., 1990).

In the PB model, the solute dielectric constant (εin) affects the computed functions gijPB and Eq. (8). Meanwhile, in the GB model, the solute dielectric constant drops out from the final expression in Eq.(9), due to the approximations used to arrive at an analytic formula. An εin value other than 1 can still be used; in this case, the first parenthesis on the right-hand side of Eq. (9) becomes (1/ε –1/εin) and the GB expression yields the free energy of transferring

biomolecular conformations with an implicit-solvent estimation of the binding free energies,

Conceptually, most implicit solvent models decompose the solvation process into three sequential steps (Cramer & Truhlar, 1999): i) creation of a cavity in solution to accommodate the biomolecule; ii) switching-on dispersion interactions between the biomolecule and surrounding medium, while all atomic charges are set to zero; and iii) switching-on the biomolecular charges. The solvation free energies of steps i) and ii) are normally assumed to be proportional to the SASA of the biomolecule and represent the *non-polar* contributions (GSASA) to Gsolv in Eq. (6), although the validity of this approximation has been questioned for step ii) (Levy et al., 2003). With a positive coefficient of proportionality, an increase in the SASA is associated with an unfavorable increase in solvation free energy, which is partly accounted for by the tendency of non-polar residues to be solvent-excluded. The equation

> () . *G X SASA SASA*

with the γ and β parameter values dependant on the method and solvation model (PBSA or

Meanwhile, step iii) calculates the contribution to solvation free energy due to the charge /electrostatic interactions of the solute with the surrounding solvent, the *polar* contributions (GPB(GB)) to Gsolv in Eq. (6). In continuum-electrostatics models such as PB and GB, the solute is treated as a low-dielectric cavity embedded in a high dielectric medium. The solute charges are in the simplest and most common approximation centered on the individual atoms. The resulting solvation free energy of a molecule X is expressed as (Simonson, 2003):

,

where the summation is over all the atomic charges {qi}. The quantity gijPB(GB) is determined using the PB model by numerical solution of the Poisson or Poisson-Boltzmann equation (depending on the existence of salt), or using the GB model by an analytical expression with

*ij X*

<sup>1</sup> ( ) <sup>2</sup>

*G X qq g*

<sup>1</sup> 1 exp

 *AB* 

The parameters Bij depend on the position (distance from the solute-solvent dielectric boundary) of atoms *i* and *j*, and the shape of the entire biomolecule; ε is the solvent dielectric constant, and rij is the distance between *i* and *j*. The constants n and A were set to n=2 and

In the PB model, the solute dielectric constant (εin) affects the computed functions gijPB and Eq. (8). Meanwhile, in the GB model, the solute dielectric constant drops out from the final expression in Eq.(9), due to the approximations used to arrive at an analytic formula. An εin value other than 1 can still be used; in this case, the first parenthesis on the right-hand side of Eq. (9) becomes (1/ε –1/εin) and the GB expression yields the free energy of transferring

*GB n ij*

*ij ij ij*

*g rB* 

A=4 in the original formulation of Still and coworkers (Still et al., 1990).

PB(GB)

the functional form (Simonson, 2003; Still et al., 1990):

( )

(8)

1/

*<sup>n</sup> <sup>n</sup>*

*ij*

*r*

*PB GB*

*i j ij*

(7)

(9)

in a post-processing step.

typically used is of the form:

GBSA) used (Rastelli et al., 2010).

the solute from an infinite reference medium with dielectric constant εin into solution. A careful discussion of this point is in (Bashford & Case, 2000).

Application of Eq. (8) to a protein:ligand complex (PL) and the dissociated protein (P) and ligand (L) yields the electrostatic (polar) solvation free energy contribution to Eq. (3):

$$
\Delta G\_{\rm PB(GB)} = G\_{\rm PB(GB)}(PL) - G\_{\rm PB(GB)}(P) - G\_{\rm PB(GB)}(L) \tag{10}
$$

An advantage of these methods is that they facilitate the decomposition of the total solvation free energy into insightful components. Indeed, the summation over atomic charges in Eq. (8) implies that the electrostatic free energy of Eq. (10) can be partitioned into interaction and desolvation components (Archontis et al., 2001; Hendsch & Tidor, 1999):

$$\Delta \mathbf{G}\_{\text{PB(GB)}} = \sum\_{i \in \mathcal{P}, j \in \mathcal{L}} q\_i \mathbf{g}\_{ij}^{\text{PL}} \boldsymbol{q}\_j + \left[ \frac{1}{2} \sum\_{i, j \neq \mathcal{P}} q\_i \mathbf{g}\_{ij}^{\text{PL}} \boldsymbol{q}\_j - \frac{1}{2} \sum\_{i, j \neq \mathcal{P}} q\_i \mathbf{g}\_{ij}^{\text{PL}} \boldsymbol{q}\_j \right] + \left[ \frac{1}{2} \sum\_{i, j \neq \mathcal{L}} q\_i \mathbf{g}\_{ij}^{\text{PL}} \boldsymbol{q}\_j - \frac{1}{2} \sum\_{i, j \neq \mathcal{P}} q\_i \mathbf{g}\_{ij}^{\text{L}} \boldsymbol{q}\_j \right] \tag{11}$$

The notation gijX implies that the interaction between charges *i* and *j*, as determined by the function gij, is in general different in the complex, free protein and free ligand. The first term on the right-hand side of Eq. (11) is the "interaction term" and arises from direct interactions between the protein and ligand charges, which are only present in the complex; the next term [the first brace] is a protein "desolvation" term, arising from the replacement of highdielectric solvent by the low-dielectric ligand in the protein vicinity, as well as structural relaxation and changes in the charge distribution of the protein. The last term [second brace] corresponds to the desolvation of the ligand.

The "interaction term" can be further decomposed into contributions from specific residues (Archontis et al., 2001; Hendsch & Tidor 1999). For example, the contribution of a protein residue R to this term is given by the following expression

$$\Delta \mathbf{G}\_{\text{PB(CB)}}^{R} = \sum\_{i \neq R, j \neq L} q\_i \mathbf{g}\_{ij}^{PL} \mathbf{q}\_j \tag{12}$$

These components provide useful insights on the origin of the binding free energy values. They can help interpret differences in binding affinities for a series of related complexes and guide the design of modified proteins or ligands. For example, in a study of amino acid binding to native and mutant aspartyl-tRNA synthetase, residue decomposition identified the protein residues discriminating between the cognate ligand (aspartic acid) and the analogue asparagine (Archontis et al., 2001). In another study of RNAse A recognition by dinucleotidic inhibitors, a similar decomposition attributed the stronger binding of the most potent inhibitor to interactions with two active site lysines (Polydoridis et al., 2007).

Finally, Hou and co-workers very recently evaluated the performance of MM-GBSA and MM-PBSA for predicting binding free energies based on molecular dynamics simulations (Hou et al., 2011a). Their results showed that MM-PBSA performed better in calculating absolute binding free energies compared to MM-GBSA but not necessarily for the relative binding free energies, sufficient for most applications in computational drug design. Interestingly, in a study of the accuracy of continuum solvation models for drug-like molecules, GB methods typically were more stable and gave more accurate results that the widely used PB methods (Kongsted et al., 2009).

### **3.3 Selection of MD trajectory conformations/snapshots**

A recent study for the binding of seven biotin analogues to avidin suggested that to obtain statistically converged MM-GBSA results, several independent simulations each with sampling times of 20-200 ps (averaging the results) is more effective than a single long simulation (Genheden & Ryde, 2010). 'Single-trajectory' simulations of the complex are generally sufficiently accurate for most applications, and while MD simulation length does have an obvious impact on the accuracy of predictions, longer MD simulations doesn't necessarily mean better predictions (Hou et al., 2011a). For the calculations of the ∆EMM and ∆Gsolv terms, a large ensemble (e.g. several hundred) conformations are typically extracted in small intervals from the single MD trajectory of the complex. Alternatively, averaging over a select few receptor-ligand binding conformations from the MD trajectory via clustering has proved effective (Section 4.1), as well as more time efficient (Hayes et al., 2011). MM-GB(PS)SA calculations on single (minimized) structures has also recently been proposed and validated (Kuhn et al., 2005; Rastelli et al., 2010), but not necessarily for structures generated from MD simulations (Section 4.2). Meanwhile, for the entropy term calculated using normal mode analysis, fewer snapshots (typically less than a 100) are employed, due to the computational cost involved. As already highlighted (Section 3.1), a larger number of snapshots may be required for more stable and accurate predictions. These calculations, however, are computationally expensive and often not feasible with limited computational resources. Consequently, neglect of the entropy term can in some cases lead to sufficient or more accurate predictions for ranking of ligand binding affinities in certain macromolecular systems (Hayes et al., 2011; Hou et al., 2011a; Rastelli et al., 2010).

### **3.4 Limitations & caveats of MM-GB(PB)SA calculations**

MM-GB(PB)SA methods are widely recognised as valuable tools in CADD applications. However, as with any method they have limitations and caveats, which need to be considered. First, while useful for ranking relative ligand binding affinities, these methods lack the required accuracy for absolute binding free energy predictions (Hou et al., 2011a; Singh & Warshel, 2010).The inclusion of entropic contributions brings the MM-GB(PB)SA values somewhat closer to experimental absolute affinities (Gilson & Zhou, 2007). However, such entropic terms are costly and contain large uncertainties. Force-field inconsistencies may also be an issue: PB and GB results depend strongly on adequate atomic charges and van der Waals radii, which are often optimized for MD simulations. The MM-GB(PB)SA results may be influenced by system-dependent properties, such as the features of the binding site, the extent of protein and ligand conformational relaxation upon association, and the protein and ligand charge distribution (Kuhn et al., 2005; Hou et al., 2011a). Continuum electrostatics models ignore the molecular structure of the solvent; in some cases this might affect the results, particularly when key receptor-ligand interactions are bridged by water molecules, *c.f.* Section 4.1 (Hayes et al., 2011). Furthermore, the value of the protein/ligand dielectric constant is empirically chosen, and takes into account not only the protein and ligand structural relaxation, but also other error-introducing factors such as the ones mentioned above (Archontis & Simonson, 2005; Schutz & Warshel, 2001). Hou and coworkers suggested in a recent MM-PBSA study that the use of εin = 4 for a highly charged protein-ligand binding interface, εin = 2 for a moderately charged binding interface and εin = 1 for a hydrophobic binding interface may improve ligand ranking (Hou et al., 2011a). The lack of a consistent optimum dielectric constant for MM-PBSA calculations has been noted by other workers (see for e.g. (Aleksandrov et al*.* 2010)), although generally a value εin =4 often gives satisfactory results (Aleksandrov et al., 2010; Archontis et al., 2001; Thompson et al., 2006). As a final note, MM-GB(PB)SA calculations require some degree of user expertise and planning, from the initial set-up and analysis of the MD simulations through to the binding free energy calculations themselves.

### **3.5 Variations & extensions of MM-GB(PB)SA**

178 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

A recent study for the binding of seven biotin analogues to avidin suggested that to obtain statistically converged MM-GBSA results, several independent simulations each with sampling times of 20-200 ps (averaging the results) is more effective than a single long simulation (Genheden & Ryde, 2010). 'Single-trajectory' simulations of the complex are generally sufficiently accurate for most applications, and while MD simulation length does have an obvious impact on the accuracy of predictions, longer MD simulations doesn't necessarily mean better predictions (Hou et al., 2011a). For the calculations of the ∆EMM and ∆Gsolv terms, a large ensemble (e.g. several hundred) conformations are typically extracted in small intervals from the single MD trajectory of the complex. Alternatively, averaging over a select few receptor-ligand binding conformations from the MD trajectory via clustering has proved effective (Section 4.1), as well as more time efficient (Hayes et al., 2011). MM-GB(PS)SA calculations on single (minimized) structures has also recently been proposed and validated (Kuhn et al., 2005; Rastelli et al., 2010), but not necessarily for structures generated from MD simulations (Section 4.2). Meanwhile, for the entropy term calculated using normal mode analysis, fewer snapshots (typically less than a 100) are employed, due to the computational cost involved. As already highlighted (Section 3.1), a larger number of snapshots may be required for more stable and accurate predictions. These calculations, however, are computationally expensive and often not feasible with limited computational resources. Consequently, neglect of the entropy term can in some cases lead to sufficient or more accurate predictions for ranking of ligand binding affinities in certain

macromolecular systems (Hayes et al., 2011; Hou et al., 2011a; Rastelli et al., 2010).

MM-GB(PB)SA methods are widely recognised as valuable tools in CADD applications. However, as with any method they have limitations and caveats, which need to be considered. First, while useful for ranking relative ligand binding affinities, these methods lack the required accuracy for absolute binding free energy predictions (Hou et al., 2011a; Singh & Warshel, 2010).The inclusion of entropic contributions brings the MM-GB(PB)SA values somewhat closer to experimental absolute affinities (Gilson & Zhou, 2007). However, such entropic terms are costly and contain large uncertainties. Force-field inconsistencies may also be an issue: PB and GB results depend strongly on adequate atomic charges and van der Waals radii, which are often optimized for MD simulations. The MM-GB(PB)SA results may be influenced by system-dependent properties, such as the features of the binding site, the extent of protein and ligand conformational relaxation upon association, and the protein and ligand charge distribution (Kuhn et al., 2005; Hou et al., 2011a). Continuum electrostatics models ignore the molecular structure of the solvent; in some cases this might affect the results, particularly when key receptor-ligand interactions are bridged by water molecules, *c.f.* Section 4.1 (Hayes et al., 2011). Furthermore, the value of the protein/ligand dielectric constant is empirically chosen, and takes into account not only the protein and ligand structural relaxation, but also other error-introducing factors such as the ones mentioned above (Archontis & Simonson, 2005; Schutz & Warshel, 2001). Hou and coworkers suggested in a recent MM-PBSA study that the use of εin = 4 for a highly charged protein-ligand binding interface, εin = 2 for a moderately charged binding interface and εin = 1 for a hydrophobic binding interface may improve ligand ranking (Hou et al., 2011a). The lack of a consistent optimum dielectric constant for MM-PBSA calculations has been noted

**3.4 Limitations & caveats of MM-GB(PB)SA calculations** 

**3.3 Selection of MD trajectory conformations/snapshots** 

While molecular docking algorithms are computationally efficient methods used to screen a large number of ligands against a given target in reasonable time, generating and postprocessing MD ensembles for more than a few receptor-ligand structures in MM-GB(PB)SA calculations is currently impractical. Recently, however, MM-GB(PB)SA methods are receiving plaudits as post-docking methods in virtual screening experiments. Postprocessing of single docking poses using MM-GB(PS)SA algorithms can improve correlations between predicted and experimental binding affinities with a number of successes reported, *c.f.* for example (Du et al., 2011; Hou et al., 2011b; Lyne et al.; 2006). This is consistent with the previously mentioned discovery that select single receptor-ligand structures can prove as accurate as sampling over large numbers of MD trajectory snapshots in MM-GB(PB)SA applications (Kuhn et al., 2005; Rastelli et al., 2010). Post-docking MM-GBSA is implemented in Schrödinger software using the program Prime, with options to include receptor and ligand flexibility; the entropy term is neglected by default. Other recent extensions of MM-PBSA exploit quantum mechanics (QM) methods in QM/MM-PBSA calculations (Gräter et al., 2005; Manta et al., 2012; Wang & Wong, 2007). Here, a "hybrid" gas phase energy term (EQM/MM) effectively replaces the pure molecular mechanics energy (EMM) term in Eq. (5). Representing the ligand by QM has the advantage, for example, to eliminate a frequently encountered problem of deficient ligand forcefield parameters. However, calculations of this sort are significantly more expensive than MM-GB(PB)SA, and therefore are typically only viable for binding predictions on relatively few ligands.

### **4. Recent applications of MM-GB(PS)SA**

In the present section, we will review recent examples of the use of MM-GB(PS)SA calculations for calculating ligand binding free energies highlighting both successes and limitations.

### **4.1 Example 1: Phosphorylase kinase ATP-binding site inhibitors**

With an aim towards glycogenolysis control in type 2 diabetes, indirubin (IC50> 50 µM), indirubin-3'-oxime (IC50 = 144 nM), KT5720 (Ki = 18.4 nM) and staurosporine (Ki = 0.37 nM) were investigated as phosphorylase kinase (PhKγtrnc) ATP-binding site inhibitors (Hayes et al., 2011).

Due to the lack of experimental structural information for binding of these ligands (Figure 2), MD simulations in explicit solvent using Desmond 2.0 (Bowers et al., 2006) were performed for each receptor-inhibitor complex, with 4 ns production runs following an initial equilibration period. A computationally-efficient multiple timestep RESPA integration algorithm was employed with timesteps of 2, 2 and 6 fs for bonded, "near" and "far" non-bonded interactions, respectively. Energy and trajectory atomic coordinate data

Fig. 2. The ATP-binding site inhibitors of phosphorylase kinase, as studied in Ref. (Hayes et al., 2011).

were recorded every 1.2 and 2.1 ps, respectively. For the trajectory analysis preceding the 'single-trajectory' based MM-GBSA calculations, both VMD (Humphrey et al., 1996) and Desmonds' Maestro simulation analysis tools were employed. The extracted MD trajectory binding site conformations (inhibitors + residues within 7 Å) of each complex were clustered into 10 groups based on atomic root-mean-square-distances (RMSDs). Every second frame (snapshot) of the last 3 ns of the production run (analysis phase) was used in the hierarchial clustering algorithm employed by the Desmond Maestro's Trajectory Clustering module. The representative complex of each of the 10 binding site cluster families (waters and counterions deleted) was then used in MM-GBSA calculations of binding free energies using Eq. (3). Schrödinger software with the MacroModel 9.7 Embrace module was used for the *∆EMM* and *∆Gsolv* calculations, while the entropy change, *∆S*, was calculated for the minimized representatives using Rigid Rotor Harmonic Oscillator (RRHO) calculations. Using this algorithm, the change in vibrational, rotational and translational (VRT) entropy of the ligands on binding was estimated. Finally, the thermodynamic average *∆Gbind* values (300 K) were estimated using the corresponding values for the 10 cluster representatives:

$$
\Delta \mathbf{G}\_{bind} = \sum\_{i=1}^{10} p\_i \Delta \mathbf{G}\_{bind}(i) \tag{13}
$$

The sum *i* was over the 10 cluster representatives, with *pi* defined as the cluster frequency:

$$p\_i = \frac{N\_i}{N\_{total}}\tag{14}$$

where *Ni* the number of frames in cluster *i*, and *Ntotal* the total number of frames.

Fig. 2. The ATP-binding site inhibitors of phosphorylase kinase, as studied in Ref. (Hayes et

were recorded every 1.2 and 2.1 ps, respectively. For the trajectory analysis preceding the 'single-trajectory' based MM-GBSA calculations, both VMD (Humphrey et al., 1996) and Desmonds' Maestro simulation analysis tools were employed. The extracted MD trajectory binding site conformations (inhibitors + residues within 7 Å) of each complex were clustered into 10 groups based on atomic root-mean-square-distances (RMSDs). Every second frame (snapshot) of the last 3 ns of the production run (analysis phase) was used in the hierarchial clustering algorithm employed by the Desmond Maestro's Trajectory Clustering module. The representative complex of each of the 10 binding site cluster families (waters and counterions deleted) was then used in MM-GBSA calculations of binding free energies using Eq. (3). Schrödinger software with the MacroModel 9.7 Embrace module was used for the *∆EMM* and *∆Gsolv* calculations, while the entropy change, *∆S*, was calculated for the minimized representatives using Rigid Rotor Harmonic Oscillator (RRHO) calculations. Using this algorithm, the change in vibrational, rotational and translational (VRT) entropy of the ligands on binding was estimated. Finally, the thermodynamic average *∆Gbind* values (300 K) were estimated using the corresponding values for the 10 cluster representatives:

10

1 . () *bind i bind i G pG i* 

The sum *i* was over the 10 cluster representatives, with *pi* defined as the cluster frequency:

*i*

where *Ni* the number of frames in cluster *i*, and *Ntotal* the total number of frames.

*i*

*total N*

(13)

*<sup>p</sup> N* (14)

al., 2011).

Fig. 3. Predicted binding of staurosporine (gray) at the ATP-binding site of PhKγtrnc. Key interacting residues are colored by type – polar/charged residues shown in red (D104, E110 and E153) and hydrophobic in green (M106). Hydrogen bonds are formed with M106 and D104 backbones (hinge region), E110 and E153.

The relative binding affinities from the final *∆Gbind* values were generally in agreement with experiment, except the rankings of KT5720 and staurosporine (representative complex shown in Figure 3) were reversed by 0.7 kcal/mol. The discrepancy was accounted for by neglect of certain key contributions to the binding free energies using the MM-GBSA algorithm. Notably, accounting for and estimating the loss of the conformational entropy for the more flexible KT5720 ligand yielded *∆Gbind* values 1.2 – 2.6 kcal/mol in favour of staurosporine binding and hence in agreement with the experimental rankings. Further, whereas staurosporine had no key receptor-ligand bridging waters, the MD simulations revealed a key role of bridging waters for KT5720 binding. The entropy loss associated with a bound water molecule in a protein-ligand complex is sometimes important (but typically neglected in MM-GB(PB)SA calculations), with an upper bound free energy cost of 2 kcal/mol at 300 K suggested (Dunitz, 1994).

#### **4.2 Example 2: Interpretation of species specificity of compstatin, a peptidic inhibitor of the complement system**

The complement system provides the first line of defense against the invasion of foreign pathogens (Mastellos et al., 2003). Its inappropriate activation may cause or aggravate several pathological conditions, including asthma, macular degeneration, rheumatoid arthritis, and rejection of xenotransplantation. The 13-residue compstatin is a promising candidate for the therapeutic treatment of unregulated complement activation (Janssen et al., 2007; Morikis & Lambris, 2005). The conformation of the human C3 – compstatin complex is shown in Figure 4. Compstatin interacts closely with four protein sectors indicated by thick yellow tubes. An important conclusion, drawn from experimental studies with a large number of species, was that compstatin inhibits the key complement component protein C3 from several primate species, but is inactive against the corresponding protein from lower mammals, precluding the testing of compstatin-based analogues in animal models (Sahu et al., 2003). To understand this species specificity, Tamamis and co-workers compared the stabilities of compstatin complexes with human or rat C3 (Tamamis et al., 2010) by atomistic MD simulations and an MM-GBSA analysis. In the simulations of the rat C3 complex, specific protein sectors near the compstatin binding site underwent reproducible localized displacements, which eliminated or weakened critical protein-ligand interactions. In agreement with the simulations and the lack of compstatin activity against rat C3, a MM-GBSA analysis estimated the binding free energy of the human C3 complex to be stronger by -9 kcal/mol relative to the rat C3 complex, in the "singletrajectory" approximation. If protein and ligand relaxation were taken into account by a "three-trajectory" approximation, the relative binding free energy increased (in absolute value) to -19 kcal/mol. Thus, in this system the neglect of relaxation effects introduced a significant error, even though the qualitative conclusion was correct in both cases.

Fig. 4. Representation of the human C3c-compstatin complex. The compstatin main chain is shown in a red tube and licorice representation. The protein C3c is shown as a gray tube; only residues 329-534 and 607-620 are included, corresponding to the simulation system of Ref. (Tamamis et al, 2010). The yellow thick tubes show four protein sectors in proximity to compstatin (488-492, 454-462, 344-349 and 388-393, from left to right). The right-most sector 388-393 moves away from compstatin in the simulations of non-primate C3 complexes (Tamamis et al, 2010; Tamamis et al, 2011).

al., 2007; Morikis & Lambris, 2005). The conformation of the human C3 – compstatin complex is shown in Figure 4. Compstatin interacts closely with four protein sectors indicated by thick yellow tubes. An important conclusion, drawn from experimental studies with a large number of species, was that compstatin inhibits the key complement component protein C3 from several primate species, but is inactive against the corresponding protein from lower mammals, precluding the testing of compstatin-based analogues in animal models (Sahu et al., 2003). To understand this species specificity, Tamamis and co-workers compared the stabilities of compstatin complexes with human or rat C3 (Tamamis et al., 2010) by atomistic MD simulations and an MM-GBSA analysis. In the simulations of the rat C3 complex, specific protein sectors near the compstatin binding site underwent reproducible localized displacements, which eliminated or weakened critical protein-ligand interactions. In agreement with the simulations and the lack of compstatin activity against rat C3, a MM-GBSA analysis estimated the binding free energy of the human C3 complex to be stronger by -9 kcal/mol relative to the rat C3 complex, in the "singletrajectory" approximation. If protein and ligand relaxation were taken into account by a "three-trajectory" approximation, the relative binding free energy increased (in absolute value) to -19 kcal/mol. Thus, in this system the neglect of relaxation effects introduced a

significant error, even though the qualitative conclusion was correct in both cases.

Fig. 4. Representation of the human C3c-compstatin complex. The compstatin main chain is shown in a red tube and licorice representation. The protein C3c is shown as a gray tube; only residues 329-534 and 607-620 are included, corresponding to the simulation system of Ref. (Tamamis et al, 2010). The yellow thick tubes show four protein sectors in proximity to compstatin (488-492, 454-462, 344-349 and 388-393, from left to right). The right-most sector 388-393 moves away from compstatin in the simulations of non-primate C3 complexes

(Tamamis et al, 2010; Tamamis et al, 2011).

Guided by the above MM-GBSA study, subsequent simulations investigated the stabilities of compstatin complexes with "transgenic" variants of mouse C3, containing 6-9 substitutions from the human C3 sequence near the compstatin binding site. The MM-GBSA binding affinities of the resulting transgenic complexes were comparable to the one of the human C3 complex, and by -8 to -9 kcal/mol stronger relative to the mouse C3:compstatin complex (Tamamis et al., 2011). More recent simulations have investigated the affinities of a large number of human and mouse or rat C3 complexes with compstatin analogues, producing promising results (Tamamis et al., 2012). Thus, the combined study of a series of related protein-ligand complexes by atomistic simulations and an efficient evaluation of the corresponding affinities, such as in the MM-GB(PB)SA formulation, provides a powerful way to design new proteins or ligands.

#### **4.3 Example 3: Fast predictions of binding free energies using MM-GB(PB)SA**

Rastelli and co-workers (Rastelli et al., 2010) explored the reliability of using a single energy minimized receptor–ligand complex in MM-GB(PB)SA calculations to estimate ligand binding affinities for a series of structurally diverse inhibitors (Figure 5) of *Plasmodium falciparum* DHFR (Figure 6) with known binding modes and affinities.

Fig. 5. Sample of the structurally diverse *Pf*DHFR inhibitors studied in Ref. (Rastelli et al., 2010).

Fig. 6. Wild-type *Pf*DHFR in complex with NADPH (yellow) and the inhibitor WR92210 (green). This protein structure taken from PDB code 1J3I was used for the calculations in Ref. (Rastelli et al., 2010) and as described in Example 3.

They obtained excellent correlations between MM-PBSA or MM-GBSA binding affinities and experimental values, similar to those obtained after averaging over multiple snapshots from periodic boundary MD simulations in explicit water in the traditional sense, but with significant savings on computational time and effort. Different methods were used for generating the structures for the MM-GB(PB)SA calculations from minimizations in implicit and explicit solvent models, to minimization using a distance dependent dielectric function, and finally minimization followed by a short MD simulation and then re-minimization. The approach has been implemented in an automated workflow called BEAR (Binding Estimation After Refinement) which produces both MM-GBSA and MM-PBSA predictions of binding free energies, and is fast enough to be suitable for virtual screening applications (Degliesposti et al., 2011; Rastelli et al., 2009).

### **5. Conclusion**

MM-GBSA and MM-PBSA are computationally efficient, end-point free energy methods that have been widely used to study protein-ligand binding affinities. Even though they lack the sound theoretical foundations of recently developed computationally demanding absolute-affinity free-energy methods (Boresch et al., 2003; Deng & Roux, 2009; Gilson & Zhou, 2007; Lee & Olsen, 2006), their connection with statistical thermodynamics has been established (Swanson et al., 2004). Due to the approximations inherent in MM-GB(PB)SA methods, they are more applicable for ranking ("scoring") of ligand binding affinities rather than to quantitatively predicted absolute binding free energies. They should be regarded as approximate, as they combine a molecular mechanics energy function with a continuumelectrostatics treatment of solvation effects; they include solute conformational entropy effects in an approximate manner (Singh & Warshel, 2010); and ignore the solvent molecular structure. Accurate incorporation of solute entropy (Foloppe & Hubbard, 2006) and solvent effects in binding affinity calculations is challenging, but future extensions and development of MM-GB(PB)SA methods will undoubtedly serve to address these limitations.

### **6. Acknowledgment**

JMH acknowledges funding from the SP4-Capacities Coordination and Support Action, Support Actions, EUROSTRUCT project (CSA-SA\_FP7-REGPOT-2008-1 Grant Agreement N° 230146).

### **7. References**

184 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Fig. 6. Wild-type *Pf*DHFR in complex with NADPH (yellow) and the inhibitor WR92210 (green). This protein structure taken from PDB code 1J3I was used for the calculations in

They obtained excellent correlations between MM-PBSA or MM-GBSA binding affinities and experimental values, similar to those obtained after averaging over multiple snapshots from periodic boundary MD simulations in explicit water in the traditional sense, but with significant savings on computational time and effort. Different methods were used for generating the structures for the MM-GB(PB)SA calculations from minimizations in implicit and explicit solvent models, to minimization using a distance dependent dielectric function, and finally minimization followed by a short MD simulation and then re-minimization. The approach has been implemented in an automated workflow called BEAR (Binding Estimation After Refinement) which produces both MM-GBSA and MM-PBSA predictions of binding free energies, and is fast enough to be suitable for virtual screening applications

MM-GBSA and MM-PBSA are computationally efficient, end-point free energy methods that have been widely used to study protein-ligand binding affinities. Even though they lack the sound theoretical foundations of recently developed computationally demanding absolute-affinity free-energy methods (Boresch et al., 2003; Deng & Roux, 2009; Gilson & Zhou, 2007; Lee & Olsen, 2006), their connection with statistical thermodynamics has been established (Swanson et al., 2004). Due to the approximations inherent in MM-GB(PB)SA methods, they are more applicable for ranking ("scoring") of ligand binding affinities rather

Ref. (Rastelli et al., 2010) and as described in Example 3.

(Degliesposti et al., 2011; Rastelli et al., 2009).

**5. Conclusion** 


clusters, *Proceedings of the ACM/IEEE Conference on Supercomputing (SC06)*, ISBN 0- 7695-2700-0, Tampa, Florida, USA, Nov 11-17, 2006


Brooks, B.R.; Janezic, D. & Karplus M. (1995).Harmonic-analysis of large systems. 1.

Case, D. A.; Cheathham, T. E., III; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M., Jr.; Onufriev,

Chodera, J. D.; Mobley, D. L.; Shirts, M. R.; Dixon, R. W., Branson K. &Pande V.S. (2011).

*American Chemical Society*, Vol.117, No.19, pp. 5179-5197, ISSN 0002-7863 Cramer, C. J. & Truhlar, D. G. (1999). Implicit solvation models: Equilibria, structure, spectra and dynamics. *Chemical Reviews*, Vol.99, No.8, pp. 2161-2200. ISSN 0009-2665 Degliesposti, G.; Portioli, C.; Parenti, M. D. & Rastelli, G. (2011). BEAR, a novel virtual

Deng, Y. & Roux, B. (2009). Computations of standard binding free energies with molecular

Du, J.; Sun. H.; Xi, L.; Li, J.; Yang, Y.; Liu, H. & Yao, X. (2011). Molecular modeling study of

Dunitz, J. D. (1994). The entropic cost of bound waters in crystals and biomolecules. *Science*,

Foloppe, N. & Hubbard, R. (2006). Towards predictive ligand design with free-energy based

Genheden, S. & Ryde, U. (2010). How to obtain statistically converged MM/GBSA results. *Journal of Computational Chemistry*, Vol.31, No.4, pp. 837-846, ISSN 0192-8651 Gilson, M. K.; Given, J. A.; Bush, B. L. & McCammon, J. A. (1997). The statistical-

Gilson, M. K. & Zhou, H-X. (2007). Calculation of protein-ligand binding affinities. *Annual Review of Biophysics and Biomolecular Structure*, Vol.36, pp. 21–42, ISSN 1056-8700 Gohlke, H.; Kiel, C. & Case, D. A. (2003). Insight into protein-protein binding by binding free

*Biophysical Journal*, Vol.72, No.3, pp. 1047-1069, ISSN 0006-3495

7695-2700-0, Tampa, Florida, USA, Nov 11-17, 2006

Chemistry, Vol.26, No.16, pp. 1668-1688, ISSN 0192-8651

ISNN 0192-8651

ISSN 1520-6106

ISNN 0929-8673

8651

No.1, pp. 129-133, ISSN 1087-0571

No.2, pp. 238-250, ISSN 0192-8651

Vol.264, No.5159, pp. 670-670. ISSN 0036-8075

clusters, *Proceedings of the ACM/IEEE Conference on Supercomputing (SC06)*, ISBN 0-

Methodology. *Journal of Computational Chemistry*, Vol.16, No.12, pp. 1522-1542,

A.; Simmerling, C.; Wang, B. & Woods, R. (2005). Journal of Computational

Alchemical free energy methods for drug discovery: progress and challenges. *Current Opinion in Structural Biology*, Vol.21, No.2, pp. 150-160, ISSN 0959-440X Cornell, W.D.; Cieplak, P.; Bayly C. I.; Gould I. R.; Merz K. M.; Ferguson D. M.; Spellmeyer

D. C.; Fox T.; Caldwell J. W. & Kollman P. A. (1995). A second generation forcefield for the simulation of proteins, nucleic acids, and organic molecules. *Journal of the* 

screening methodology for drug discovery. *Journal of Biomolecular Screening*, Vol.16,

dynamics simulations. *Journal of Physical Chemistry B*, Vol.113, No.8, pp. 2234-2246,

Checkpoint Kinase 1 inhibitors by multiple docking strategies and Prime/MM-GBSA. *Journal of Computational Chemistry*, Vol.32, No.13, pp. 2800-2808, ISSN 0192-

computational methods. *Current Medicinal Chemistry*, Vol.13, No.29, pp. 3583-3608,

thermodynamical basis for computation of binding affinities: A critical review.

energy calculation and free energy decomposition for the Ras-Raf and Ras-RalGDS complexes. *Journal of Molecular Biology*, Vol.330, No.4, pp. 891-913, ISSN 0022-2836 Gohlke, H. & Case, D. (2004). Converging Free Energy Estimates: MM-PB(GB)SA Studies on

the Protein-Protein Complex Ras-Raf. *Journal of Computational Chemistry* Vol.25,


Kongsted, J. & Ryde, U. (2009). An improved method to predict the entropy term with the

Kongsted, J.; Söderhjelm, P. & Ryde, U. (2009). How accurate are continuum solvation

Kuhn, B.; Gerber, P.; Schulz-Gasch, T. & Stahl, M. (2005). Validation and use of the MM-

Lee, M. S. & Olson, M. A. (2006). Calculation of absolute protein-ligand binding affinity

Levy, R. H., Zhang, L. Y., Gallicchio, E. & Felts, A. K. (2003). On the nonpolar hydration free

Lyne, P. D.; Lamb, M. L. & Saeh, J. C. (2006). Accurate prediction of the relative potencies of

Lu, N. & Woolf, T. B. (2007). Chapter 6: Understanding and improving free energy

Manta, S.; Xipnitou, A.; Kiritsis, C.; Kantsadi, A. L.; Hayes, J. M.; Skamnaki, V. T.; Lamprakis,

*Journal of Chemical Physics,* Vol.112, No.20, pp.8910-8922, ISSN 0021-9606 MacKerell, A.D.; Bashford, D.; Bellott M.; Dunbrack, Jr., R. L.; Evanseck, J. D.; Field, M. J.;

Mastellos, D.; Morikis, D.; Isaacs, S. N.; Holland, M. C.; Strey, C. W. & Lambris, J. D. (2003).

Mobley, D. L.; Graves, A. P.; Chodera, J. D.; McReynolds, A. C.; Shoichet, B. K. & Dill, K. A.

Morikis, D. & Lambris, J. D. (2005). Structure, dynamics, activity and function of compstatin

*Immunologic Research*, Vol.27, No.2-3, pp. 367-385, ISSN: 0257-277X.

*Molecular Biology*, Vol.371, No.4, pp. 1118-1134, ISSN 0022-2836

*Chemistry B*, Vol.102, No.18, pp. 3586-3616, ISSN 1089-5647

63-71, ISSN 0920-654X

ISSN 0006-3495

No.7, pp. 395-409, ISSN 0920-654X

pp. 4040-4048, ISSN 0022-2623

pp. 9523-9530, ISSN 0002-7863

540-38447-2, Berlin-Heidelberg.

MM/PBSA approach. *Journal of Computer Aided Molecular Design*, Vol.23, No.2, pp.

models for drug-like molecules. *Journal of Computer Aided Molecular Design*, Vol.23,

PBSA approach for drug discovery. *Journal of Medicinal Chemistry*, Vol.48, No.12,

using path and endpoint approaches. *Biophysical Journal*, Vol.90, No.3, pp. 864-877,

energy of proteins: Surface area and continuum solvent models for the solutesolvent interaction energy. *Journal of the American Chemical Society* Vol.125, No.31,

members of a series of kinase inhibitors using molecular docking and MM-GBSA scoring. *Journal of Medicinal Chemistry*, Vol.49, No.16, pp. 4805-4808, ISSN 0022-2623

calculations in molecular simulations: Error analysis and reduction methods. In: *Free Energy Calculations: Theory & Applications in Chemistry & Biology (Springer Series in Chemical Physics Vol.86)*, Chipot, C. & Pohorille, A., Springer-Verlag, ISBN 978-3-

C.; Kontou, M.; Zoumpoulakis, P.; Zographos, S. E.; Leonidas D. D. & Komiotis (2012). Contrary to docking and SAR analysis, a 3'-axial CH2OH substitution on glucopyranose does not increase glycogen phosphorylase inhibitory potency. QM/MM-PBSA calculations suggest why. *Chemical Biology & Drug Design*, in press. Mahoney, M. W. & Jorgensen, W. L. (2000). A five-site model for liquid water and the

reproduction of the density anomaly by rigid, nonpolarizable potential functions.

Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F.T.K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhorn, B.; Reiher, III, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe M.; Wiórkiewicz-Kuczera, J.; Yin, D. & Karplus, M. (1998). All-atom empirical potential for molecular modeling and dynamics studies of proteins. *Journal of Physical* 

Complement – Structure, functions, evolution and viral molecular mimicry.

(2007). Predicting absolute binding free energies to a simple model site. *Journal of* 

and design of more potent analogs. In: *Structural biology of the complement system*,

Morikis D, Lambris J. D.Editors, pp. 317-340, CRC Press/Tayor& Francis Group, ISBN 0824725409, Boca Raton, FL


## **Part 3**

**Dynamics of Plasmas** 

190 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Straatsma, T. P. & McCammon J. A. (1992). Computational alchemy. *Annual Review of* 

Swanson, J. M.; Henchman, R. H. & McCammon, J. A. (2004). Revisiting free energy

Tamamis, P.; Morikis, D.; Floudas, C. A. & Archontis, G. (2010). Species specificity of the

Tamamis, P; Pierou, P.; Mytidou, C.; Floudas, C. A.; Morikis, D. & Archontis, G. (2011).

Tamamis, P; de Victoria, A. L.; Gorham, R. D.; Bellows-Peterson, M. L.; Pierou, P.; Floudas,

Thompson, D.; Plateau, P. & Simonson, T. (2006).Free-energy simulations and experiments

Tembe, B. L. & McCammon, J. A. (1984). Ligand-receptor interactions. *Computers &* 

Tidor, B. & Karplus, M. (1994). The contribution of vibrational entropy to molecular

Van Gunsteren, W. F.; Billeter, S. R.; Eising, A. A.; Hünenberger, P. H.; Krueger, P.; Mark, A.

Wang, M. & Wong, C. F. (2007). Rank-ordering protein-ligand binding affinity by a

Yang, C-Y.; Sun, H.; Chen, J.; Nikolovska-Coleska, Z. & Wang, S. (2009). Importance of ligand

*American Chemical Society*, Vol.131, No.38, pp. 13709-13721, ISSN 0002-7863 Zhou, H-X. & Gilson, M. K. (2009).Theory of free energy and entropy in noncovalent binding. *Chemical Reviews*, Vol.109, No.9, pp. 4092-4107, ISSN 0009-2665 Zhou, H-X. & Gilson, M. K. (2009).Theory of free energy and entropy in noncovalent binding. *Chemical Reviews*, Vol.109, No.9, pp. 4092-4107, ISSN 0009-2665 Zwanzig, R. W. (1954). High-temperature equation of state by a perturbation method. I.

*Journal of Chemical Physics*, Vol.126, No.2, 026101, ISSN 0021-9606

*Chemistry*, Vol.8, No.4, pp. 281-283, ISSN 0097-8485

calculations: a theoretical connection to MM/PBSA and direct calculation of the association free energy. *Biophysics Journal*, Vol.86, No.1, pp. 67-74, ISSN 0006-

complement inhibitor compstatin investigated by all-atom molecular dynamics simulations. *Proteins: Structure, Function & Bioinformatics*, Vol.78, No 12, pp. 2655-

Design of a modified mouse protein with ligand binding properties of its human analog by molecular dynamics simulations: The case of C3 inhibition by compstatin. *Proteins: Structure, Function & Bioinformatics*, Vol. 79, No. 12, pp.3166-

C. A.; Morikis, D. & Archontis G. (2012). Molecular dynamics in drug design: New generations of Compstatin analogs. *Chemical Biology & Drug Design*, in press, ISSN

reveal long-range electrostatic interactions and substrate-assisted specificity in an aminoacyl-tRNAsynthetase.*Chembiochem.* Vol.7, No.2, pp. 337-344, ISSN 1439-

association – the dimerization of insulin. *Journal of Molecular Biology*, Vol.238, No.3,

E.; Scott, W. R. P. & Tironi, I. G. (1996). Biomolecular Simulation: The GROMOS96 Manual and User Guide; vdf Hochschulverlag AG an der ETH Zürich and

quantum mechanics/molecular mechanics/Poisson Boltzmann-surface area model.

reorganization free energy in protein-ligand binding affinity prediction. *Journal of the* 

Nonpolar gases. *Journal of Chemical Physics*, Vol.22, No.8, pp. 1420-1426, ISSN 0021-

*Physical Chemistry*, Vol. 43, pp. 407-435, ISSN 0066-426X

3495

2667, ISSN 0887-3585

3179, ISSN 0887-3585

pp. 405-414, ISSN 0022-2836

BIOMOS b.v. Zürich, Groningen

1747-0285

4227

9606

## **Micro-Heterogeneity in Complex Liquids**

Aurélien Perera1, Bernarda Keži´c1,2, Franjo Sokoli´c2 and Larisa Zorani´c2

<sup>1</sup>*Laboratoire de Physique Théorique de la Matière Condensée (UMR CNRS 7600), Université Pierre et Marie Curie, 4 Place Jussieu, F75252, Paris cedex 05, France* <sup>2</sup>*Department of Physics, Faculty of Sciences, University of Split, Nikole Tesle 12, 21000, Split* <sup>1</sup>*France* <sup>2</sup>*Croatia*

### **1. Introduction**

What is micro-heterogeneity and why is it important to the theory of liquids in particular, and for physical chemistry and even Physics in general, to deserve a full chapter devoted to it?

Physics was initially about discovering the sets of laws that would describe motions of single objects, such as planets or more casually a stone under the gravitational field of earth. The focus was a *single object*. Statistical thermodynamics was a revolution in the sense that it could describe matter as a set of multitudes of objects, in par with thermodynamics, which was all about heat and energy. It could describe various states of matter, gases liquids and solids. However, while the statistical description of gases and solids was still about describing single particles under the field created by surrounding particles, liquids stood in a very special place, since it was recognized very early in the 20th century that it was necessary to consider *correlations* between particles. This was thought to be such a serious obstacle that the great soviet Physicist Lev Landau said once that a true "theory of liquids could be neither convincing nor useful" (deGennes, 1977). It was the founders of a true statistical theory of liquids (Frish & Lebowitz, 1974) that showed how the formalism of correlation functions could be advantageously used in understanding various properties of the liquid phase. Now, we know how to calculate various physical properties, that are related to the continuum description of matter in its various disordered phases, from the statistical collection of microscopic objects and their correlations. In other words, we know how to go from single objects to a huge collection of them.

Now, we can ask a new question: can new "objects" emerge from such a statistical description of a discretized continuum? This type of question is partially answered by the high brow theories of fundamental particles, quantum field theories and such, wherein photon and electron can be "emerged" from a peculiarly structured vacuum, by analogy to how sound phonons emerge from structured solids (Wen, 2004). However, all these theories heavily rely on Quantum Mechanics, which is a whole different story. Here, we are particularly interested in the *classical* version of this question. What we intend to show here is that ordinary aqueous mixtures are the theater of the emergence of new objects, that are themselves made of the same constituents, but that are grouped in a particular way. This is what we call here "micro-heterogeneity" (MH). A very good example of micro-heterogeneity is the micelle, which emerges at the critical micellar concentration (CMC), from a disordered assembly of water and surfactant molecules(Floriano et al., 1999; Poland & Scheraga, 1965). This same example also serves to understand the difficulty of the concept of micro-heterogenity: what is the role played by the constituents in the appearance of such a new object, and how does the stability of such an object is affected by various microscopic and macroscopic parameters?

The peculiar nature of aqueous mixtures, as seen from various thermodynamical properties, has been acknowledged from the sixties by Frank et al. (Frank & Yves, 1966) and more recently by many authors such as Desnoyer (de Visser et al., 1977), Davies (Davies, 1993), and more systematically by Koga (Koga, 2007). They have observed that, for binary mixtures, many physical quantities such as the vapour pressure, for example, showed few weak kink-type "anomalies" when the mole fraction of one of the component was varied. This was in contrast with the monotonous behaviour of the same quantities for mixtures of ordinary polar substances, such as benzene-toluene, for example (Rowlinson & Swinton, 1982). For example, McAlister has reported in 1960 the viscosities for benzene-toluene and methanol-toluene(McAlister, 1960 ). The first has a monotonous almost linear variation in function of mole fraction, while that for the H-bonding substance has an "S" shape with 2 slope changes. These changes in slope were not signs of any known types of phase transitions: no discontinuities in the first or second derivatives of the Gibbs free energy were observed(Koga, 2007) The question was then: what could be the nature of the changes at the molecular level that could produce these changes in slope? Such typical changes in slope are shown in Fig.1 below for very different physical properties. The arrows indicate the position of the slope changes.

Fig. 1. Left: vapour pressure of aqueous-tbutanol mixture (Koga et al., 1990). Middle: excess enthalpy of aqueous-ethanol mixture (Lama & Lu, 1965) Right: absorbance frequencies of aqueous-Tbutanol mixture (Pradhan et al., 2008), a function of the respective alcohol mole fraction. The arrows indicate slope changes mentioned in the text

Since this is all about physical chemistry, these changes were traced back into the corresponding changes in enthalpy *H* or entropy *S*, and it was noticed that rather large compensating changes in these quantities were responsible for small changes in the Gibbs free-energy *G* = *H* − *TS*. This is the so-called enthalpy-entropy compensation that is often invoked in the "bio" context(Wiggins, 2008 ). Subsequent investigations did not lead to any particular clarifications based on the strict thermodynamical arguments, and the research along this direction has stalled.

here "micro-heterogeneity" (MH). A very good example of micro-heterogeneity is the micelle, which emerges at the critical micellar concentration (CMC), from a disordered assembly of water and surfactant molecules(Floriano et al., 1999; Poland & Scheraga, 1965). This same example also serves to understand the difficulty of the concept of micro-heterogenity: what is the role played by the constituents in the appearance of such a new object, and how does the stability of such an object is affected by various microscopic and macroscopic parameters? The peculiar nature of aqueous mixtures, as seen from various thermodynamical properties, has been acknowledged from the sixties by Frank et al. (Frank & Yves, 1966) and more recently by many authors such as Desnoyer (de Visser et al., 1977), Davies (Davies, 1993), and more systematically by Koga (Koga, 2007). They have observed that, for binary mixtures, many physical quantities such as the vapour pressure, for example, showed few weak kink-type "anomalies" when the mole fraction of one of the component was varied. This was in contrast with the monotonous behaviour of the same quantities for mixtures of ordinary polar substances, such as benzene-toluene, for example (Rowlinson & Swinton, 1982). For example, McAlister has reported in 1960 the viscosities for benzene-toluene and methanol-toluene(McAlister, 1960 ). The first has a monotonous almost linear variation in function of mole fraction, while that for the H-bonding substance has an "S" shape with 2 slope changes. These changes in slope were not signs of any known types of phase transitions: no discontinuities in the first or second derivatives of the Gibbs free energy were observed(Koga, 2007) The question was then: what could be the nature of the changes at the molecular level that could produce these changes in slope? Such typical changes in slope are shown in Fig.1 below for very different physical properties. The arrows indicate the position

Fig. 1. Left: vapour pressure of aqueous-tbutanol mixture (Koga et al., 1990). Middle: excess enthalpy of aqueous-ethanol mixture (Lama & Lu, 1965) Right: absorbance frequencies of aqueous-Tbutanol mixture (Pradhan et al., 2008), a function of the respective alcohol mole

Since this is all about physical chemistry, these changes were traced back into the corresponding changes in enthalpy *H* or entropy *S*, and it was noticed that rather large compensating changes in these quantities were responsible for small changes in the Gibbs free-energy *G* = *H* − *TS*. This is the so-called enthalpy-entropy compensation that is often invoked in the "bio" context(Wiggins, 2008 ). Subsequent investigations did not lead to any particular clarifications based on the strict thermodynamical arguments, and the research

fraction. The arrows indicate slope changes mentioned in the text

of the slope changes.

along this direction has stalled.

Computer simulations could be, in principle, the ideal tool to observe the microscopic molecular arrangements that could lead to kink-like variations in macroscopic quantities. However, computer simulations are severely restricted by system size considerations, that are due to the computational handling of the description of the individual motions of millions of particles. The leading paradigmatic dogma of computer simulations is that it is not necessary to study systems of the size of the Avogadro number, and that the much smaller size of thousands of particles is often more than enough to have a reliable statistical estimate of major thermodynamical quantities, such as the enthalpy, for example. However, recent advances in computer studies has confirmed that his paradigm fails for mixtures of complex liquids such as aqueous mixtures (Kezic et al., 2011; Mijakovic et al., 2011; Perera & Sokolic, 2004), as we will show later. The origin of such failure is still under debate, and can be attributed to two sources. The first is the accuracy of the force fields that describe the interaction between molecules, and which can be questioned. Accordingly, the force fields should be modified to account for these specific features under each conditions (Chitra & Smith, 2002; Lee & van der Vegt, 2005; Smith, 2004; Weerasinghe & Smith, 2003; 2005). The second is the existence of intrinsic phenomena, such as the micro-heterogeneity, that cannot be described properly with systems sizes currently in use (about few thousand particles), and that may require computational resources out of proportions in view of the modest scientific interest such systems could *a priori* require (Kezic et al., 2011; Mijakovic et al., 2011; Perera & Sokolic, 2004). These two different explanations differ from a fundamental point of view. The first attitude is to modify the interactions in order to *account* special extended correlations that can be associated with the appearance of large scale structures. However, since correlations are the consequence of interactions, this appear at first as some bootstrap procedure. The second attitude is to modify the computational constraint such that the large scale structures can be seen without artifacts related to periodicity. This second point of view is inspired from the simple idea that, if micro-heterogeneity is associated to a new object, with no precise shape, then current system sizes are simply to small to accomodate enough such "objects" for proper statistical sampling, not counting the fact that periodical boundary conditions may distort these "objects" artifically if the system is too small. From this perspective, the first attitude looks like introducing pseudo-potentials to adapt the system to small scales. If this is the case, is it possible to detect artifacts that they might introduce. We will show below how this discussion is illustrated by the case of aqueous acetone-water mixtures.

For now, it is instructive to compare these two points of view for the case of spontaneous micelle formation of long alcohol chains in water. Although the exact value of the CMC will depend on the various force fields available for water, the proper accouting of the formation of a *single* small micelle, of diameter 50Å, which is in par with experimental values (Tanford, 1974), would need a cubic simulation box of size 0.3*μm*, which would contain 2048 <sup>×</sup> 103 <sup>≈</sup> <sup>2</sup> million water molecules. Therefore, simulating realistic micelle formation of say few tens of such micelles is beyond the reach of current computational power. Needless to say, simulating realistic micro-emulsions is totally out of reach, except through simplified models(Floriano et al., 1999). The other point of view would require to modify the various interactions in order to enforce micelle formation within reasonable system sizes. It turns out that such calculations are possible, but often at the cost of eliminating a realistic solvent(Floriano et al., 1999). This example illustrates the dilemna offered by the two point of view.

What is the origin of this whole problematic, as sketched above for the case of micelle formation? Water and most solutes tend to micro-segregate, which means that solutes and water occupy different partitions in space, each partition being of few molecular sizes, but of no specific shape. This was suspected almost a half a century ago from thermodynamical arguments (Frank & Yves, 1966). A recent paper by the group of Soper in *Nature* (Dixit et al., 2002) acknowledged that even simple system as the methanol-water mixture exhibits micro-segregation. Following this re-discovery, many authors, and in particular our group, investigated different aqueous mixtures by computer simulations. It was quickly aknowledged that, unlike simple liquids and their mixtures, such as argon, or carbon dioxide, for example, molecules made of very different subgroups had a strong tendency to self-segregate. Alcohol molecules are a perfect example, which contains both the hydrophilic *OH* group and the hydrophobic *CHn* methyl groups. Liquids made of such molecules exhibit a strong local order, due to the tendency of OH and methyl groups to self-segregate, and which distinguishes them from ordinary Lennard-Jonesium liquids, or even polar liquids such as benzene, for example. The source of the difficulty mentioned in the beginning of this paragraph, is that complex liquids contain *two* scales of description, the original microscopic scale related to molecular size, just like simple liquids, and the newly emerged scale related to the size of the segregated domains, as well as the corresponding time scales. Therefore, computer simulations need some rescaling to accomodate a full statistical description of the larger scale phenomena. This problem hides in fact another fundamental problem, that of the description of disorder.

Liquids are fundamentally thought to be disordered systems (with few exceptions such as liquids crystals). To be more specific about this issue, we need to explain how order differs from disorder at the level of the statistical description of liquids. We will do this in the next section. For now, while order has a very precise statistical microscopic description, disorder is considered as generic. We feel that aqueous mixtures belong to a very special type of disorder, caracterised by the micro-heterogeneity, and where fluctuations in the number of particles in a given volume play an important role. It is this problem that is not well described by finite size simulations, for the very simple reason that such system contain *two* scales of description, the original microscopic scale related to molecular size, and the newly emerged mesoscopic scale related to the segregated domains, and needs special approaches that we will describe later in this chapter. In the next section, we will try to answer the question posed above: how to distinguish between different types of local orders in simple and complex liquids.

#### **2. Statistical description of liquids**

There are very good text books about the theory of liquids, among which the Hansen-MacDonald's celebrated *Theory of Simple Liquids* is the reference(Hansen & McDonald, 2006) we follow here to elaborate the theoretical description. If a liquid is made of N molecules in a volume V, each molecule *i* being described by its position*ri* and orientation **Ω***i*, the latter which is a set of Euler angles, then we use here the shorthand notation *i* = (*ri*, **Ω***i*) to denote a generalised "position". The instantaneous microscopic density of such a liquid at position 1 is given by:

$$\rho(1) = \sum\_{i=1}^{N} \delta(1 - i) \tag{1}$$

What is the origin of this whole problematic, as sketched above for the case of micelle formation? Water and most solutes tend to micro-segregate, which means that solutes and water occupy different partitions in space, each partition being of few molecular sizes, but of no specific shape. This was suspected almost a half a century ago from thermodynamical arguments (Frank & Yves, 1966). A recent paper by the group of Soper in *Nature* (Dixit et al., 2002) acknowledged that even simple system as the methanol-water mixture exhibits micro-segregation. Following this re-discovery, many authors, and in particular our group, investigated different aqueous mixtures by computer simulations. It was quickly aknowledged that, unlike simple liquids and their mixtures, such as argon, or carbon dioxide, for example, molecules made of very different subgroups had a strong tendency to self-segregate. Alcohol molecules are a perfect example, which contains both the hydrophilic *OH* group and the hydrophobic *CHn* methyl groups. Liquids made of such molecules exhibit a strong local order, due to the tendency of OH and methyl groups to self-segregate, and which distinguishes them from ordinary Lennard-Jonesium liquids, or even polar liquids such as benzene, for example. The source of the difficulty mentioned in the beginning of this paragraph, is that complex liquids contain *two* scales of description, the original microscopic scale related to molecular size, just like simple liquids, and the newly emerged scale related to the size of the segregated domains, as well as the corresponding time scales. Therefore, computer simulations need some rescaling to accomodate a full statistical description of the larger scale phenomena. This problem hides in fact another fundamental problem, that of the

Liquids are fundamentally thought to be disordered systems (with few exceptions such as liquids crystals). To be more specific about this issue, we need to explain how order differs from disorder at the level of the statistical description of liquids. We will do this in the next section. For now, while order has a very precise statistical microscopic description, disorder is considered as generic. We feel that aqueous mixtures belong to a very special type of disorder, caracterised by the micro-heterogeneity, and where fluctuations in the number of particles in a given volume play an important role. It is this problem that is not well described by finite size simulations, for the very simple reason that such system contain *two* scales of description, the original microscopic scale related to molecular size, and the newly emerged mesoscopic scale related to the segregated domains, and needs special approaches that we will describe later in this chapter. In the next section, we will try to answer the question posed above: how

to distinguish between different types of local orders in simple and complex liquids.

*ρ*(1) =

There are very good text books about the theory of liquids, among which the Hansen-MacDonald's celebrated *Theory of Simple Liquids* is the reference(Hansen & McDonald, 2006) we follow here to elaborate the theoretical description. If a liquid is made of N molecules in a volume V, each molecule *i* being described by its position*ri* and orientation **Ω***i*, the latter which is a set of Euler angles, then we use here the shorthand notation *i* = (*ri*, **Ω***i*) to denote a generalised "position". The instantaneous microscopic density of such a liquid at position 1

> *N* ∑ *i*=1

*δ*(1 − *i*) (1)

description of disorder.

**2. Statistical description of liquids**

is given by:

where *δ*() is a Dirac symbol. This expression simply states that a given molecule among the N in the sample is found at position 1. As such, this is an instantaneous snapshot of the system, when the position 1 is varied through the sample. In order to have a statistical estimate, one must average this microscopic density over some statistical ensemble, say here the Canonical ensemble when N, V and the temperature T are fixed. One has then the first observable of the system, the microscopic density defined as:

$$
\rho^{(1)}(1) = <\rho(1)>\tag{2}
$$

where the bracket <> denotes the statistical average defined for any microscopic quantity *A* as:

$$= \frac{1}{z\\_N} \int d1...dN \, A \, \exp\[-\beta V\(N\)\]\tag{3}$$

where the integral is carried over the generalized positions of the N molecules, and *V*(*N*) is the total interaction energy of the N particles, *β* = 1/*kBT* is the Boltzmann factor, with *T* the temperature and *kB* the Boltzmann constant. *ZN* <sup>=</sup> *<sup>d</sup>*1...*dN* exp[−*βV*(*N*)] is the Canonical ensemble partition function.

If *ρ*(1) is considered as a random variable, then one can construct a whole family of statistical correlations, such as the pair correlation function, which is defined as:

$$
\rho^{(2)}(1,2) = <\rho(1)\rho(2)>\tag{4}
$$

and more generally, the n-body correlation function

$$\rho^{(n)}(1,\ldots,n) = <\prod\_{i=1}^{n} \rho(i) > \tag{5}$$

In practice, only *ρ*(1)(1) and *ρ*(2)(1, 2) are important quantities. Indeed, most thermodynamical quantities, such as for example the pressure or the enthalpy, can be expressed solely as weighted averages of these two functions. For the matter that concerns us here, a detailed discussion of these 2 functions is at order.

#### **2.1 One-body and two-body functions**

The one-body function *ρ*(1)(1) describes the order of the system. If the system is subject to a symmetry breaking field, such as an electric or magnetic field, or under the presence of a wall, then such a system is spatially and/or orientationally inhomogeneous, and the distribution of each particle needs to be specified with respect to the field. Hence, *ρ*(1)(1) depends explicitly on the variable 1. For example, if the liquid is next to a wall, then if the z-axis is chosen to be perpendicular to the wall, one has *ρ*(1)(1) = *ρ*(1)(*z*1), where *z*<sup>1</sup> is the z-coordinate of particle 1 to the wall. If the symmetry breaking field is a magnetic field � *B*, then *ρ*(1)(1) = *ρ*(1)(**Ω**1.� *B*) which depends on the orientation of molecule 1 with respect to the field. The mathematical reason for such situation is to be found in Eq.(3): the interaction energy V(N) includes the interaction of each of the particles with the field, hence the average in Eq.(2) also depends on the field's position. In contrast, for a disordered system, the interaction energy depends only on the position of all the particles. Therefore, the integral in Eq.(2) integrates out all coordinates, and one has simply the number density of the system:

$$
\rho^{(1)}(1) = \rho = \frac{N}{V} \tag{6}
$$

This is a very important result, because it states that the usual statistical description of a liquid cannot differentiate a Lennard-Jonesium liquid from an alcohol at the level of the one-body distribution. In other words, the strong local order of the second type of liquid is not addressed by this function. A *crystal* of any of the two liquids will have a non-trivial *ρ*(1)(1) which will contain the specificities of the crystallin order. But for both *liquids*, it will be the *same* boring number density *ρ* = *<sup>N</sup> V* .

This has a very strong implication: if we want to differentiate the two liquids according to their local orders, we need to look for one step higher in the correlation functions, namely the two-body correlation function. It is convenient to introduce the distribution function *g*(1, 2) as a measure of the correlation between *ρ*(2)(1, 2) and *ρ*(1)(1)*ρ*(1)(2):

$$
\rho^{(2)}(1,2) = \rho^{(1)}(1)\rho^{(1)}(2)\mathbf{g}(1,2)\tag{7}
$$

More generally, one defines the n-body distribution functions as

$$\boldsymbol{\rho}^{(n)}(1,\ldots,n) = \prod\_{i=1}^{n} \boldsymbol{\rho}^{(1)}(i)\boldsymbol{g}^{(n)}(1,\ldots,n) \tag{8}$$

For the case of a disordered system, from Eq.(6) one has

$$
\rho^{(2)}(1,2) = \rho^2 \mathbf{g}(1,2) \tag{9}
$$

which indicates that both functions differ only by a scalar *ρ*2. Since correlations decouple when particles are infinitely far apart, the relation above contains the very important limiting law. We will see that this law is violated in computer simulation for intrinsic reasons.

$$\lim\_{r \to \infty} g(1, 2) = 1 \tag{10}$$

Much the same way one can define pair correlation functions between individual sites on each molecules. This is particularly convenient when considering molecules made of sites, which is often the case for realistic molecules. The force fields *v*(1, 2) between 2 molecules that we will consider here are expressed entirely in terms of site-site interactions:

$$v(1,2) = \sum\_{i,j} v\_{ij}(r\_{ij}) \tag{11}$$

where the sum runs over all the sites i on molecule 1 and j on molecule 2, with *rij* being the radial distance between 2 such sites. In such case, one defines site-site correlation functions as *gij*(*rij*) . In practice, we will consider site-site interactions made of 2 terms, a Lennard-Jones (LJ) term that accounts for repulsive and dispersion interactions, and a Coulomb terms that handles electrostatic interaction between the partial charges *q* located at the center of each sites:

$$v\_{i\bar{j}}(r\_{i\bar{j}}) = 4k\_B T \epsilon\_{i\bar{j}} [(\frac{\sigma\_{i\bar{j}}}{r\_{i\bar{j}}})^{12} - (\frac{\sigma\_{i\bar{j}}}{r\_{i\bar{j}}})^6] + k\_B T \frac{q\_i q\_{\bar{j}}}{r\_{i\bar{j}}} \tag{12}$$

(1) = *<sup>ρ</sup>* <sup>=</sup> *<sup>N</sup>*

This is a very important result, because it states that the usual statistical description of a liquid cannot differentiate a Lennard-Jonesium liquid from an alcohol at the level of the one-body distribution. In other words, the strong local order of the second type of liquid is not addressed by this function. A *crystal* of any of the two liquids will have a non-trivial *ρ*(1)(1) which will contain the specificities of the crystallin order. But for both *liquids*, it will be

This has a very strong implication: if we want to differentiate the two liquids according to their local orders, we need to look for one step higher in the correlation functions, namely the two-body correlation function. It is convenient to introduce the distribution function *g*(1, 2)

> *n* ∏ *i*=1 *ρ*(1)

which indicates that both functions differ only by a scalar *ρ*2. Since correlations decouple when particles are infinitely far apart, the relation above contains the very important limiting

Much the same way one can define pair correlation functions between individual sites on each molecules. This is particularly convenient when considering molecules made of sites, which is often the case for realistic molecules. The force fields *v*(1, 2) between 2 molecules that we

*i*,*j*

where the sum runs over all the sites i on molecule 1 and j on molecule 2, with *rij* being the radial distance between 2 such sites. In such case, one defines site-site correlation functions as *gij*(*rij*) . In practice, we will consider site-site interactions made of 2 terms, a Lennard-Jones (LJ) term that accounts for repulsive and dispersion interactions, and a Coulomb terms that handles electrostatic interaction between the partial charges *q* located at the center of each

> *σij rij*

)<sup>12</sup> <sup>−</sup> (

*σij rij*

)6] + *kBT*

*qiqj rij*

(12)

law. We will see that this law is violated in computer simulation for intrinsic reasons.

*v*(1, 2) = ∑

(1)*ρ*(1)

(*i*)*g*(*n*)

*<sup>V</sup>* (6)

(2)*g*(1, 2) (7)

(1, .., *n*) (8)

(1, 2) = *ρ*2*g*(1, 2) (9)

lim*r*→<sup>∞</sup> *<sup>g</sup>*(1, 2) = <sup>1</sup> (10)

*vij*(*rij*) (11)

coordinates, and one has simply the number density of the system:

the *same* boring number density *ρ* = *<sup>N</sup>*

sites:

*ρ*(1)

*V* .

(1, 2) = *ρ*(1)

(1, ..., *n*) =

*ρ*(2)

will consider here are expressed entirely in terms of site-site interactions:

*vij*(*rij*) = 4*kBT�ij*[(

as a measure of the correlation between *ρ*(2)(1, 2) and *ρ*(1)(1)*ρ*(1)(2):

*ρ*(2)

More generally, one defines the n-body distribution functions as

*ρ*(*n*)

For the case of a disordered system, from Eq.(6) one has

The diameters *σij* and energy wells *�ij* are handled as usual in terms of the individual site diameters *σ<sup>i</sup>* and energy wells *�<sup>i</sup>* by the Lorentz rule *σij* = (*σ<sup>i</sup>* + *σj*)/2 and Berthelot rule *�ij* = √*�i�<sup>j</sup>* .

Many thermodynamical properties can be expressed in terms of these correlation functions. For example, the isothermal compressibility *κ<sup>T</sup>* = (1/*ρ*)(*∂ρ*/*∂P*)*<sup>T</sup>* can be expressed with any of the correlations between pairs of sites (i,j) (Hansen & McDonald, 2006)

$$\kappa\_T = \frac{1 + \rho \int d\vec{r} (g\_{i\vec{j}}(r) - 1)}{\rho k\_B T} \tag{13}$$

Similary, the configurational part of the internal energy per particle is given by

$$E/N = \frac{\rho k\_B T}{2} \sum\_{\vec{i}\vec{j}} x\_{\vec{i}} x\_{\vec{j}} \int d\vec{(r)} g\_{\vec{i}\vec{j}}(r) v\_{\vec{i}\vec{j}}(r) \tag{14}$$

Structure factors can be defined by Fourier transforms of the site-site correlation functions

$$S\_{\vec{l}\vec{j}}(k) = 1 + \rho \sqrt{\mathbf{x}\_{\vec{i}}\mathbf{x}\_{\vec{j}}} \int d\vec{r} [g\_{\vec{i}\vec{j}}(r) - 1] \exp(-i\vec{k}.\vec{r})\tag{15}$$

where it is assumed that *xi* stands for the mole fraction of species to which site *i* belongs, which is a convenient shorthand notation. These quantities can be directly compared with those extracted from the scattering intensity obtained by neutron or X-ray scattering techniques (Hansen & McDonald, 2006), and are a useful alternate description of the microscopic structure of liquids.

#### **2.2 The difference in disorder between a simple and a complex liquid**

Now, we have to figure out how to distinguish between a disordered Lennard-Jonesium liquid and a disordered alcohol by using only *g*(1, 2) in each case. In order to do that efficiently, we are going to consider on one hand the methanol molecule as described by an interaction of the type Eq.(11) with OPLS force field parameters (Jorgensen, 1986), and on the other hand the same model but stripped of all the partial charges that account for H-bond interactions, and that are at the origin of the complexity of this liquid. Both liquids are simulated in ambient conditions, and the density is taken to be that of pure methanol. We call this second model the bare model. In Fig.2 below, we have plotted typical site-site correlation functions for each of the two liquids, as well as the corresponding structure factors.

Both models show very similar methyl sites M-M correlations because these sites are uncharged in both models. Such correlations are typical of dense liquids. This is also seen from the correesponding structure factors shown in Fig.3.

However, the correlation between the oxygen atoms show an entirely different packing structure between the two models. The bare model shows weak O-O correlations contrary to the OPLS model. There is a very sharp first neighbour correlations, which is due to the strong hydrogen bonding interaction of the OPLS model, but the correlations further apart look very much featureless, with very weak oscillations that indicate weak packing structure, in opposition to those of the bare model. In fact, these weak correlations are due to the

Fig. 2. Site-site correlation functions for OPLS methanol (red) and bare model(blue) (top) Oxygen-oxygen correlations (bottom) methyl-methyl correlations

Fig. 3. Structure factors corresponding to the figure above. The arrows in the top panel show the 2 peak structure discussed in the text

fact that methanol molecules form chain-like clusters attached by O-H..O hydrogen bonds, therefore the OO correlations exist only along such chains and are spatially weaker than the packing correlation, although they are very strong along the chains. The periodicity of these oscillations is modulated by the average size of the clusters, say *σ<sup>C</sup>* > *σ*. Therefore, the main

Fig. 2. Site-site correlation functions for OPLS methanol (red) and bare model(blue) (top)

Fig. 3. Structure factors corresponding to the figure above. The arrows in the top panel show

fact that methanol molecules form chain-like clusters attached by O-H..O hydrogen bonds, therefore the OO correlations exist only along such chains and are spatially weaker than the packing correlation, although they are very strong along the chains. The periodicity of these oscillations is modulated by the average size of the clusters, say *σ<sup>C</sup>* > *σ*. Therefore, the main

Oxygen-oxygen correlations (bottom) methyl-methyl correlations

the 2 peak structure discussed in the text

peak of the OO structure factor will be centered at a *smaller* wave number *kmax* ≈ 2*π*/*σ<sup>C</sup>* < *kM*. Although both liquids are disordered, methanol exhibits considerable local order which can be tracked through specific features of some site-site correlations. In fact, methanol has *emerged meta-molecules* -the clusters- *within* its disordered structure, and the pre-peak of the structure factor indicates the presence of such objects, just like the main peak of the structure factor of the simple liquid indicates the size of the core of the particles. These newly emerged objects are to be compared with the micelles that we mentioned above. We see that the order in complex liquids is not just near neighbour ordering, such as dipole-dipole alignment for example, but it is all about *the emergence of new larger "particles"*. This is the main message of this whole chapter, that we will develop with other examples. When considering neat liquids, we will characterise its peculiar local structure by the wording *micro-structure*, while in the case of mixtures we will use the wording *micro-heterogeneity*. Indeed, mixtures of alcohol and simple liquids, such as the methanol-acetone mixture for example, exhibit local segregation, which is due to the fact that methanol molecules tend to self-associate through H-bond. Once this difference between ordinary liquids and complex liquids is admitted, we can ask other questions that are central to liquids, namely how density fluctuations affect the local order and the stability of the liquid state.

#### **2.3 Local order, correlation length and stability**

The notion of disorder is intimately related to the concept of stability. In fact, when local order develops too much, the system can undergo a mechanical breakdown and loose its stability. This is what happens when a gas-liquid phase transition occurs, for example. If we cool a gas, or increase its density, molecules start to cluster in larger groups, and the local order increases while the gas becomes metastable. This increase of order can be tracked through the intrinsic correlation length *ξ*, which should be distinguished from the molecular size *σ<sup>M</sup>* and the emergent cluster size *σC*. The clusters of the metastable gas phase are characterised by the correlation length *ξ* which is a measure of the size of the correlations. This correlation lenght is about *σ<sup>M</sup>* in stable phase, and increases rapidly in the metastable region. One way to define this length is to look at the decay of the center-of-mass correlations, which behave as:

$$\lim\_{r \to \infty} \text{gcc}(r) = 1 + \frac{A}{r} \exp[-\frac{r}{\mathfrak{f}}] \tag{16}$$

This is an exact relation that is always true in disordered systems. It can be derived through the Ornstein-Zernike equation that is central to the theory of liquids(Hansen & McDonald, 2006). One sees that, if the correlation length diverges, the correlation functions develops a algebraic 1/*r* tail. This occurs exactly at the spinodal and witnesses the loss of the mechanical stability of the system.

Associated complex liquids, such as methanol, also undergo gas-liquid phase separation through the same mechanism described above. It is interesting to ask how the intrinsic clusters affect the universal gas-liquid phase separation. In fact, this problem of the stability is more interesting in the case of mixtures, which can undergo a liquid-liquid phase separation controled by the same physical phenomena of the loss of mechanical stability due to an increase of the correlation length *ξ*. This is a serious source of problems for mixtures when they are studied by computer simulations. How to distinguish between micro-phase separation and true phase separation within a computer simulation? Micro-phase separation is akin to micelle formation, when the alcohol molecules self-segregates themselves in domains separated from water. If one simulates a systems too small to accomodate a single micelle, then it is quite probable that one would witness a full phase separation, which is a misleading picture. The only way to answer this question properly is to increase the size of the system, which may not be feasible if the number of molecules to handle becomes too large.

#### **2.4 Kirkwood-Buff integrals and computer simulations**

The integrals of the correlation functions can be related to the mechanical stability. For each pair of sites (*ai*, *bj*) with *ai* on molecule of species *i* and *bj* on molecule of species *j*, the corresponding site-site correlation function are noted *gij*(*r*) we defined running integrals (RKBI) *Gij*(*r*) as:

$$\mathcal{G}\_{\vec{ij}}(r) = 4\pi \int\_0^r dt \, t^2 [g\_{\vec{ij}}(t) - 1] \tag{17}$$

In 1951 Kirkwood and Buff(Kirkwood & Buff, 1951) showed how the integrals of the correlation functions *Gij* = *Gij*(*r* → ∞) were related to physical quantities such as the isothermal compressibility *κT*, the total volume *V* and the partial molar volumes *V*¯ *<sup>a</sup>* of each species *a* , as well as density derivatives of the chemical potentials (*∂μi*/*∂ρj*)*T*,*ρ<sup>k</sup>* . These relations can be inverted to express the Kirkwood-Buff integrals (KBI) *Gij* themselves in terms of the physical quantities. For a binary mixture, these relations can be condensed into the form:

$$\mathbf{G}\_{\rm ij} = \mathbf{G}\_{\rm 12} \delta\_{\rm ij} + (\beta \kappa\_T - \frac{\bar{V}\_1 \bar{V}\_2}{D})(1 - \delta\_{\rm ij}) - \frac{1}{\varkappa\_{\rm j}} (\frac{\bar{V}\_j}{D} - V)\delta\_{\rm ij} \tag{18}$$

where *xi* is the mole fraction of species i , *δij* is a Kronecker symbol, and

$$D = \frac{\rho\_1}{\rho\_2} (\partial \beta \mu\_1 / \partial \rho\_1)\_T = \frac{\rho\_2}{\rho\_1} (\partial \beta \mu\_2 / \partial \rho\_2)\_T \tag{19}$$

where *ρ<sup>i</sup>* = *xiρ* is the partial density of species *i* and the second equality holds because of the Gibbs-Duhem equality *ρ*1*δμ*<sup>1</sup> + *ρ*2*δμ*<sup>2</sup> = 0.

This inversion operation was initially thought to provide some insight into the structure of the aqueous mixtures(Ben-Naim, 1977). In a landmark paper(Matteoli E. & Lepori L., 1984), Matteoli and Lepori have provided the behaviour of the experimental KBI for a variety of binary mixtures. The reproduction of this data by computer simulations and its interpretation is still a challenging open problem.

In 1961 Lebowitz and Percus have shown(Lebowitz& Percus, 1961) that the asymptote of *gij*(1, 2) as obtained in a finite size system, even when periodically extended is not 1, as would be expected from Eq(16) but rather

$$\lim\_{r \to \infty} g\_{ij}(1, 2) = 1 - \frac{1}{N \rho \sqrt{\mathcal{X}\_i \mathcal{X}\_j}} (\frac{\partial \rho\_i}{\partial \beta \mu\_j}) \tag{20}$$

This relation has been largely ignored, mainly because the exact asymptote of *gij* is never required in practice, and most thermodynamical properties are computed directly in simulations, instead of using relations such as Eqs.(13,14). We have checked(Zoranic et al., 2007) for many systems that the incorrect asymptote in Eq.(20) is never a serious problem

is akin to micelle formation, when the alcohol molecules self-segregates themselves in domains separated from water. If one simulates a systems too small to accomodate a single micelle, then it is quite probable that one would witness a full phase separation, which is a misleading picture. The only way to answer this question properly is to increase the size of the system, which may not be feasible if the number of molecules to handle becomes too large.

The integrals of the correlation functions can be related to the mechanical stability. For each pair of sites (*ai*, *bj*) with *ai* on molecule of species *i* and *bj* on molecule of species *j*, the corresponding site-site correlation function are noted *gij*(*r*) we defined running integrals

> *r* 0

isothermal compressibility *κT*, the total volume *V* and the partial molar volumes *V*¯

In 1951 Kirkwood and Buff(Kirkwood & Buff, 1951) showed how the integrals of the correlation functions *Gij* = *Gij*(*r* → ∞) were related to physical quantities such as the

species *a* , as well as density derivatives of the chemical potentials (*∂μi*/*∂ρj*)*T*,*ρ<sup>k</sup>* . These relations can be inverted to express the Kirkwood-Buff integrals (KBI) *Gij* themselves in terms of the physical quantities. For a binary mixture, these relations can be condensed into the

> 1*V*¯ 2

(*∂βμ*1/*∂ρ*1)*<sup>T</sup>* <sup>=</sup> *<sup>ρ</sup>*<sup>2</sup>

where *ρ<sup>i</sup>* = *xiρ* is the partial density of species *i* and the second equality holds because of the

This inversion operation was initially thought to provide some insight into the structure of the aqueous mixtures(Ben-Naim, 1977). In a landmark paper(Matteoli E. & Lepori L., 1984), Matteoli and Lepori have provided the behaviour of the experimental KBI for a variety of binary mixtures. The reproduction of this data by computer simulations and its interpretation

In 1961 Lebowitz and Percus have shown(Lebowitz& Percus, 1961) that the asymptote of *gij*(1, 2) as obtained in a finite size system, even when periodically extended is not 1, as would

This relation has been largely ignored, mainly because the exact asymptote of *gij* is never required in practice, and most thermodynamical properties are computed directly in simulations, instead of using relations such as Eqs.(13,14). We have checked(Zoranic et al., 2007) for many systems that the incorrect asymptote in Eq.(20) is never a serious problem

*Nρ* √*xixj*

lim*r*→<sup>∞</sup> *gij*(1, 2) = <sup>1</sup> <sup>−</sup> <sup>1</sup>

*<sup>D</sup>* )(<sup>1</sup> <sup>−</sup> *<sup>δ</sup>ij*) <sup>−</sup> <sup>1</sup>

*ρ*1

*xi* ( *V*¯ *j*

( *∂ρ<sup>i</sup> ∂βμ<sup>j</sup>*

*dt t*2[*gij*(*t*) <sup>−</sup> <sup>1</sup>] (17)

*<sup>D</sup>* <sup>−</sup> *<sup>V</sup>*)*δij* (18)

) (20)

(*∂βμ*2/*∂ρ*2)*<sup>T</sup>* (19)

*<sup>a</sup>* of each

*Gij*(*r*) = 4*π*

*Gij* <sup>=</sup> *<sup>G</sup>*12*δij* + (*βκ<sup>T</sup>* <sup>−</sup> *<sup>V</sup>*¯

*<sup>D</sup>* <sup>=</sup> *<sup>ρ</sup>*<sup>1</sup> *ρ*2

Gibbs-Duhem equality *ρ*1*δμ*<sup>1</sup> + *ρ*2*δμ*<sup>2</sup> = 0.

is still a challenging open problem.

be expected from Eq(16) but rather

where *xi* is the mole fraction of species i , *δij* is a Kronecker symbol, and

**2.4 Kirkwood-Buff integrals and computer simulations**

(RKBI) *Gij*(*r*) as:

form:

for many quantities. This problem has resurfaced only recently when the computation of the KBI through Eq.(17) requires the proper asymptote. In the absence of any estimate of the correction, we have devised an empirical way of correcting for the asymptote which allows a very good estimate of the KBI in some systems(Mijakovic et al., 2011; Perera et al., 2007). It consists in shifting the incorrect asymptote value *aij* to 1 with the help of a switch function *Sij*(*r*):

$$\mathbf{g}\_{ij}^{(corrected)}(r) = \mathbf{g}\_{ij}(r)[1 + (1 - a)\mathbf{S}\_{ij}(r)] \tag{21}$$

with *Sij*(*r*) = 0.5(1 + tanh((*r* − *Rij*)/*κij*), where we take the switch distance *Rij* = *σ<sup>i</sup>* + *σ<sup>j</sup>* and the switch smoothness *κij* = 1Å. These values guarantee that the RKBI defined in Eq.(17) is unaltered for the first few neighbours. An illustration of the need for such a correction is given in Fig.4 below for the case of liquid methanol (the OPLS model was used). We look at the O-O and M-M site-site correlation functions (top panel) (O=oxygen atom and M=*CH*<sup>3</sup> methyl group considered as a pseudo atom) and they appear to go to 1 at long range as expected. A closer look, however, (middle panel) shows that they both go to an asymptote slightly lower (the cyan line), as predicted by Eq(20). We therefore use a shifting function (shown in green) to shift the asymptote by avoiding to shift it for the first neighbours. The bottom panel shows the RKBI for shifted and unshifted functions. Only the functions corrected through Eq(21) have the proper horizontal asymptote, in actual very good agreement with the experimental value(Perera et al., 2007) shown as magenta line. It is seen that the incorrect asymptote leads to a typical curved RKBI as can be seen in several works that are unaware of this problem.

#### **2.5 Illustration of the failure of computer simulations: the acetone-water mixtures**

When we tackled the acetone-water mixture back in 1998(Perera & Sokolic, 2004), we were not fully aware of the problems we would meet. Indeed, the credo of computer simulations is that near exact results are to be obtained if "proper" conditions are met. The acetone-water system puts this credo down! The first problem was with force fields. There were 3 force fields available for neat acetone, the OPLS(Jorgensen et al., 1990) and the FHMK(Ferrario et al., 1990) that were both produced around the 1970s and checked to give reasonnably accurate thermophysical properties of liquid acetone. The recently introduced WS force field(Weerasinghe & Smith, 2003) that was adjusted to give proper magnitude of the KBI of the acetone-water mixtures and was supposed to avoid the demixion problems encountered with the older force fields. This model is interesting because it shows that altering the force field instead of properly handling the second scale of the system leads to unphysical behaviour of the KBIs. Fig.5 below shows the KBI as obtained from various experiments and from computer simulations using several combinations of force fields for acetone(OPLS,FHMK,WS) and for water(SPC/E, TIP4P).

There are several striking features in this figure. Perhaps the most apparent is that the KBIs obtained by most simulations are 7 fold larger than the experimental values. The physical reason for such large values is due to the demixing behaviour of these simulated mixtures: they tend to overestimate the correlation length *ξ* in Eq.(16), hence leading to large values of the KBIs. The second feature is that, the KBI obtained by the WS the force field, which has been adjusted in order to lower the values of the KBIs, does indeed lead to the proper magnitude of these latter, but also to a very different physical behaviour: the water-water KBI monotonously increases with acetone mole fraction instead of going through a maximum

Fig. 4. Illustration of asymptote correction for neat liquid methanol(see Eq(21) and the text for explanations)

around acetone mole fraction 0.6. This latter feature can be interpreted in two different ways: on one hand it could be considered as a small price to pay to avoid demixing and keep the KBI under reasonable scale, on the other hand it could be interpreted as incorrectly describing the structure of the micro-heterogeneity of these systems. This latter interpretation has for now a rather thin theoretical argument, since we don't know how to differentiate between two different topologies of the micro-structure within computer simulations. This is illustrated in Fig.6 below by showing 3 snapshots illustrating the very different scale of the MH in the three different simulations.

It can be seen that the water and acetone microdomains are not of the same size in each of the shots. Snapshot 1 looked like a detail of snapshot 3. However the same simulation performed in a larger system shows a clear demixion (middle snapshot). On the other hand, the model of snapshot 3 shows incorrect KBI behaviour, without maximum, while the two others have the proper KBI behaviour, but are 7 times too large. We believe that proper KBI will be obtained when the size of the micro-domains will be the intermediate between the two extreme shown above. It is particularly difficult to figure out which parts of the acetone force field should be altered in order to arrive at such a result. Since the force fields for both neat systems are relatively good, one may ask if it is the use of the LB rules that creates the difficulties encountered here. It turns out that the Coulomb interaction energies are in general 10 times

Fig. 4. Illustration of asymptote correction for neat liquid methanol(see Eq(21) and the text

around acetone mole fraction 0.6. This latter feature can be interpreted in two different ways: on one hand it could be considered as a small price to pay to avoid demixing and keep the KBI under reasonable scale, on the other hand it could be interpreted as incorrectly describing the structure of the micro-heterogeneity of these systems. This latter interpretation has for now a rather thin theoretical argument, since we don't know how to differentiate between two different topologies of the micro-structure within computer simulations. This is illustrated in Fig.6 below by showing 3 snapshots illustrating the very different scale of the MH in the three

It can be seen that the water and acetone microdomains are not of the same size in each of the shots. Snapshot 1 looked like a detail of snapshot 3. However the same simulation performed in a larger system shows a clear demixion (middle snapshot). On the other hand, the model of snapshot 3 shows incorrect KBI behaviour, without maximum, while the two others have the proper KBI behaviour, but are 7 times too large. We believe that proper KBI will be obtained when the size of the micro-domains will be the intermediate between the two extreme shown above. It is particularly difficult to figure out which parts of the acetone force field should be altered in order to arrive at such a result. Since the force fields for both neat systems are relatively good, one may ask if it is the use of the LB rules that creates the difficulties encountered here. It turns out that the Coulomb interaction energies are in general 10 times

for explanations)

different simulations.

Fig. 5. KBI for acetone-water mixtures. (lines) expt data from (Perera et al., 2006), (green triangles) WS+SPC/E MD data from (Weerasinghe & Smith, 2003), (squares) FHMK+SPC/E, (open circles)OPLS+SPC/E (red circles) OPLS+TIP4P (the last 3 data are shown divided by 7)

Fig. 6. Snapshots acetone water. (right) OPLS/SPCE N=864, (middle) OPLS/SPCE N=2048, (Left) WS/SPCE N=2048,

larger than the corresponding LJ ones(Zoranic et al., 2007). Hence, it is difficult that a small interaction alone could be responsible for these variation. It seems more appropriate to think that it is a particular combination of changes of both energies that would reproduce the correct MH. Of course, such a change should not alter the neat systems in any undesirable way.

The problem of finding a proper force field for this particular mixture is still an open problem.

#### **3. Micro-structure of complex neat liquids**

As stated in the Introduction, we find instructive to differentiate between the local order in neat liquids and mixtures. We will describe the latter in the next section. The local order in simple liquids, such as nitrogen, is simply the dense packing structure found in hard spheres. When considering small polar molecules, such as benzene for example, the preferred orientations of the dipoles tend to create some local order. For the moderate values of the dipole moments, this local order does not alter significatively the hard-sphere liquid packing structure, for example to the point of creating specific clusters. Chain-like stuctures appear only for very large values of dipole moments. The H-bond interaction is very strong, of the order of 200 kJ/mol at the contact distance of 3Åfor SPC/E water model (Zoranic et al., 2007). As a consequence, it creates a strong local order in forms of specific clusters, which can be detected by various experimental techniques(Guo et al., 2003; Ludwig, 2005).

Since clusters are meta-objects, one can re-consider the system as a fluid of clusters floating in the middle of a sea of monomers. This way, one can measure the cluster-cluster interaction and their correlation functions. However, there is a problem of defining the cluster as a well defined entity. In order to achieve this, one needs the cluster distribution function in terms of the monomers. It turns out that clusters in complex liquids of small molecules are not defined as sharply as micelles, and therefore cannot be easily considered as meta-molecules the same way the constituent molecules are themselves. One can rather consider such clustered liquid as a "plasma", in the sense that the meta-particles are "broken" into their constituents. The picture that emerges from this description is that complex liquids are akin to some primordial state of matter. This point of view open new interesting theoretical perspectives from statistical mechanics of liquids.

For now, let us explore the details of self-clustering in some H-bonded liquids. We define the cluster distribution probability in terms of sites. This has the advantage of distinguishing the topology of how the molecules associate into clusters. This probability is defined as:

$$P(n) = \frac{\sum\_{k} \text{s}(n, k)}{\sum\_{m} \sum\_{k} \text{s}(m, k)} \tag{22}$$

where *s*(*n*, *k*) represents the number of clusters of size *n* in the configuration *k* (Perera et al., 2007; Zoranic et al., 2007).

As shown in the Theory section, the presence of specific clusters is detected through the pre-peak of the site-site structure factors that are related to the cluster forming interaction, namely the H-bonding interaction. So, we have two independant tools to detect clusters in neat liquids.

#### **3.1 Acohols**

The probability distributions for some of first alcohols in the nomenclatura, namely methanol, ethanol and tert-butanol(TBA) are shown below in Fig.7. Both bonding oxygen O sites and non-bonding methyl M sites are shown. These are obtained by computer simulations of the OPLS models(Jorgensen, 1986), with N=2048 molecules, under ambient conditions.

One can see that all three alcohols have a specific cluster peak about n=5. The pure monomer probability indicates the amount of clustering: a monomer cluster probability higher than that of the specific cluster indicates that most molecules are free monomers. Hence, methanol appears as the least clustered of the three alchohols, while TBA is almost entirely clustered. There may be a topological reason related to geometry of the molecules. Indeed, since the strenght of the H-bonding interactions is the same for all models (the underlying electrostatic forces are the same), the difference in clustering can only come from molecular shapes. Since

orientations of the dipoles tend to create some local order. For the moderate values of the dipole moments, this local order does not alter significatively the hard-sphere liquid packing structure, for example to the point of creating specific clusters. Chain-like stuctures appear only for very large values of dipole moments. The H-bond interaction is very strong, of the order of 200 kJ/mol at the contact distance of 3Åfor SPC/E water model (Zoranic et al., 2007). As a consequence, it creates a strong local order in forms of specific clusters, which can be

Since clusters are meta-objects, one can re-consider the system as a fluid of clusters floating in the middle of a sea of monomers. This way, one can measure the cluster-cluster interaction and their correlation functions. However, there is a problem of defining the cluster as a well defined entity. In order to achieve this, one needs the cluster distribution function in terms of the monomers. It turns out that clusters in complex liquids of small molecules are not defined as sharply as micelles, and therefore cannot be easily considered as meta-molecules the same way the constituent molecules are themselves. One can rather consider such clustered liquid as a "plasma", in the sense that the meta-particles are "broken" into their constituents. The picture that emerges from this description is that complex liquids are akin to some primordial state of matter. This point of view open new interesting theoretical perspectives

For now, let us explore the details of self-clustering in some H-bonded liquids. We define the cluster distribution probability in terms of sites. This has the advantage of distinguishing the

*<sup>P</sup>*(*n*) = <sup>∑</sup>*<sup>k</sup> <sup>s</sup>*(*n*, *<sup>k</sup>*)

where *s*(*n*, *k*) represents the number of clusters of size *n* in the configuration *k* (Perera et al.,

As shown in the Theory section, the presence of specific clusters is detected through the pre-peak of the site-site structure factors that are related to the cluster forming interaction, namely the H-bonding interaction. So, we have two independant tools to detect clusters in

The probability distributions for some of first alcohols in the nomenclatura, namely methanol, ethanol and tert-butanol(TBA) are shown below in Fig.7. Both bonding oxygen O sites and non-bonding methyl M sites are shown. These are obtained by computer simulations of the

One can see that all three alcohols have a specific cluster peak about n=5. The pure monomer probability indicates the amount of clustering: a monomer cluster probability higher than that of the specific cluster indicates that most molecules are free monomers. Hence, methanol appears as the least clustered of the three alchohols, while TBA is almost entirely clustered. There may be a topological reason related to geometry of the molecules. Indeed, since the strenght of the H-bonding interactions is the same for all models (the underlying electrostatic forces are the same), the difference in clustering can only come from molecular shapes. Since

OPLS models(Jorgensen, 1986), with N=2048 molecules, under ambient conditions.

<sup>∑</sup>*<sup>m</sup>* <sup>∑</sup>*<sup>k</sup> <sup>s</sup>*(*m*, *<sup>k</sup>*) (22)

topology of how the molecules associate into clusters. This probability is defined as:

detected by various experimental techniques(Guo et al., 2003; Ludwig, 2005).

from statistical mechanics of liquids.

2007; Zoranic et al., 2007).

neat liquids.

**3.1 Acohols**

Fig. 7. Cluster distribution probabilities. (Left) methanol, (center) ethanol, (right) tbutanol. Typical cluster shapes are shown in the insets.

methanol molecules form chains through O-O contacts, the methyl groups are disordered and the compactness of the liquid cannot be achieved if all molecules were forming chains. Conversely, TBA molecules are pyramid-like, hence they can form small micelles with the H-bond interactions grouped at the center while the three methyl groups stay outside. This structure can be compactly reproduced through the whole liquid, with defects that do not destroy the compactness of the liquid. These considerations can be confirmed through the analysis of the pair site-site distribution functions shown in Fig.8, together with the corresponding structure factors.

Fig. 8. Site-site distribution functions O-O(left) and M-M(right) for methanol, ethanol and tbutanol.

The O-O correlation functions between the oxygen sites have the same features previously observed for pure methanol in Fig(2): a sharp first peak due to H-bonding, followed by weak and damped oscillatory features. In contrast, the MM correlations look more structured, indicating the packing effects expected in a dense liquid. As stated in paragraph 2.2, the weak oscillations produce a pre-peak in the O-O structure factor shown in the Fig.(9) below, and which witnesses the presence of new emerged entities that are the clusters. It is seen that the pre-peak for TBA is of larger amplitude, which confirms that the corresponding clustering is very strong, as witnessed by the larger specific peak in P(n) for TBA.

Fig. 9. Structure factors corresponding to site-site correlation functions displayed in 8

#### **3.2 Amides**

Amides are an interesting class of small molecules, that are of biological and biochemical interest. Some amides such as formamide are fully H-bonding while others such as N-methyl-formamide (NMF) are weakly H-bonding, and dimethyl-formamide(DMF) is not bonding. We have studied formamide, nmethyl-formamide(NMF) and DMF. For the amides, we have used the Cordeiro force field(Cordeiro et al., 2006), with N=2048 and ambiant conditions as described in Refs.(Zoranic et al., 2007; 2009). The cluster distributions for the three amides are shown in Fig.10 below. It is seen that none of them shows a specific peak as that seen for alcohols. In all three cases, the n-mer probability is always higher than that of the (n+1)-mer, indicating a distribution of cluster similar to that found for Lennard-Jonesium, that is a straight exponential decay.

Fig. 10. Cluster distribution probabilities. (Left) formamide, (center) NMF, (right) DMF

The O-N site-site distributions (oxygen-nitrogen) for all three neat amides are shown in Fig.11(left panel). The sharp first peak for formamide and NMF indicate their respective H-bonding tendencies, and this peak is obviously absent for DMF. The structure factor of formamide and NMF (right panel) show a weak-prepeak feature somewhat similar to methanol (Fig.9).

Fig. 9. Structure factors corresponding to site-site correlation functions displayed in 8

Fig. 10. Cluster distribution probabilities. (Left) formamide, (center) NMF, (right) DMF

The O-N site-site distributions (oxygen-nitrogen) for all three neat amides are shown in Fig.11(left panel). The sharp first peak for formamide and NMF indicate their respective H-bonding tendencies, and this peak is obviously absent for DMF. The structure factor of formamide and NMF (right panel) show a weak-prepeak feature somewhat similar to

Amides are an interesting class of small molecules, that are of biological and biochemical interest. Some amides such as formamide are fully H-bonding while others such as N-methyl-formamide (NMF) are weakly H-bonding, and dimethyl-formamide(DMF) is not bonding. We have studied formamide, nmethyl-formamide(NMF) and DMF. For the amides, we have used the Cordeiro force field(Cordeiro et al., 2006), with N=2048 and ambiant conditions as described in Refs.(Zoranic et al., 2007; 2009). The cluster distributions for the three amides are shown in Fig.10 below. It is seen that none of them shows a specific peak as that seen for alcohols. In all three cases, the n-mer probability is always higher than that of the (n+1)-mer, indicating a distribution of cluster similar to that found for Lennard-Jonesium,

**3.2 Amides**

methanol (Fig.9).

that is a straight exponential decay.

Fig. 11. (Left) O-N (oxygen-nitrogen) site-site distribution functions for FA(blue) NMF(red) and DMF(green); (right) Structure factors corresponding correlations functions in the left panel. (right)

Perhaps the most intriguing feature found here is the absence of specific clustering in formamide which is a H-bonding liquid, in sharp contrast with alcohols. This may be due to the geometry of the molecule, which does not allow dense packing when clusters exits. This argument indicates that strong directional interaction is not enough, and that entropic effects due to packing can frustrate the local ordering. The constraints of packing effects and local ordering are therefore to be tuned in particular ways to give rise to specific clustering. Another typical example of such features is liquid water.

#### **3.3 Water: a very peculiar micro-structure**

We have studied clustering in SPC/E (Berendsen et al., 1987) and TIP4P (Jorgensen & Madura, 1985) water, which are very similar, and both widely used in several studies. Fig(12) shows, for SPC/E, that there is no specific cluster peak in the cluster distribution, hence no specific clusters in water. This is equally consistent with similar observations by other authors(Dougan et al., 2004).

Fig. 12. Cluster distribution probabilities for SPC/E water

A look at the O-O site-site function in Fig(13) does not show any apparent peculiarity in the correlations, aside the strong first peak which witnesses the H-bonding tendency.

Fig. 13. SPC/E water O-O (oxygen-oxygen) correlations (left) and corresponding structure factor (right) The arrows show the two peaks feature discussed in the text

The O-O structure factor shows a weak shoulder, that is not much of a pre-peak. Hence, pair correlation infomation is consistent with direct cluster calculations: no specific clustering. This is really surprising! Water is often thought to be the best example of a clustering liquid. In fact, a closer look at the long range part of *gww*(*r*) reveals a surprizing *absence* of correlations, a unique feature that have been only pointed out recently by one of us(Perera, 2011). It turns out that water has a very strong and very peculiar local order based on H-bonding, but it does not take the trivial cluster form. This is a much more abstract form of order, based on correlations rather than direct interactions and clusters such as those shown in the inset of Fig.7, for example. This peculiar form of order of liquid water is seen from the fact that the various site-site correlation functions decay abruptly to 1 beyond *Rc* ≈ 10Å, and is a very intriguing feature observed in all water models(Perera, 2011). It suggests that every water molecule is the center of specific correlations in the range of *Rc*, that vanish abruptly beyond. The corresponding entity, we have suggested to name it a "correlon"(Perera, 2011). Since every water molecule is the center of a correlon, it means that correlons are ideal between them and that they do not interact with each other. This is a particularly strange and novel concept, that would require further investigations, namely in how it would serve to explain the various anomalous properties of water. It is noteworthy that a simple two repulsive core model, which has no electrostatic nor H-bonding interactions, is able to reproduce the peculiar structure seen for real water(Perera et al., 2009). This emphasises that the correlon feature may be essentially the consequence of entropic effects related to local competing packing constraints.

#### **4. Micro-heterogeneity: the aqueous-alcohol mixtures**

Alcohol molecules are perfect amphiphiles, hence water-alcohol mixtures are akin to micro-emulsions (ME). However, these mixtures are never considered as such, probably because the rich manifestations of ME (micelles of various shapes) are absent from these lesser mixtures. Nevertheless, the basic ingredient found in micro-emulsions, which is the micro-immiscibility is common to both systems. It is only a matter of size of the oil and water domains: in aqueous alcohol mixtures these are of the size of the nanometer, while in ME they are more of the size of the micro-meter. We call the former mixtures **molecular emulsions**, in order to emphasize the analogy by keeping in mind the scale. We will use here this analogy

Fig. 13. SPC/E water O-O (oxygen-oxygen) correlations (left) and corresponding structure

The O-O structure factor shows a weak shoulder, that is not much of a pre-peak. Hence, pair correlation infomation is consistent with direct cluster calculations: no specific clustering. This is really surprising! Water is often thought to be the best example of a clustering liquid. In fact, a closer look at the long range part of *gww*(*r*) reveals a surprizing *absence* of correlations, a unique feature that have been only pointed out recently by one of us(Perera, 2011). It turns out that water has a very strong and very peculiar local order based on H-bonding, but it does not take the trivial cluster form. This is a much more abstract form of order, based on correlations rather than direct interactions and clusters such as those shown in the inset of Fig.7, for example. This peculiar form of order of liquid water is seen from the fact that the various site-site correlation functions decay abruptly to 1 beyond *Rc* ≈ 10Å, and is a very intriguing feature observed in all water models(Perera, 2011). It suggests that every water molecule is the center of specific correlations in the range of *Rc*, that vanish abruptly beyond. The corresponding entity, we have suggested to name it a "correlon"(Perera, 2011). Since every water molecule is the center of a correlon, it means that correlons are ideal between them and that they do not interact with each other. This is a particularly strange and novel concept, that would require further investigations, namely in how it would serve to explain the various anomalous properties of water. It is noteworthy that a simple two repulsive core model, which has no electrostatic nor H-bonding interactions, is able to reproduce the peculiar structure seen for real water(Perera et al., 2009). This emphasises that the correlon feature may be essentially

factor (right) The arrows show the two peaks feature discussed in the text

the consequence of entropic effects related to local competing packing constraints.

Alcohol molecules are perfect amphiphiles, hence water-alcohol mixtures are akin to micro-emulsions (ME). However, these mixtures are never considered as such, probably because the rich manifestations of ME (micelles of various shapes) are absent from these lesser mixtures. Nevertheless, the basic ingredient found in micro-emulsions, which is the micro-immiscibility is common to both systems. It is only a matter of size of the oil and water domains: in aqueous alcohol mixtures these are of the size of the nanometer, while in ME they are more of the size of the micro-meter. We call the former mixtures **molecular emulsions**, in order to emphasize the analogy by keeping in mind the scale. We will use here this analogy

**4. Micro-heterogeneity: the aqueous-alcohol mixtures**

to build a theoretical background of aqueous alcohol mixtures. The major problem is that for many force fields the border between demixing and micro-heterogeneous domain formation is an intrinsic problematic of the simulations. We saw an example of this for the acetone-water mixtures in Section 2.5. The only cure for this problem seems to use much larger system sizes, which is often too expensive to be routinely used. A suitable theoretical procedure that would circumvent this problem would be most welcome.

#### **4.1 The Teubner-Strey theoretical description of micro-emulsions**

In a landmark article(Teubner & Strey, 1987 ), Teubner and Strey described the generic form of the structure factor for micro-emulsions by starting from field theoretic Landau-deGennes Hamiltonian free-energy. Such fre-energy includes local variation of the order parameter, which is in fact the local density of the system. Since molecules sizes are of the order of the Angstrom, and that oil-water domains are of the size of the micro-meter, the five order of magnitude difference is a good justification to consider density variations as being mesoscopic, with molecular details entirely omitted. This description led them to produce the following structure factor

$$S(k) = \frac{A}{c\_0 + c\_2k^2 + a\_2k^4} \tag{23}$$

Based on the signs of the three coefficients, they were able to distinguish between 3 regimes. The first regime is governed by pure concentration fluctuations, where the structure factor has the well known Ornstein-Zernike form:

$$S(k) = \frac{A}{c\_0 + c\_2 k^2} \tag{24}$$

where a single correlation length emerges. It is noteworthy that this structure factor is entirely consistent with the asymptotic decay in Eq.(16). This structure factor can only develop a peak at k=0, when the correlation length increases, due to increase of fluctuations, as when nearing a phase transition. The second regime corresponds to well defined oil and water domains, which introduces a second lenght *d* as the size of the domains. In this regime, the structure facor can have a well defined pre-peak positioned at *kP* ≈ 2*π*/*d* . The k=0 value of the S(k) is generally smaller than the height of the pre-peak, which indicates that concentration fluctuations do not alter the domains. The last regime is the transition between the previous two, when the domain size is not well distinguished from size of the concentration fluctuations. In this regime, called the Lifshitz regime (Ciach and Godz, 2001), fluctuations form and destroy the domains. In a sense this is akin to a critical point, but where fluctuations can never diverge because domains stabilize their divergences. This seems curiously similar to what we have described in the case of the acetone water mixture. The main problem is that domains in micro-emulsions are 3 orders of magnitude larger than what we have in our aqueous-mixtures. Moreover, the molecular nature of the domains cannot be ignored at such scales. Hence, we cannot use the TS formulation in any straightforward manner by starting from field theoretic free-energy considerations. Nevertheless, one feels that a proper statistical theory of liquid should be able to encompass both micro and molecular emulsions.

#### **4.2 A statistical theory of molecular emulsions**

In a recent work on aqueous-TBA mixture (Kezic & Perera, 2011), two of us (Keži´c and Perera) have used an extension of the molecular OZ equation to arrive at an expression exactly similar to the TS expression. In order to do that, one starts from the molecular OZ equation, and writes a small-k expansion of it. The molecular OZ equation can be written in matrix form as (Fries & Patey, 1985 )

$$
\tilde{H}\_{\chi}(k) = \tilde{\mathcal{C}}\_{\chi}(k)[I - (-)^{\chi}\rho \tilde{\mathcal{C}}\_{\chi}(k)]^{-1} \tag{25}
$$

where *ρ* is the density, and the matrix indexed by *χ* contains for elements of the projections on a basis of rotational invariants of the pair and direct correlation functions in Fourier-Hankel space: *Amn* <sup>=</sup> {*A*˜*χ*(*k*)}*mn* <sup>=</sup> *<sup>a</sup>*˜*mnχ*(*k*), where *<sup>A</sup>* <sup>=</sup> *<sup>H</sup>*, *<sup>C</sup>* and *<sup>a</sup>*˜*mnχ*(*k*) is the Fourier transform of the correlation function *amnχ*(*r*) with *a* = *h*, *c*, the pair and direct correlation functions, respectively. The details of how the projections *amn<sup>χ</sup>* are related to the full correlation function *a*(1, 2) has been explained in several articles (Fries & Patey, 1985 ) and is now textbook knowledge(Hansen & McDonald, 2006). In particular, it is important to recall that the *a*˜*mnχ*(*k*) are in fact what is called a *<sup>χ</sup>*−transform involving projections *<sup>a</sup>*˜*mnl*(*k*) each of which are Fourier-Hankel transforms of order *l* of the functions *amnl*(*r*).

This matricial equation holds also exactly as above for a mixture with the appropriate indexing for handling the correlation functions of the mixture (Kusalik & Patey, 1988). The mathematics of the MOZ that concern the present development are really simple. In short, we wish to examine the behaviour at small-q of any ˜ *hij*;*mnχ*(*k*). After some algebraic manipulations(Kezic & Perera, 2011) one arrives at the following expression of the desired limit:

$$\tilde{\mu}\_{ij;mn\chi}(k) = \frac{\tilde{t}\_{ij;mn\chi}(k)}{1 - \tilde{\gamma}\_{\chi}(k)}\tag{26}$$

where the functions in the numerator and the denominator are related the correlation functions appearing in Eq.(25) The details of these calculations are not relevant to the argument, since the general form above is exactly equivalent to MOZ. Since the function *γχ*(*r*) involves only direct correlation functions, which are short ranged functions, one can expand this function around q=0, and one has

$$\tilde{h}\_{\text{ij:mn}\chi}(q \to 0) \approx \frac{\mathsf{f}\_{\text{ij:mn}\chi}(0)}{1 - \tilde{\gamma}\_{\chi\beta} - q^2 \tilde{\gamma}\_{\chi\bar{\chi}2} - q^4 \tilde{\gamma}\_{\chi\beta} - q^6 \tilde{\gamma}\_{\chi\beta\delta} - \dots} \tag{27}$$

where only even powers of q are retained because of the general symmetry of all correlation functions *a*(−*r*) = *a*(*r*). The small-q expansion involves expanding Fourier-Hankel transforms and it is not so simple to express each of the *γ*˜*χ*;*<sup>n</sup>* in terms of integrals of the *γχ*(*r*) function, as it would have been in case of a simple Fourier transform. But this is a minor detail irrelevant to the generality of the discussion here. The equation above retained to order *q*<sup>2</sup> alone will lead to the well known discussion in terms of the correlation length(Fisher, 1964). We briefly recall this argument here

$$\tilde{\Lambda}\_{ij;mn\chi}(q\to 0) \approx \frac{\tilde{\mathfrak{f}}\_{ij;mn\chi}(0)}{1 - \tilde{\gamma}\_{\chi 0} - q^2 \tilde{\gamma}\_{\chi 2}} = \frac{A\_{ij;mn\chi}}{\tilde{\xi}^{-2} + q^2} \tag{28}$$

In a recent work on aqueous-TBA mixture (Kezic & Perera, 2011), two of us (Keži´c and Perera) have used an extension of the molecular OZ equation to arrive at an expression exactly similar to the TS expression. In order to do that, one starts from the molecular OZ equation, and writes a small-k expansion of it. The molecular OZ equation can be written in matrix form as (Fries

where *ρ* is the density, and the matrix indexed by *χ* contains for elements of the projections on a basis of rotational invariants of the pair and direct correlation functions in Fourier-Hankel space: *Amn* <sup>=</sup> {*A*˜*χ*(*k*)}*mn* <sup>=</sup> *<sup>a</sup>*˜*mnχ*(*k*), where *<sup>A</sup>* <sup>=</sup> *<sup>H</sup>*, *<sup>C</sup>* and *<sup>a</sup>*˜*mnχ*(*k*) is the Fourier transform of the correlation function *amnχ*(*r*) with *a* = *h*, *c*, the pair and direct correlation functions, respectively. The details of how the projections *amn<sup>χ</sup>* are related to the full correlation function *a*(1, 2) has been explained in several articles (Fries & Patey, 1985 ) and is now textbook knowledge(Hansen & McDonald, 2006). In particular, it is important to recall that the *a*˜*mnχ*(*k*) are in fact what is called a *<sup>χ</sup>*−transform involving projections *<sup>a</sup>*˜*mnl*(*k*) each of which are

This matricial equation holds also exactly as above for a mixture with the appropriate indexing for handling the correlation functions of the mixture (Kusalik & Patey, 1988). The mathematics of the MOZ that concern the present development are really simple. In short, we wish to

*hij*;*mnχ*(*k*) = ˜*tij*;*mnχ*(*k*)

where the functions in the numerator and the denominator are related the correlation functions appearing in Eq.(25) The details of these calculations are not relevant to the argument, since the general form above is exactly equivalent to MOZ. Since the function *γχ*(*r*) involves only direct correlation functions, which are short ranged functions, one can expand

where only even powers of q are retained because of the general symmetry of all correlation functions *a*(−*r*) = *a*(*r*). The small-q expansion involves expanding Fourier-Hankel transforms and it is not so simple to express each of the *γ*˜*χ*;*<sup>n</sup>* in terms of integrals of the *γχ*(*r*) function, as it would have been in case of a simple Fourier transform. But this is a minor detail irrelevant to the generality of the discussion here. The equation above retained to order *q*<sup>2</sup> alone will lead to the well known discussion in terms of the correlation length(Fisher, 1964).

<sup>1</sup> − *<sup>γ</sup>*˜*χ*;0 − *<sup>q</sup>*2*γ*˜*χ*;2

*<sup>H</sup>*˜ *<sup>χ</sup>*(*k*) = *<sup>C</sup>*˜*χ*(*k*)[*<sup>I</sup>* <sup>−</sup> (−)*χρC*˜*χ*(*k*)]−<sup>1</sup> (25)

*hij*;*mnχ*(*k*). After some algebraic manipulations(Kezic

<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*˜*χ*;0 <sup>−</sup> *<sup>q</sup>*2*γ*˜*χ*;2 <sup>−</sup> *<sup>q</sup>*4*γ*˜*χ*;4 <sup>−</sup> *<sup>q</sup>*6*γ*˜*χ*;6 <sup>−</sup> ... (27)

<sup>=</sup> *Aij*;*mn<sup>χ</sup>*

*<sup>ξ</sup>*−<sup>2</sup> <sup>+</sup> *<sup>q</sup>*<sup>2</sup> (28)

<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*˜*χ*(*k*) (26)

**4.2 A statistical theory of molecular emulsions**

Fourier-Hankel transforms of order *l* of the functions *amnl*(*r*).

& Perera, 2011) one arrives at the following expression of the desired limit:

*hij*;*mnχ*(*<sup>q</sup>* <sup>→</sup> <sup>0</sup>) <sup>≈</sup> ˜*tij*;*mnχ*(0)

*hij*;*mnχ*(*<sup>q</sup>* <sup>→</sup> <sup>0</sup>) <sup>≈</sup> ˜*tij*;*mnχ*(0)

˜

examine the behaviour at small-q of any ˜

this function around q=0, and one has

˜

We briefly recall this argument here

˜

& Patey, 1985 )

with *ξ* the correlation lenght expressed as *ξ* = −*γ*˜*χ*;2/(1 − *γ*˜*χ*;0). This function is seen to be a Lorentzian, and its inverse Fourier transform is a Yukawa function:

$$\lim\_{r \to \infty} h\_{ij;mn\chi}(r) \approx A\_{ij;mn\chi} \frac{\exp(-r/\xi)}{r} \tag{29}$$

It is important to note that the correlation length *ξ* is the same for all projections and it is uniquely defined for the system.

The correlation length is sensitive to density fluctuations in a pure liquid or concentration fluctuations in a mixture. In particular, it is a very useful probe of the approach of any global phase transition regions, since fluctuations are enhanced in their vicinity, hence the correlation length tend to increase and diverge at the limit of stability of the phase -the so-called spinodal. But what happens when the system does not phase separate in global fashion and only micro-segregates, like in all alcohol water mixtures? In order to study such phenomenon, one should retain one more order in the q-expansion, which amounts to explore distances shorter than the domains of critical fluctuations which are of several tens of Angstroms. In principle, there should be no reason to stop the expansion at *q*<sup>4</sup> or *q*6. However, since the expansion at *q*<sup>4</sup> has been successfully used from micro-emulsions down to aqueous mixtures of relatively short chain alcohol molecules, such as diols, triols and others, there are good reasons to try the TS approximation first.

The molecular level TS equivalent of the MOZ equation reads

$$\tilde{h}\_{ij;mn\chi}^{(T\mathbb{S})}(k) = \frac{\tilde{t}\_{ij;mn\chi}(0)}{1 - \tilde{\gamma}\_{\chi\beta} - q^2 \tilde{\gamma}\_{\chi\chi} - q^4 \tilde{\gamma}\_{\chi\psi}} = \frac{\tilde{t}\_{ij;mn\chi}(0)}{a\_2 + q^2 c\_1 + q^4 c\_2} \tag{30}$$

where we have adopted the original TS notation(Teubner & Strey, 1987 ) in the denominator of the second equality, and where the new coefficients can be redefined in terms of the domain size *d* and the correlation length *ξ* as follows(Kezic & Perera, 2011):

$$a\_2 = 1 - \tilde{\gamma}\_{\chi 0} = (d^2 + \tilde{\xi}^2)^2 \tag{31}$$

$$\mathcal{L}\_1 = -\tilde{\gamma}\_{\chi \colon 2} = 2(d\tilde{\xi})^2 (d^2 - \tilde{\xi}^2) \tag{32}$$

$$c\_2 = -\tilde{\gamma}\_{\chi:4} = (\bar{d}\xi)^4 \tag{33}$$

where ¯*d* = *d*/2*π*. The definitions in the second equalities have been introduced(Kezic & Perera, 2011) such as the inverse Fourier transform of the TS function reads exactly:

$$h\_{ij;mn\chi}^{(TS)}(r) = \int d\vec{\eta} \, \exp(i\vec{\eta}.\vec{r}) \, \tilde{h}\_{ij;mn\chi}^{(TS)}(k) = \frac{\tilde{t}\_{ij;mn\chi}(0)}{\pi^2 (\vec{d}\xi)^3} \, \frac{\exp(-r/\xi)}{r} \sin(r/\delta) \tag{34}$$

One sees that the relations between the three TS coefficients(*a*2, *c*1, *c*3) and the combinations of length parameters ( ¯*d*, *ξ*) are stringent, since the moments of the *γχ*(*r*) functions need to have specific signs to match the algebraic forms and the positivity conditions. Since these functions cannot be obtained unless one calculates the direct correlation function expansion coefficients, this information is unavailable to us. The obtention of such coefficients needs the arsenal of liquid state theory, and reliable output from such theory for realistic molecular system is not well developped at present. Therefore, in our approach to strong microheterogeneous systems, we will test the TS functional form to correlation functions determined by computer simulations.

#### **4.3 Aqueous-tbutanol mixtures: an illustration of the molecular emulsion concept**

Tbutanol is a nearly spherical molecule, but it in fact very asymmetrical since it has a bulky group of 4 methylgroups and a strong polar OH head: it is a small amphiphile. Therefore, just like methanol or ethanol, it can be expected to form a molecular-emulsion, in analogy with micro-emulsion of larger alcohols such as 2butoxy-ethanol, for example (Koga, 2007). However, when dealing with computer simulations, aqueous methanol and aqueous ethanol do not have the same problems encountered for aqueous tbutanol. Indeed, the OPLS model of the two former alcohols can be reasonnably simulated with N=2048 system sizes (Kezic et al., 2011; Mijakovic et al., 2011; Perera et al., 2007) and special force fields designed to account for microheterogeneity (Weerasinghe & Smith, 2005) do not alter significantly the results. However, aqueous-tbutanol mixtures with OPLS tbutanol tend to produce very high KBI and consequently it is a good candidate for force field alterations(Lee & van der Vegt, 2005). From experimental point of view, it has also been noticed that tbutanol-water mixture appear as very micro-heterogeneous(Bowron et al., 1998).

Fig.14 below illustrates our considerations on the applicability of the TS structure factor for molecular emulsion, for the case of the water-tbutanol mixture for tbutanol mole fraction *x* = 0.2. This corresponds to the maximum of the experimental Kirkwood-Buff integrals (Perera et al., 2006) and therefore corresponds to a region of high concentration fluctuation where the microheterogeneity should be quite large. Indeed, Fig.15 shows a snapshot from our simulations of the N=2048 particle mixture of the SPC/E-OPLS models mixture. The micro-segregation is quite evident, inducing a clear partitioning of both species.

Fig. 14. Water-water correlations of the SPC/E water-OPLS TBA mixture (left) TS treatment (middle) corresponding RKBI and (right) structure factor (right)

The right panel of Fig(14)shows the uncorrected calculated water-water distribution function (in green), together with the version corrected for the TS effect (blue). The fitting function is shown in magenta. Note that a rescaling of the asymptote is required to set the proper asymptotical value to 1, which we do by applying the procedure in Eq(21). The correlation lenght corresponding to the TS fit is found to be *ξ* = 8Å, while the domain size is ¯*d* = 6.7Å. It is seen that both quantities are about the same order of magnitude, which correspond to strongly fluctuating domains, as shown in the snapshot of Fig(15). The TS-fit allows to extend

systems, we will test the TS functional form to correlation functions determined by computer

Tbutanol is a nearly spherical molecule, but it in fact very asymmetrical since it has a bulky group of 4 methylgroups and a strong polar OH head: it is a small amphiphile. Therefore, just like methanol or ethanol, it can be expected to form a molecular-emulsion, in analogy with micro-emulsion of larger alcohols such as 2butoxy-ethanol, for example (Koga, 2007). However, when dealing with computer simulations, aqueous methanol and aqueous ethanol do not have the same problems encountered for aqueous tbutanol. Indeed, the OPLS model of the two former alcohols can be reasonnably simulated with N=2048 system sizes (Kezic et al., 2011; Mijakovic et al., 2011; Perera et al., 2007) and special force fields designed to account for microheterogeneity (Weerasinghe & Smith, 2005) do not alter significantly the results. However, aqueous-tbutanol mixtures with OPLS tbutanol tend to produce very high KBI and consequently it is a good candidate for force field alterations(Lee & van der Vegt, 2005). From experimental point of view, it has also been noticed that tbutanol-water mixture

Fig.14 below illustrates our considerations on the applicability of the TS structure factor for molecular emulsion, for the case of the water-tbutanol mixture for tbutanol mole fraction *x* = 0.2. This corresponds to the maximum of the experimental Kirkwood-Buff integrals (Perera et al., 2006) and therefore corresponds to a region of high concentration fluctuation where the microheterogeneity should be quite large. Indeed, Fig.15 shows a snapshot from our simulations of the N=2048 particle mixture of the SPC/E-OPLS models mixture. The

Fig. 14. Water-water correlations of the SPC/E water-OPLS TBA mixture (left) TS treatment

The right panel of Fig(14)shows the uncorrected calculated water-water distribution function (in green), together with the version corrected for the TS effect (blue). The fitting function is shown in magenta. Note that a rescaling of the asymptote is required to set the proper asymptotical value to 1, which we do by applying the procedure in Eq(21). The correlation lenght corresponding to the TS fit is found to be *ξ* = 8Å, while the domain size is ¯*d* = 6.7Å. It is seen that both quantities are about the same order of magnitude, which correspond to strongly fluctuating domains, as shown in the snapshot of Fig(15). The TS-fit allows to extend

(middle) corresponding RKBI and (right) structure factor (right)

micro-segregation is quite evident, inducing a clear partitioning of both species.

**4.3 Aqueous-tbutanol mixtures: an illustration of the molecular emulsion concept**

appear as very micro-heterogeneous(Bowron et al., 1998).

simulations.

the correlations much beyond the initial box size of 50Åas seen in the middle panel. The middle panel shows the RKBI calculated from the various correlation functions shown in the left panel. It is obvious that, if the initial form should have been retained, it would lead to value of the KBI or the order of 2500, which correspond to the peak of the RBKI (as often used in various publications), and too large compared to the various experimental estimates from (Matteoli E. & Lepori L., 1984) shown in brown horizontal line and our own experimental value(Perera et al., 2006) shown in cyan line. The TS extension (blue curve) allows to bring the wrong initial tendency to the value very close to our experimental estimate. The green curve shows the catastroph due to not shifting the incorrect asymptote to 1. Finally, the right panel shows the water-water structure factor. It is obvious that the pre-peak at *<sup>k</sup>* <sup>≈</sup> 1.2Å−<sup>1</sup> is due to the oscillating feature brought by the TS fit. In the inset we plot the structure factor of the pure SPC/E water for comparison (red). It is seen that the major change is just the prepeak feature. In absence of the TS fit, the value at *k* = 0 of S(k) would be around 90, a factor 4 higher than expected from the experiments.

Fig. 15. Snapshot of the TBA/water mixture at TBA mole fraction x=0.2

The fitting form proposed in Eq(34) seems therefore to predict the asymptotical form of the correlations despite small size simulations. This theory predicts the proper effects of fluctuations and domain formation even at molecular scales. The next step is to produce such effects from theoretical considerations on the direct correlation function itself. For this, one needs to develop integral equations techniques beyond the current level of accuracy. Work along these directions have been started(Kezic and Perera, 2011) and is in progress.

### **5. Conclusion and perspectives**

In their well known texbook on liquids and mixtures(Rowlinson & Swinton, 1982), Rowlinson and Swinton open the paragraph on aqueous mixture of non-electrolytes by these words: "No one has yet proposed a quantitative theory of aqueous solutions of non-electrolytes, and such solutions will probably the last to be understood fully". Three decades later, can we say that we have reached an better understanding of these mixtures? The various examples of this chapter indicate that much dark spots remain behind these systems, even when looked from traditional points of view. We still do not know how to clearly explain the various kinks in the properties shown in Fig.1 We have seen that computer simulations are very often unable to give unambiguous interpretations of what exactly are the outcomes of the statistical analysis that are conducted. The statistical theories of liquids are still in infancy when facing some of the challenges concerning the nature of the correlations in these systems.

In view of such problems, we have proposed to concentrate these various problematics as arising from a single source: the existence of micro-heterogeneity as an intrinsic properties of these mixtures. Form such a viewpoint one can understand this property from two different directions: the fact that it arises from an underlying interacting molecular background, and how macroscopical properties themselves reflect the existence of this feature.

The two most important messages of this chapter are that, first, complex liquids are the seage of a new phenomenon of micro-structure or micro-heterogeneity, and second that computer simulations do not appear as very reliable statistical tools for studying such systems, for reasons intrinsic to the related physical phenomenon. As shown through the chapter, micro-heterogeneity is a physical phenomenon that is originally a spatial and temporal manifestation of simple specific interactions, but that occurs at a much larger scale, and which calls for a novator point of view. The system emerges new "entities" that are the siege of density fluctuations (neat liquids) or concentration fluctuations (mixtures). Instead of undergoing a full phase transition, the system becomes a mixture of these "entities' or "meta-particles", with the emphasis that fluctuations and domains may be interwoven in a way that needs further progresses to figure out how. This is precisely why micelle formation is not a true phase transition: fluctuations are not entirely responsible for the appearance of the micelle like they would if it was a true phase transition. Micellar systems are in fact simpler than the present problem, since each cluster looks like a small spherical particle, while micro-heterogeneity, in general, has no particular shape, as seen from various snapshots. This duality of the fluctuation versus the cluster is not properly handled by simulations, which often smooth out the fluctuations by undergoing a full phase transition for reasons intrinsic to the methodology itself: finite sizes and difficulty to handle stabilizing effects of the fluctuations, to name the two most important of them.

The issue of the competition between fluctuations and intrinsic local organisation is in fact more general than the topic of aqueous mixtures alone: it is about how "particles" could emerge from a discretised background, by the effect of direct interactions and statistical correlations and fluctuations. Our study shows how seemingly methodological problems are in fact hiding fundamental issues about the statistical description of liquids. Beyond this issue, it is the structure of matter itself that could be at stake. As stated in the Introduction, the emergence of "meta-particles" in a composite smaller particles assembly is a new way of looking at an old problem: how to distinguish an "object" from its background. We find it very remarkable that the one most common manifestation of the form of matter, namely liquids, should be the siege of such fundamental issues.

#### **6. References**

24 Will-be-set-by-IN-TECH

solutions will probably the last to be understood fully". Three decades later, can we say that we have reached an better understanding of these mixtures? The various examples of this chapter indicate that much dark spots remain behind these systems, even when looked from traditional points of view. We still do not know how to clearly explain the various kinks in the properties shown in Fig.1 We have seen that computer simulations are very often unable to give unambiguous interpretations of what exactly are the outcomes of the statistical analysis that are conducted. The statistical theories of liquids are still in infancy when facing some of

In view of such problems, we have proposed to concentrate these various problematics as arising from a single source: the existence of micro-heterogeneity as an intrinsic properties of these mixtures. Form such a viewpoint one can understand this property from two different directions: the fact that it arises from an underlying interacting molecular background, and

The two most important messages of this chapter are that, first, complex liquids are the seage of a new phenomenon of micro-structure or micro-heterogeneity, and second that computer simulations do not appear as very reliable statistical tools for studying such systems, for reasons intrinsic to the related physical phenomenon. As shown through the chapter, micro-heterogeneity is a physical phenomenon that is originally a spatial and temporal manifestation of simple specific interactions, but that occurs at a much larger scale, and which calls for a novator point of view. The system emerges new "entities" that are the siege of density fluctuations (neat liquids) or concentration fluctuations (mixtures). Instead of undergoing a full phase transition, the system becomes a mixture of these "entities' or "meta-particles", with the emphasis that fluctuations and domains may be interwoven in a way that needs further progresses to figure out how. This is precisely why micelle formation is not a true phase transition: fluctuations are not entirely responsible for the appearance of the micelle like they would if it was a true phase transition. Micellar systems are in fact simpler than the present problem, since each cluster looks like a small spherical particle, while micro-heterogeneity, in general, has no particular shape, as seen from various snapshots. This duality of the fluctuation versus the cluster is not properly handled by simulations, which often smooth out the fluctuations by undergoing a full phase transition for reasons intrinsic to the methodology itself: finite sizes and difficulty to handle stabilizing effects of the

The issue of the competition between fluctuations and intrinsic local organisation is in fact more general than the topic of aqueous mixtures alone: it is about how "particles" could emerge from a discretised background, by the effect of direct interactions and statistical correlations and fluctuations. Our study shows how seemingly methodological problems are in fact hiding fundamental issues about the statistical description of liquids. Beyond this issue, it is the structure of matter itself that could be at stake. As stated in the Introduction, the emergence of "meta-particles" in a composite smaller particles assembly is a new way of looking at an old problem: how to distinguish an "object" from its background. We find it very remarkable that the one most common manifestation of the form of matter, namely

the challenges concerning the nature of the correlations in these systems.

how macroscopical properties themselves reflect the existence of this feature.

fluctuations, to name the two most important of them.

liquids, should be the siege of such fundamental issues.


Frank F. and Ives D. J. G. (1966). The structural properties of alcohol-water mixtures. *Q. Rev.*

Guo J.-H. ; Y. Luo Y. ; A. Auggustsson A. ; Kashtanov S. ; Rubensson J.-E. ; Shuh

Hansen J.-P. & McDonald I. R.(2006). *Theory of simple liquids*, Academic Press, Elsevier,

Jorgensen W. L. (1986). Optimized Intermolecular Potential Functions for Liquid Alcohols. *J*

Jorgensen W. L. ; Briggs J. M. & Contreras M. L. (1990). Relative Partition Coefficients for

Jorgensen W. L. & Madura J. D.(1985). Temperature and Size Dependence for Monte

Kezic B.; M. Mijakovic M. ; ; Zoranic L. ; Sokolic F. ; Asenbaum A. ; Pruner C.; Wilhelm E.

Kezic B. and A. Perera (2011) Towards a more accurate RISM integral equation theory of

Kezic B. and A. Perera, Aqueous tert-butanol mixtures: a model for molecular-emulsions (in

Kirkwood J. G.& Buff F. P.(1951). The Statistical Mechanical Theory of Solutions. I *J Chem Phys*,

Koga Y. *Solution Thermodynamics and its application to aqueous solutions* Elsevier, Amsterdam

Koga Y., Siu W. W. Y & Wong T. Y. H (1990). Excess partial molar free energies and entropies in

Kusalik P. G and Patey G. N. On the molecular theory of aqueous-electrolytes solutions. 1. the

Lama R. F. & Lu B. C-Y (1965) Excess Thermodynamic Properties of Aqueous Alcohol Solutions. *J. Chem. Eng. Data* Vol 10(3) 216-219 DOI: 10.1021/je60026a003 Lebowitz J. L & Percus J. K,(1961) Long-Range Correlations in a Closed System

Lee M. E. & van der Vegt N. F. A. (2005). A new force field for atomistic simulations of

Ludwig R. (2005). Isotopic Quantum Effects in Liquid Methanol *ChemPhysChem* Vol6 (7)

McAllister R. A. (1960) The viscosity of liquid mixtures *A.I.Ch.E. Journal* Vol 6(3) 427-434 DOI:

aqueous tert-butyl alcohol solutions at 25.degree.C *J. Phys. Chem* Vol94(19) 7700-7706

solution of the RHNC approximation for models and finite concentration *J. Chem.*

with Applications to Nonuniform Fluids *Phys. Rev.* Vol122(6) 1675-1691. DOI:

aqueous tertiary butanol solutions *J Chem Phys*, Vol. 122, No.11 , (March, 2005)

*Phys Chem*, Vol 90 (7) , (1986) 1276-1284. DOI: 10.1021/j100398a015

D. K. ; Agren H.J. & Nordgren J.(2003). Molecular structure of alcohol-water mixtures *Phys. Rev. Lett.*, Vol. 91, No.15 , (October, 2003) 157401-157401-4 DOI:

Organic Solutes from Fluid Simulations *J Phys Chem*, Vol 94 (4), (1990) 1683-1686 DOI:

Carlo Simulations of TIP4P Water. *Mol Phys*, Vol.56, No. , (1985) 1381-1392 DOI:

& Perera A. (2011). The microscopic structure of the Ethanol-Water mixtures *J Chem*

*Chem. Soc.* Vol 20, 1-44. DOI: 10.1039/QR9662000001

ISBN-13:978-0-12-370535-8, Amsterdam, The Netherlands

10.1103/PhysRevLett.91.157401

10.1021/j100367a084

preparation)

10.1080/00268978500103111

*Phys*, (to be published 2011)

2007 ISBN: 978-0-444-53073-8

DOI: 10.1021/j100382a070

10.1103/PhysRev.122.1675

1369-1375 DOI: 10.1002/cphc.200400664

114509-114509-13

10.1002/aic.690060316

molecular liquids *J. Chem. Phys.* (2011, in prints)

Vol19(6) (1951) 774-777 DOI: 10.1063/1.1748352

*Phys.* Vol 88(12) 7715-7738 DOI: 10.1063/1.454286


## **Application of Molecular Dynamics Simulations to Plasma Etch Damage in Advanced Metal-Oxide-Semiconductor Field-Effect Transistors**

Koji Eriguchi *Graduate School of Engineering, Kyoto University Japan* 

### **1. Introduction**

28 Will-be-set-by-IN-TECH

220 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

X. G. Wen *Quantum Field Theory of Many-Body Systems* Oxford University Press 2004,

Wiggins P (2008) Life Depends upon Two Kinds of Water. *PLoS ONE* Vol 3(1): e1406.

Zoranic L. ; Sokolic F. & Perera A. (2007). Density and energy distribution in water and organic

Zoranic L. ; Sokolic F. & Perera A. (2007). Microstructure of neat alcohols: A molecular dynamics study *J Chem Phys*, Vol. 127, No. 2, (July, 2007) 024502-024502-10 Zoranic L. ;Redha M.; Sokolic F. & Perera A. (2007). On the microheterogeneity in neat and

Zoranic L. ;Redha M.; Sokolic F. & Perera A. (2009). Concentration fluctuations and

solvents: A molecular dynamics study *J Mol Liquids*, Vol 136 ( 3) (December, 2007)

aqueous amides: A molecular dynamics study *J Phys Chem C*, Vol 111 (43) (November,

microheterogeneity in aqueous amide mixtures *J Chem Phys*, Vol 130 (12) (March,

ISBN-10-0198530943

DOI:10.1371/journal.pone.0001406

199-205 DOI: 10.1016/j.molliq.2007.08.026

2007) 15586-15595 DOI: 10.1021/jp0736894

2009) 1124315-124315-12 DOI: 10.1063/1.3093071

According to "the international technology roadmap for semiconductors (ITRS)" (SIA, 2009), the shrinkage of silicon-based metal–oxide–semiconductor field-effect transistor (MOSFET) – an elemental device (unit) in ultra-large-scale integrated (ULSI) circuits – has been accelerating due to expanding demands for the higher performance and the lower power operation. The characteristic dimensions of current MOSFETs in mass productions are around 30 – 50 nm. Figure 1 shows the scaling trend of the key feature sizes in ULSI circuits predicted by Semiconductor Industry Association, USA. Various types of MOSFETs are designed for the specific purposes, i.e., low standby power (LSP), low operation power (LOP), and high performance (HP) operations, and built in ULSI circuits such as dynamic random access memory (DRAM) and micro-processing unit (MPU). New structured MOSFETs such as fully-depleted (FD) and metal-gate (MG) devices have been recently proposed. Since physical gate length (*L*g) and source / drain extension depth (Ext) are the key feature sizes determining MOSFET performance (Sze & Ng, 2007), the shrinkage of *L*g and Ext is a primal focus in the development of MOSFETs. These sizes have become a few nanometers, comparable to the scale of atomistic simulation domain.

To meet the requirements such as fabricating fine patterns with anisotropic features, plasma etching is widely used in mass production of MOSFETs. At present, the control of the pattern transfer by plasma etching needs to be within the variation of a few nanometers (SIA, 2009). In such regimes, the feature size of the region where plasma - etch reactions are occurring becomes no more negligible with respect to the scale of MOSFET. Thus, precise control of the surface reaction between plasma and device is strongly required. In plasma etch process, radicals (atoms or molecules in an excited state) react with surface material with a help of the energy of incident ions accelerated in the "sheath" between plasma and device surface. This reaction mechanism is commonly referred to as "reactive ion etching (RIE)" (Lieberman & Lichtenberg, 2005). In some plasma etch processes, an energy of the ion becomes larger than 1 keV to obtain high etch rate. In such schemes, an unexpected reaction may occur. This unexpected ("unwanted") reaction mechanism is usually called as "plasma process-induced damage" (Eriguchi & Ono, 2008; Lieberman & Lichtenberg, 2005; Oehrlein, 1989), which is bringing out many key problems in the development of MOSFETs.

Fig. 1. Scaling trend of feature sizes in a metal–oxide–semiconductor field-effect transistor (MOSFET) in an ultra-large-scale integrated circuit.

Plasma process-induced damage (PID) is one of the serious issues causing degradation of MOSFET performance and reliability. Figure 2 illustrates an example of PID during a typical plasma etch process. Si wafer is placed on a wafer stage in a plasma chamber. Reactive plasma is generated by power supply. This figure corresponds to an inductively coupled plasma (ICP) system (Lieberman & Lichtenberg, 2005), where powers with frequencies of *f*<sup>1</sup> and *f*2 are supplied to a plasma source (*f*1) and a wafer stage (*f*2), respectively. During plasma etching, MOSFET is exposed to a plasma and energetic ions impact on the surface. This energetic ion bombardment results in creation of defects (*ex.* displaced Si atom) in the Si surface region of MOSFET. This mechanism is one of examples of PID (Eriguchi & Ono, 2008). During more than the last two decades, PID has been studied extensively to understand the mechanisms and to solve practical problems with various approaches. In order to obtain the statistical data in mass production and to clarify the source of PID, a use of specifically designed devices called "test elementary group (TEG)" (SIA, 2009) has been a major approach. In addition to the use of TEG, physical and electrical analyses have been conducted by using a bare Si wafer to gain fundamental understanding of PID. To realize future high-performance MOSFETs, understanding and controlling (minimizing) PID is crucial because the critical dimension of reaction-layer thickness and device feature size will be in conflict with the plasma-damaged-layer thickness governed by plasma parameters. In other words, the damaged-layer thickness does not scale with device- shrinkage trends shown in Fig. 1.

Several simulation techniques have been proposed so far for plasma etch process. There are two major schemes: (1) Plasma-etch feature-profile simulations employing a small cell (Jin et al., 2002; Tsuda et al., 2011) and (2) Surface-reaction simulations based on a molecular dynamics (MD) (Graves & Humbird, 2002; Ohta & Hamaguchi, 2001a, 2001b; Sankaran & Kushner, 2004).

**Lg : Physical Gate Length Ext : Drain Extension Depth LSP : Low Standby Power LOP : Low Operating Power HP : High Performance FD : Fully Depleted MG : Metal Gate**

**Si Loss Ext**

**Lg**

**DRAM : Dynamic Random Access Memory MPU : Micro-Processing Unit**

**0.1**

shown in Fig. 1.

**2005 2010 2015 2020 2025**

**Year of Production**

(MOSFET) in an ultra-large-scale integrated circuit.

**Si Loss (DRAM)**

**Ext (FD) Ext (MG)**

**Si Loss (MPU)**

Fig. 1. Scaling trend of feature sizes in a metal–oxide–semiconductor field-effect transistor

Plasma process-induced damage (PID) is one of the serious issues causing degradation of MOSFET performance and reliability. Figure 2 illustrates an example of PID during a typical plasma etch process. Si wafer is placed on a wafer stage in a plasma chamber. Reactive plasma is generated by power supply. This figure corresponds to an inductively coupled plasma (ICP) system (Lieberman & Lichtenberg, 2005), where powers with frequencies of *f*<sup>1</sup> and *f*2 are supplied to a plasma source (*f*1) and a wafer stage (*f*2), respectively. During plasma etching, MOSFET is exposed to a plasma and energetic ions impact on the surface. This energetic ion bombardment results in creation of defects (*ex.* displaced Si atom) in the Si surface region of MOSFET. This mechanism is one of examples of PID (Eriguchi & Ono, 2008). During more than the last two decades, PID has been studied extensively to understand the mechanisms and to solve practical problems with various approaches. In order to obtain the statistical data in mass production and to clarify the source of PID, a use of specifically designed devices called "test elementary group (TEG)" (SIA, 2009) has been a major approach. In addition to the use of TEG, physical and electrical analyses have been conducted by using a bare Si wafer to gain fundamental understanding of PID. To realize future high-performance MOSFETs, understanding and controlling (minimizing) PID is crucial because the critical dimension of reaction-layer thickness and device feature size will be in conflict with the plasma-damaged-layer thickness governed by plasma parameters. In other words, the damaged-layer thickness does not scale with device- shrinkage trends

Several simulation techniques have been proposed so far for plasma etch process. There are two major schemes: (1) Plasma-etch feature-profile simulations employing a small cell (Jin et al., 2002; Tsuda et al., 2011) and (2) Surface-reaction simulations based on a molecular dynamics (MD) (Graves & Humbird, 2002; Ohta & Hamaguchi, 2001a, 2001b; Sankaran & Kushner, 2004).

**1**

**10**

**Feature Size (nm)**

**Lg (HP)**

**Ext (LOP)**

**100**

**Lg (LOP) Lg (LSP)**

Fig. 2. Illustrations of plasma process reactor and plasma etch damage.

Since the number of particles to be simulated may be quite large (>1010 cm-3) during plasma etching, classical MD simulations based on Newton's equation of motion (Graves & Humbird, 2002; Ohta & Hamaguchi, 2001a, 2001b) are now widely employed, compared to those using quantum mechanical calculations (Mazzarolo et al., 2001; Pelaz et al., 2009). Recently MD simulations have been used to understand formation of the surface "damaged" layer and displacement of Si atoms – PID (Graves & Humbird, 2002; Pelaz et al., 2009). However, the primary focus of these conventional MD simulations has been placed on the surface-reaction chemistry among ions, radicals, and the surface material. Since plasma etch processes are utilized for MOSFET fabrication, not only the surface reaction mechanism but also the effects of PID on MOSFET performance degradation should carefully be taken into account. Thus, MD simulations for PID should be incorporated with the prediction of electrical characteristics of MOSFETs.

There are two major challenging parts in the development of future MOSFET and plasma etch process: (1) A systematic and quantitative understanding of PID – the thickness of the damaged layer and the density of the displaced Si atoms (defects), and (2) A comprehensive design framework for future plasma by considering the effect of PID on electrical characteristics of MOSFETs. By keeping these issues in mind, this chapter discusses PID mechanisms by a classical MD simulation. We compare the simulation results with experimental data obtained by various analysis techniques. Future key issues concerning the effects of PID on MOSFET performance are provided. This chapter is organized as follows: In Sec. 2, we review the PID mechanism (Si loss mechanism – Si recess structure formation as mentioned later). In Sec. 3, the MD simulation employed in this study is briefly described. In Sec. 4, the simulation results are presented. In Sec. 5, experimental results are compared to the simulation results. Concluding remarks are in Sec. 5.

### **2. Ion-bombardment damage to MOSFET during plasma processing**

Figure 3 illustrates a PID mechanism induced by the ion bombardment on Si surface during an offset spacer (SIA, 2009) etching (one of manufacturing steps) for MOSFET. During plasma etching, an energetic ion impinges on the Si surface with an energy of *E*ion, leasing the energy by a series of collisions, then, it creates the defect sites under the exposed surface. This mechanism forms the damaged layer. In general, defect sites in Si substrate are referred to as displaced Si atoms, vacancies, and interstitials. As seen on the right in Fig. 3, the damaged structure consists of the surface (amorphous) and interfacial (a mixture of amorphous and crystalline structure) layers. Underneath these layers, there exist (latent) localized defect sites. (In this study, we denote these sites as "(local) defect sites".) The surface and interfacial layers can usually be monitored by an optical technique such as spectroscopic ellipsometry (SE) in production lines. The profile of defect site and the thickness of damaged-layer are determined by *E*ion as well as the potential between Si and the incident ion.

Fig. 3. Mechanism of plasma-etch damage to Si substrate and "Si loss" formation. Energetic ion bombardment creates a damaged layer underneath the Si surface. As shown on the right, localized defect sites are formed. A portion of the damaged layer with defect sites is stripped off during a subsequent wet-etch process, resulting in Si loss. (For details, see the text.)

In conventional MOSFET fabrication processes, a wet-etch step then follows the plasma etch process to remove the contaminated layer including the damaged layer. Since the damaged layer oxidizes due to an air exposure after the plasma etch, the portion is stripped off by the wet-etch. Then, the etched layer results in Si loss whose structure is observed as recessed Si surface, called "Si recess" (Ohchi et al., 2008; Petit-Etienne et al., 2010; Vitale & Smith, 2003). Si recess is formed in the source / drain extension (SDE) region in a MOSFET.

It has been reported (Eriguchi et al., 2009a; Eriguchi et al., 2008a) that the Si recess structure by PID degrades MOSFET performance, i.e., induces the shift of threshold voltage (*V*th) for MOSFET operation. Since *V*th (Sze, 1981; Sze & Ng, 2007) plays an important role in determining the performance, Si recess structure has become a primal problem in the present-day MOSFET development (SIA, 2009). To understand the formation of Si recess

Figure 3 illustrates a PID mechanism induced by the ion bombardment on Si surface during an offset spacer (SIA, 2009) etching (one of manufacturing steps) for MOSFET. During plasma etching, an energetic ion impinges on the Si surface with an energy of *E*ion, leasing the energy by a series of collisions, then, it creates the defect sites under the exposed surface. This mechanism forms the damaged layer. In general, defect sites in Si substrate are referred to as displaced Si atoms, vacancies, and interstitials. As seen on the right in Fig. 3, the damaged structure consists of the surface (amorphous) and interfacial (a mixture of amorphous and crystalline structure) layers. Underneath these layers, there exist (latent) localized defect sites. (In this study, we denote these sites as "(local) defect sites".) The surface and interfacial layers can usually be monitored by an optical technique such as spectroscopic ellipsometry (SE) in production lines. The profile of defect site and the thickness of damaged-layer are determined by *E*ion as well as the potential between Si and

**Stripped-off**

Fig. 3. Mechanism of plasma-etch damage to Si substrate and "Si loss" formation. Energetic ion bombardment creates a damaged layer underneath the Si surface. As shown on the right, localized defect sites are formed. A portion of the damaged layer with defect sites is stripped off during a subsequent wet-etch process, resulting in Si loss. (For details, see the

In conventional MOSFET fabrication processes, a wet-etch step then follows the plasma etch process to remove the contaminated layer including the damaged layer. Since the damaged layer oxidizes due to an air exposure after the plasma etch, the portion is stripped off by the wet-etch. Then, the etched layer results in Si loss whose structure is observed as recessed Si surface, called "Si recess" (Ohchi et al., 2008; Petit-Etienne et al., 2010; Vitale & Smith, 2003).

It has been reported (Eriguchi et al., 2009a; Eriguchi et al., 2008a) that the Si recess structure by PID degrades MOSFET performance, i.e., induces the shift of threshold voltage (*V*th) for MOSFET operation. Since *V*th (Sze, 1981; Sze & Ng, 2007) plays an important role in determining the performance, Si recess structure has become a primal problem in the present-day MOSFET development (SIA, 2009). To understand the formation of Si recess

Si recess is formed in the source / drain extension (SDE) region in a MOSFET.

**: Si**

**displaced Si**

**vacancy**

**"Defect" MD simulation**

**: Si**

**interstitial**

**Defect**

**2. Ion-bombardment damage to MOSFET during plasma processing** 

the incident ion.

**Plasma**

**Gate** *E***ion**

**Gate**

**Wet-etched ("Si loss")**

**Si sub.**

text.)

structure (PID), the damage creation mechanism should be clarified from both theoretical

and experimental viewpoints. Moreover, to predict the effects of PID on the MOSFET electrical characteristics, the defect structures should be identified quantitatively with high accuracy. In this study, we performed a classical molecular dynamics simulation as well as quantitative analyses of the local defect site density. Then, we discuss the effects of PID on the electrical characteristics of MOSFET.

### **3. Molecular dynamics simulation for plasma etching**

Regarding to classical MD simulations for plasma etch process, many papers have been focusing on the surface reactions to understand details of silicon and silicon dioxide etch characteristics by energetic halogen (Abrams & Graves, 2000; Hanson et al., 1997; Humbird & Graves, 2004; Nagaoka et al., 2009; Ohta & Hamaguchi, 2001b) and fluorocarbon (Abrams & Graves, 1999) ions. The primal focuses are placed on estimation of etch yield by incident ions and the selectivity for RIE system. Regarding the ion bombardment damage, Graves and Humbird (Graves & Humbird, 2002) have reported in detail the formation of damaged layer in crystalline Si structures by Ar ion impacts. They estimated a stopping range of ions as well as a thickness of the amorphous (amorphized) layer formed near the surface. However, the detail mechanism of local defect site formation was not discussed.

As mentioned in Sec. 1, an RIE system includes the physical and chemical reactions triggered by 10 – 103 eV high-energy ion impacts. Although *ab initio* MD simulations are now available, they cannot be applied in practice. This is because more than 103 atoms are necessary to construct a solid surface and the total number of incident ions is more than 1010 cm-2, resulting in more than 103 impacts on the surface with the area of 10 nm2 (~ commonly simulated size). Therefore, at present, the only possible candidate for atomistic RIE simulations is a classical MD, in particular, with pre-constructed interatomic potential functions.

One of the commonly used interatomic potential functions for Si-containing systems is the one proposed by Stillinger and Weber (SW) (Stillinger & Weber, 1985) wherein the total potential energy consists of two- and three-body functions. The SW potential function was originally designed for Si/F systems. Afterwards various potential sets for Si/Cl (Ohta & Hamaguchi, 2001b), Si/O (Watanabe et al., 1999), Si/O/F (Ohta & Hamaguchi, 2001a), Si/O/Cl (Ohta & Hamaguchi, 2001a), and Si/O/C/F (Ohta & Hamaguchi, 2004; Smirnov et al., 2007) systems were provided. The other widely used function was proposed by Tersoff (Tersoff, 1988a, 1988b) with bond-order parameters including multi-body interactions. This potential can be effectively applied to C-containing systems to understand the complicated behaviours by the strengths of double and triple bonds. The parameter sets were proposed for systems including Si, C, Si/C, C/H, C/F, and Si/C/F (Abrams & Graves, 1998; Tanaka et al., 2000). In addition to the SW and Tersoff potential functions, other potential functions (Biswas & Hamann, 1987; Dodson, 1987; Hanson et al., 1997) were also proposed. Although there have been many discussions on the validity of the functions (Balamane et al., 1992), all of these functions can effectively reproduce some structural and thermodynamic characteristics of the materials and the relevant structural chemistry for some selected molecules. In this study, to eliminate complicated surface reactions usually occurring in halogen-containing plasmas, we focus on Ar-Si-O system for studying PID. We used the SW potential function for the Si-Si and Si-O systems.

#### **3.1 Interatomic potential functions used in this study**

The Stillinger-Weber potential function (Stillinger & Weber, 1985) utilizes both two-body and three-body interaction terms to stabilize the diamond cubic structure of crystalline silicon. The potential function is given by

$$\Phi = \sum\_{i$$

where *V*2(*i*, *j*) is the two-body interaction term between *i*-th and *j*-th atoms expressed as

$$V\_2(i,j) = A\_{ij} \cdot g\_{ij} (B\_{ij} \cdot r\_{ij}^{-p\_{\bar{\eta}}} - r\_{ij}^{-q\_{\bar{\eta}}}) \cdot \exp\left(\frac{\mathbf{C}\_{ij}}{r\_{\bar{\eta}} - a\_{\bar{\eta}}}\right),\tag{2}$$

if *rij* < *aij*, where *aij* is the cut-off distance, and *V*2(*i*, *j*) = 0 otherwise. *rij* is the interatomic distance between *i*-th and *j*-th atoms in the SW's length unit (0.20951 nm). The parameters *Aij*, *Bij*, *Cij*, *pij*, and *aij* depend only on the species of *i*-th and *j*-th atoms. *gij* (< 1) is the bondsoftening function introduced by Watanabe et al. (Watanabe et al., 1999), adjusting the contribution of the two-body term to reproduce the cohesive energies of Si-O bonds. The three-body interaction term has the following form.

$$V\_{\mathfrak{J}}(i, j, k) = h(r\_{\vec{l}j}, r\_{\vec{l}k}, \theta\_{j\vec{l}k}) + h(r\_{\vec{j}i}, r\_{\vec{j}k}, \theta\_{\vec{l}\vec{k}}) + h(r\_{\vec{k}i}, r\_{\vec{k}j}, \theta\_{\vec{l}\vec{k}j}) \, , \tag{3}$$

where *ijk* is the angle between two lengths of *rij* and *rjk*, etc. Given that *rij* and *rik* are less than the cut-off distance, the function *h* is

$$h(r\_{lj}, r\_{ik}, \theta\_{jik}) = \lambda\_{jik} \exp\left(\frac{r\_{jik}^{ij}}{r\_{lj} - a\_{jik}^{ij}} + \frac{r\_{jik}^{ik}}{r\_{ik} - a\_{jik}^{ik}}\right) \times \left(\cos\theta\_{jik} - \cos\theta\_{jik}^{0}\right)^2,\tag{4}$$

otherwise *h* = 0. ,,,,, *ij ij ik ik jik jik jik jik jik jik a a* , and <sup>0</sup> *jik* are parameters for the *j-i-k* triplet. *ij jik a* and *ik jik a* are the cut-off distances in the three-body configuration. For the "ideal" tetrahedral angle,

$$\cos \theta\_{jik}^0 = -\frac{1}{3} \tag{5}$$

is held.

Regarding the two-body system with Ar, the Moliere-type repulsive pair potential function (Moliere, 1947; Torrens, 1972; Wilson et al., 1977) was employed. The potential function *V*2(*i*,*j*) includes a screening function *f*s(*rij*) combined with a Coulomb potential. The function is expressed as

$$V\_2(i,j) = \frac{Z\_i Z\_j e^2}{4\pi\varepsilon\_0 r\_{ij}} f\_s(r\_{ij}) \, , \tag{6}$$

where *e* is the elementary charge, 0 is the permittivity of a vacuum, and *Zi* and *Zj* are the atomic numbers of the projectile (Ar) and target (Si, O, Ar) atoms, respectively. The expression of the screening function has been studied by many researchers (Torrens, 1972). So far, the repulsive interatomic potential between Ar and other elements has been modified (Wilson et al., 1977). In the expression of Moliere potential function, *f*s(*rij*) is described as

$$f\_s(r\_{\overline{\eta}}) = \sum\_{m=1}^{3} \mathbb{C}\_m \exp\left(-b\_m r\_{\overline{\eta}} \,/\, a'\right) = 0.35 \exp\left(-\frac{0.3 r\_{\overline{\eta}}}{a'}\right) + 0.55 \exp\left(-\frac{1.2 r\_{\overline{\eta}}}{a'}\right) + 0.10 \exp\left(-\frac{6.0 r\_{\overline{\eta}}}{a'}\right) \tag{7}$$

where *a*' is the Firsov screening length (Firsov, 1957). The parameters proposed by Moliere (Moliere, 1947; Wilson et al., 1977) were used in this study.

#### **3.2 Simulation procedure**

226 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

The Stillinger-Weber potential function (Stillinger & Weber, 1985) utilizes both two-body and three-body interaction terms to stabilize the diamond cubic structure of crystalline

*ij ijk*

where *V*2(*i*, *j*) is the two-body interaction term between *i*-th and *j*-th atoms expressed as

2(, ) ( ) exp *ij ij p q ij ij ij ij ij ij*

if *rij* < *aij*, where *aij* is the cut-off distance, and *V*2(*i*, *j*) = 0 otherwise. *rij* is the interatomic distance between *i*-th and *j*-th atoms in the SW's length unit (0.20951 nm). The parameters *Aij*, *Bij*, *Cij*, *pij*, and *aij* depend only on the species of *i*-th and *j*-th atoms. *gij* (< 1) is the bondsoftening function introduced by Watanabe et al. (Watanabe et al., 1999), adjusting the contribution of the two-body term to reproduce the cohesive energies of Si-O bonds. The

( , , ) ( , , ) ( , , ) ( , , ) <sup>3</sup> *ij ik jik ji jk ijk ki kj ikj V i j k h r r*

<sup>0</sup> <sup>2</sup> ( , , ) exp (cos cos ) *ik jik jik*

*jik a* are the cut-off distances in the three-body configuration. For the "ideal" tetrahedral

<sup>0</sup> 1

3

2

*ij*

*ij*

*ij ij jik*

cos

Regarding the two-body system with Ar, the Moliere-type repulsive pair potential function (Moliere, 1947; Torrens, 1972; Wilson et al., 1977) was employed. The potential function *V*2(*i*,*j*) includes a screening function *f*s(*rij*) combined with a Coulomb potential. The function

> 2 s 0 (, ) ( ) <sup>4</sup> *i j*

*ZZe V i j f r* 

*ij ik jik jik <sup>r</sup> <sup>a</sup> <sup>r</sup> <sup>a</sup> <sup>h</sup> <sup>r</sup> <sup>r</sup>*

 

*a a* , and <sup>0</sup>

*jik jik jik jik jik jik*

*ij jik*

 *ijk* is the angle between two lengths of *rij* and *rjk*, etc. Given that *rij* and *rik* are less

*ik jik*

 

, (4)

*ik jik*

*h r r*

*jik* are parameters for the *j-i-k* triplet. *ij*

*jik* (5)

*<sup>r</sup>* , (6)

*h r r*

*V ij A g B r r*

2 3 (, ) (, , )

(1)

*ij ij C*

, (2)

, (3)

*jik a* and

*r a*

*V ij V ijk*

**3.1 Interatomic potential functions used in this study** 

three-body interaction term has the following form.

than the cut-off distance, the function *h* is

otherwise *h* = 0. ,,,,, *ij ij ik ik*

where

*ik*

angle,

is held.

is expressed as

silicon. The potential function is given by

We prepared a crystalline Si structure of squared Si (100) surface with a side length of 3.258 nm (six times of lattice constant ~ a squared 6-unit cell). The MD code used in this study was originally developed by Ohta and Hamaguchi (Ohta & Hamaguchi, 2001b). Each layer contained 72 atoms (= 1 monolayer (ML)). The bottom layer of the simulation cell (72 atoms) was rigidly fixed throughout the simulations. Initial depth of the simulation domain is 9 unit cells (nine times of lattice constant or 36 MLs). Periodic boundary condition was employed along the horizontal direction. Since a typical MOSFET structure has an SiO2 layer on the Si substrate, the initial Si structure (6 6 9 cell) was "oxidized" before the Ar ionbombardment. The oxidation was done by the 500-consecutive impacts of O atoms at 50 eV and followed by a subsequent cooling step with a set-point temperature of 400 K (See below). This step can create a surface oxidized layer corresponding to SiO2 film formed on a source / extension region of MOSFET. Using the obtained SiO2/Si structure, we injected Ar atoms with various incident energies. Ar atoms were injected from randomly selected horizontal locations above the surface of the cell at normal incidence. (In this study, we injected Ar ions with a constant *E*ion, although, in practical plasma etching processes, the energy of ions obeys an ion energy distribution function (IEDF) dependent on a frequency of applied bias power (Lieberman & Lichtenberg, 2005) as illustrated in Fig. 2. However, a recent study (Eriguchi et al., 2010) showed that, in low applied bias voltages (< 500 eV), the average ion energy can be used as a primal measure for the damaged-layer thickness. Therefore, in the present MD simulation, we employed a mono-energetic ion impacts.) Note that, in plasma etching, an ion plays an important role in the reactions. However, in conventional MD simulations, charge-neutral atoms are used as incident particles. This is based on the assumption that incident ions are expected to be neutralized near the target surface due to a resonance or Auger process. Thus, in this study, we employed the above potential functions for charge-neutral Ar atom.

In the present-day RIE, plasma densities of the order of 109-1011 cm-3 are widely used. These densities lead to a (Bohm) flux of incident ions of 1013 – 1016 cm-2s-1, depending on the plasma density and the electron temperature (Lieberman & Lichtenberg, 2005). The interval between two successive ion impacts in the case of the present simulation domain (~ 10 nm2) is much longer than the simulation time range. Therefore, each ion impact is thought to be an independent single event. To simulate such single events, for the first 0.7 ps after an energetic particle hits the surface, the motion of all particles in the domain are solved numerically by a classical mechanics except for those in the rigidly fixed bottom layer. Then, we applied "artificial" cooling step to all the particles for 0.3 ps using Berendsen's heat removal scheme (Berendsen et al., 1984) with a cooling constant of 2.0 x 10-14 s-1. The set-point temperature of the simulation cell was 400 K to reproduce the practical surface-temperature range during plasma etching. After the end of cooling step, a new energetic particle was directed again to the surface, and the whole simulation cycle was repeated. Details of the present MD simulation procedure were published elsewhere (Ohta & Hamaguchi, 2001b).

To evaluate the damage creation mechanisms, the defect structure formed by the ion impacts should be identified. Since comprehensive discussions on the defect structures may be beyond the scope of this article, we focus on the displaced Si atoms from the initial lattice site and the representative defect structures obtained by the simulations. To assign the displaced atoms, the Lindemann radius (*r*L) was used as a measure of the displacement threshold (Hensel & Urbassek, 1998; Nordlund et al., 1998). The Lindemann radius is defined as the vibration amplitude of Si atoms at their melting point. For the SW potential function for Si, the radius is *r*L = 0.45 Å. After the MD simulations, we inspected the displacements of all the Si atoms. Then, we identified all those atoms as defects if they were outside of the cubic cell (with an edge of 2 *r*L) whose center was located at an original lattice site. The number of the displaced atoms was counted to investigate an overall trend of PID in the course of successive ion impacts.

### **3.3 Simulation results and discussion**

Figures 4 display typical damaged-layer/Si structures after 500 impacts by Ar ions with various energies. In these figures, Ar ions are omitted from the structures to clearly show the damaged layer and Si substrate. From other simulation results, we found that Ar atoms are usually present in the tetrahedral interstitial site underneath the interfacial layer. As seen in Fig. 4, the damaged layer thickness increases with an increase in *E*ion. The heavily damaged surface layer and mixing layer with a rough interface (interfacial layer) are observed, in particular, for higher *E*ion cases (≥ 200 eV). The surface layer has been usually identified as "an amorphous layer" in the view of simulation as discussed in previous literatures (Graves & Humbird, 2002; Oehrlein, 1989). However, when samples (wafers) are processed to the next manufacturing step, oxidation of the surface occurs due to an air exposure. As mentioned in the next section, the partially oxidized layer is detected by an optical technique as SiO2 layer, and the residual damaged layer is evaluated as a mixing layer. In the following discussion, based on the experimental data, we define the surface amorphous layer and the rough interfacial region (including local defect sites) as the surface and interfacial layers, respectively. One should also pay careful attention to the local defect sites underneath the rough interface as highlighted in Fig. 4.

Regarding the local defect sites observed in Fig. 4, many respective structures have been proposed so far (Baraff et al., 1980; Batra et al., 1987; Cheng & Corbett, 1974; Estreicher et al., 1997; Hastings et al., 1997; Leung et al., 1999; Schober, 1989; Tang et al., 1997). In addition to vacancy, various interstitials have been studied extensively by *ab initio* calculations using clustered Si atoms. Tetrahedral and hexagonal interstitials as well as "dumbbell" and the bond-centered interstitial are commonly proposed structures (Batra et al., 1987; Leung et al., 1999). Figure 5 shows the profile of an increase in the number of Si atoms located in each Application of Molecular Dynamics Simulations to Plasma Etch Damage in Advanced Metal-Oxide-Semiconductor Field-Effect Transistors 229

228 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

solved numerically by a classical mechanics except for those in the rigidly fixed bottom layer. Then, we applied "artificial" cooling step to all the particles for 0.3 ps using Berendsen's heat removal scheme (Berendsen et al., 1984) with a cooling constant of 2.0 x 10-14 s-1. The set-point temperature of the simulation cell was 400 K to reproduce the practical surface-temperature range during plasma etching. After the end of cooling step, a new energetic particle was directed again to the surface, and the whole simulation cycle was repeated. Details of the present MD simulation procedure were published elsewhere

To evaluate the damage creation mechanisms, the defect structure formed by the ion impacts should be identified. Since comprehensive discussions on the defect structures may be beyond the scope of this article, we focus on the displaced Si atoms from the initial lattice site and the representative defect structures obtained by the simulations. To assign the displaced atoms, the Lindemann radius (*r*L) was used as a measure of the displacement threshold (Hensel & Urbassek, 1998; Nordlund et al., 1998). The Lindemann radius is defined as the vibration amplitude of Si atoms at their melting point. For the SW potential function for Si, the radius is *r*L = 0.45 Å. After the MD simulations, we inspected the displacements of all the Si atoms. Then, we identified all those atoms as defects if they were outside of the cubic cell (with an edge of 2 *r*L) whose center was located at an original lattice site. The number of the displaced atoms was counted to investigate an overall trend

Figures 4 display typical damaged-layer/Si structures after 500 impacts by Ar ions with various energies. In these figures, Ar ions are omitted from the structures to clearly show the damaged layer and Si substrate. From other simulation results, we found that Ar atoms are usually present in the tetrahedral interstitial site underneath the interfacial layer. As seen in Fig. 4, the damaged layer thickness increases with an increase in *E*ion. The heavily damaged surface layer and mixing layer with a rough interface (interfacial layer) are observed, in particular, for higher *E*ion cases (≥ 200 eV). The surface layer has been usually identified as "an amorphous layer" in the view of simulation as discussed in previous literatures (Graves & Humbird, 2002; Oehrlein, 1989). However, when samples (wafers) are processed to the next manufacturing step, oxidation of the surface occurs due to an air exposure. As mentioned in the next section, the partially oxidized layer is detected by an optical technique as SiO2 layer, and the residual damaged layer is evaluated as a mixing layer. In the following discussion, based on the experimental data, we define the surface amorphous layer and the rough interfacial region (including local defect sites) as the surface and interfacial layers, respectively. One should also pay careful attention to the local defect

Regarding the local defect sites observed in Fig. 4, many respective structures have been proposed so far (Baraff et al., 1980; Batra et al., 1987; Cheng & Corbett, 1974; Estreicher et al., 1997; Hastings et al., 1997; Leung et al., 1999; Schober, 1989; Tang et al., 1997). In addition to vacancy, various interstitials have been studied extensively by *ab initio* calculations using clustered Si atoms. Tetrahedral and hexagonal interstitials as well as "dumbbell" and the bond-centered interstitial are commonly proposed structures (Batra et al., 1987; Leung et al., 1999). Figure 5 shows the profile of an increase in the number of Si atoms located in each

(Ohta & Hamaguchi, 2001b).

of PID in the course of successive ion impacts.

sites underneath the rough interface as highlighted in Fig. 4.

**3.3 Simulation results and discussion** 

Fig. 4. SiO2/Si structures after 500 impacts of Ar with different incident energies. The injected Ar ions are omitted from the cells for easy comparison. Local defect sites are highlighted.

Fig. 5. Profiles of an increase in Si atoms in each atomic plane along [100].

atomic plane along the depth of [100] direction (The space between the planes is 0.13575 nm). 72-Si atoms are originally located in each plane for the present cell structure (6 × 6). In this figure, only Si atoms were inspected and counted. The difference in the interstitial structures mentioned in the above was not considered. The positive value presumably implies the presence of interstitials and the negative, that of vacancy. This means that the local defect sites are consisting of vacancies as well as Si interstitials. (Note that one can see cumulatively the net positive value when integrating the data from the bottom. This implies that the Si interstitials are more probable than vacancies in those cases.) The interstitial Si atoms were formed by knock-on process.

By keeping in mind the results in Fig. 5, we investigated in detail some of typical defects in Fig. 4 to clarify the structures of these defects formed by PID. Figures 6 present some of typical structures – we chose three representative structures. On the left are shown the views of the defects on the (100) plane (along [100]). In the middle, the bird's views to the respective defects (Si atoms) are shown. Interstitial Si atoms are highlighted in the views. From Fig. 4, we roughly categorize the defect structures (the interstitial atoms) as Type-A, Type-B, and Type-C. Type-A is like a tetrahedral interstitial, and Type-B, a hexagonal interstitial (Batra et al., 1987; Leung et al., 1999; Schober, 1989). In both cases, neighbouring

Fig. 6. Typical structures of Si-interstitial defects created by energetic ion bombardments. On the left are the views along [100] and the bird's-view are shown in the middle. On the right, typical structures assigned by previous reports are shown. (See also Fig. 4.)

local defect sites are consisting of vacancies as well as Si interstitials. (Note that one can see cumulatively the net positive value when integrating the data from the bottom. This implies that the Si interstitials are more probable than vacancies in those cases.) The interstitial Si

By keeping in mind the results in Fig. 5, we investigated in detail some of typical defects in Fig. 4 to clarify the structures of these defects formed by PID. Figures 6 present some of typical structures – we chose three representative structures. On the left are shown the views of the defects on the (100) plane (along [100]). In the middle, the bird's views to the respective defects (Si atoms) are shown. Interstitial Si atoms are highlighted in the views. From Fig. 4, we roughly categorize the defect structures (the interstitial atoms) as Type-A, Type-B, and Type-C. Type-A is like a tetrahedral interstitial, and Type-B, a hexagonal interstitial (Batra et al., 1987; Leung et al., 1999; Schober, 1989). In both cases, neighbouring

**Tetrahedral**

**Hexagonal**

**Dumbbell**

atoms were formed by knock-on process.

**[100]**

**[100]**

**[100]**

**Type-A**

**Type-B**

**Type-C**

Fig. 6. Typical structures of Si-interstitial defects created by energetic ion bombardments. On the left are the views along [100] and the bird's-view are shown in the middle. On the right,

typical structures assigned by previous reports are shown. (See also Fig. 4.)

Si atoms are not displaced considerably. Type-C is like a "dumbbell" structure where bonded Si atom was displaced from the lattice sites due to the presence of an interstitial Si permeating from other regions. It is widely believed that these structures are probable and found to be stable in terms of the formation energies calculated from a quantum mechanics scheme (Colombo, 2002; Leung et al., 1999; Tang et al., 1997). Typical structures reported so far are also illustrated on the right of Fig. 5. For details including other structures, please see the literatures (Batra et al., 1987; Cheng & Corbett, 1974; Leung et al., 1999). In terms of the effect of the presence of these structures on electrical characteristics of crystalline Si structure, it is also believed that these defects can create additional energy levels in the band gap ("band-gap states") (Estreicher et al., 1997; Hastings et al., 1997; Schultz, 2006). Therefore, one can speculate that the local defects by PID degrade MOSFET performance because the band-gap state plays a role as a carrier trap site inducing an increase in leakage current due to hopping conduction (Koyama et al., 1997) and/or an increase in a channel series-resistance (Eriguchi et al., 2009b) due to coulomb scattering by the trapped carrier. In general, it is difficult to identify those local defects by conventional analysis techniques such as TEM observation, in particular, to quantify the density of these defects. In the next section, by using novel analysis techniques, we quantify the density of the defects and discuss the electrical characteristics of these defects.

Since the observed defect structures are indeed derived from our MD simulations, we have to pay careful attention to the effects of the simulation procedure on our findings. In the present case, the number of defect sites (~ the density) might be dependent on both the simulation cell size and the number of Ar impacts. Conventional MD simulations for plasma etch processes employ; (1) periodic boundary condition and (2) the rigid fixed Si atoms at the bottom plane of the cell. Both technical restrictions may make the defect generation mechanism being dependent on the cell-size. It was reported (Abrams & Graves, 2000; Graves & Humbird, 2002) that the formation of amorphous layer is time-dependent, i.e., the number of ion impacts determines whether the damaged-layer formation process is in a growth or a saturation phases. Previous reports showed that the thickness of amorphous layer by Ar ion impacts became in a steady state after 1.5-monolayer (ML) impact (Graves & Humbird, 2002) and also that the reaction layer thickness by F ion impacts, after 10 MLs (Abrams & Graves, 2000).

Figure 7 shows at first the cell-size dependence of Si ion penetration depth. Si was selfimplanted repeatedly with the energy of 150 eV into the initial simulation cells of various sizes. Due to a stopping process (Lindhard et al., 1963; Wilson et al., 1977), the injected Si atom loses the energy by a series of collisions and finally comes to rest in Si substrate. After 1000 impacts of Si atoms, the profiles of penetration depth ("projection range") were determined and compared for various cell sizes. As shown, one can see a small difference among the peak positions, i.e., the ion penetration depth is almost independent on the cell size. The result may suggest that one can use a smaller cell-size for plasma-etch MD simulation to reduce a "simulation cost" for estimating the ion penetration depth. However, as mentioned later, for investigating the density of local defect site, a smaller cell-size statistically contains a smaller number of local defect sites under the same number of ion impacts. Hence, although the estimation of ion penetration depth can be done by using a smaller cell size, a larger number of impacts should be conducted to investigate an overall feature of local defect site structures. In Sec. 4, we compare the number of defect sites in Fig. 4 with that obtained by the experiments.

Fig. 7. Depth profiles of incident Si atoms into the cells with various sizes.

Figure 8 shows time evolutions of the number of displaced Si atoms (*n*Si) determined from the algorithm based on the Lindemann radius, for the case of the present cell size of 6 × 6 × 9. As seen, at the initial stage, the counted *n*Si increases with fluence until ~ 100 impacts, and then saturates after approximately 500 impacts for both *E*ion cases. The 500-impact corresponds to ~ 7-ML injections in the case of the present (100) surface (6 × 6). Compared to previous reports (Abrams & Graves, 2000; Graves & Humbird, 2002), this saturation value is

Fig. 8. Time evolutions of the number of displaced Si atoms.

in the reasonable range. Thus, it can be concluded that by considering the saturation phenomena, the number of impacts should be properly optimized in advance when performing plasma-etch MD simulations. (In Fig. 4, we have shown the results by 500 impacts for discussions on the defect sites.)

### **4. Comparison with experimental data**

232 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

**Si (150 eV)** → **Si (100) 0.1 nm step**

**0 1.0 2.0 3.0 4.0**

**Cell Size (x y 9)**

**Depth from surface (nm)**

**0 200 400 600 800 1000**

**Fluence**

**Ar Si (100),** *r***L= 0.045 nm**

**50 eV**

**100 eV**

Figure 8 shows time evolutions of the number of displaced Si atoms (*n*Si) determined from the algorithm based on the Lindemann radius, for the case of the present cell size of 6 × 6 × 9. As seen, at the initial stage, the counted *n*Si increases with fluence until ~ 100 impacts, and then saturates after approximately 500 impacts for both *E*ion cases. The 500-impact corresponds to ~ 7-ML injections in the case of the present (100) surface (6 × 6). Compared to previous reports (Abrams & Graves, 2000; Graves & Humbird, 2002), this saturation value is

Fig. 7. Depth profiles of incident Si atoms into the cells with various sizes.

**0**

**0**

Fig. 8. Time evolutions of the number of displaced Si atoms.

**100**

**200**

**300**

**Number of displaced Si atoms**

**400**

**500**

**600**

**10**

**20**

**30**

**Frequency**

**40**

**50**

**60**

### **4.1 Experimental setup, test sample structure, and evaluation techniques**

To verify the above MD simulation results, we carried out plasma treatments of Si wafers by using two plasma reactors. N-type (100) Si wafer with 0.02 cm was exposed to an inductively coupled plasma (ICP) reactor (See Fig. 2) and a DC plasma reactor. In the ICP reactor (Eriguchi et al., 2008a), the plasma was generated by a source power supply unit. The applied power was 300 W. The plasma density can be controlled by changing this source power. On the other, a bias power applied to the wafer stage determines the energy of ion incident on the wafer surface. In the present study, the bias power was varied from 50 to 400 W. The frequencies of source and bias power were 13.56 MHz for both. Ar was used and the pressure was 20 mTorr. Unless otherwise specified, the process time was 30 s. By using an oscilloscope, we determined self dc bias (*V*dc<0) and plasma potential (*V*p), resulting in an average ion energy *E*ion (eV) (= *e*(*V*p – *V*dc)) ranging from 56 to 411 eV. From a Langmuir probe measurement, the electron density and the electron temperature were estimated to be 3.3 1011 cm-3 and 2.6 eV, respectively. Since Ar plasma is electropositive, the ion density is approximately equal to the electron density 3.3 1011 cm-3, giving the flux of ions (ion) of 1.2 1017 cm-2s-1. Note that this ICP configuration results in a constant ion to the wafer for all bias power conditions in the present experiments.

In order to evaluate the damaged structure, we conducted several analyses – spectroscopic ellipsometry (SE) (Herman, 1996), high-resolution transmission electron microscope (HR-TEM) observation, Rutherford backscattering spectroscopy (HR-RBS), photoreflectance spectroscopy (PRS) (Herman, 1996), and capacitance – voltage (C-V) measurement. In particular, to quantify the local defect site density, we employed PRS and CV measurement.

In SE analysis, the damaged layer is assumed to consist of two regions, i.e., the surface and interfacial layers (abbreviated as SL and IL, respectively). The SL is composed of SiO2, as a result of oxidation of heavily damaged regions by exposure to an air as well as the presence of knock-on oxygen from the surface oxide layer. (This is confirmed from the MD simulation results.) The IL is partially oxidized or disordered Si, which is identified by SE as a mixed layer consisting of crystalline Si and SiO2 phases. Thus, an optical model assuming four layers (ambient, surface SiO2 layer, interfacial layer, and Si substrate) was employed (Eriguchi et al., 2008a; Matsuda et al., 2010). In this analysis, the thicknesses of the surface (SL: *d*SL) and interfacial [IL (SiO2 + crystalline-Si): *d*IL] layers and the composition (*f*Si: component of crystalline-Si) were determined.

PRS is one of modulation spectroscopic techniques (Aspnes, 1973; Pollak & Shen, 1990), where the surface of the sample is modulated by a laser beam. The reflectance change by using a probe beam was monitored. In the present PRS, the amplitude of reflectance change (*R*/*R*) for the damaged sample was measured. A decrease in *R*/*R* means the defect generation in the surface region of Si substrate (< 10 nm) (Murtagh et al., 1997; Wada et al., 2000). Details for the basics of PRS analysis and experimental procedures were described elsewhere (Eriguchi & Ono, 2008). From the laser-power dependence of spectra, the areal defect site density (*N*dam (cm-2)) can be estimated (Eriguchi & Ono, 2008; Nakakubo et al., 2010a).

For Capacitance – Voltage (C-V) analysis (Sze, 1981), we used a mercury probe system. The bias frequency was 1 MHz. To estimate the local defect site density, we investigated 1/C2-V curves of the damage samples (Eriguchi et al., 2008a). Details are mentioned later.

#### **4.2 Experimental results and discussion**

#### **4.2.1 Thickness of the damaged-layer**

Figure 9 shows the surface and interfacial thicknesses (*d*SL and *d*IL) identified by SE with an optimized four-layer model. The samples were treated by the ICP system by changing an applied bias power. The SE analysis assigns an increase in *d*IL with the average ion energy *E*ion, while *d*SL does not exhibit a clear increase. This may be due to surface sputtering mechanism. Rutherford backscattering spectrometry (not shown here) identifies the presence of stoichiometric SiO2 region in the SL, and the IL consisting of SiO2 and *c*-Si phases (Eriguchi et al., 2008a). Figure 10 shows TEM observation results for various bias powers. Although it is difficult for the case of higher *E*ion to assign the details of the damaged structures, one can observe the presence of interfacial layer below the surface layer. The estimated thickness was found to be consistent with the SE data (Eriguchi et al., 2008a; Matsuda et al., 2010). The roughness in the interfacial layer is confirmed to increase, in particular, for the 400-W case. The increase in roughness may agree with MD simulation results in Fig. 4. Therefore, it is concluded that the presence of both the surface and interfacial layers should be taken into account when evaluating the damaged-layer thickness by SE and/or TEM. Note that, from these TEM observation results, one can see no local defect site which was assigned by MD simulations. To identify these local defect sites, other novel techniques are required. In the subsection 4.2.3, we provide the results by these techniques.

Fig. 9. Thicknesses of surface and interfacial layers obtained by spectroscopic ellipsometry with a four-layer optical model. Total optical thickness is also shown (closed circles).

Wada et al., 2000). Details for the basics of PRS analysis and experimental procedures were described elsewhere (Eriguchi & Ono, 2008). From the laser-power dependence of spectra, the areal defect site density (*N*dam (cm-2)) can be estimated (Eriguchi & Ono, 2008;

For Capacitance – Voltage (C-V) analysis (Sze, 1981), we used a mercury probe system. The bias frequency was 1 MHz. To estimate the local defect site density, we investigated 1/C2-V

Figure 9 shows the surface and interfacial thicknesses (*d*SL and *d*IL) identified by SE with an optimized four-layer model. The samples were treated by the ICP system by changing an applied bias power. The SE analysis assigns an increase in *d*IL with the average ion energy *E*ion, while *d*SL does not exhibit a clear increase. This may be due to surface sputtering mechanism. Rutherford backscattering spectrometry (not shown here) identifies the presence of stoichiometric SiO2 region in the SL, and the IL consisting of SiO2 and *c*-Si phases (Eriguchi et al., 2008a). Figure 10 shows TEM observation results for various bias powers. Although it is difficult for the case of higher *E*ion to assign the details of the damaged structures, one can observe the presence of interfacial layer below the surface layer. The estimated thickness was found to be consistent with the SE data (Eriguchi et al., 2008a; Matsuda et al., 2010). The roughness in the interfacial layer is confirmed to increase, in particular, for the 400-W case. The increase in roughness may agree with MD simulation results in Fig. 4. Therefore, it is concluded that the presence of both the surface and interfacial layers should be taken into account when evaluating the damaged-layer thickness by SE and/or TEM. Note that, from these TEM observation results, one can see no local defect site which was assigned by MD simulations. To identify these local defect sites, other novel techniques are required. In the

curves of the damage samples (Eriguchi et al., 2008a). Details are mentioned later.

Nakakubo et al., 2010a).

**4.2 Experimental results and discussion 4.2.1 Thickness of the damaged-layer** 

subsection 4.2.3, we provide the results by these techniques.

**Thickness (nm)**

**0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0**

**0 50 100 150 200** *V***<sup>p</sup> –** *V***dc (V)** 

Fig. 9. Thicknesses of surface and interfacial layers obtained by spectroscopic ellipsometry with a four-layer optical model. Total optical thickness is also shown (closed circles).

*d***SL**

*d***IL +** *d***SL**

*d***IL**

Fig. 10. TEM observation results for the Si wafer surface damaged by various bias powers.

### **4.2.2 Time-dependent damaged layer formation**

As indicated in Fig. 8, the MD simulation predicts growth of the damaged layer in accordance with ion fluence. Some of experimental results regarding this phenomenon are shown in Fig. 11. Figure 11 indicates *d*SL and *d*IL as a function of process time (plasma exposure time *t*pro). A DC plasma reactor was used to create PID for various conditions; *V*dc = - 300 and - 350 V, respectively. Dependences of *d*SL and *d*IL on *t*pro are shown. As *t*pro increases, while the *d*SL is almost constant (due to oxidation of surface layer by an air exposure), the *d*IL increases for both cases. Moreover, the *d*IL tends to saturate after a certain amount of time; 5 s for - 300 V and 10 s for - 350 V. From a Langmuir probe measurement, the electron temperature and the electron density were estimated to be 3.0 eV and ~ 109 cm-3, respectively, giving ion of ~ 2 × 1014 cm-2s-1 in these experiments. Thus, the total dosage (fluence) is approximately 2 × 1015 cm-2 for 10 s. Based on the results by the MD simulations in Fig. 8, 500-impact in the present cell size corresponds to a fluence of ~ 5 × 1015 cm-2. Although the *E*ion in the DC plasma processes was not measured precisely (only deduced from applied DC bias voltages), we can speculate that the saturating behaviour of *d*IL in Fig. 11 corresponds to the results in Fig. 8. In other words, once ion is properly determined by plasma diagnostics, the MD simulation can predict the time evolution of damaged-layer formation in practical plasma etching process.

In general, the number of ion impacts combined with the cell size is a more practical and useful measure rather than MLs for MD simulations, because the density of local defect site is a key parameter determining the effects of PID on MOSFET performance degradation. Figure 12 provides the relationship among the number of ion impacts, ion, and *t*pro. Figure 12 guides how many ion impacts are necessary in MD simulations to predict correctly the local defect site generation during practical plasma etching. For a given ion, the number of impacts in a MD simulation should be increased with an increase in *t*pro. For example, in the case of ion = 1015 cm-2s-1 and *t*pro = 102 s, approximately a 10000–impact is needed in the MD simulation. Note that this value is dependent on the cell size. Deduced only from the time

Fig. 11. Thicknesses of surface and interfacial layers as a function of process time. A DC plasma reactor was used.

Fig. 12. The number of impacts required for MD simulation to predict practical plasma etch damage for various ion fluxes. The cell size dependence is shown by the respective lines.

evolution of damaged-layer formation as in Fig. 8, one can expect that a smaller size is better (more efficient). However, as discussed in the next, in terms of local defect site density, one has to increase a cell size as much to understand accurate pictures and overall features of the defect structures for the precise prediction of MOSFET performance degradation.

#### **4.2.3 Density of defect sites**

So far, PID has been evaluated by a wide variety of techniques (Awadelkarim et al., 1994; Egashira et al., 1998; Kokura et al., 2005; Mu et al., 1986; Oehrlein et al., 1988; Yabumoto et

**Surface layer Interfacial layer**

**350 V**

**Cell Size (x y 9)**

**Ion Flux (cm-2s-1)**

**5 10 30 1 5 10 30**

**Process Time (s)**

Fig. 11. Thicknesses of surface and interfacial layers as a function of process time. A DC

**100 101 102 103 Process Time (s)**

defect structures for the precise prediction of MOSFET performance degradation.

Fig. 12. The number of impacts required for MD simulation to predict practical plasma etch damage for various ion fluxes. The cell size dependence is shown by the respective lines.

evolution of damaged-layer formation as in Fig. 8, one can expect that a smaller size is better (more efficient). However, as discussed in the next, in terms of local defect site density, one has to increase a cell size as much to understand accurate pictures and overall features of the

So far, PID has been evaluated by a wide variety of techniques (Awadelkarim et al., 1994; Egashira et al., 1998; Kokura et al., 2005; Mu et al., 1986; Oehrlein et al., 1988; Yabumoto et

**300 V**

**0.0**

**1**

**4.0**

**8.0**

**Thickness (nm)**

plasma reactor was used.

**Number of Impacts**

**4.2.3 Density of defect sites** 

**12.0**

**16.0**

al., 1981). However, there have been not so many reports quantifying the density of created local defects by PID. In this study, we employed two different quantification techniques, photoreflectance spectroscopy (PRS) and capacitance-voltage (C-V) measurement. Details are presented in other literatures (Eriguchi et al., 2008a; Eriguchi & Ono, 2008).

Figure 13 shows the estimated local defect site density from photoreflectance spectra of the test structures damaged by the ICP system (closed squares, on the left axis). By using a modified PRS model for estimation of the density of defect sites (Eriguchi et al., 2008b; Eriguchi & Ono, 2008), one can determines the areal defect site density (*N*dam) as a function of *E*ion. From this figure, one can observe *N*dam of the order of 1012 cm-2 for the present plasma conditions.

Figure 14 shows examples of 1/C2-V analysis results. Figure 14 on the right illustrates the basics of this C-V technique, where w and q are the depletion layer width and the elementary charge, respectively. By using a mercury probe system, a bias voltage applied to Si substrate was swept, and the capacitance was measured. We performed this C-V measurement for both the control (without plasma exposure) and the damaged samples. When the value 1/C2 is plotted along the bias voltage (*V*b) for a fresh Si substrate (the control), the slope of 1/C2 - *V*b becomes constant since the slope corresponds to the impurity (dopant) concentration *n*D (Goodman, 1963; Sze & Ng, 2007). For the plasma-damaged sample, one can see the distortion of 1/C2 - *V*b curves in the inversion region as seen on the left figure. In this experiment, we used n-type Si substrate, thus, the negative bias voltage (*V*b < 0)) corresponds to the inversion layer formation region. The presence of the local defect site (carrier trapping site) causes doping loss (a decrease in *n*D) by intermixing with subsequently-implanted ions (Kokura et al., 2005). Therefore, the distortion (a decrease in the slope of 1/C2 - *V*b plot) indicates a defect site creation in the Si substrate. On the right in Fig. 14, the schematic view of these defect sites in the inversion scheme is also shown. From a change of the slope, we can estimate the volume density of the defect site (*n*dam cm-3) by assuming that the effect impurity concentration equals to (*n*D + *n*dam) (Eriguchi et al., 2008a; Nakakubo et al., 2010b). The calculated *n*dam is plotted on the right axis in Fig. 13 (open circles). The *n*dam on the order of 1018-1019 cm-3 is assigned. Since the thickness of *d*IL containing *n*dam is estimated to be a few nanometers as seen in Fig. 9, the areal density calculated from the *E*ion-dependence of *n*dam is consistent with *N*dam by PRS. (Exactly speaking, the depth profile of *n*dam should be taken into account, though.) One of the other important findings in Figs. 13 and 14 is the fact that the identified local defect sites are electrically active and may induce the MOSFET performance degradation. The defect site structures observed in Fig. 6 are fatal and should be investigated with carefully attention.

The estimated range of *n*dam or *N*dam gives an important interpretation to the MD simulations for PID as follows. The areal density of defect site ranging from 1012 to 1013 cm-2 in practical samples is equivalent to the number of defect site of 0.1 to 1 in the present 6 × 6 simulation cell (~ 10 nm2), although it depends on *t*pro and plasma parameters. In Fig. 4, the observed local defect sites are indeed a few in the number. This number corresponds to approximately 1013 cm-2 in practical samples, which is in consistent with the experimental results in Fig. 13. If one uses a smaller simulation size such as 3 × 3 cell to reduce the simulation cost, one can not find any defect sites statistically. In other words, a simulation scheme using a smaller size (for reducing the calculation time) may bring an erroneous conclusion (no observation of local defect sites). Regarding PID by MD, an optimization of the cell size in accordance with the plasma etching parameter such as ion is quite important.

Fig. 13. Estimated defect site densities as a function of (*V*p – *V*dc) by two different analysis techniques; PRS and CV. From PRS, the areal density was determined, while, from CV, the volume density.

Fig. 14. Illustrations of an example for 1/C2-V plots and the energy band diagram during the CV measurement.

#### **4.2.4 Prediction framework for MOSFET performance degradation by MD**

Finally we discuss the effects of PID observed in the MD simulations and experimental data on MOSFET performance degradation. From the findings in the above, we can summarize the following key issues:


3. The local defect sites (observed in the MD simulation) are confirmed to be electrically active from the experimental data.

As proposed by Eriguchi et al.(Eriguchi et al., 2009a; Eriguchi et al., 2009b), Si recess depth *d*R and the local defect sites degrades MOSFET-operation parameters such as *V*th and *I*d, where *I*d is drain current determining the operation speed (Sze, 1981). These mechanisms are analytically expressed as,

$$
\Delta V\_{\text{th}} \propto -\frac{A}{L\_{\text{g}}} d\_{\text{R}}\,\prime\,\tag{8}
$$

$$I\_{\rm d} = I\_{\rm d}^{0} \cdot (1 - B \cdot n\_{\rm dam}) \, . \tag{9}$$

where *V*th and <sup>0</sup> <sup>d</sup>*I* are the shift of threshold voltage and the drain current of a fresh device, respectively. *A* and *B* are the process- and device-structure-dependent parameters which can be determined from Technology Computer-Aided-Design (TCAD) simulations (Eriguchi et al., 2008a). A prediction of MOSFET performance from process parameters such as *E*ion was demonstrated by using experimental relationship between *E*ion and *d*R (or *n*dam). Moreover, from the present studies, one presumes that *d*R and *n*dam can be determined from MD simulations. At present, the statistical distribution functions for the local defect structures by PID are not clarified yet. A future work is hoped to assign the defect generation mechanisms and the statistical distribution functions with a help of MD simulations. Since the size of MOSFET is shrinking aggressively, a prediction framework of future device design can be organized by an atomistic technique such as MD simulations.

### **5. Conclusion**

238 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

**10 100 1000**

*V***<sup>p</sup> –** *V***dc (V)** 

Fig. 14. Illustrations of an example for 1/C2-V plots and the energy band diagram during the

Finally we discuss the effects of PID observed in the MD simulations and experimental data on MOSFET performance degradation. From the findings in the above, we can summarize

1. The damaged layer is found to consist of two layers, the surface and the interfacial layers. By considering this oxidation process, this thickness of Si loss (Si recess depth *d*R)

2. Underneath the interfacial layer, local defect sites are present. The local defect

Fig. 13. Estimated defect site densities as a function of (*V*p – *V*dc) by two different analysis techniques; PRS and CV. From PRS, the areal density was determined, while, from CV, the

**1018**

Hg **Defect site** *<sup>E</sup>*<sup>F</sup>

**Energy** *Damaged layer*

**Si substrate**

*~* **(***d***SL+***d***IL)** *w*

n-Si *E*<sup>F</sup>

<sup>b</sup> *qV*

**1019**

**Defect Density by CV (cm-3)**

**1012**

**-2.0 0.0 2.0 4.0 6.0 Applied bias voltage:** *V***<sup>b</sup> (V)**

**4.2.4 Prediction framework for MOSFET performance degradation by MD** 

**Ref. Damaged**

**Bias power: 100 W**

**1013**

**Defect Density by PRS (cm-2)**

volume density.

**1/***C***2 (1013**

CV measurement.

the following key issues:

can be predicted by MD simulations.

structures can be assigned from MD simulations.

**(F/cm2)-2)**

**0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0**

We applied classical MD simulations to investigate plasma-etch damage mechanisms. The simulated structures were found to consist of two layers, the surface and the interfacial layers. Due to oxidation by an air exposure, these two layers were identified in practice as SiO2 and a mixed layer consisting of crystalline Si and SiO2 phases, respectively. Underneath the interfacial layer, we assigned local defect sites, and the structures were confirmed to include typical Si interstitials. From the experiments, these sites were found to be electrically active. Combined with plasma diagnostics and quantitative analysis techniques of the local defect site density, the number of required ion impacts and the cell size for MD simulations were optimized for PID. Finally, a prediction framework for MOSFET design was discussed. An optimized MD simulation will be a promising candidate for predicting MOSFET performance degradation by PID.

### **6. Acknowledgment**

The author greatly thanks Dr. H. Ohta for his great support of the MD simulation in this work. Acknowledgements are also given to Mr. Y. Nakakubo, Mr. A. Matsuda, Dr. Y. Takao, and Prof. K. Ono at Kyoto University, and Drs. M. Yoshimaru, H. Hayashi, S. Hayashi, H. Kokura, T. Tatsumi, and S. Kuwabara at STARC (Semiconductor Technology Academic Research Center) for their helpful discussions. This work was financially supported in part by STARC and a Grant-in-Aid for Scientific Research (B) from the Japan Society for the Promotion of Science.

#### **7. References**


Abrams, C., & Graves, D. (1998). Energetic ion bombardment of SiO2 surfaces: Molecular dynamics simulations, *J. Vac. Sci. Technol. A*, Vol.16, No.5, (1998), pp. 3006-3019 Abrams, C. F., & Graves, D. B. (1999). Molecular dynamics simulations of Si etching by

+, *J. Appl. Phys.*, Vol.86, No.11, (1999), pp. 5938-5948 Abrams, C. F., & Graves, D. B. (2000). Molecular dynamics simulations of Si etching with

Aspnes, D. E. (1973). Third-derivative modulation spectroscopy with low-field

Awadelkarim, O. O., Mikulan, P. I., Gu, T., Reinhardt, K. A., & Chan, Y. D. (1994). Electrical

Balamane, H., Halicioglu, T., & Tiller, W. A. (1992). Comparative study of silicon empirical interatomic potentials, *Phys. Rev. B*, Vol.46, No.4, (1992), pp. 2250-2279 Baraff, G. A., Kane, E. O., & Schluter, M. (1980). Theory of the silicon vacancy: An Anderson negative-U system, *Phys. Rev. B*, Vol.21, No.12, (1980), pp. 5662-5986 Batra, I. P., Abraham, F. F., & Ciraci, S. (1987). Molecular-dynamics study of self-interstitials

Berendsen, H. J. C., Postma, J. P. M., Gunsteren, W. F. v., DiNola, A., & Haak, J. R. (1984).

Biswas, R., & Hamann, D. R. (1987). New classical models for silicon structural energies,

Cheng, L.-J., & Corbett, J. W. (1974). Defect creation in electronic materials, *Proceedings of the* 

Colombo, L. (2002). Tight-Binding Theory of Native Point Defects in Silicon, *Annual Review* 

Dodson, B. W. (1987). Development of a many-body Tersoff-type potential for silicon, *Phys.* 

Egashira, K., Eriguchi, K., & Hashimoto, S. A new evaluation method of plasma process-

Eriguchi, K., Matsuda, A., Nakakubo, Y., Kamei, M., Ohta, H., & Ono, K. (2009a). Effects of

Eriguchi, K., Nakakubo, Y., Matsuda, A., Kamei, M., Ohta, H., Nakagawa, H., S. Hayashi,

Eriguchi, K., Nakakubo, Y., Matsuda, A., Takao, Y., & Ono, K. (2009b). Plasma-Induced

induced Si substrate damage by the voltage shift under constant current injection at metal/Si interface, *IEDM Tech. Dig.*, pp. 563-566, San Francisco, CA, USA, Dec 06-

Plasma-Induced Si Recess Structure on n-MOSFET Performance Degradation, *IEEE* 

Noda, S., Ishikawa, K., Yoshimaru, M., & Ono, K. A New Framework for Performance Prediction of Advanced MOSFETs with Plasma-Induced Recess Structure and Latent Defect Site, *IEDM Tech. Dig.*, pp. 443-446, San Francisco, CA,

Defect-Site Generation in Si Substrate and Its Impact on Performance Degradation in Scaled MOSFETs, *IEEE Electron Device Lett.*, Vol.30, No.12, (2009), pp. 1275-1277

Molecular dynamics with coupling to an external bath, *J. Chem. Phys.*, Vol.81, No.8,

electroreflectance, *Surf. Sci.*, Vol.37, (1973), pp. 418-442

in silicon, *Phys. Rev. B*, Vol.35, No.18, (1987), pp. 9552-9558

*Phys. Rev. B*, Vol.36, No.12, (1987), pp. 6434-6445

*of Materials Research*, Vol.32, No.1, (2002), pp. 271-295

*Electron Device Lett.*, Vol.30, No.7, (2009), pp. 712-714

*IEEE*, Vol.62, No.9, (1974), pp. 1208-1214

*Rev. B*, Vol.35, No.6, (1987), pp. 2795-2798

energetic F+: Sensitivity of results to the interatomic potential, *J. Appl. Phys.*, Vol.88,

properties of contact etched p-Si: A comparison between magnetically enhanced and conventional reactive ion etching, *J. Appl. Phys.*, Vol.76, No.4, (1994), pp. 2270-

**7. References** 

energetic CF3

2278

No.6, (2000), pp. 3734-3738

(1984), pp. 3684-3690

09, 1998

USA, Dec 15-17, 2008


Matsuda, A., Nakakubo, Y., Takao, Y., Eriguchi, K., & Ono, K. (2010). Modeling of ion-

Mazzarolo, M., Colombo, L., Lulli, G., & Albertazzi, E. (2001). Low-energy recoils in

Moliere, G. (1947). Theorie der Streuung schneller geladener Teichen I, *Z. Naturforschung*,

Mu, X. C., Fonash, S. J., Rohatgi, A., & Rieger, J. (1986). Comparison of the damage and

Murtagh, M., S M Lynch, Kelly, P. V., Hildebrandt, S., Herbert, P. A. F., Jeynes, C., & Crean,

Nakakubo, Y., Matsuda, A., Kamei, M., Ohta, H., Eriguchi, K., & Ono, K. (2010a). Analysis of

Nakakubo, Y., Matsuda, A., Takao, Y., Eriguchi, K., & Ono, K. Study of Wet-Etch Rate of

Nordlund, K., Ghaly, M., Averback, R. S., Caturla, M., Diaz de la Rubia, T., & Tarus, J.

Oehrlein, G. S. (1989). Dry etching damage of silicon: A review, *Materials Sci. Eng. B*, Vol.4,

Oehrlein, G. S., Bright, A. A., & Robey, S. W. (1988). X-Ray Photoemission Spectroscopy

Ohta, H., & Hamaguchi, S. (2001a). Classical interatomic potentials for Si-O-F and Si-O-Cl

Ohta, H., & Hamaguchi, S. (2001b). Molecular dynamics simulation of silicon and silicon

Ohta, H., & Hamaguchi, S. (2004). Effects of Van der Waals Interactions on SiO2 Etching by

Pelaz, L., Marques, L. A., Aboy, M., Lopez, P., & Santos, I. (2009). Front-end process modeling in silicon, *The European Physical Journal B*, Vol.72, (2009), pp. 323-359

systems, *J. Chem. Phys.*, Vol.115, No.14, (2001), pp. 6679-6690

CFx Plasmas, *J. Plasma Fusion Res*, Vol.6, (2004), pp. 399-401

hydrogen, *Appl. Phys. Lett.*, Vol.48, No.17, (1986), pp. 1147-1149

*J. Appl. Phys.*, Vol.105, No.2, (2009), pp. 023302-023306

ion etched silicon (100), *Mat. Sci. Technol.*, Vol.13 (1997), pp. 961-964 Nagaoka, T., Eriguchi, K., Ono, K., & Ohta, H. (2009). Classical interatomic potential model

*Technologies and Circuits* (Vol. 66, pp. 107-120), London, (Springer)

*Symp. Dry Process*, pp. 173-174, Tokyo, Japan, Nov 11-12, 2010

metals, *Phys. Rev. B*, Vol.57, No.13, (1998), pp. 7556-7570

No.13, (2010), pp. 3481-3486

Vol.A2, (1947), pp. 133-145

No.1-4, (1989), pp. 441-450

(2001), pp. 2373-2381

5326

195207

bombardment damage on Si surfaces for in-line analysis, *Thin Solid Films*, Vol.518,

crystalline silicon: Quantum simulations, *Phys. Rev. B*, Vol.63, No.19, (2001), pp.

contamination produced by CF4 and CF4/H2 reactive ion etching: The role of

G. M. (1997). Photoreflectance characterization of Ar+ ion etched and SiCl4 reactive

for Si/H/Br systems and its application to atomistic Si etching simulation by HBr+,

Si Substrate Damage Induced by Inductively Coupled Plasma Reactor with Various Superposed Bias Frequencies. In A. Amara & M. Belleville & T. Ea (Eds.), *Emerging* 

Plasma-Damaged Surface and Interface Layers and Residual Defect Sites, *Proc.* 

(1998). Defect production in collision cascades in elemental semiconductors and fcc

Characterization of Si Surfaces after CF4/H2 Magnetron Ion Etching - Comparisons to Reactive Ion Etching, *J. Vac. Sci. Technol. A*, Vol.6, No.3, (1988), pp. 1989-1993 Ohchi, T., Kobayashi, S., Fukasawa, M., Kugimiya, K., Kinoshita, T., Takizawa, T.,

Hamaguchi, S., Kamide, Y., & Tatsumi, T. (2008). Reducing Damage to Si Substrates during Gate Etching Processes, *Jpn. J. Appl. Phys.*, Vol.47, No.7, (2008), pp. 5324-

dioxide etching by energetic halogen beams, *J. Vac. Sci. & Technol. A*, Vol.19, No.5,


## **Molecular Dynamics Simulations of Complex (Dusty) Plasmas**

Céline Durniak and Dmitry Samsonov *Department of Electrical Engineering and Electronics, University of Liverpool United Kingdom*

#### **1. Introduction**

244 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Wada, H., Agata, M., Eriguchi, K., Fujimoto, A., Kanashima, T., & Okuyama, M. (2000).

Watanabe, T., Fujiwara, H., Noguchi, H., Hoshino, T., & Ohdomari, I. (1999). Novel

Wilson, W. D., Haggmark, L. G., & Biersack, J. P. (1977). Calculations of nuclear stopping,

Yabumoto, N., Oshima, M., Michikami, O., & Yoshii, S. (1981). Surface Damage on Si

*Appl. Phys.*, Vol.88, No.5, (2000), pp. 2336-2341

Vol.38, No.4A, (1999), pp. L366-L369

pp. 2458-2468

(1981), pp. 893-900

Photoreflectance characterization of the plasma-induced damage in Si substrate, *J.* 

Interatomic Potential Energy Function for Si, O Mixed Systems, *Jpn. J. Appl. Phys.*,

ranges, and straggling in the low-energy region, *Phys. Rev. B*, Vol.15, No.5, (1977),

Substrates Caused by Reactive Sputter Etching, *Jpn. J. Appl. Phys.*, Vol.20, No.5,

Complex or dusty plasmas are multi-component plasmas, which, in addition to the usual plasma components *i.e.* ions and electrons, contain micron-sized particles, also called grains or dust. These particles acquire high electric charges, and interact collectively over long distances. Like colloids, complex plasmas form solid- and liquid-like structures with long range correlations and exhibit phase transitions. Unlike colloids, they exhibit a range of dynamic phenomena such as particle-mediated linear and nonlinear waves, shocks, wakes, instabilities, etc. Complex plasma crystallisation has been theoretically predicted (Ikezi, 1986) and subsequently discovered in the early 1990s (Chu & I, 1994; Hayashi & Tachibana, 1994; Melzer et al., 1994; Thomas et al., 1994) giving rise to the interest in the field among the whole physics community.

Particles in complex plasmas can be illuminated by a laser light and easily observed with a video camera yielding their full kinetic information *i.e.* positions and velocities. This makes them useful as model systems for studying various phenomena in solids and liquids at the microscopic level (Thomas & Morfill, 1996). Since it is almost impossible and very expensive to observe the dynamics of real solids and liquids at the kinetic level, model systems are used to study the fundamental properties of phase transitions, diffusion, viscosity and elasticity. These model systems include colloids, granular media and complex plasmas, collectively known as soft matter. Phenomena in complex plasmas have analogues and applications in many different fields of science and technology such as plasma physics, fusion, solid state physics, fluid dynamics, acoustics, optics, material science, nanoscience, nanotechnology, environment protection, space exploration and astrophysics (Fortov et al., 2005a; Merlino & Goree, 2004; Morfill & Ivlev, 2009).

Arguably the most common technique to simulate complex plasmas is molecular dynamics (MD). It solves numerically the equations of motion for each individual particle comprising the system under investigation. It is also applied to simulate biomolecules, polymers, solids, colloids, granular media, atomic nuclei, galaxies, and stellar systems. MD simulations of complex plasmas are used in this chapter to determine their structural and dynamic properties as well as to identify the underlying physical mechanisms of various phenomena.

This chapter is organised as follows. Section 2 describes applications, natural occurrence, scientific significance, and multidisciplinary character of complex plasmas. MD methods and their experimental verification techniques will be detailed in Sections 3 and 4 respectively. Finally, numerical results and their comparison with experiments will be discussed in Section 5 and concluding remarks will be presented in Section 6.

### **2. Overview of complex (dusty) plasmas**

The term "dusty plasmas" originated in astrophysics, where polydisperse (various size and shape) dust grains are exposed to various charged particles, ionised gases, and ionising radiation. Laboratory studies often involve high quality monodisperse (same size) microspheres added to gas discharges to study various complex and collective phenomena that do not occur in natural dusty plasmas. Thus the term "complex plasmas" (Samsonov et al., 2000) is often used instead, by analogy with "complex fluids", where similar complex phenomena are observed in multicomponent fluids.

#### **2.1 Charging and forces**

Mixed with a plasma, grains or microparticles collect ions and electrons and typically charge negatively due to higher mobility of electrons (Bronold et al., 2009; Goree, 1994; Melzer et al., 1994). However in the presence of ionising radiation, such as UV light, or thermionic emission, the particle charge may become positive. Typical charges are of the order of <sup>∼</sup> 104 electrons for ∼ 10 *μ*m diameter particles. The charge value is proportional to the electron temperature and to the particle radius. The time it takes a particle to reach an equilibrium charge, the charging time, is inversely proportional to the particle size and plasma density (Goree, 1994). Its typical value is ∼ 100 ns for ∼ 10 *μ*m grains in typical laboratory conditions. The particle charge is not constant and it fluctuates around an equilibrium. This may cause instabilities and if the charge value changes its sign, it may even result in particle coagulation.

Highly charged particles are affected by electric fields in the discharge and interact with each other electrostatically. Their interaction potential is usually assumed to be of a Yukawa (Debye-Hückel or screened Coulomb) type, if the background plasma is isotropic (Kennedy & Allen, 2003). This approximation has been also shown to be valid for particles levitating in a plasma sheath at the same height (Konopka et al., 2000) as in monolayer complex plasmas. Flowing plasma makes particle-particle interaction anisotropic with regions of negative and positive potentials (ion wake) (Melandsø& Goree, 1995; Vladimirov et al., 2003) and it also affects the charge of downstream particles.

Apart from the electrostatic force, grains are affected by other forces. Gravitational force becomes dominant for particles with a diameter 1 *μ*m in the bulk of the discharge. Large particles are pushed by the gravity down into the plasma sheath, where the electric field is strong enough to levitate them. This effect makes it necessary to use microgravity conditions in order to produce large three dimensional (3D) structures. Neutral drag force results from collisions with the gas molecules. It is equivalent to friction and damps particle motion. Streaming ions affect grains via an ion drag. This force is responsible for a void formation (Goedheer et al., 2009; Samsonov & Goree, 1999). Thermophoretic force arises due to a temperature gradient. It can be used for particle levitation (Rothermel et al., 2002). Intense light sources create a light pressure force, which is utilised for grain manipulation (Liu et al., 2003).

Complex plasmas can be characterised by two parameters. The coupling parameter Γ = *U*/*T* is the average ratio of the electrostatic potential energy to the kinetic energy of particles. The screening parameter *κ* = *a*/*λ<sup>D</sup>* is the ratio of the interparticle distance to the screening (Debye) length. These parameters determine if the complex plasma is in the crystalline or a liquid state (Hamaguchi et al., 1997; Ikezi, 1986; Vaulina et al., 2002). Crystalline plasmas, which have long range correlations, are characterised by large values of the coupling parameter Γ 170 (Ikezi, 1986). Liquid phase state has smaller values 1 Γ 170 and short range correlations. Solids and liquids are strongly coupled states. Gaseous state is weakly coupled (Γ < 1), with uncorrelated particle positions.

#### **2.2 Natural occurrence and significance**

2 Will-be-set-by-IN-TECH

their experimental verification techniques will be detailed in Sections 3 and 4 respectively. Finally, numerical results and their comparison with experiments will be discussed in

The term "dusty plasmas" originated in astrophysics, where polydisperse (various size and shape) dust grains are exposed to various charged particles, ionised gases, and ionising radiation. Laboratory studies often involve high quality monodisperse (same size) microspheres added to gas discharges to study various complex and collective phenomena that do not occur in natural dusty plasmas. Thus the term "complex plasmas" (Samsonov et al., 2000) is often used instead, by analogy with "complex fluids", where similar complex

Mixed with a plasma, grains or microparticles collect ions and electrons and typically charge negatively due to higher mobility of electrons (Bronold et al., 2009; Goree, 1994; Melzer et al., 1994). However in the presence of ionising radiation, such as UV light, or thermionic emission, the particle charge may become positive. Typical charges are of the order of <sup>∼</sup> 104 electrons for ∼ 10 *μ*m diameter particles. The charge value is proportional to the electron temperature and to the particle radius. The time it takes a particle to reach an equilibrium charge, the charging time, is inversely proportional to the particle size and plasma density (Goree, 1994). Its typical value is ∼ 100 ns for ∼ 10 *μ*m grains in typical laboratory conditions. The particle charge is not constant and it fluctuates around an equilibrium. This may cause instabilities

Highly charged particles are affected by electric fields in the discharge and interact with each other electrostatically. Their interaction potential is usually assumed to be of a Yukawa (Debye-Hückel or screened Coulomb) type, if the background plasma is isotropic (Kennedy & Allen, 2003). This approximation has been also shown to be valid for particles levitating in a plasma sheath at the same height (Konopka et al., 2000) as in monolayer complex plasmas. Flowing plasma makes particle-particle interaction anisotropic with regions of negative and positive potentials (ion wake) (Melandsø& Goree, 1995; Vladimirov et al., 2003) and it also

Apart from the electrostatic force, grains are affected by other forces. Gravitational force becomes dominant for particles with a diameter 1 *μ*m in the bulk of the discharge. Large particles are pushed by the gravity down into the plasma sheath, where the electric field is strong enough to levitate them. This effect makes it necessary to use microgravity conditions in order to produce large three dimensional (3D) structures. Neutral drag force results from collisions with the gas molecules. It is equivalent to friction and damps particle motion. Streaming ions affect grains via an ion drag. This force is responsible for a void formation (Goedheer et al., 2009; Samsonov & Goree, 1999). Thermophoretic force arises due to a temperature gradient. It can be used for particle levitation (Rothermel et al., 2002). Intense light sources create a light pressure force, which is utilised for grain manipulation (Liu et al.,

and if the charge value changes its sign, it may even result in particle coagulation.

Section 5 and concluding remarks will be presented in Section 6.

**2. Overview of complex (dusty) plasmas**

phenomena are observed in multicomponent fluids.

affects the charge of downstream particles.

2003).

**2.1 Charging and forces**

Dust is abundant in space, where it is found in planetary rings, comet tails, interstellar clouds, and planetary nebulae. The sources of ionisation are also present, such as charged particles from cosmic rays and stellar winds, gas ionised by the stellar radiation, stellar radiation itself, and various radioactive elements in the dust, which emit charged particles and ionising gamma rays. Dusty plasma effects are believed to be involved in formation of dark spokes in Saturn rings (Hartquist et al., 2003). Charge fluctuations are known to enhance coagulation of particles (Konopka et al., 2005). This effect may influence the models of planet formation. Spacecraft and satellites are often charged by the solar wind (Whipple, 1981). This can cause their malfunction due to electrical breakdown. It also increases the drag force due to enhanced collisions with ions. Lunar and Martian dust can be charged by the solar radiation and levitate above the planet surface (Sternovsky et al., 2002). It poses threat to machinery and spacesuits because of its abrasive properties. Charged ionospheric aerosols affect radio wave propagation (Cho et al., 1996) and often disrupt communications. Ultrafine charged particles in the Earth atmosphere influence cloud formation by providing centres of condensation and thus affect the Earth radiative budget (Boulon et al., 2010) with implications for climate models. Aerosol charging modifies the atmospheric chemistry as well as the formation and transport of pollutants (Aikin & Pesnell, 1998).

#### **2.3 Industrial applications**

Particles with designed properties are grown in a plasma environment for various technological applications (Boufendi et al., 2011). These include production of fine powders for ceramics and catalysts, phase separated materials, coatings for solar cells, and nanocoatings for optics. Undesirable dust growth has been observed in ultra-clean etching reactors. As the size of the semiconductor device features has approached 22 nanometers, a single dust particle of a similar size can destroy a whole device, significantly reducing the yield of the manufacturing process. This requires strict measures to prevent dust formation. Fusion devices were found to produce metal dust in significant quantities by evaporating or sputtering their walls. This dust is flammable, radioactive and poses safety hazard. It can contaminate and quench the plasma reducing energy yield.

#### **2.4 Complex plasmas as model systems**

One of the most interesting laboratory uses of complex plasmas is to study properties of solids and liquids at the microscopic or kinetic level. Complex plasmas possess a unique combination of properties and share many of them with other model systems such as colloids and granular media. Grains in complex plasmas can be easily observed. They have sizes of a few microns and separation distances of almost a millimetre. Thus complex plasmas remain optically thin over many interparticle distances. Illuminated with a laser, particles can be imaged with a video camera producing images with high contrast. Typical timescales of particle motion are of the order of 10 ms, meaning that a moderately high speed camera is adequate. The damping rate due to the background gas can be below 1 s−1. This makes it possible to observe grain-mediated wave motion. The particle-particle interaction potential is continuous and long range, qualitatively similar to that in real solids and liquids. Complex plasmas exhibit a multitude of dynamic phenomena such as waves (Nunomura et al., 2002; Zhdanov et al., 2003), solitons (Durniak et al., 2009; Samsonov et al., 2002), shock waves (Luo et al., 1999; Samsonov & Morfill, 2008), melting and crystallisation (Knapek et al., 2007), Mach cones (Nosenko et al., 2003; Samsonov et al., 1999), diffusion (Nunomura et al., 2006), heat transport (Nosenko et al., 2008; Nunomura et al., 2005a), and shear flows (Hartmann et al., 2011). Here we will focus on MD simulations of these phenomena in strongly coupled complex plasmas.

#### **3. Numerical models for complex plasmas**

Computer simulations are used to describe and predict the behaviour of systems whose complexity makes analytical treatments impossible or very difficult. Numerical models advance our understanding of what basic processes are responsible for different observable phenomena. They can be rerun using exactly the same initial conditions (Durniak & Samsonov, 2010) with altered physical processes (*e.g.* forces) or parameters (*e.g.* damping rate). Examples of numerical simulations of complex plasmas include dynamics of bilayers (Hartmann et al., 2009), self diffusion in two dimensional (2D) liquids (Hou, Piel & Shukla, 2009), phase transition between solid and liquid (Farouki & Hamaguchi, 1992), phonons in a linear chain (Liu & Goree, 2005a), diffusion in 2D liquids (Ott & Bonitz, 2009a), defect dynamics (Durniak & Samsonov, 2010), shear flows (Sanbonmatsu & Murillo, 2001), and nonlinear wave propagation (Durniak et al., 2009). There are two basic types of numerical simulations: stochastic and deterministic. The Monte Carlo method (Sheridan, 2009b) belongs to the first type, while the molecular dynamics (Allen & Tildesley, 1987) and the fluid model (Goedheer et al., 2009) to the second. MD simulations often incorporate stochastic elements in order to simulate Brownian motion and the effects of finite temperature. All these techniques are used to simulate complex plasmas, however here we will only consider the MD method.

#### **3.1 Numerical methods**

MD simulations solve numerically the equations of motion for every particle comprising the system. Newtonian equations of motion are used in classical simulations (Allen & Tildesley, 1987). The most common integration techniques employed in complex plasmas include Verlet (Liu et al., 2006), velocity-Verlet (Klumov et al., 2010), Swope (Ott & Bonitz, 2009a), leapfrog (Ma & Bhattacharjee, 2002), predictor-corrector (Farouki & Hamaguchi, 1994), Gear-like (Hou, Piel & Shukla, 2009), Beeman-like (Couëdel et al., 2011; Donkó et al., 2010), Runge-Kutta (Jefferson et al., 2010; Zhdanov et al., 2003), and Runge-Kutta with variable step (Cash Karp) (Durniak et al., 2010). Performance of several integration algorithms for complex plasma

combination of properties and share many of them with other model systems such as colloids and granular media. Grains in complex plasmas can be easily observed. They have sizes of a few microns and separation distances of almost a millimetre. Thus complex plasmas remain optically thin over many interparticle distances. Illuminated with a laser, particles can be imaged with a video camera producing images with high contrast. Typical timescales of particle motion are of the order of 10 ms, meaning that a moderately high speed camera is adequate. The damping rate due to the background gas can be below 1 s−1. This makes it possible to observe grain-mediated wave motion. The particle-particle interaction potential is continuous and long range, qualitatively similar to that in real solids and liquids. Complex plasmas exhibit a multitude of dynamic phenomena such as waves (Nunomura et al., 2002; Zhdanov et al., 2003), solitons (Durniak et al., 2009; Samsonov et al., 2002), shock waves (Luo et al., 1999; Samsonov & Morfill, 2008), melting and crystallisation (Knapek et al., 2007), Mach cones (Nosenko et al., 2003; Samsonov et al., 1999), diffusion (Nunomura et al., 2006), heat transport (Nosenko et al., 2008; Nunomura et al., 2005a), and shear flows (Hartmann et al., 2011). Here we will focus on MD simulations of these phenomena in strongly coupled

Computer simulations are used to describe and predict the behaviour of systems whose complexity makes analytical treatments impossible or very difficult. Numerical models advance our understanding of what basic processes are responsible for different observable phenomena. They can be rerun using exactly the same initial conditions (Durniak & Samsonov, 2010) with altered physical processes (*e.g.* forces) or parameters (*e.g.* damping rate). Examples of numerical simulations of complex plasmas include dynamics of bilayers (Hartmann et al., 2009), self diffusion in two dimensional (2D) liquids (Hou, Piel & Shukla, 2009), phase transition between solid and liquid (Farouki & Hamaguchi, 1992), phonons in a linear chain (Liu & Goree, 2005a), diffusion in 2D liquids (Ott & Bonitz, 2009a), defect dynamics (Durniak & Samsonov, 2010), shear flows (Sanbonmatsu & Murillo, 2001), and nonlinear wave propagation (Durniak et al., 2009). There are two basic types of numerical simulations: stochastic and deterministic. The Monte Carlo method (Sheridan, 2009b) belongs to the first type, while the molecular dynamics (Allen & Tildesley, 1987) and the fluid model (Goedheer et al., 2009) to the second. MD simulations often incorporate stochastic elements in order to simulate Brownian motion and the effects of finite temperature. All these techniques are used to simulate complex plasmas, however here we will only consider the MD method.

MD simulations solve numerically the equations of motion for every particle comprising the system. Newtonian equations of motion are used in classical simulations (Allen & Tildesley, 1987). The most common integration techniques employed in complex plasmas include Verlet (Liu et al., 2006), velocity-Verlet (Klumov et al., 2010), Swope (Ott & Bonitz, 2009a), leapfrog (Ma & Bhattacharjee, 2002), predictor-corrector (Farouki & Hamaguchi, 1994), Gear-like (Hou, Piel & Shukla, 2009), Beeman-like (Couëdel et al., 2011; Donkó et al., 2010), Runge-Kutta (Jefferson et al., 2010; Zhdanov et al., 2003), and Runge-Kutta with variable step (Cash Karp) (Durniak et al., 2010). Performance of several integration algorithms for complex plasma

complex plasmas.

**3.1 Numerical methods**

**3. Numerical models for complex plasmas**

problems in presence of a magnetic field has been investigated (Hou, Miskovi´ ˘ c, Piel & Shukla, 2009).

In order to simulate thermodynamic processes correctly, canonical or microcanonical ensembles are used. This requires conservation of certain thermodynamic quantities (volume, particle number, temperature, energy, etc.) by the algorithm. Numerical errors tend to accumulate and increase the total energy of the system. Energy conserving integrators, such as Verlet and Beeman, are often utilised in order to keep the total energy of the system constant. Since a relatively small number of particles is involved in the simulation, their kinetic energy and thus temperature tend to have significant fluctuations. Constant temperature is maintained by deterministic techniques such as Nosé-Hoover thermostat (Liu et al., 2006) and velocity rescaling (Ohta & Hamaguchi, 2000; Ott & Bonitz, 2009a), or by stochastic methods such as Andersen thermostat (Nelissen et al., 2007) or Langevin dynamics (Schve˘ıgert et al., 2000). Langevin dynamics is an extension of the MD method, in which a random (Langevin) force and a damping term are added to the equations in order to simulate the effects of liquid or gaseous background on micron-sized particles. This accounts for random kicks by fast moving molecules as well as for the friction force caused by the liquid or gas drag.

Computing pair interactions for all possible pairs of particles in an ensemble is very costly, when the number of particles is large. Since the interaction force decreases with the distance, it is possible to simplify the calculations. Short range interactions allow introduction of a cut-off distance (Vaulina & Dranzhevski, 2006) beyond which the forces are neglected. This method needs to maintain a list of neighbours to keep track of interacting grains in order to increase efficiency. Long range forces can not be easily truncated, however they can be averaged to reduce the computational cost. Several methods are used such as the Ewald summation (Ott et al., 2011), the fast multipole, the tree code, and particle-mesh-based techniques (Frenkel & Smit, 2002; Rapaport, 1995), and an example of the latter the particle-particle particle-mesh method (Donkó et al., 2008). The number of particles in the simulation can be reduced using periodic boundary conditions: a particle exiting a simulation box from one side will re-enter on the opposite side. Particles interact not only with other particles in the simulation box but also with particles in image boxes (Donkó et al., 2010; Hamaguchi, 1999). Free boundaries are also often used in complex plasma simulations.

#### **3.2 Interaction and confinement**

Since plasmas contain a mixture of species moving on very different time scales, it is impossible to simulate a meaningful number of them as individual particles even using a supercomputer. Thus complex plasmas are typically simulated as microparticles interacting via an effective potential. The ion-electron plasma is not explicitly included and enters only as a screening parameter in the interaction potential. The neutral gaseous background is simulated as a friction force. This approach is valid in most cases, as comparisons with experiments show. However there were some notable attempts to include all plasma components (Ikkurthi et al., 2009; Joyce et al., 2001). It is also frequently assumed that the particles have a constant charge, which is often a very good approximation for large and highly charged grains.

The most frequently used particle-particle interaction potential in complex plasma simulations is Yukawa, however Coulomb (Lai & I, 1999) as well as a repulsive-attractive potentials have been used (Chen , Yu & Luo, 2005). Other forces can be added *e.g.* due the influence of magnetic fields (Hou, Shukla, Piel & Miskovi´ ˘ c, 2009), ion drag (Ikkurthi et al., 2009), or external excitations (Durniak et al., 2009). External excitations applied to an equilibrated structure allow to investigate non-equilibrium dynamics (Donkó et al., 2010), and various dynamic phenomena (Jefferson et al., 2010).

Since Yukawa potential is purely repulsive, an additional confinement is needed, if free boundary conditions are used. The most common confining potential is parabolic (Durniak et al., 2009; Ma & Bhattacharjee, 2002; Nelissen et al., 2007), but other shapes have been used such as 4th order (Lai & I, 1999), 10th order (Durniak et al., 2010), soft-wall (Ott et al., 2008), and hard-wall (Klumov et al., 2009). Special shapes of confinement are used in order to obtain particular particle arrangements, *e.g.* a 2D annular potential to obtain rings (Sheridan, 2009a) and anisotropic potentials for elliptical clusters (Cândido et al., 1998), linear chains (Liu & Goree, 2005a), or flat disks (Durniak et al., 2010).

The Langevin dynamics method includes a stochastic force and a dynamic damping force into the equations of motion. The stochastic force does not depend on the particle momentum. It has a zero mean value and a Gaussian probability distribution with a correlation function:

$$
\langle L\_i(t) \rangle = 0, \quad \langle L\_i(t)L\_j(t') \rangle = 2\nu mk\_B T \delta(t - t')\delta\_{ij} \tag{1}
$$

where *ν* is the damping coefficient, *kB* is the Boltzmann constant, *T* the temperature of the system and the indices *i* and *j* are linked to particles. The damping force is often chosen to be equal to the gas drag force, which depends on the gas pressure, the kind of gas and the momentum exchange between the gas molecules and the grain surface (Epstein, 1924).

#### **3.3 Our simulation code**

The code used to simulate the complex plasmas presented in this chapter is based on an objects-oriented multi-threaded programming. It is assumed that the microparticles comprising a complex plasma interact with each other via a Yukawa potential, their motion is damped by a neutral drag, and that they are confined in an external potential well with free boundaries. A Langevin force is used to study effects of finite temperature. The ions and electrons are not explicitly included in the model. We take into account the interaction of every microparticle with every other one (particle-particle code), thus there is no cut-off for the potential. The equations of grain motion are solved using a fifth order Runge Kutta method with the Cash Karp adaptive step size control (Press et al., 1992). This makes the code precise, simple and stable at the expense of computational efficiency. It is used to simulate a wide range of dynamic phenomena in a system of several thousand particles.

The model is based on the Newtonian equations of motion written for each microparticle:

$$\begin{aligned} m\ddot{\mathbf{r}}\_s &= \mathbf{f}\_s^{int} + \mathbf{f}\_s^{fr} + \mathbf{f}\_s^{conf} + \mathbf{L}\_s(t) + \mathbf{f}\_s^{ext} \quad \mathbf{f}\_s^{fr} = -m\nu \dot{\mathbf{r}}\_s\\ \mathbf{f}\_s^{int} &= -\nabla \sum\_{j \neq s} \mathcal{U}\_0 \quad \mathcal{U}\_0(r\_{sj}) = \mathcal{Q}^2 (4\pi \epsilon\_0 r\_{sj})^{-1} \exp(-r\_{sj}/\lambda\_D) \\ \mathbf{f}\_s^{conf} &= -m[\Omega\_h^2(\mathbf{x}\_s + \mathbf{y}\_s) + \Omega\_v^2 \mathbf{z}\_s] \end{aligned} \tag{2}$$

potentials have been used (Chen , Yu & Luo, 2005). Other forces can be added *e.g.* due the influence of magnetic fields (Hou, Shukla, Piel & Miskovi´ ˘ c, 2009), ion drag (Ikkurthi et al., 2009), or external excitations (Durniak et al., 2009). External excitations applied to an equilibrated structure allow to investigate non-equilibrium dynamics (Donkó et al., 2010), and

Since Yukawa potential is purely repulsive, an additional confinement is needed, if free boundary conditions are used. The most common confining potential is parabolic (Durniak et al., 2009; Ma & Bhattacharjee, 2002; Nelissen et al., 2007), but other shapes have been used such as 4th order (Lai & I, 1999), 10th order (Durniak et al., 2010), soft-wall (Ott et al., 2008), and hard-wall (Klumov et al., 2009). Special shapes of confinement are used in order to obtain particular particle arrangements, *e.g.* a 2D annular potential to obtain rings (Sheridan, 2009a) and anisotropic potentials for elliptical clusters (Cândido et al., 1998), linear chains (Liu &

The Langevin dynamics method includes a stochastic force and a dynamic damping force into the equations of motion. The stochastic force does not depend on the particle momentum. It has a zero mean value and a Gaussian probability distribution with a correlation function:

�

where *ν* is the damping coefficient, *kB* is the Boltzmann constant, *T* the temperature of the system and the indices *i* and *j* are linked to particles. The damping force is often chosen to be equal to the gas drag force, which depends on the gas pressure, the kind of gas and the momentum exchange between the gas molecules and the grain surface (Epstein, 1924).

The code used to simulate the complex plasmas presented in this chapter is based on an objects-oriented multi-threaded programming. It is assumed that the microparticles comprising a complex plasma interact with each other via a Yukawa potential, their motion is damped by a neutral drag, and that they are confined in an external potential well with free boundaries. A Langevin force is used to study effects of finite temperature. The ions and electrons are not explicitly included in the model. We take into account the interaction of every microparticle with every other one (particle-particle code), thus there is no cut-off for the potential. The equations of grain motion are solved using a fifth order Runge Kutta method with the Cash Karp adaptive step size control (Press et al., 1992). This makes the code precise, simple and stable at the expense of computational efficiency. It is used to simulate a

wide range of dynamic phenomena in a system of several thousand particles.

*<sup>h</sup>*(**x***<sup>s</sup>* <sup>+</sup> **<sup>y</sup>***s*) + <sup>Ω</sup><sup>2</sup>

The model is based on the Newtonian equations of motion written for each microparticle:

*<sup>s</sup>* + **L***s*(*t*) + **f**

*<sup>v</sup>***z***s*]

*ext <sup>s</sup>* , **f** *f r*

*<sup>s</sup>* = −*mν***r**˙*<sup>s</sup>*

*<sup>U</sup>*0, *<sup>U</sup>*0(*rsj*) = *<sup>Q</sup>*2(4*π�*0*rsj*)−<sup>1</sup> exp(−*rsj*/*λD*) (2)

)� = 2*νmkBTδ*(*t* − *t*

�

)*δij* (1)

various dynamic phenomena (Jefferson et al., 2010).

Goree, 2005a), or flat disks (Durniak et al., 2010).

**3.3 Our simulation code**

*m***r**¨*<sup>s</sup>* = **f**

**f***int*

**f** *conf* *int <sup>s</sup>* + **f** *f r <sup>s</sup>* + **f** *conf*

*j*�=*s*

*<sup>s</sup>* = −∇ ∑

*<sup>s</sup>* <sup>=</sup> <sup>−</sup>*m*[Ω<sup>2</sup>

�*Li*(*t*)� = 0, �*Li*(*t*)*Lj*(*t*

where *m* is the particle mass, **r** = **x** + **y** + **z** is the particle coordinate (**z** = 0 for 2D) with the subscripts *s* and *j* denoting different particles, **f** *f r <sup>s</sup>* is the friction force due to collisions with neutrals, **f** *int <sup>s</sup>* is the grain-grain interaction force, **f** *ext <sup>s</sup>* is the external excitation force, **L***s*(*t*) is the Langevin force (1), *ν* is the damping rate, Ω*<sup>h</sup>* and Ω*<sup>v</sup>* are respectively the horizontal and vertical confinement parameters of the well (Ω*<sup>v</sup>* = ∞ in 2D case), *U*<sup>0</sup> is the Yukawa interaction potential, *λ<sup>D</sup>* is the Debye screening length, *Q* is the particle charge and *rsj* = |**r***<sup>s</sup>* − **r***j*| is the intergrain distance. The overdots denote time derivatives. *Q* and *λ<sup>D</sup>* are kept constant during the simulation. Dimensionless units are used: the lengths are normalised in terms of the screening length *λ<sup>D</sup>* and the time in terms of *t* = 4*π�*0*mλ*<sup>3</sup> *<sup>D</sup>*/*Q*2. The units are converted into the dimensional units after the simulation is completed, to facilitate comparison with the experiments. The code records the position, velocity, and potential energy of each particle at specified time steps.

After seeding the grains randomly, the code is run with the external and Langevin forces switched off until the equilibrium is reached and a monolayer crystal lattice or a solid 3D cluster is formed. The structural properties of the resulting crystals are characterised before they are utilised as inputs for simulations of dynamic phenomena. These simulations are performed by applying various excitation forces. A random (Langevin) force is used to simulate the thermal Brownian motions of particles and to obtain phonon spectra. Pulsed excitations are applied to investigate nonlinear waves and structural properties of complex plasmas.

We use the following parameters *<sup>m</sup>* <sup>=</sup> <sup>5</sup> <sup>×</sup> <sup>10</sup>−<sup>13</sup> kg, *<sup>λ</sup><sup>D</sup>* <sup>=</sup> 1 mm, *<sup>Q</sup>* <sup>=</sup> <sup>−</sup>16000*<sup>e</sup>* (where *<sup>e</sup>* is the electron charge) in all our simulation runs. Other parameters are listed in Table 1.

#### **3.4 Data and structural analysis**

The results of complex plasma simulations are analysed in order to determine their basic microscopic and macroscopic parameters, structural and dynamical properties using standard methods, which are also used for analysing the experimentally obtained particle tracks.

The local orientation of 2D crystalline cells is characterised using the local bond orientational order parameter for each lattice cell: *<sup>ψ</sup>*<sup>6</sup> <sup>=</sup> <sup>1</sup> *N N* ∑ *j*=1 <sup>e</sup>6*<sup>i</sup> <sup>θ</sup><sup>j</sup>* <sup>=</sup> <sup>|</sup>*ψ*6|e*<sup>i</sup> <sup>θ</sup>*<sup>6</sup> , *<sup>θ</sup>*<sup>6</sup> <sup>=</sup>

arctan[Im(*ψ*6)/Re(*ψ*6)], where *N* is the number of nearest neighbours, *θ<sup>j</sup>* is the angle between the *x*-axis and the bond connecting the central particle with its neighbour *j*. The average bond orientation angle *θ*<sup>6</sup> is used to highlight crystal grains separated by strings of defects as well as lattice deformations. The value of |*ψ*6| gives the local order parameter, which is equal to one for an ideal crystal.

Delauney triangulation of the lattice is used in order to find the nearest neighbours of each particle and determine their numbers. An ideal hexagonal lattice would have particles with 6 nearest neighbours. Lattice defects are defined as lattice cells that have other numbers of neighbours, such as 5 or 7. Dislocations are pairs of 5- and 7- fold cells, they are also called penta-hepta defects and are characterised by their Burgers vectors, perpendicular to the axis formed by the two defective cells. Defects and dislocations play an important role in melting and plastic deformations (Durniak & Samsonov, 2011; Knapek et al., 2007).


Tableparticles, Ωparameter, is the initial crystal diameter, *Fex*0 is the amplitude of the excitation force *Fex*, *x*0 is the force offset with respect to the centre of the lattice, *w* is the width and *τ* is the duration of the force. Continuous randomly changing excitation force is used to simulate phonon either Gaussian spectra (case 1). Pulsed excitation force is applied on one (cases 3-5,10) or both (cases 2,6) sides of the lattice. Its spatial profile is *Fex* ∝ exp − (*x* + *x*0)2 2 or half-Gaussian, that is Gaussian for *x* ≤ | *x*0| and *Fex* = *const* for *x* > |*x*0|. The temporal

profile (cases 2-5,10) is a parabola (inverted and truncated at negative values) *Fex* ∝ 1 − 1 − *τ t* 2 for *t* ≤ 2*τ* and *Fex* = 0 otherwise. It is modified in case 6 to keep a constant value after the maximum is reached ( *Fex* = *const* for *t* ≥ *τ*). For cases 1-8 and 10, the excitation force is independent of *y*. The Mach cones are simulated using an anisotropic Gaussian excitation force (*x* − *x*0 − *vt*)2 *y*2

*w*

*Fex* ∝ exp − *wx* 2 − *wy* 2 , with *wx* = 0.5 mm, and *wy* = 1 mm, moving with a speed *v* = 50 mm/s. The amplitude of theexcitationforce *Fex*0isexpressedintermsoftheparameter4*π�*0*λ D*2 /*Q*2.

Table 1.

Ω*h* is the

 is the initial crystal

lattice,

spectra (case 1). Pulsed

either

profile (cases 2-5,10) is a parabola

It is modified

excitation

*Fex*

∝ exp

excitation

 force

*Fex*0 is

expressed

 in terms of the

parameter

 4*π�*0*λ*

*D*

2 /*Q*2. − (*x*

−

*x*0 − *vt*)2

*wx*

*wy*

2

2

−

, with

> *wx* = 0.5 mm, and

*wy* = 1 mm, moving with a speed *v* = 50 mm/s. The

*y*2  force is

independent

 of

*y*. The Mach cones are

 in case 6 to keep a constant

Gaussian

*Fex*

∝ exp

− (*x*

*w*2

(inverted

 and

 value after the

truncated

 at negative

maximum

simulated

 using an

 is reached (

*Fex*

=

anisotropic

Gaussian

excitation

 force

amplitude

 of the

*const* for *t* ≥

*τ*). For cases 1-8 and 10, the

 values)

*Fex*

∝ 1 −

1 −

2 for *t* ≤ 2*τ* and

*Fex*

= 0

otherwise.

*t*

*τ*

*w* is the width and

horizontal

confinement

diameter,

*τ* is the duration

excitation

 force is applied on one (cases 3-5,10) or both (cases 2,6) sides of the lattice. Its spatial profile is

+

*x*0)2

 or

half-Gaussian,

 that is

Gaussian

 for

*x* ≤ |

*x*0| and

*Fex*

=

*const* for

*x*

> |*x*0|. The temporal

 of the force. *Fex*0 is the

amplitude

 of the

excitation

Continuous

randomly

changing

excitation

 force is used to simulate

 phonon  force

*Fex*, *x*0 is the force offset with respect to the centre of the parameter

 [

Ω*z* =

Ω*h* for 3D

Parameter

 values used in our

10.

melting

> 2D

3000

3

2

simulations.

*D* is the

dimensionality

simulation

 (case 8)],

*ν* is the damping

> rate, *κ* is the

screening

parameter,

 of the

simulation

 (2D or 3D),

*N* is the number of particles, 0.575

41.5

8

17

2

0.185 Case

1.

phonon

spectrum

2.

soliton

collision

3.

4.

5.

defect

dynamics

6.

7.

8.

9.

Mach

cone

> 2D

3000

2

1

0.725

50.2

0.005

12

0.5

× 1


moving

anisotropic

Gaussian

half-Gaussian

parabola

 *v* = 50 mm/s clusters

3D

150

2

2

0.725

50.2





clusters

2D

3-150

2

2

0.725

50.2







plasticity

> 2D

3000

2

1

0.725

50.2

0.75

17

2

9.2

2×

half-Gaussian

half-parabola



2D

3000

2

1

0.575

50.2

1

12

2

0.185 shock

2D

3000

2

1

0.725

50.2

4.5

12

2

0.185 tsunami 2D

3000

2

1

0.725

50.2

1

12

3

0.185

half-Gaussian

half-Gaussian

half-Gaussian

parabola

parabola

parabola

2D

3000

0.5

1

1.325

93

3

17

2

0.185

2×

Gaussian

parabola

2D

3000

2

1

0.725

50.2

0.01




Effect

*D*

*N*

Ω*h*

(Hz)

(Hz)

*ν*

*κ*

(mm)

(arb.u.)

(mm)

(mm)

(s)

space

stochastic

> time (t)

stochastic

*Fex*0

*x*0

*w*

*τ*

Excitation

The number density of the lattice is computed using the Voronoi analysis (Voronoi, 1908). A Voronoi cell is defined as a set of points for which a given particle is the nearest. The local number density is proportional to the inverse area in 2D (volume in 3D) of a Voronoi cell. The compression factor of a lattice is calculated as the ratio of the number density *n* of a strained lattice to its unperturbed number density *n*0.

The kinetic temperature of a complex plasma is determined from the velocities of individual particles. The lattice is split into bins and the average bin velocity �**v**� is calculated. It is then subtracted from the speeds of all particles. The average kinetic energy *E* in the bin is determined using the mean square random velocity *<sup>E</sup>* <sup>=</sup> *<sup>m</sup>* <sup>2</sup> �(**<sup>v</sup>** − �**v**�) <sup>2</sup>�, where *<sup>m</sup>* is the particle mass. The kinetic temperature *<sup>T</sup>* is found from the relation *<sup>E</sup>* <sup>=</sup> *<sup>d</sup> kBT*, where *kB* is the

2 Boltzmann constant and *d* is the number of the degrees of freedom.

A correlation analysis is performed in order to assess the structure of complex plasmas. The pair correlation function *g*(*r*) gives the probability of finding a specific distance between two particles in the system relative to the probability of finding that distance in a completely random particle distribution of the same density (Crocker & Grier, 1996; Quinn et al., 1996). It measures the translational order of the lattice. For a perfect crystal at zero temperature *g*(*r*) is a series of *δ*-functions. At non-zero temperature peaks have finite widths and decaying amplitude. The position of the first peak of the pair correlation function is determined by the interparticle spacing and is often used to measure the average interparticle distance.

#### **4. Experimental verification**

Like any other simulations, numerical models of complex plasmas have to be verified by comparison with the experiments. The code should reproduce the same phenomena as observed in the experiment with similar quantitative characteristics.

#### **4.1 Laboratory complex plasmas**

The most frequently used method to obtain high quality complex plasmas in the laboratory is to add premanufactured monodisperse plastic microspheres to the plasma (Thomas et al., 1994). Since the crystalline state is most often the subject of interest, relatively large (∼ 10 *μ*m in diameter) particles are used. The reason for that is the lower boundary of the coupling parameter (Γ ∝ *Q*2/*T* 170) required for crystallisation (Sect. 2.1). The value of the kinetic temperature can not be lower than the temperature of the neutral gas, which is close to the room temperature. Thus the coupling parameter can be only increased by increasing the potential energy of the grains or their charge *Q*, which is a function of the grain size. The dependence of the charge on the particle size makes it also necessary to use monodisperse particles in order to make their charges, potential energies, and levitation heights identical.

Most ground-based experiments utilise a capacitively coupled radio-frequency (rf) gas discharge (Donkó et al., 2010; Durniak & Samsonov, 2011; Käding et al., 2008). The particles are suspended in the plasma sheath, where a strong electric field counteracts the gravity. They typically form between one (Nosenko & Zhdanov, 2009) and a few layers (Quinn & Goree, 2001) creating 2D and quasi-2D structures. In order to obtain 3D structures, gravity has to be compensated (Sect. 2.1). This can be done in a strong electric field such as in striations of a direct current glow discharge (Fortov et al., 2005b), using a thermophoretic force (Arp et al., 2004), or under microgravity conditions on parabolic flights or on board International Space Station (Konopka et al., 2005; Seurig et al., 2007). A Q-machine (Luo et al., 1999) has also been used to create weakly coupled complex plasmas using polydisperse particles.

Fig. 1. Experimental setup for ground-based complex plasma experiments. (a) Side view. A discharge is formed in a vacuum chamber. Micron-sized plastic spheres levitate in the plasma sheath above the rf electrode. (b) Oblique view. The particles are illuminated by a laser sheet and imaged with a top view video camera. They are excited electrostatically using the wires stretched across the electrode. (c) Experimental rig. The chamber (shown in the centre) has optical access from 5 sides. The optical system for illumination is shown on the left and the video camera on the top. (d) Close-up view of the experiment. The wires are positioned on both sides of the monolayer particle cloud. This setup has been used to verify complex plasma simulations.

#### **4.2 The experimental setup**

Our experiments are performed in a capacitively coupled rf discharge vacuum chamber as shown in Fig. 1. An argon flow (a few sccm) maintains a constant working gas pressure (1-2 Pa) in the chamber. An rf power is applied to the lower disc electrode, which is 20 cm in diameter. The chamber itself is the other grounded electrode. Due to different area of the electrodes and different mobility of ions and electrons, the powered electrode has a negative self-bias voltage, which helps to suspend the particles in the plasma sheath against the gravity. The particles used are monodisperse plastic microspheres 8.9 or 9.19 *μ*m in diameter. They are injected into the plasma through a particle dispenser, levitated in the plasma sheath, and confined radially by a rim on the outer edge of the electrode, forming a monolayer hexagonal lattice of approximately 6 cm in diameter. The particles are illuminated by a horizontal thin (0.2-0.3 mm) sheet of laser light and imaged by a top-view digital camera. Two parallel horizontal tungsten wires, both 0.1 mm in diameter are placed below the particles. Negative pulses applied to one or both wires excite compressional disturbances and deformations.

The experimental results are analysed by identifying the particle positions in all video frames using the intensity weighted moment method (Feng et al., 2007; Ivanov & Melzer, 2007). The particle velocities are calculated by tracking their positions from one frame to the next.

### **5. Results and discussion**

10 Will-be-set-by-IN-TECH

direct current glow discharge (Fortov et al., 2005b), using a thermophoretic force (Arp et al., 2004), or under microgravity conditions on parabolic flights or on board International Space Station (Konopka et al., 2005; Seurig et al., 2007). A Q-machine (Luo et al., 1999) has also been

side window

Fig. 1. Experimental setup for ground-based complex plasma experiments. (a) Side view. A discharge is formed in a vacuum chamber. Micron-sized plastic spheres levitate in the plasma sheath above the rf electrode. (b) Oblique view. The particles are illuminated by a laser sheet and imaged with a top view video camera. They are excited electrostatically using the wires stretched across the electrode. (c) Experimental rig. The chamber (shown in the centre) has optical access from 5 sides. The optical system for illumination is shown on the left and the video camera on the top. (d) Close-up view of the experiment. The wires are positioned on both sides of the monolayer particle cloud. This setup has been used to verify

Our experiments are performed in a capacitively coupled rf discharge vacuum chamber as shown in Fig. 1. An argon flow (a few sccm) maintains a constant working gas pressure (1-2 Pa) in the chamber. An rf power is applied to the lower disc electrode, which is 20 cm in diameter. The chamber itself is the other grounded electrode. Due to different area of the electrodes and different mobility of ions and electrons, the powered electrode has a negative self-bias voltage, which helps to suspend the particles in the plasma sheath against the gravity. The particles used are monodisperse plastic microspheres 8.9 or 9.19 *μ*m in diameter. They are injected into the plasma through a particle dispenser, levitated in the plasma sheath, and confined radially by a rim on the outer edge of the electrode, forming a monolayer hexagonal lattice of approximately 6 cm in diameter. The particles are illuminated by a horizontal thin (0.2-0.3 mm) sheet of laser light and imaged by a top-view digital camera. Two parallel horizontal tungsten wires, both 0.1 mm in diameter are placed below the particles. Negative pulses applied to one or both wires excite compressional disturbances and deformations.

(c)

(d)

used to create weakly coupled complex plasmas using polydisperse particles.

70 mm

particles

rf electrode

wire

2D particle cloud laser sheet

camera video

200 mm rf electrode

top window

wire

complex plasma simulations.

**4.2 The experimental setup**

plasma

side window

(b)

(a)

### **5.1 Structure of complex plasmas**

Complex plasmas are formed into different shapes by the confinement potential (see Sect. 3.2). Parameters of the confinement should be selected to ensure that stable structures are formed. The structures of interest include linear 1D chains, monolayer 2D lattices, and 3D balls. Different parameter values result in formation of zigzag lines instead of chains (Sheridan, 2009a), or multiple layers instead of monolayers. The only structure stable against shear perturbations in crystalline monolayers is hexagonal (Durniak et al., 2010). Several different structures exist in bilayer lattices (Donkó & Kalman, 2001) and transition between them is controlled by the layer separation. As the separation increases, the structure changes from hexagonal to staggered square and then to staggered rhombic.

Fig. 2. Crystalline structures commonly observed in 3D complex plasmas: (a) hexagonal close packed (hcp), (b) body centred cubic (bcc), and (c) face centred cubic (fcc).

More crystal structures are observed in multilayer and 3D strongly coupled systems. Face centred cubic (fcc), body centred cubic (bcc), and hexagonal close packed (hcp), illustrated in Fig. 2, have been identified in (Hamaguchi et al., 1997; Klumov et al., 2009). Phase diagrams of the phase transitions between liquid, bcc and fcc phases have been simulated by (Hamaguchi, 1999).

### **5.2 Linear and nonlinear waves**

Complex plasmas sustain waves, which are analogous to those in ordinary solids and liquids. Small amplitude linear waves as well as nonlinear waves, solitons and shocks are observed. Most wave experiments in crystalline complex plasmas are performed in monolayers, since they are easy to obtain at low gas pressure. This results in very low damping and therefore underdamped wave motion can be studied in great details.

### **5.2.1 Small amplitude waves**

Complex plasmas can be in the solid or liquid state (Sect. 2.1). Crystalline monolayer complex plasmas sustain acoustic compressional and shear wave modes, as shown experimentally and numerically in (Donkó et al., 2008; Zhdanov et al., 2003). The phonon spectra are isotropic for long wavelength phonons, but strongly anisotropic in the short wavelength case. The

compressional mode has a higher frequency than the shear mode. As the complex plasma crystal melts, the shear mode disappears, while a dust thermal mode appears (Nunomura et al., 2005). Brownian simulation has been used to calculate the cut-off wavenumber for the shear mode in a liquid state and to compare the wave spectra in 2D complex plasma solids and liquids with those from different analytical theories (Hou, Miskovi´ ˘ c, Piel & Murillo, 2009). It confirmed the existence of the dust thermal mode first observed by (Nunomura et al., 2005). Lattices formed of particles with two different sizes exhibit optical compressional and shear modes. Simulation of bilayers revealed the existence of the optical modes and verified the existence of the *k* = 0 energy (frequency) gap (Hartmann et al., 2009).

The method to compute phonon spectra from either experimental or simulated data relies on the Fourier transform of particle velocities **v**(*x*, *t*) both in time *t* and in the *x*-direction (Nunomura et al., 2002; Zhdanov et al., 2003):

$$\mathbf{V}(k,\omega) = \frac{1}{L\_o T\_o} \sum\_{m=0}^{M} \sum\_{n=0}^{N} \mathbf{v}(\mathbf{x}\_{mn}, t\_{mn}) \exp[i(\mathbf{k}\mathbf{x}\_{mn} + \omega t\_{mn})]$$

where *k* and *ω* are the wave number and the frequency, *N* and *M* are the numbers of data points in space and time respectively, *Lo* is the length of the field of view and *To* is the recording period. The longitudinal and transverse modes are resolved by using *vx* and *vy* components of the velocity respectively. The resulting phonon spectra are shown in Fig. 3. They are obtained using a MD simulation (Table 1, case 1) with a stochastic force. The theoretical dispersion relations (Donkó et al., 2008; Hou, Miskovi´ ˘ c, Piel & Murillo, 2009) are calculated for a perfect hexagonal lattice by solving the eigenvalue problem for the dynamical matrix *Dμν*:

$$\frac{1}{2}||\omega^2(\mathbf{k},\boldsymbol{\varrho}) - D\_{\mu\nu}(\mathbf{k})|| = 0, \quad D\_{\mu\nu}(\mathbf{k}) = \frac{1}{m} \sum\_{\mathbf{j}} \frac{\partial^2 U\_0(r\_{\mathbf{j}})}{\partial\_{\mu}\partial\_{\nu}} [1 - \cos(\mathbf{k} \cdot \mathbf{r}\_{\mathbf{j}})]\_{\boldsymbol{\rho}}$$

where the summation is performed over all particles *j* with coordinates **r***<sup>j</sup>* and mass *m*; *U*0(*rj*) is the Yukawa potential (Eq. 2), and coordinates *μ*, *ν* take the values {*x*, *y*}. The spectra depend on the propagation angle or the direction of **k**. The longitudinal *L* and transverse *T* modes are given by:

$$
\omega\_{L,T}^2(k,\varphi) = \frac{1}{2} \left[ D\_{xx} + D\_{yy} \pm \sqrt{(D\_{xx} - D\_{yy})^2 + D\_{xy}^2} \right]. \tag{3}
$$

In the limit of long wavelengths the dispersion relations (Eq. 3) become *ωL*,*<sup>T</sup>* = *cL*,*Tk*, where *cL*,*<sup>T</sup>* are the longitudinal and transverse wave speeds, which depend on the particle charge and the screening parameter. This can be used to determine the values of *Q* and *κ* in experiments.

#### **5.2.2 Nonlinear pulses and solitons**

Pulsed excitation applied to a lattice with a laser or a biased wire results in a localised propagating disturbance. This disturbance can be compressional (Nosenko et al., 2002; Samsonov et al., 2002) or shear (Nunomura et al., 2000). If it complies with the Korteveg-de Vries (KdV) equation it is called a KdV soliton. The properties of these solitons include conservation of the soliton parameter (*AL*<sup>2</sup> = *const*, where *A* is the soliton's amplitude and *L*

compressional mode has a higher frequency than the shear mode. As the complex plasma crystal melts, the shear mode disappears, while a dust thermal mode appears (Nunomura et al., 2005). Brownian simulation has been used to calculate the cut-off wavenumber for the shear mode in a liquid state and to compare the wave spectra in 2D complex plasma solids and liquids with those from different analytical theories (Hou, Miskovi´ ˘ c, Piel & Murillo, 2009). It confirmed the existence of the dust thermal mode first observed by (Nunomura et al., 2005). Lattices formed of particles with two different sizes exhibit optical compressional and shear modes. Simulation of bilayers revealed the existence of the optical modes and verified the

The method to compute phonon spectra from either experimental or simulated data relies on the Fourier transform of particle velocities **v**(*x*, *t*) both in time *t* and in the *x*-direction

where *k* and *ω* are the wave number and the frequency, *N* and *M* are the numbers of data points in space and time respectively, *Lo* is the length of the field of view and *To* is the recording period. The longitudinal and transverse modes are resolved by using *vx* and *vy* components of the velocity respectively. The resulting phonon spectra are shown in Fig. 3. They are obtained using a MD simulation (Table 1, case 1) with a stochastic force. The theoretical dispersion relations (Donkó et al., 2008; Hou, Miskovi´ ˘ c, Piel & Murillo, 2009) are calculated for a perfect hexagonal lattice by solving the eigenvalue problem for the dynamical

where the summation is performed over all particles *j* with coordinates **r***<sup>j</sup>* and mass *m*; *U*0(*rj*) is the Yukawa potential (Eq. 2), and coordinates *μ*, *ν* take the values {*x*, *y*}. The spectra depend on the propagation angle or the direction of **k**. The longitudinal *L* and transverse *T* modes are

In the limit of long wavelengths the dispersion relations (Eq. 3) become *ωL*,*<sup>T</sup>* = *cL*,*Tk*, where *cL*,*<sup>T</sup>* are the longitudinal and transverse wave speeds, which depend on the particle charge and the screening parameter. This can be used to determine the values of *Q* and *κ* in experiments.

Pulsed excitation applied to a lattice with a laser or a biased wire results in a localised propagating disturbance. This disturbance can be compressional (Nosenko et al., 2002; Samsonov et al., 2002) or shear (Nunomura et al., 2000). If it complies with the Korteveg-de Vries (KdV) equation it is called a KdV soliton. The properties of these solitons include conservation of the soliton parameter (*AL*<sup>2</sup> = *const*, where *A* is the soliton's amplitude and *L*

*Dxx* + *Dyy* ±

**v**(**x***mn*, *tmn*) exp[*i*(**kx***mn* + *ωtmn*)]

*<sup>m</sup>* ∑ *j*

*∂*2*U*0(*rj*) *∂μ∂ν*

(*Dxx* <sup>−</sup> *Dyy*)<sup>2</sup> <sup>+</sup> *<sup>D</sup>*<sup>2</sup>

[1 − cos(**k** · **r***j*)],

*xy* 

. (3)

existence of the *k* = 0 energy (frequency) gap (Hartmann et al., 2009).

*M* ∑ *m*=0

*N* ∑ *n*=0

(Nunomura et al., 2002; Zhdanov et al., 2003):

*ω*2

**5.2.2 Nonlinear pulses and solitons**

*<sup>L</sup>*,*T*(*k*, *<sup>ϕ</sup>*) = <sup>1</sup>

matrix *Dμν*:

given by:

**<sup>V</sup>**(*k*, *<sup>ω</sup>*) = <sup>1</sup>

*LoTo*


2 

Fig. 3. Phonon spectra in a hexagonal crystalline complex plasma monolayer. (a) Longitudinal and (b) transverse wave modes. The colour scale indicates the normalised spectral power ||*V*(*k*, *ω*)||. The solid lines correspond to the theoretical dispersion relations of longitudinal (a) and transverse (b) modes for different propagation directions varying from 0◦ to 30◦ in increments of 5◦. The dashed lines show complementary transverse (a) and longitudinal (b) modes.

Fig. 4. Collision of counter-propagating solitons. (a) Number density of interacting solitons *vs* distance and time. The collision point corresponds to the peak of number density. (b) Number density versus distance at different times t=0.5 s (before the collision), t=0.7 s (at the collision point), and t=0.9 s (after the collision).

is its width) and a relation between the propagation speed and the amplitude (faster solitons have larger amplitudes). This has been observed in a 2D experiment and a linear chain simulation (Samsonov et al., 2002). Interactions between nonlinear waves is another subject of interest. Experimental and numerical investigation of collisions of counter-propagating solitons in complex plasma monolayers find that solitons with larger amplitude experience larger delays and that the amplitude at the collision point is different from the sum of the initial soliton amplitudes (Harvey et al., 2010). Figure 4 shows a head-on collision of two solitons simulated with the parameters of Table 1, case 2. The amplitudes of the pulses slightly decrease due to the neutral damping as they propagate. The amplitude of the overlapping solitons is lower than the sum of the initial amplitudes.

Waves can gain amplitude due to nonlinear effects even in the presence of damping. A soliton propagating in a lattice with decreasing number density gains amplitude as observed in simulation (Table 1, case 3) and experiment (Fig. 5). It is found that the measured amplitude gain is higher than that predicted by the KdV equation with damping included (Durniak et al., 2009).

Fig. 5. "Tsunami" effect observed in an inhomogeneous complex plasma lattice. The particle velocity *vx* along the wave propagation direction is shown as a function of time and distance in (a) simulation and (b) experiment. The amplitude of the nonlinear wave increases as the number density decreases at later times even in the presence of a damping force. The wave trajectories curve upward as their speeds decrease.

#### **5.3 Shock waves**

Shock waves are propagating discontinuities arising from large amplitude perturbations, therefore they cannot be treated as small amplitude waves. Shocks can cause phase transitions and present a challenge for simulations since they are likely to cause numerical instabilities. Experimental observations of shock waves in complex plasmas were reported in (Fortov et al., 2005b; Samsonov et al., 2003; 2004). The structure of a simulated shock (Table 1, case 4) is shown in Figure 6. The shock front has a thickness of a few interparticle distances and an oscillatory structure. It propagates from left to right at a velocity decreasing from 57 to 46 mm/s or a Mach number varying from 1.9 to 1.6. The lattice, initially in the solid state, is melted after the shock. There is a discontinuous jump at the shock front in compression factor, number density, kinetic temperature, and defect fraction (Durniak et al., 2010; Samsonov & Morfill, 2008).

Fig. 6. Simulated shock wave in a monolayer complex plasma visualised as (a) particle positions, (b) Voronoi map with 5-fold defects marked by � and 7-fold by , and (c) velocity vector map. The oblique propagation is due to the alignment of the crystal lattice. The discontinuity at the shock front is illustrated by (d) compression factor, (e) flow velocity, and (f) kinetic temperature plots computed for a cropped lattice (−12 ≤ *x* ≤ 25 mm and |*y*| ≤ 2.5 mm) at time *t* = 0.54 s. The oscillatory shock structure is clearly visible.

#### **5.4 Defect dynamics and plastic deformation**

14 Will-be-set-by-IN-TECH

Fig. 5. "Tsunami" effect observed in an inhomogeneous complex plasma lattice. The particle velocity *vx* along the wave propagation direction is shown as a function of time and distance in (a) simulation and (b) experiment. The amplitude of the nonlinear wave increases as the number density decreases at later times even in the presence of a damping force. The wave

Shock waves are propagating discontinuities arising from large amplitude perturbations, therefore they cannot be treated as small amplitude waves. Shocks can cause phase transitions and present a challenge for simulations since they are likely to cause numerical instabilities. Experimental observations of shock waves in complex plasmas were reported in (Fortov et al., 2005b; Samsonov et al., 2003; 2004). The structure of a simulated shock (Table 1, case 4) is shown in Figure 6. The shock front has a thickness of a few interparticle distances and an oscillatory structure. It propagates from left to right at a velocity decreasing from 57 to 46 mm/s or a Mach number varying from 1.9 to 1.6. The lattice, initially in the solid state, is melted after the shock. There is a discontinuous jump at the shock front in compression factor, number density, kinetic temperature, and defect fraction (Durniak et al., 2010; Samsonov &

5 mm 100 mm/s

Fig. 6. Simulated shock wave in a monolayer complex plasma visualised as (a) particle positions, (b) Voronoi map with 5-fold defects marked by � and 7-fold by , and (c) velocity vector map. The oblique propagation is due to the alignment of the crystal lattice. The discontinuity at the shock front is illustrated by (d) compression factor, (e) flow velocity, and

(f) kinetic temperature plots computed for a cropped lattice (−12 ≤ *x* ≤ 25 mm and |*y*| ≤ 2.5 mm) at time *t* = 0.54 s. The oscillatory shock structure is clearly visible.

0.0 0.2 0.4 0.6 0.8 1.0

> 0 5 10 15 20 25 30 35 distance x (mm)

(d)

(e)

(f)

0 15 30

10-6 10-3 10 <sup>0</sup> T (eV)

> -10 0 10 20 distance (mm)

vx (mm/s)


0

10

particle velocity vx (mm/s)

20

particle velocity vx (mm/s)

(a) (b)

time (s)


trajectories curve upward as their speeds decrease.

0.0 0.2 0.4 0.6 0.8 1.0

time (s)

**5.3 Shock waves**

Morfill, 2008).

(a)

(b)

(c)

Lattice defects and dislocations determine mechanical properties of crystals and may be responsible for material fatigue and catastrophic failure. The effect of temperature on defects in 2D Coulomb clusters has been studied numerically by (Nelissen et al., 2007). It was found that the defect mobility strongly depends on the neighbouring defects, that the geometrical defects have different dynamics than the topological defects, and that a fast cooling rate favours formation of a non-equilibrium glass-like state with many defects. Dislocations have been observed to propagate in crystalline complex plasmas supersonically (Nosenko et al., 2007). They interact with nonlinear waves (Durniak & Samsonov, 2010) and they are generated during shear slips in plastic deformations (Durniak & Samsonov, 2011).

Fig. 7. Interaction of defects with large amplitude waves. (a) Simulated trajectories of isolated dislocations were placed with their initial Burgers vectors **b** pointing left (in −**e***<sup>x</sup>* direction). The direction of the wave excitation force **F***ex* (and thus the wave propagation direction) is shown with the same colour as the corresponding wave trajectories. The dislocations move either in small increments due to elastic deformations of the lattice, or in large jumps due to lattice structural changes. One can see that the defect jumps occur in directions almost parallel to the Burgers vector regardless of the wave propagation direction. The crystal lattice (b) is initially characterised by a screening parameter *κ* = 0.725. The defects are marked by (7-fold defect) and (5-fold defect) and the Burgers vector by the pink arrow.

The interaction of a dislocation with a wave is simulated using parameters listed in Table 1, case 5. Figure 7a shows the trajectories of an isolated dislocation as the wave passes by from different directions (runs 1-4). The excitation wave has an amplitude between 4.8 and 6.1 mm/s in all runs and propagates at about 38 mm/s. The dislocation either stays at the same lattice site, or jumps to a neighbouring pair of particles. In the former case it displaces roughly in the direction of the wave, while in the latter case it moves almost parallel to its Burgers vector. This result agrees with the experiment (Durniak & Samsonov, 2010).

Plastic deformation under a uniaxial compression is numerically modelled using the conditions of Table 1, case 6. As the strain increases and exceeds the elastic limit, shear slips occur causing stress relaxation. This happens because a uniaxial compression is a superposition of a uniform compression and shear. Complex plasmas are very compressible and do not change their structure under a uniform compression. However their shear strength is not very high, thus their structural failure results in a shear slip. Shear slips are initialised by generation of a pair of dislocations, which move in opposite directions (Durniak & Samsonov, 2011).

Fig. 8. Plastic deformation of a crystal lattice under a slow uniaxial compression. (a) Time evolution of the crystal width and of the fraction of defective lattice cells (those with other than a 6-fold symmetry). (b) Time evolution of the particle trajectories shows that the local stress is relaxed by a shear slip (trajectories twist at time 2 s). The crystal defects are marked in blue (7-fold defect) and green (5-fold defect). Only the particles along the slip line are shown. Voronoi maps (c-e) with defects marked by (5-fold) and (7-fold) visualise the lattice structure at (c) t=1.5 s, (d) t=2.04 s, and (e) t=2.3 s. The colour scale shows the bond orientation angle *θ*6. A pair of dislocations is generated at t=2.04 s, they separate and move in opposite directions as the slip progresses. Velocity vector maps show the particle velocities at (f) t=1.5 s, (g) t=2.04 s, and (h) t=2.3 s. The colour scale indicates the angle between the velocity vector and the horizontal axis. The slip is shown as particle rows moving in opposite directions.

#### **5.5 Coulomb (Yukawa) clusters**

Coulomb (or Yukawa) clusters are systems of up to a few hundred charged particles confined in a 2D or 3D well and interacting via a Coulomb (or Yukawa) potential. They occur in systems of trapped electrons, ions, colloids, and complex plasmas. Since they comprise a small number

0.07 0.08 0.09 0.1 0.11

(a)

Defect fraction

Fig. 8. Plastic deformation of a crystal lattice under a slow uniaxial compression. (a) Time evolution of the crystal width and of the fraction of defective lattice cells (those with other than a 6-fold symmetry). (b) Time evolution of the particle trajectories shows that the local stress is relaxed by a shear slip (trajectories twist at time 2 s). The crystal defects are marked in blue (7-fold defect) and green (5-fold defect). Only the particles along the slip line are shown. Voronoi maps (c-e) with defects marked by (5-fold) and (7-fold) visualise the lattice structure at (c) t=1.5 s, (d) t=2.04 s, and (e) t=2.3 s. The colour scale shows the bond orientation angle *θ*6. A pair of dislocations is generated at t=2.04 s, they separate and move in opposite directions as the slip progresses. Velocity vector maps show the particle velocities at (f) t=1.5 s, (g) t=2.04 s, and (h) t=2.3 s. The colour scale indicates the angle between the velocity vector and the horizontal axis. The slip is shown as particle rows moving in opposite

Coulomb (or Yukawa) clusters are systems of up to a few hundred charged particles confined in a 2D or 3D well and interacting via a Coulomb (or Yukawa) potential. They occur in systems of trapped electrons, ions, colloids, and complex plasmas. Since they comprise a small number

10 13 16

x (mm)

t (s)

2 mm 2 mm/s

(h)

(f) (g)

(c) (d) (e)

Time (s)

0 5 10 15 20

Width (mm)

directions.

**5.5 Coulomb (Yukawa) clusters**



<sup>19</sup> -6 -3 <sup>0</sup> <sup>3</sup>

y (mm)

 6 (rad)

(b)

v (rad)

of particles, they are easy to model numerically (Nelissen et al., 2007). Properties of clusters are different from those of the bulk material and they depend on the cluster size (Yurtsever et al., 2005), the interaction and confinement potentials. The emergence of bulk behaviour in a strongly coupled Yukawa cluster has been studied in (Sheridan, 2007). Clusters are often simulated using the Monte Carlo method, however we will focus here on the MD technique.

Figure 9 shows the structure of 2D and 3D clusters simulated using the parameters of Table 1, cases 7 (2D) and 8 (3D). The particles in these clusters interact via a Yukawa potential and are confined in an isotropic parabolic potential. The structure of the clusters results from the interplay between the particle-particle interaction and the global confinement. The interaction potential favours hexagonal order whereas the confinement induces a circular symmetry. Thus large clusters have a hexagonal inner core and circular outer shells, while small ones tend to contain only shells (Lai & I, 1999). The same effect is observed in 3D clusters (Arp et al., 2004; Totsuji et al., 2002). Clusters assume particularly stable configurations at certain "magic" numbers of particles (Tsuruta & Ichimaru, 1993). The effect of the screening length on the structure of Coulomb balls has been studied in (Bonitz et al., 2006; Käding et al., 2008): particles moved from the outer shells to the inner ones as the screening length increased.

Metastable states of 3D Yukawa clusters can occur with a significantly higher probability than the ground state. The results strongly depend on the screening parameter and the damping coefficient. Slow cooling favours the ground state over the metastable ones (Kählert et al., 2008). The effects of anisotropic confinement and interaction potentials have been studied by (Killer et al., 2011). The structure of spherical clusters was found to be unaffected by a weak ion focus unlike the structure of elongated clusters.

Since clusters have a finite size, they have a finite number of normal modes of oscillations. Knowing the normal modes of a system allows to determine its response to external excitations, *i.e.* its dynamics. Clusters of *N* particles in a harmonic potential well have 2*N* normal modes. Oscillations of clusters are not purely compressional or shear, however they can be described as compression-like or shear-like (Melzer, 2003). It was shown that in asymmetric potentials both the rotational and breathing modes of elliptical clusters were robust (Sheridan et al., 2007). Melting and defect excitation in Coulomb clusters have been also investigated (Lai & I, 2001; Nelissen et al., 2007).

#### **5.6 Mach cones**

Mach cones are propagating V-shaped disturbances. Their existence in a dusty plasma of Saturn rings was first predicted by (Havnes et al., 1995). They were then observed in a monolayer complex plasma (Samsonov et al., 1999), where they were generated by fast particles moving parallel to the main layer. They had a multiple V-shaped structure formed by compressional waves. The properties of Mach cones were confirmed using laser excitation (Melzer et al., 2000). The opening angle of Mach cone obeys the relation *μ* = sin−1(1/*M*), *M* = *v*/*c* being the Mach number of an object moving at speed *v* through a medium with an acoustic speed *c*. Measurements of this angle can therefore be used as a diagnostic tool for complex plasmas to detect inhomogeneity (Zhdanov et al., 2004). Since there are two acoustic wave modes in plasma crystals, compressional and shear (see Section 5.2.1), two types of Mach cones exist. They are distinguished by the particle motion in the cone front. The particles move

Fig. 9. Simulated Coulomb clusters in (a) 2D and (b,c) 3D. The red lines correspond to the particle bonds calculated by a Delauney triangulation, which reveals the static structure of the clusters. The numbers of particles *N* in the 2D clusters are shown in (a). The 3D cluster (b) has been been generated using 150 particles. Its shell structure is visualised in (c) by plotting the particle positions in cylindrical coordinates (*r* = *x*<sup>2</sup> + *y*2, *z*). In (b) the colour scale corresponds to the particle's vertical position *z*.

perpendicular to the cone front in compressional Mach cones and parallel to the front in shear Mach cones (Ma & Bhattacharjee, 2002; Nosenko et al., 2003).

Fig. 10. Simulated Mach cone (wake). The velocity vector map (a) shows particle positions and speeds at *t* = 0.55 s. The Mach cone consists of a shear and a compressional cone. The shear cone is generated by the shear lattice wave and the compressional cone by the longitudinal wave. Their structures are revealed by (b) the vorticity ∇ × **v** and (c) the divergence ∇ · **v** maps respectively.

Figure 10 shows a numerically generated Mach cone with parameters listed in Table 1 case 9. The excitation force has been chosen similar to that reported by (Ma & Bhattacharjee, 2002; Nosenko et al., 2003). We visualise the cones by plotting the velocity vector map (Fig. 10a) and also the vorticity and divergence maps of simulated particle positions, which highlight the shear (Fig. 10b) and compressional (Fig. 10c) cones respectively. The compressional cone has a multiple structure while the shear cone is single. This is due to the fact that the shear wave is slower and thus it can propagate a shorter distance than the compressional before being damped. The angles of the inner and outer compressional cones might be slightly different due to nonlinear effects (Samsonov et al., 2000).

#### **5.7 Phase transitions**

18 Will-be-set-by-IN-TECH

N=12

2 mm

N=150

2 mm

Fig. 9. Simulated Coulomb clusters in (a) 2D and (b,c) 3D. The red lines correspond to the particle bonds calculated by a Delauney triangulation, which reveals the static structure of the clusters. The numbers of particles *N* in the 2D clusters are shown in (a). The 3D cluster (b) has been been generated using 150 particles. Its shell structure is visualised in (c) by plotting the particle positions in cylindrical coordinates (*r* = *x*<sup>2</sup> + *y*2, *z*). In (b) the colour

perpendicular to the cone front in compressional Mach cones and parallel to the front in shear

5 mm

Fig. 10. Simulated Mach cone (wake). The velocity vector map (a) shows particle positions and speeds at *t* = 0.55 s. The Mach cone consists of a shear and a compressional cone. The shear cone is generated by the shear lattice wave and the compressional cone by the longitudinal wave. Their structures are revealed by (b) the vorticity ∇ × **v** and (c) the

Figure 10 shows a numerically generated Mach cone with parameters listed in Table 1 case 9. The excitation force has been chosen similar to that reported by (Ma & Bhattacharjee, 2002;

2 mm

scale corresponds to the particle's vertical position *z*.

5 mm 0.02 mm/s

divergence ∇ · **v** maps respectively.

(a)

Mach cones (Ma & Bhattacharjee, 2002; Nosenko et al., 2003).

(b)

N=100 2 mm

N=25

1 mm

N=3 (a)

N=50 2 mm





0.0

(curl(v))z (s-1)

(c)

0.01 x

5 mm


0.0

div(v) (s-1)

> x 0.01

0.5

0.5

1.0

0

z (mm)

5

(c)

0

y (mm)

5

(b)


01234567 r (mm)


z (mm)

MD simulations of charged grains in plasmas have been used to study phase transitions before this became possible experimentally (Farouki & Hamaguchi, 1992). Solid and liquid phases have been predicted as well as a hysteresis at the transition between them. This hysteresis corresponds to a superheated solid and supercooled liquid. Solid superheating was later observed experimentally (Feng et al., 2008). Phase diagrams of Yukawa systems have been computed in (Robbins et al., 1988) predicting liquid as well as solid fcc and bcc structures. Simulations show that strongly screened Yukawa systems have a triple point or a point on the phase diagram where liquid, bcc, and fcc phases coexist, whereas weakly screened Yukawa systems do not have a triple point (Hamaguchi et al., 1997). In experiments, phase transitions can be induced by stochastic laser heating, shear flows (Nosenko & Zhdanov, 2009), shock waves (Knapek et al., 2007) or by changing the discharge power (Rubin-Zuzic et al., 2006). In the latter case a propagating crystallisation front has been observed. It was shown that this process is fundamentally 3D (Klumov et al., 2006). Complex plasma recrystallisation has been simulated by (Hartmann et al., 2010). It was found that the sizes of crystal domains have a power-law time dependence.

Figure 11 shows melting and recrystallisation of a complex plasma lattice excited by a shock wave. The simulation parameters are given in Table 1 case 10. As the shock propagates, the kinetic temperature increases and defects are generated (fig. 11a)). The pair correlation function calculated at different times indicates that the order in the system decreases during melting and then increases reaching almost the initial level at 6 s as the lattice recrystallises.

#### **5.8 Transport phenomena**

Transport phenomena are irreversible statistical processes which are responsible for transfer of mass, momentum, or energy in matter. These processes use similar mathematical formalisms and are described by similar equations. The three most commonly considered transport phenomena are diffusion (mass transfer), heat conduction (energy transfer), and viscosity (momentum transfer). Their fundamental nature makes them very important for understanding basic properties of matter. Complex plasmas offer a possibility to study these processes at the level of individual particles and compare experiments directly to MD simulations.

Two simulation techniques are used to study transport phenomena: the equilibrium and nonequilibrium methods (Donkó & Hartmann, 2008). The first method calculates the particle trajectories of a system in a state of statistical equilibrium (Vaulina et al., 2008). The second method applies a perturbation to an equilibrated system and measures the changes it causes

Fig. 11. Crystal melting and recrystallisation caused by a shock wave applied to the lattice. (a) Time evolution of the kinetic temperature and defect fraction. As the lattice crystallises their values return to the initial ones. (b) Pair correlation function at different times. The lattice structure changes from solid (t=0 s), to liquid (t=0.5 and 2 s), and back to solid (t=6 s). Voronoi maps visualise the lattice structure at (c) t=0 s, (d) t=0.5 s, (e) t=2 s, and (f) t=6 s. The lattice defects are marked by (7-fold), (5-fold), and ∗ (other).

(Sanbonmatsu & Murillo, 2001). If the investigated system exhibits small deviations from the statistical equilibrium, which is the case for equilibrium simulations, the transport coefficients are found from the Green-Kubo relations (Donkó et al., 2009). Nonequilibrium simulations calculate the transport coefficients directly, by computing the diffusion coefficient from the mean square displacement (Ohta & Hamaguchi, 2000), viscosity coefficient from the velocity profile (Sanbonmatsu & Murillo, 2001), or the thermal conductivity from the temperature gradient (Hou & Piel, 2009). Both equilibrium and nonequilibrium methods might produce artifacts due to the finite size of the system or insufficient recording time, thus care should be taken. Another possible problem is the existence of the transport coefficients in particular systems. It was found that in a 2D Yukawa liquids the diffusion coefficient exists at high temperature and the viscosity coefficient at low temperature but not in the opposite limits (Donkó et al., 2009). The thermal conductivity did not appear to exist at high temperature and it could not be evaluated at low temperature due to computational limitations.

The diffusion is governed by the Fick's law: **J** = *D*∇*C*, where **J** is the diffusion flux, *D* is the diffusion coefficient, *C* is the molecular concentration. The diffusion coefficient determines the time dependence of the mean square displacement of all particles �|**r**(*t*) − **r**(0)| <sup>2</sup>� <sup>=</sup> <sup>4</sup>*Dtα*. If *α* = 1 the diffusion is normal, otherwise (*α* �= 1) it is anomalous: *α* > 1 corresponds to superdiffusion and *α* < 1 to subdiffusion. The temperature dependence of the diffusion coefficient for 3D Yukawa systems has been studied by (Ohta & Hamaguchi, 2000). It was

(c)

(d)

(e)

(f)

<sup>2</sup>� <sup>=</sup> <sup>4</sup>*Dtα*.

Defect fraction

0 0.1 0.2 0.3 0.4 0.5

(a)

(b)

r (mm) 5 mm

Fig. 11. Crystal melting and recrystallisation caused by a shock wave applied to the lattice. (a) Time evolution of the kinetic temperature and defect fraction. As the lattice crystallises their values return to the initial ones. (b) Pair correlation function at different times. The lattice structure changes from solid (t=0 s), to liquid (t=0.5 and 2 s), and back to solid (t=6 s). Voronoi maps visualise the lattice structure at (c) t=0 s, (d) t=0.5 s, (e) t=2 s, and (f) t=6 s. The

(Sanbonmatsu & Murillo, 2001). If the investigated system exhibits small deviations from the statistical equilibrium, which is the case for equilibrium simulations, the transport coefficients are found from the Green-Kubo relations (Donkó et al., 2009). Nonequilibrium simulations calculate the transport coefficients directly, by computing the diffusion coefficient from the mean square displacement (Ohta & Hamaguchi, 2000), viscosity coefficient from the velocity profile (Sanbonmatsu & Murillo, 2001), or the thermal conductivity from the temperature gradient (Hou & Piel, 2009). Both equilibrium and nonequilibrium methods might produce artifacts due to the finite size of the system or insufficient recording time, thus care should be taken. Another possible problem is the existence of the transport coefficients in particular systems. It was found that in a 2D Yukawa liquids the diffusion coefficient exists at high temperature and the viscosity coefficient at low temperature but not in the opposite limits (Donkó et al., 2009). The thermal conductivity did not appear to exist at high temperature and

The diffusion is governed by the Fick's law: **J** = *D*∇*C*, where **J** is the diffusion flux, *D* is the diffusion coefficient, *C* is the molecular concentration. The diffusion coefficient determines

If *α* = 1 the diffusion is normal, otherwise (*α* �= 1) it is anomalous: *α* > 1 corresponds to superdiffusion and *α* < 1 to subdiffusion. The temperature dependence of the diffusion coefficient for 3D Yukawa systems has been studied by (Ohta & Hamaguchi, 2000). It was

it could not be evaluated at low temperature due to computational limitations.

the time dependence of the mean square displacement of all particles �|**r**(*t*) − **r**(0)|

01 23 4 5 6 7 Time (s)

> t=0.0 s t=0.5 s t=2.0 s t=6.0 s

0 1 2 3 4 5

lattice defects are marked by (7-fold), (5-fold), and ∗ (other).

0

g(r)

250

500

Kinetic temperature (eV)

750

1000

found to be independent of the screening parameter. Different methods of computing the diffusion coefficient at short observation times have been compared by (Vaulina et al., 2008) and the required minimum observation time has been estimated. Anomalous diffusion has been reported in 2D Yukawa systems in some simulations (Hou, Piel & Shukla, 2009; Liu & Goree, 2007) as well as experiments (Juan & I, 1998), which contradicts some other numerical (Vaulina & Dranzhevski, 2006) and experimental results (Nunomura et al., 2006). As shown by (Ott & Bonitz, 2009b) the diffusion exponent depends critically on the neutral gas friction, which makes anomalous diffusion a transient effect in simulations. This certainly does not rule out superdiffusion in experiments, however it is possible that limited fields of view and insufficiently long observation times might obscure anomalous effects and the role of nonequilibrium states in the experimental dissipative-driven systems. The effect of coherent transport on the diffusion exponent, *e.g.* waves and flows is also difficult to rule out.

Shear viscosity in strongly coupled 3D Yukawa system has been studied by (Sanbonmatsu & Murillo, 2001). It was found that the viscosity coefficient has a minimum and that it is nonlocal, with the scale length consistent with the correlation length. The decay of the velocity profile deviates from the one predicted by the Navier-Stokes equation. The minimum of the viscosity coefficient is also observed in 2D Yukawa liquids (Liu & Goree, 2005b). The wavenumber-dependent viscosity, which characterises the viscous effects at different length scales has been computed by (Feng et al., 2011). They have also verified the accuracy of the Green-Kubo relation for static viscosity in the presence of damping. Dynamic shear viscosity has been studied experimentally and numerically by (Hartmann et al., 2011) and shown to exhibit strong frequency dependence. A shear-thinning effect has been demonstrated under static shear.

Heat conductivity in a 2D strongly coupled system has been simulated using nonequilibrium methods by (Hou & Piel, 2009). The results show that the heat conductivity coefficient depends on the damping rate and indicate that it might be temperature-dependent. The experimental results, however, show no temperature dependence within the experimental uncertainty (Nosenko et al., 2008; Nunomura et al., 2005a).

Since the Newton's law of viscosity, the Fourier's law of heat transfer, and the Fick's law of molecular diffusion are very similar, relations between the transport coefficients can be found. The Stokes-Einstein formula relates the diffusion and viscosity coefficients. It has been thoroughly tested in 3D liquids even at the molecular level. In 3D strongly coupled complex plasmas the Stokes-Einstein relation has been verified for a wide range of temperatures down to the solidification point (Donkó & Hartmann, 2008). However this relation is violated in 2D complex plasmas near the disordering transition, remaining valid at higher temperatures (Liu et al., 2006).

### **6. Conclusion**

Methods of molecular dynamics simulations of complex plasmas and their results have been reviewed in this chapter and illustrated by examples of such simulations. Complex plasmas belong to the soft matter class of materials; they are formed by mesoscopic and highly charged particles immersed in a plasma. They are underdamped due to their gaseous background and thus ideally suitable for studying dynamic effects at the particle level. Complex plasmas resemble colloids and some effects such as phase transitions, diffusion and shear flows can be observed in both. Some phenomena such as waves, shocks, kinetic temperature, and phonon heat transfer exist only in complex plasmas. The numerical results have been compared with experiments in order to make sure that simulated effects are observable in physical systems. Unfortunately due to space constraints, some effects not exclusive to complex plasmas were left out (such as lane formation) as well as some of those not extensively simulated (such as coupling of vertical and horizontal modes in monolayers). The authors hope that their choice of dynamic effects well illustrates the beauty and complexity of complex plasmas.

#### **7. References**


resemble colloids and some effects such as phase transitions, diffusion and shear flows can be observed in both. Some phenomena such as waves, shocks, kinetic temperature, and phonon heat transfer exist only in complex plasmas. The numerical results have been compared with experiments in order to make sure that simulated effects are observable in physical systems. Unfortunately due to space constraints, some effects not exclusive to complex plasmas were left out (such as lane formation) as well as some of those not extensively simulated (such as coupling of vertical and horizontal modes in monolayers). The authors hope that their choice

Aikin, A. C. & Pesnell, W. D. (1998). Uptake coefficient of charged aerosols - implications for

Allen, M. P. & Tildesley, D. J. (eds) (1987). *Computer simulations of liquids*, Oxford University

Arp, O., Block, D. & Piel, A. (2004). Dust Coulomb balls: Three-dimensional plasma crystals,

Bonitz, M., Block, D., Arp, O., Golubnychiy, V., Baumgartner, H., Ludwig, P., Piel, A. &

Boufendi, L., Jouanny, M. C., Kovacevic, E., Berndt, J. & Mikikian, M. (2011). Dusty plasma

Boulon, J., Sellegri, K., Venzac, H., Picard, D., Weingartner, E., Wehrle, G., Collaud Coen,

Cândido, L., Rino, J.-P., Studart, N. & Peeters, F. M. (1998). The structure and spectrum of the

Chen, Z., Yu, M. Y. & Luo, H. (2005). Molecular dynamics simulation of dust clusters in

Cho, J. Y. N., Alcala, C. M., Kelley, M. C. & Swartz, W. E. (1996). Further effects of

Chu, J. H. & I, L. (1994). Direct observation of Coulomb crystals and liquids in strongly

Couëdel, L., Zhdanov, S. K., Ivlev, A. V., Nosenko, V., Thomas, H. M. & Morfill, G. E. (2011).

Crocker, J. C. & Grier, D. G. (1996). Methods of digital video microscopy for colloidal studies,

Donkó, Z., Goree, J. & Hartmann, P. (2010). Viscoelastic response of Yukawa liquids, *Phys.*

Filinov, A. (2006). Structural properties of screened Coulomb balls, *Phys. Rev. Lett.*

M., Bütikofer, R., Flückiger, E., Baltensperger, U. & Laj, P. (2010). New particle formation and ultrafine charged aerosol climatology at a high altitude site in the Alps (Jungfraujoch, 3580 m a.s.l., Switzerland), *Atmos. Chem. Phys.* 10(19): 9333–9349. Bronold, F. X., Fehske, H., Kersten, H. & Deutsch, H. (2009). Towards a microscopic theory of

anisotropically confined two-dimensional Yukawa system, *J. Phys.: Condens. Matter.*

charged aerosols on summer mesospheric radar scatter, *J. Atmospher. Terrestr. Phys.*

Wave mode coupling due to plasma wakes in two-dimensional plasma crystals:

of dynamic effects well illustrates the beauty and complexity of complex plasmas.

atmospheric chemistry, *Geophys. Res. Lett.* 25(9): 1309–1312.

for nanotechnology, *J. Phys. D-Appl. Phys.* 44:174035.

particle charging, *Contrib. Plasma Phys.* 49(4-5): 303–315.

coupled rf dusty plasmas, *Phys. Rev. Lett.* 72(25): 4009–4012.

in-depth view, *Physics of plasmas* 18(8): 083707.

**7. References**

Press, New York.

96(7): 075001.

10: 11627–11644.

58(6): 661–672.

*Rev. E* 81(5): 056404.

plasmas, *Physica Scripta* 71: 638–643.

*J. Colloid Interface Sci.* 179: 298–310.

*Phys. Rev. Lett.* 93(16): 165004.


Hartmann, P., Douglass, A., Reyes, J. C., Matthews, L. S., Hyde, T. W., Kovács, A. & Donkó,

Hartmann, P., Sándor, M. C., Kovács, A. & Donkó, Z. (2011). Static and dynamic shear viscosity

Hartquist, T. W., Havnes, O. & Morfill, G. E. (2003). The effects of charged dust on Saturn's

Harvey, P., Durniak, C., Samsonov, D. & Morfill, G. (2010). Soliton interaction in a complex

Havnes, O., Aslaksen, T., Hartquist, T. W., Li, F., Melandsø, F., Morfill, G. E. & Nitter, T.

Hayashi, Y. & Tachibana, K. (1994). Observation of Coulomb-crystal formation from carbon particles grown in a methane plasma, *Jpn. J. Appl. Phys.* 33(6A): 804–806. Hou, L.-J., Shukla, P. K., Piel, A. & Miskovi´ ˘ c, Z. L. (2009). Wave spectra of two-dimensional

Hou, L.-J., Miskovi´ ˘ c, Z. L., Piel, A. & Murillo, S. (2009). Wave spectra of two-dimensional

Hou, L.-J., Miskovi´ ˘ c, Z., Piel, A. & Shukla, P. (2009). Brownian dynamics of charged particles

Hou, L.-J. & Piel, A. (2009). Heat conduction in 2D strongly coupled dusty plasmas, *J. Phys.*

Hou, L.-J., Piel, A. & Shukla, P. K. (2009). Self-diffusion in 2D dusty-plasma liquids:

Ikezi, H. (1986). Coulomb solid of small particles in plasmas, *Physics of Fluids* 29(6): 1764–1766. Ikkurthi, V. R., Matyash, K., Melzer, A. & Schneider, R. (2009). Computation of ion drag force

Ivanov, Y. & Melzer, A. (2007). Particle positioning techniques for dusty plasma experiments,

Jefferson, R. A., Cianciosa, M. & Thomas Jr., E. (2010). Simulations of one- and

Joyce, G., Lampe, M. & Ganguli, G. (2001). Particle simulation of dust structures in plasmas,

Juan, W.-T. & I, L. (1998). Anomalous diffusion in strongly coupled quasi-2D dusty plasmas,

Käding, S., Block, D., Melzer, A., Piel, A., Kählert, H., Ludwig, P. & Bonitz, M. (2008).

Kählert, H., , Ludwig, P., Baumgartner, H., Bonitz, M., Block, D., Käding, S., Melzer, A. & Piel,

Kennedy, R. V. & Allen, J. E. (2003). The floating potential of spherical probes and dust grains.

on a static spherical dust grain immersed in rf discharges, *Phys. Plasmas* 16(4): 043703.

two-dimensional complex plasmas using a modular, oject-oriented code, *Physics of*

Shell transitions between metastable states of Yukawa balls, *Physics of Plasmas*

A. (2008). Probability of metastable configurations in spherical three-dimensional

(1995). Probing the properties of planetary ring dust by the observation of Mach

Yukawa solids and liquids in the presence of a magnetic field, *Physics of Plasmas*

of a single-layer complex plasma, *Phys. Rev. E* 84(1): 016404.

cones, *Journal of Geophysical Research* 100(A2): 1731–1734.

dusty plasma solids and liquids, *Phys. Rev. E* 79(4): 046412.

in a constant magnetic field, *Physics of Plasmas* 16(5): 053705.

numerical-simulation results, *Phys. Rev. Lett.* 102(8): 085002.

*Review of Scientific Instruments* 78(3): 033506.

*IEEE Transactions in plasma science* 29(2): 238–246.

Yukawa crystals, *Phys. Rev. E* 78(3): 036408.

II: Orbital motion theory, *J. Plasma Phys.* 69(6): 485–506.

105(11): 115004.

16(7): 073704.

rings, *Astron. Geophys.* 44(5): 26–30.

plasma, *Phys. Rev. E* 81(5): 057401.

*A: Math. Theor.* 42(21): 214025.

*plasmas* 17(11): 113704.

15(7): 073710.

*Phys. Rev. Lett.* 80(14): 3073–3076.

Z. (2010). Crystallization dynamics of a single layer complex plasma, *Phys. Rev. Lett.*


Merlino, R. L. & Goree, J. A. (2004). Dusty plasmas in the laboratory, industry, and space,

Morfill, G. E. & Ivlev, A. V. (2009). Complex plasmas: an interdisciplinary research field, *Rev.*

Nelissen, K., Partoens, B. & Peeters, F. M. (2007). Dynamics of topological defects and the

Nosenko, V., Goree, J., Ma, Z. W., Dubin, D. H. E. & Piel, A. (2003). Compressional and shear wakes in a two-dimensional dusty plasma crystal, *Phys. Rev. E* 68(5): 056409. Nosenko, V., Nunomura, S. & Goree, J. (2002). Nonlinear compressional pulses in a 2D

Nosenko, V., Zhdanov, S., Ivlev, A. V., Morfill, G., Goree, J. & Piel, A. (2008). Heat transport

Nosenko, V. & Zhdanov, S. K. (2009). Dynamics of dislocations in a 2D plasma crystal, *Contrib.*

Nosenko, V., Zhdanov, S. & Morfill, G. (2007). Supersonic dislocations observed in a plasma

Nunomura, S., Goree, J., Hu, S., Wang, X., Bhattacharjee, A. & Avinash, K. (2002). Phonon

Nunomura, S., Samsonov, D. & Goree, J. (2000). Transverse waves in a two-dimensional screened-Coulomb crystal (dusty plasma), *Phys. Rev. Lett.* 84(22): 5141–5144. Nunomura, S., Samsonov, D., Zhdanov, S. & Morfill, G. (2006). Self-diffusion in a liquid

Nunomura, S., Samsonov, D., Zhdanov, S. & Morfill, G. E. (2005a). Heat transfer in a two-dimensional crystalline complex (dusty) plasma, *Phys. Rev. Lett.* 95(2): 025003. Nunomura, S., Zhdanov, S., Samsonov, D. & Morfill, G. E. (2005b). Wave spectra in solid and

Ohta, H. & Hamaguchi, S. (2000). Molecular dynamics evaluation of self-diffusion in Yukawa

Ott, T. & Bonitz, M. (2009a). Anomalous and Fickian diffusion in two-dimensional dusty

Ott, T. & Bonitz, M. (2009b). Is diffusion anomalous in two-dimensional Yukawa liquids?,

Ott, T., Bonitz, M., Donkó, Z. & Hartmann, P. (2008). Superdiffusion in quasi-two-dimensional

Ott, T., Stanley, M. & Bonitz, M. (2011). Non-invasive determination of the parameters of strongly coupled 2D Yukawa liquids, *Physics of Plasmas* 18(6): 063701. Press, W., Teukolsky, S., Vetterling, W. & Flannery, B. (1992). *Numerical recipes in C: the art of*

Quinn, R. A., Cui, C., Goree, J., Pieper, J. B., Thomas, H. & Morfill, G. E. (1996). Structural analysis of a Coulomb lattice in a dusty plasma, *Phys. Rev. E* 53(3): R2049. Quinn, R. A. & Goree, J. (2001). Experimental test of two-dimensional melting through

Rapaport, D. C. (1995). *The art of molecular dynamics simulation*, Cambridge University Press.

crystallized dusty plasma, *Phys. Rev. Lett.* 88(21): 215002.

spectrum in a plasma crystal, *Phys. Rev. Lett.* 89(3): 035001.

liquid complex (dusty) plasmas, *Phys. Rev. Lett.* 94(4): 045001.

*scientific computing*, second edn, Cambridge University Press.

disclination unbinding, *Phys. Rev. E* 64(5): 051404.

effects of the cooling rate on finite-size two-dimensional screened Coulomb clusters,

in a two-dimensional complex (dusty) plasma at melting conditions, *Phys. Rev. Lett.*

*Physics Today* pp. 1–7.

100(2): 025003.

*Mod Phys.* 81(4): 1353–1404.

*Europhysics Letters* 79: 66001.

*Plasma Phys.* 49(4-5): 191–198.

crystal, *Phys. Rev. Lett.* 99(2): 025002.

complex plasma, *Phys. Rev. Lett.* 96(1): 015003.

systems, *Physics of Plasmas* 7(11): 4506–4514.

Yukawa liquids, *Phys. Rev. E* 78(2): 026409.

*Phys. Rev. Lett.* 103(19): 195001.

plasmas, *Contrib. Plasma Phys.* 49(10): 760–764.


**Part 4** 

**Dynamics at the Interface** 

28 Will-be-set-by-IN-TECH

272 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Thomas, H. M., Morfill, G. E., Demmel, V., Goree, J., Feuerbacher, B. & Möhlmann, D.

Totsuji, H., Kishimoto, T., Totsuji, C. & Tsuruta, K. (2002). Competition between two forms of

Tsuruta, K. & Ichimaru, S. (1993). Binding energy, microstructure, and shell model of Coulomb

Vaulina, O., Khrapak, S. & Morfill, G. (2002). Universal scaling in complex (dusty) plasmas,

Vaulina, O. S., Adamovich, X. G., Petrov, O. F. & Fortov, V. E. (2008). Evolution of the

Vaulina, O. S. & Dranzhevski, I. E. (2006). Transport of macroparticles in dissipative

Vladimirov, S. V., Maiorov, S. A. & Ishihara, O. (2003). Molecular dynamics simulation of

Voronoi, G. (1908). Nouvelles applications des paramètres continus à la théorie des formes

Yurtsever, E., Calvo, F. & Wales, D. J. (2005). Finite-size effects in the dynamics and thermodynamics of two-dimensional Coulomb clusters, *Phys. Rev. E* 72(2): 026110. Zhdanov, S. K., Morfill, G. E., Samsonov, D., Zuzic, M. & Havnes, O. (2004). Origin of the curved nature of Mach cone wings in complex plasmas, *Phys. Rev. E* 69(2): 026407. Zhdanov, S., Nunomura, S., Samsonov, D. & Morfill, G. (2003). Polarization of wave modes

Whipple, E. C. (1981). Potentials of surfaces in space, *Rep. Prog. Phys.* 44: 1197–1250.

two-dimensional Yukawa systems, *Phys. Scr.* 73: 577–586.

quadratiques, *J. Reine Angew. Math.* 134: 198–287.

mass-transfer processes in nonideal dissipative systems. I. numerical simulation,

plasma flow around two stationary dust grains, *Physics of Plasmas* 10(10): 3867–3873.

in a two-dimensional hexagonal lattice using a complex (dusty) plasma, *Phys. Rev. E*

ordering in finite Coulomb clusters, *Phys. Rev. Lett.* 88(12): 125002.

73(5): 652–655.

clusters, *Phys. Rev. A* 48(2): 1339–1344.

*Phys. Rev. E* 66(1): 016404.

*Phys. Rev. E* 77(6): 066403.

68(3): 035401(R).

(1994). Plasma crystal: Coulomb crystallization in a dusty plasma, *Phys. Rev. Lett.*

## **Studies of Cardio Toxin Protein Adsorption on Mixed Self-Assembled Monolayers Using Molecular Dynamics Simulations**

Shih-Wei Hung1, Pai-Yi Hsiao1 and Ching-Chang Chieng2

*1Department of Engineering and System Science National Tsing Hua University, Hsinchu 2Department of Mechanical and Biomedical Engineering City University of Hong Kong, Kowloon 1Taiwan 2Hong Kong* 

### **1. Introduction**

To understand protein adsorption on a surface is very important in bio-related domains of technology and application such as biomaterials, implant biocompatibility, and biosensor technology. (Gray 2004) Self-assembled monolayers (SAMs) are excellent model surfaces for biology and biochemistry because they are stable, highly ordered, easy to prepare, and provide a wide range of organic functionality. (Love et al. 2005) Previous experimental studies (Ostuni et al. 2003; Prime & G.M. Whitesides 1991) have indicated that hydrophobic interactions are the major mechanism for protein adsorption on surfaces, and the dehydration of both the protein and hydrophobic SAMs provides an entropic driving force for protein adsorption. (Ostuni et al. 2003) Nonetheless, there are still many questions needed to be clarified and difficult to be investigated in experiments. Molecular dynamics (MD) simulation is a powerful tool that is able to investigate the atomic details of a molecular system. This paper summarizes the investigations of proteins adsorption on alkanethiol SAMs by means of MD simulations in order to expand the limited information that the experiments can provide. (Hung et al.2006; Hung et al.2010; Hung et al. 2011)

MD simulation has been applied to study the mechanisms of proteins adsorption on various SAMs and has provided valuable information. For example, Tobias *et al.* (Tobias et al. 1996) studied cytochrome-c (Cyt-c) covalently tethered to hydrophobic (methylterminated) and hydrophilic (thiol-terminated) SAMs. In their model, water molecules were not modeled explicitly. They found that Cyt-c was completely excluded from the hydrophobic SAM surface but partially dissolved on the hydrophilic SAM surface. In a follow-up study, Nordgren *et al.* (Nordgren et al. 2002) reported that the larger perturbation of Cyt-c structure occurred on hydrophilic SAM surfaces rather than on hydrophobic SAM surfaces using explicit water molecule model. They found that the protein molecule was surrounded by the water molecules, resulting in the reduction of the interaction between the protein and the surface. Zhou *et al.* (Zhou et al. 2004) investigated the orientation and conformation of Cyt-c on carboxyl-terminated SAM using a combined method of Monte Carlo (MC) and MD simulations. Their results showed that the preferable orientation of an adsorbed protein can be obtained by a strongly charged surface but the protein may lose its bioactivity due to the large conformational change. To understand surface resistance to protein adsorption, Jiang and his coworkers(Hower et al. 2006; Zheng et al. 2004; Zheng et al. 2005) studied lysozyme adsorption on various SAM surfaces with terminal methyl, hydroxyl, oligo (ethylene glycol) (OEG), mannitol, and sorbitol groups by a hybrid MC and MD simulation. They concluded that the resistance of protein adsorption to a surface was due to the tightly bound, structured water layer directly above the surface. Agashe *et al.* (Agashe et al. 2005) utilized MD simulations to investigate the adsorption of the γ-chain fragment of fibrinogen on SAM surfaces with five different terminal groups: methyl, hydroxyl, carboxyl, amino and OEG. The fibrinogen fragment did not show conformational rearrangements; rather it underwent rotational and translational motions until low-energy orientations were achieved. The above studies demonstrated that MD simulation is a useful and important tool to study protein adsorption on various SAM surfaces.

Not only the static properties, such as orientation and potential energy, but also the dynamic information about the process of CTX binding to the membrane are of interest. The dynamic biological process and the corresponding information, such as the structural change, adsorption force, interaction energy, and potential of mean force (PMF), can be investigated by steered molecular dynamics (SMD) simulation. (Isralewitz et al. 2001) The PMF is the equilibrium free energy difference along the reaction coordinate, which is an important thermodynamic quantity characterizing the dynamic process. (Kirkwood 1935) Calculation methods for PMF had been reviewed and classified as either equilibrium or non-equilibrium approaches.(Ytreberg, Swendsen, and Zuckerman 2006) The equilibrium approaches, for example, the umbrella sampling method, (Torrie & Valleau 1977) need large computer resources because they rely on fully sampled equilibrium simulations performed at each stage of the PMF calculation. The nonequilibrium approaches, for example, using Jarzynski's remarkable equality (Jarzynski 1997) from SMD simulations, has the potential to provide very rapid estimates of PMF. However, the use of Jarzynski's equality suffers from significant bias and error when the pulling velocity is too high or the number of trajectories sampled is insufficient.(Gore, Ritort, and Bustamante 2003)

CTX is a cytotoxic β-sheet basic polypeptide which is known to cause membrane leakage in many cells including human erythrocytes and phospholipid membrane vesicles.(Dufton and Hider 1988) The three-dimensional structures of various CTX homologues in both aqueous and micellar environments are available.(S C Sue et al. 2001; Dauplais et al. 1995) The interactions between CTX and lipid membranes have been widely studied by various methods, such as Fourier transform infrared spectroscopy, (Huang et al. 2003; Forouhar et al. 2003) nuclear magnetic resonance (NMR) spectroscopy, (Dubovskii et al. 2005) and computer simulation.(Levtsova et al. 2009; R G Efremov et al. 2004) CTXs do not easily adopt large conformational changes due to the existence of four disulfide bonds in their chemical structure. Therefore, they are good candidates in experimental study of protein adsorption on SAM surfaces. MD simulations can help, furthermore, in the understanding of the interaction of CTX-SAM system.

In our earlier study, (Hung et al. 2006) the binding energy of protein molecules to SAM surfaces of different mixing composition of alkanethiols chains was investigated. We found that the binding energy was enhanced due to the increasing of the hydrophobic area on the SAM surface. In that study, we focused on the enthalpic contribution of the hydrophobic interaction between a CTX molecule and SAM surfaces, and thus a solvent model of distance-dependent dielectric function was used (Ramstein & Lavery 1988) instead of explicit water molecules. However, the hydrophobic interaction is primarily driven by entropy (Chandler 2005) and it has been shown by many groups (Hower et al. 2006; Zheng et al. 2004; Zheng et al. 2005; Ostuni et al. 2003) that the water molecules on the protein-SAM interface play an important role on protein adsorption mechanism. In order to take into account both enthalpic and entropic components of hydrophobic interaction, an explicit solvent model is conducted to investigate and identify the complete physical mechanism.

Summmarizing our previous studies, MD simulations were performed to study the physical mechanism of CTX proteins adsorption on alkanethiol SAMs with different chain lengths. The dynamic information, such as structural changes and adsorption forces of CTX protein desorption from the SAM surface were investigated by means of SMD simulations. The dependence of the dynamic information on the pulling velocity was illustrated. The PMFs were calculated by the umbrella sampling method for better interpretation of the desorption process.

## **2. Model system and methodology**

### **2.1 Simulation model**

276 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

hydrophobic SAM surfaces using explicit water molecule model. They found that the protein molecule was surrounded by the water molecules, resulting in the reduction of the interaction between the protein and the surface. Zhou *et al.* (Zhou et al. 2004) investigated the orientation and conformation of Cyt-c on carboxyl-terminated SAM using a combined method of Monte Carlo (MC) and MD simulations. Their results showed that the preferable orientation of an adsorbed protein can be obtained by a strongly charged surface but the protein may lose its bioactivity due to the large conformational change. To understand surface resistance to protein adsorption, Jiang and his coworkers(Hower et al. 2006; Zheng et al. 2004; Zheng et al. 2005) studied lysozyme adsorption on various SAM surfaces with terminal methyl, hydroxyl, oligo (ethylene glycol) (OEG), mannitol, and sorbitol groups by a hybrid MC and MD simulation. They concluded that the resistance of protein adsorption to a surface was due to the tightly bound, structured water layer directly above the surface. Agashe *et al.* (Agashe et al. 2005) utilized MD simulations to investigate the adsorption of the γ-chain fragment of fibrinogen on SAM surfaces with five different terminal groups: methyl, hydroxyl, carboxyl, amino and OEG. The fibrinogen fragment did not show conformational rearrangements; rather it underwent rotational and translational motions until low-energy orientations were achieved. The above studies demonstrated that MD simulation is a useful and important tool to study

Not only the static properties, such as orientation and potential energy, but also the dynamic information about the process of CTX binding to the membrane are of interest. The dynamic biological process and the corresponding information, such as the structural change, adsorption force, interaction energy, and potential of mean force (PMF), can be investigated by steered molecular dynamics (SMD) simulation. (Isralewitz et al. 2001) The PMF is the equilibrium free energy difference along the reaction coordinate, which is an important thermodynamic quantity characterizing the dynamic process. (Kirkwood 1935) Calculation methods for PMF had been reviewed and classified as either equilibrium or non-equilibrium approaches.(Ytreberg, Swendsen, and Zuckerman 2006) The equilibrium approaches, for example, the umbrella sampling method, (Torrie & Valleau 1977) need large computer resources because they rely on fully sampled equilibrium simulations performed at each stage of the PMF calculation. The nonequilibrium approaches, for example, using Jarzynski's remarkable equality (Jarzynski 1997) from SMD simulations, has the potential to provide very rapid estimates of PMF. However, the use of Jarzynski's equality suffers from significant bias and error when the pulling velocity is too high or the number of trajectories sampled is insufficient.(Gore,

CTX is a cytotoxic β-sheet basic polypeptide which is known to cause membrane leakage in many cells including human erythrocytes and phospholipid membrane vesicles.(Dufton and Hider 1988) The three-dimensional structures of various CTX homologues in both aqueous and micellar environments are available.(S C Sue et al. 2001; Dauplais et al. 1995) The interactions between CTX and lipid membranes have been widely studied by various methods, such as Fourier transform infrared spectroscopy, (Huang et al. 2003; Forouhar et al. 2003) nuclear magnetic resonance (NMR) spectroscopy, (Dubovskii et al. 2005) and computer simulation.(Levtsova et al. 2009; R G Efremov et al. 2004) CTXs do not easily adopt large conformational changes due to the existence of four

protein adsorption on various SAM surfaces.

Ritort, and Bustamante 2003)

In order to study the adsorption of a CTX protein on the surfaces composing of mixed *alkanethiol SAM* growing on Au (111) substrate, two types of alkanethiol chains, S(CH2)5CH3 and S(CH2)9CH3, were composed and were denoted briefly by C5 and C9, respectively. The structure of the SAM is a ( 3 3 )*R*30° lattice on *x-y* plane with lattice constant equal to 0.499 nm. Each lattice point is occupied by an alkanethiol chain of either C5 or C9. Five mixing ratios were studied: χC9 = 0, 0.25, 0.5, 0.75 and 1, where χC9 is defined as NC9/(NC5+NC9) with NC5 and NC9 representing respectively the numbers of C5 and C9 chains in the SAMs. In our studies, there were 12 chains on each side of the simulation box. Periodic boundary condition was applied in *x* and *y* directions to simulate infinitely large surface of SAMs. The Au (111) substrate was modeled by a single layer of gold atoms.

The CTX protein was modeled by the nuclear magnetic resonance (NMR) structure of CTX A3 (PDB 1I02), (Sue et al. 2001) comprised of 60 amino acid residues. The interactions between CTX and lipid membrane had been well studied by Wu's group, (Huang et al. 2003; Forouhar et al. 2003) and they found that the characteristics topology of three hydrophobic fingers (Fig. 1) played a key role in binding to lipid membranes. Hence, the CTX protein was chosen as the model system to study the protein-SAM interaction, to identify the mechanism of protein adsorbing on a SAM surface, and to understand the role of the protein hydrophobicity on the adsorption in our studies.

Fig. 1. Illustration of CTX adsorbed on SAM/Au (111) surface where χC9 = 0.5 (C5 is plotted in cyan color, C9 in pink color and gold atoms in yellow color). Water molecules are plotted in red color and the counter ions are in orange spheres. C and N indicate the C-terminus and N-terminus of the CTX protein, respectively. The three major loops, denoted by L1, L2 and L3, of CTX are colored in red. (Hung et al.2010)

#### **2.2 Potential energy function**

The potential energy function used in MD simulation included several terms describing both internal (bonded) and external (nonbonded) interactions, which ultimately determined both structure and dynamics of a molecular system. The atomic interactions inside a CTX molecule were described by the GROMOS-96(43a2) force field. (Schuler and Wilfred Van Gunsteren 2000) This force field consisted of bonded interactions, including bond, angle, dihedral and improper dihedral angle terms, and non-bonded interactions, including van der Waals (vdW) and Coulomb interactions. The complete form of this force field read

$$\begin{split} \text{AL} &= \sum\_{\text{bonds}} \frac{1}{2} k\_b (r - r\_0)^2 + \sum\_{\text{angles}} \frac{1}{2} k\_\theta (\theta - \theta\_0)^2 + \sum\_{\text{dihedrals}} k\_\theta \left[ 1 + \cos \left( n\rho - \rho\_0 \right) \right] + \sum\_{\text{impropers}} k\_\xi \left\{ \tilde{\xi} - \tilde{\xi}\_0 \right\}^2 + \\ & \sum\_{\text{vdRV}} 4 \mathcal{E}\_{\tilde{\eta}} \left[ \left( \frac{\sigma\_{\tilde{\eta}}}{r\_{\tilde{\eta}}} \right)^{12} - \left( \frac{\sigma\_{\tilde{\eta}}}{r\_{\tilde{\eta}}} \right)^6 \right] + \sum\_{\text{Coulomb}} \frac{q\_i q\_j}{4 \pi \varepsilon\_0 r\_{\tilde{\eta}}} \end{split} \tag{1}$$

Here *kb*, *kθ*, *kφ*, and *k<sup>ξ</sup>* are the bond, angle, dihedral, and improper dihedral force constants, respectively, for the bonded interactions and *r*0, *θ*0, *φ*0, and *ξ*0 are the equilibrium values of bond length, bond angle, dihedral angle and improper dihedral angle, respectively. For the non-bonded interactions, *ε* is the Lennard-Jones well depth, *σ* is the distance at which the vdW interaction is zero, *q* is the charge, and *rij* is the distance between atoms *i* and *j*. The parameters of the vdW interaction for the cross interactions between atoms *i* and *j* were obtained by the geometry combination rule. The alkanethiol chains in SAMs were modeled by the united-atom model of Hautman and Klein model (Hautman & Klein 1989), given as

$$\Delta U = \sum\_{\text{angles}} \frac{1}{2} k\_{\theta} (\theta - \theta\_0)^2 + \sum\_{\text{dihedrals}} \sum\_{n=0}^{5} C\_{\text{rl}} \left[ \cos \left( \phi - 180^{\circ} \right) \right]^{\text{lt}} + \sum\_{\text{vdW}} 4 \varepsilon\_{\bar{ij}} \left| \left( \frac{\sigma\_{\bar{ij}}}{r\_{\bar{ij}}} \right)^{12} - \left( \frac{\sigma\_{\bar{ij}}}{r\_{\bar{ij}}} \right)^{6} \right| \tag{2}$$

In this model, the dihedral potential took the form of Ryckaert-Bellemans, (Ryckaert & Bellemans 1978) in which the potential was expanded in a series of cosine function of the dihedral angle φ with Cn = 9.28, 12.16, -13.12, -3.06, 26.24, -31.5 kJ/mole for n= 0,..., 5, respectively. The laterally interaction between SAM molecules and gold surface was described by the Lennard-Jones 12-3 potential. (Tupper & Brenner 1994)

$$\mathcal{U} = \sum\_{\text{SAM-Au}} 2.117 \mathcal{E}\_{\dot{\eta}} \left[ \left( \frac{\sigma\_{\dot{\eta}}}{r\_{\dot{\eta}}} \right)^{12} - \left( \frac{\sigma\_{\dot{\eta}}}{r\_{\dot{\eta}}} \right)^{3} \right] \tag{3}$$

The gold atoms were restricted on the positions of the lattice points in the substrate. Water molecules were explicitly considered and were modeled by extended simple point charge (SPC/E) model. (Berendsen et al. 1987)

The hydrophobic interaction was the main interaction between CTX and SAM. The enthalpic and entropic components of the hydrophobic interaction were correctly involved and treated once the water molecules were explicitly modeled in our studies. Since the SAM molecules in the current study carry no charge, the potential energy between SAM and the other molecules was vdW interaction and the parameters were determined by the geometrical combination rule.

### **2.3 Initial configuration**

278 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Fig. 1. Illustration of CTX adsorbed on SAM/Au (111) surface where χC9 = 0.5 (C5 is plotted in cyan color, C9 in pink color and gold atoms in yellow color). Water molecules are plotted in red color and the counter ions are in orange spheres. C and N indicate the C-terminus and N-terminus of the CTX protein, respectively. The three major loops, denoted by L1, L2 and

The potential energy function used in MD simulation included several terms describing both internal (bonded) and external (nonbonded) interactions, which ultimately determined both structure and dynamics of a molecular system. The atomic interactions inside a CTX molecule were described by the GROMOS-96(43a2) force field. (Schuler and Wilfred Van Gunsteren 2000) This force field consisted of bonded interactions, including bond, angle, dihedral and improper dihedral angle terms, and non-bonded interactions, including van der Waals (vdW) and Coulomb interactions. The complete form of this force field read

bonds angles dihedrals imporpers

*U kr r k k n k*

*q q*

4

<sup>5</sup> <sup>2</sup>

angles dihedrals 0 vdW <sup>1</sup> ( ) cos 180 4 <sup>2</sup>

 

 2 2 <sup>2</sup> 0 0 0 0

*r r*

*<sup>n</sup> ij ij <sup>n</sup> ij <sup>n</sup> ij ij*

   

12 6

  (2)

(1)

1 1 ( ) ( ) 1 cos ( ) 2 2

 

Here *kb*, *kθ*, *kφ*, and *k<sup>ξ</sup>* are the bond, angle, dihedral, and improper dihedral force constants, respectively, for the bonded interactions and *r*0, *θ*0, *φ*0, and *ξ*0 are the equilibrium values of bond length, bond angle, dihedral angle and improper dihedral angle, respectively. For the non-bonded interactions, *ε* is the Lennard-Jones well depth, *σ* is the distance at which the vdW interaction is zero, *q* is the charge, and *rij* is the distance between atoms *i* and *j*. The parameters of the vdW interaction for the cross interactions between atoms *i* and *j* were obtained by the geometry combination rule. The alkanethiol chains in SAMs were modeled by the united-atom model of Hautman and Klein model (Hautman & Klein 1989), given as

L3, of CTX are colored in red. (Hung et al.2010)

**2.2 Potential energy function** 

12 6

 

 

*b*

4

*ij*

vdW Coulombic 0

*ij ij i j*

 

*rr r*

*ij ij ij*

0

*U k C*

  It has been reported that phase separation can take place when SAM was composed of components of different terminal groups such as 3-mercaptopropanol and *n*tetradecanethiol, *n*-undecanethiol and 11-mercaptoundecanoic acid, 3-mercapto-*N*nonylpropionamide and *n*-decanethiol and so on. (Smith et al. 2004) However, for SAM composed of the components of similar terminal groups, phase separation, in general, did not happen. For example, Whitesides and coworkers had done a series of experimental studies on methyl-terminated mixed SAM systems (Laibinis et al. 1992; Folkers et al. 1992; Bain & Whitesides 1989) and shown that chain length difference in the SAM did not render the systems into the formation of macroscopic islands. Using atomic force microscope imaging, other group (Tamada et al. 1997) also reported no phase separation in the mixed SAMs composed of S(CH2)3CH3 and S(CH2)17CH3. A recent review summarized again that molecules of similar composition did not phase separate in a SAM system formed from solutions at room temperature. (Smith et al. 2004) The results indicated that the mixed SAMs were not macroscopically phase separated according to Folkers *et al.* (Folkers et al. 1992) Based upon the above information, a homogenous mixture of the simulated SAM systems was assumed and the SAM surfaces were generated by randomly placing C5 and C9 molecules on the gold substrate in the present simulations.

The initial configuration of the CTX protein was prepared in the way with the three CTX finger loops facing the SAM surface. This orientation had been confirmed as the most favorable orientation of CTX proteins binding to membrane by experiments from Wu's group. (Forouhar et al. 2003; Huang et al. 2003) Efremov *et al.* (Efremov et al. 2004) and Lomize *et al.* (Lomize et al. 2006) obtained the same result by using MC method and transfer energy minimization method respectively. Therefore, the initial configuration with the three finger loops of the CTX facing down was set to facilitate the adsorption of the system to the truly favorable orientation for the present simulations.

### **2.4 Simulation procedure**

After generating the initial configurations, the systems were solvated in a bath of water molecules with a density of 1 g/cm3. One Na+ ion and ten Cl- ions were added into the simulation box to maintain the electro-neutrality of the systems. The simulation box was a rectangular parallelepiped of 5.99 × 5.19 × 10.00 nm3 with periodic boundary conditions applied in the *x* and *y* directions. The *z* direction was restricted by a wall (cf. Fig.1). The velocities were initially assigned to each atom with a Maxwell-Bolzmann distribution at 50 K. The system was then gradually heated to 300 K in a period of 400 ps to initially relax the water molecules around the protein and the SAM surface.

The annealing process (Kirkpatrick et al. 1983) was then performed to overcome local minima and search the global minimum of energy landscape of the interaction between CTX and SAM. The process was started by an initial heating stage in which the system was heated from 300 to 350 K. The temperature was then maintained at 350 K for 400 ps, followed by a slow cooling at a rate of 0.1 K/ps to 300 K. 300 K was then maintained for 4 ns until the interaction energy between CTX and SAM reached a constant value. The simulations were performed in canonical ensemble with the integrating time step equal to 1.0 fs. The temperature was controlled by Berendsen thermostat (Berendsen et al. 1984) with the time constant equal to 0.1 ps in the annealing process and then the temperature was controlled by a Nose-Hoover thermostat. (Hoover 1985; Nose 1984) Because bond vibration is very fast, all the covalent bonds were constrained by the LINCS algorithm. (Hess et al. 1997) For the nonbonded interactions, the cutoff distance was chosen to be 1.5 nm for the vdW interaction. The Coulomb interaction was cut at 2.5 nm by a force-shifting function. It has been demonstrated (Steinbach & Brooks 1994) that the cutoff method for the electrostatic interaction can correctly model the dynamics of biomolecules in solutions. The twin-range approach in neighbor searching (Wilfred F. van Gunsteren and Herman J. C. Berendsen 1990) was performed, with short range distance equal to 1.5 nm and long range distance equal to 2.5 nm. Data were saved every 1.0 ps for analysis. All the simulations were performed using the program GROMACS. (Van Der Spoel et al. 2005) The simulation snapshots were plotted using the software PyMOL. (DeLano 2002)

#### **2.5 Steered molecular dynamic simulations**

In our study (Hung et al.2011), SMD simulated the pulling process of CTX adsorbed on the SAM surface by applying an external force to the CTX molecule, and it monitored the adsorption force and structural change of the CTX molecule during the desorption process. Before the pulling process started, the model system was in equilibrium, and the CTX molecule was adsorbed onto the SAM surface. The equilibrium state was kept at 300 K for an additional 3 ns. The starting configurations of the model system for SMD simulations were generated by extracting configurations from the last nanosecond of the equilibrium stage.

In the pulling process during the SMD simulation, external forces were applied to the CTX through the center of mass to pull the CTX off of the SAM surface with a constant velocity perpendicular to the SAM surface (i.e., *z* direction in Fig. 1). The adsorption force *f* at time *t* can be represented by the following equation

$$f(t) = k[(Z\_{\rm CTX,0} + vt) - Z\_{\rm CTX}(t)]\tag{4}$$

where *k* is the force constant, *v* is the pulling velocity and *ZCTX*(*t*) and *ZCTX,0* are the *z* coordinates of CTX at time *t* and initial time 0, respectively. In our study, we defined *Z* as the distance from gold substrate, i.e., *Z*=0 is the position of the gold.

The stiff-spring approximation (Park & Schulten 2004; Park et al. 2003) with a large force constant was applied to minimize the fluctuation of reaction coordinate among different trajectories. The approximation was valid only if the force constant was sufficiently larger than the curvature of the energy landscape at its minimum, which was approximately the maximum of the second derivative of PMF profile. (Hummer & Szabo 2010) However, the fluctuations of applied force were related to *k* through (*kBTk*)1/2 ,where *kB* was the Boltzmann's constant and *T* was the absolute temperature, and thus the force constant cannot be arbitrarily large. (Balsera et al. 1997) In our study, the force constant was chosen to be 1209 kBT/nm2, which was four times as large as the maximum of the second derivative of PMF profile calculated using the umbrella sampling method, to ensure the validity of the stiff-spring approximation. The total traveling distance was up to 2 nm starting from *ZCTX*=2.0 nm, which was the equilibrium position of CTX protein adsorbed on the SAM surface, to *ZCTX*=4.0 nm, where the separation distance was far enough that there was no interaction between the CTX protein and SAM surface.

### **2.6 PMF calculation**

280 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

After generating the initial configurations, the systems were solvated in a bath of water molecules with a density of 1 g/cm3. One Na+ ion and ten Cl- ions were added into the simulation box to maintain the electro-neutrality of the systems. The simulation box was a rectangular parallelepiped of 5.99 × 5.19 × 10.00 nm3 with periodic boundary conditions applied in the *x* and *y* directions. The *z* direction was restricted by a wall (cf. Fig.1). The velocities were initially assigned to each atom with a Maxwell-Bolzmann distribution at 50 K. The system was then gradually heated to 300 K in a period of 400 ps to initially relax the

The annealing process (Kirkpatrick et al. 1983) was then performed to overcome local minima and search the global minimum of energy landscape of the interaction between CTX and SAM. The process was started by an initial heating stage in which the system was heated from 300 to 350 K. The temperature was then maintained at 350 K for 400 ps, followed by a slow cooling at a rate of 0.1 K/ps to 300 K. 300 K was then maintained for 4 ns until the interaction energy between CTX and SAM reached a constant value. The simulations were performed in canonical ensemble with the integrating time step equal to 1.0 fs. The temperature was controlled by Berendsen thermostat (Berendsen et al. 1984) with the time constant equal to 0.1 ps in the annealing process and then the temperature was controlled by a Nose-Hoover thermostat. (Hoover 1985; Nose 1984) Because bond vibration is very fast, all the covalent bonds were constrained by the LINCS algorithm. (Hess et al. 1997) For the nonbonded interactions, the cutoff distance was chosen to be 1.5 nm for the vdW interaction. The Coulomb interaction was cut at 2.5 nm by a force-shifting function. It has been demonstrated (Steinbach & Brooks 1994) that the cutoff method for the electrostatic interaction can correctly model the dynamics of biomolecules in solutions. The twin-range approach in neighbor searching (Wilfred F. van Gunsteren and Herman J. C. Berendsen 1990) was performed, with short range distance equal to 1.5 nm and long range distance equal to 2.5 nm. Data were saved every 1.0 ps for analysis. All the simulations were performed using the program GROMACS. (Van Der Spoel et al. 2005) The simulation

In our study (Hung et al.2011), SMD simulated the pulling process of CTX adsorbed on the SAM surface by applying an external force to the CTX molecule, and it monitored the adsorption force and structural change of the CTX molecule during the desorption process. Before the pulling process started, the model system was in equilibrium, and the CTX molecule was adsorbed onto the SAM surface. The equilibrium state was kept at 300 K for an additional 3 ns. The starting configurations of the model system for SMD simulations were generated by extracting configurations from the last nanosecond of the equilibrium stage.

In the pulling process during the SMD simulation, external forces were applied to the CTX through the center of mass to pull the CTX off of the SAM surface with a constant velocity perpendicular to the SAM surface (i.e., *z* direction in Fig. 1). The adsorption force *f* at time *t*

,0 ( ) [( ) ( )] *CTX CTX f t k Z vt Z t* (4)

**2.4 Simulation procedure** 

water molecules around the protein and the SAM surface.

snapshots were plotted using the software PyMOL. (DeLano 2002)

**2.5 Steered molecular dynamic simulations** 

can be represented by the following equation

PMF is a potential along a reaction coordinate, the gradient of which yields the negative of the average force acting on the targeted molecule over all the configurations at a given place. Physically, PMF represents the difference of free energy, *F*, along the coordinate of reaction. In statistical physics, the difference *F* between two coordinates *z* and *z0*, can be calculated by the ratio of the two configurational integrals as follow:

$$\Delta F = F(z) - F(z\_0) = -\frac{1}{\beta} \ln \frac{\int d^{3N-1} \operatorname{Re} \mathbf{exp} [-\beta \mathcal{U} \mathcal{l}(z, \mathbf{R})]}{\int d^{3N-1} \operatorname{Re} \mathbf{exp} [-\beta \mathcal{U} \mathcal{l}(z\_0, \mathbf{R})]} \tag{5}$$

where *z* is the reaction coordinate, *z0* is the reference position, **R** denotes the remaining 3*N*-1 coordinates, *U*(z,**R**) is the system potential, and β=1/*kBT* with *kB* being the Boltzmann's constant and *T* the absolute temperature.

In the circumstance of equilibrium MD simulations without applying external potentials, the regions with large PMF are not easily explored because the large difference of free energy hinders the access of the target molecule into the regions. It is hence difficult to calculate numerically the configurational integral over such regions with good accuracy. To overcome the problem, calculation methods for PMF had been devolped and classified as either equilibrium or non-equilibrium approaches.(Ytreberg et al. 2006) The equilibrium approaches, i.e. the umbrella sampling method, (Torrie & Valleau 1977) and the nonequilibrium approaches, i.e. using Jarzynski's remarkable equality (Jarzynski 1997) from SMD simulations. Further comparisons and details of these two approaches are given in our previous study. (Hung et al. 2011)

#### **3. Results and discussions**

The results and discussions were summarized in two parts. In the first part, the influence of different solvent models on the systems was investigated at first. The results demonstrated that to include water molecules explicitly was crucial in the study of the protein adsorption. Therefore, the explicit water model was implemented to study the protein adsorption on SAMs of different mixing ratios (χC9). The binding energies were obtained by means of MD simulations. The binding energy was studied by calculating the nonbonded interactions of the protein with the SAM surface. The CTX/SAM contact area and the structure of SAM molecules were examined to investigate the binding enhancement of the CTX protein adsorbing onto a SAM surface. Other physical mechanisms were discussed in our previous study. (Hung et al. 2010) In the second part, the dynamic information including the structural changes, adsorption forces a CTX protein desorbed from the pure C5 SAM surface were monitored in the SMD simulation. The thermodynamic information, PMF, was calculated using the umbrella sampling method for better interpretation of the CTX desorption process.

#### **3.1 Static properties of CTX adsorption**

#### **3.1.1 Comparison of results using different solvent models**

There have been many alternative solvent models proposed to minimize the requirement of computer resources in simulating water. (Smith & Pettitt 1994) For example, the distancedependent dielectric function was a model widely used in many studies. To demonstrate the importance of the role of water in the adsorption of CTX on SAM, we performed the simulations using the distance-dependent dielectric model to simulate water environment. In this model, solvent molecules were not explicitly treated but implicitly considered together as a dielectric continuum. The dielectric constant was set to 1 within a cut-off distance (2.5 nm), and 78 beyond the cut-off distance. The binding energy of CTX on SAM calculated from the distance-dependent dielectric model showed the similar trend of behavior as from the explicit water model, but the error bar of data was larger for the distance-dependent dielectric model (Fig. 2). Therefore, the CTX protein in the distancedependent dielectric model displayed a different conformation, compared to in the explicit water model. Fig. 3 showed snapshots of an equilibrium conformation of the CTX protein on the χC9 = 0.5 SAM surface obtained from the two models. We observed that the CTX protein was almost lying flat on the SAM surface for the distance-dependent dielectric model. The orientation was different from the one by the explicit water model. Thus, higher binding

Fig. 2. Binding energy of CTX to SAM surfaces of different mixing ratios in the distancedependent dielectric model and in the explicit solvent model. (Hung et al. 2010)

that to include water molecules explicitly was crucial in the study of the protein adsorption. Therefore, the explicit water model was implemented to study the protein adsorption on SAMs of different mixing ratios (χC9). The binding energies were obtained by means of MD simulations. The binding energy was studied by calculating the nonbonded interactions of the protein with the SAM surface. The CTX/SAM contact area and the structure of SAM molecules were examined to investigate the binding enhancement of the CTX protein adsorbing onto a SAM surface. Other physical mechanisms were discussed in our previous study. (Hung et al. 2010) In the second part, the dynamic information including the structural changes, adsorption forces a CTX protein desorbed from the pure C5 SAM surface were monitored in the SMD simulation. The thermodynamic information, PMF, was calculated using the umbrella sampling method for better interpretation of the CTX

There have been many alternative solvent models proposed to minimize the requirement of computer resources in simulating water. (Smith & Pettitt 1994) For example, the distancedependent dielectric function was a model widely used in many studies. To demonstrate the importance of the role of water in the adsorption of CTX on SAM, we performed the simulations using the distance-dependent dielectric model to simulate water environment. In this model, solvent molecules were not explicitly treated but implicitly considered together as a dielectric continuum. The dielectric constant was set to 1 within a cut-off distance (2.5 nm), and 78 beyond the cut-off distance. The binding energy of CTX on SAM calculated from the distance-dependent dielectric model showed the similar trend of behavior as from the explicit water model, but the error bar of data was larger for the distance-dependent dielectric model (Fig. 2). Therefore, the CTX protein in the distancedependent dielectric model displayed a different conformation, compared to in the explicit water model. Fig. 3 showed snapshots of an equilibrium conformation of the CTX protein on the χC9 = 0.5 SAM surface obtained from the two models. We observed that the CTX protein was almost lying flat on the SAM surface for the distance-dependent dielectric model. The orientation was different from the one by the explicit water model. Thus, higher binding

Fig. 2. Binding energy of CTX to SAM surfaces of different mixing ratios in the distance-

dependent dielectric model and in the explicit solvent model. (Hung et al. 2010)

desorption process.

**3.1 Static properties of CTX adsorption** 

**3.1.1 Comparison of results using different solvent models** 

energy was obtained for the former model due to the larger contact area. In the distancedependent dielectric model, there was no water molecule on the CTX-SAM interface to prevent direct contact of CTX and SAM and hence the interaction between them was so strong that the conformation of the CTX protein was largely deformed and the orientation was not in agreement with experimental observations. With explicit water molecules, the entropic component of the hydrophobic effect was considered and thus the hydrophobic effect induced some kinds of ordering in the surrounding water. More precisely, the hydrophilic residues of CTX were solvated by the water molecules via the hydrogen bonding. As a consequence, the CTX molecule was adsorbed on the SAM surface with specific orientation. The results demonstrated that to include water molecules explicitly in the model was crucial in the study of the protein adsorption. The binding mechanisms and protein conformation can be correctly identified only when it was considered.

Fig. 3. Snapshots of CTX on χC9 = 0.5 SAM surface using different solvent models: (a) distance-dependent dielectric model; (b) explicit solvent model. (Hung et al. 2010)

### **3.1.2 Binding energy of a CTX protein on SAM surface**

The affinity of a CTX protein on SAM surface was of interest. This affinity can be quantified by calculating the binding energy between CTX and SAM, which was the sum of all the non-bonded interactions between them. Fig. 4 showed a maximum in the middle. The binding energy of the CTX protein to the pure C5 and the energy to the pure C9 SAM surfaces were similar to each other but significantly smaller than that to a mixed SAM surface. The enhancement of the binding energy on a mixed SAM surface can be as large as 34% when χC9 = 0.5.

### **3.1.3 Physical mechanisms of CTX adsorption**

The enhancement of the binding energy can be explained by the equilibrium configurations of CTX landing on SAM surfaces composing of different mixing ratios as shown in Fig. 5. In the pure C5 and C9 SAM systems, the surface roughness is small. The CTX protein landed stably on the flat surface and the conformation of the CTX looked similarly to each other. On the other hand, the surface roughness was increased in the mixed SAM systems. The threefinger loops of the CTX can penetrate into the region between C5 and C9 molecules, especially when χC9 = 0.25 and 0.5, resulting in the increase of the CTX-SAM contact area.

Fig. 4. Binding energy (read from left) of CTX adsorded on SAM surfaces of different mixing ratios, χC9. The normalized binding energy (read from right) is calculated by dividing the binding energy by the value on the C5 SAM surface. (Hung et al. 2010)

The CTX-SAM contact area and the structure of SAMs were described in the following sections.

Fig. 5. Snapshots of CTX on (a) χC9 = 0, (b) χC9 = 0.25, (c) χC9 = 0.5, (d) χC9 = 0.75, and (e) χC9 = 1 SAM surface. Water molecules are not shown in the figures to provide clear illustrations of the protein configurations. (Hung et al. 2010)

It was commonly accepted that the free-energy change of protein from water to membrane was proportional to the change of area contacting with the surrounding water area of protein. (White & Wimley 1994; Reynolds et al. 1974; Eisenberg & McLachlan 1986) This idea was extended to our system and verified if the binding energy of CTX on SAM also followed a linear relation against the CTX-SAM contact area. The area was calculated by the method of double cubic lattice (Eisenhaber et al. 1995) with a probe radius equal to 0.14 nm. The result of the binding energy versus the contact area was plotted in Fig. 6. It showed that the binding energy satisfied a linear equation with the surface area of CTX in contact with the SAM surface. The slope of the linear equation was 36.05 kJ/molnm2. Therefore, the binding energy was higher on the mixed SAMs surface than on the pure SAM. Mixed SAMs surfaces were rough surfaces which increased the contact area.

Fig. 4. Binding energy (read from left) of CTX adsorded on SAM surfaces of different mixing ratios, χC9. The normalized binding energy (read from right) is calculated by dividing the

The CTX-SAM contact area and the structure of SAMs were described in the following

Fig. 5. Snapshots of CTX on (a) χC9 = 0, (b) χC9 = 0.25, (c) χC9 = 0.5, (d) χC9 = 0.75, and (e) χC9 = 1 SAM surface. Water molecules are not shown in the figures to provide clear illustrations of

It was commonly accepted that the free-energy change of protein from water to membrane was proportional to the change of area contacting with the surrounding water area of protein. (White & Wimley 1994; Reynolds et al. 1974; Eisenberg & McLachlan 1986) This idea was extended to our system and verified if the binding energy of CTX on SAM also followed a linear relation against the CTX-SAM contact area. The area was calculated by the method of double cubic lattice (Eisenhaber et al. 1995) with a probe radius equal to 0.14 nm. The result of the binding energy versus the contact area was plotted in Fig. 6. It showed that the binding energy satisfied a linear equation with the surface area of CTX in contact with the SAM surface. The slope of the linear equation was 36.05 kJ/molnm2. Therefore, the binding energy was higher on the mixed SAMs surface than on the pure SAM. Mixed SAMs surfaces

binding energy by the value on the C5 SAM surface. (Hung et al. 2010)

the protein configurations. (Hung et al. 2010)

were rough surfaces which increased the contact area.

sections.

Fig. 6. Binding energy as a function of CTX-SAM contact area. (Hung et al. 2010)

Since the binding energy was strongly related to the CTX-SAM contact area, the structure of the SAM molecules can be another important factor in determining the protein adsorption. Mixed SAMs can be divided to two layer regions as observed by Bain, (Bain & Whitesides 1989) illustrated in Fig. 7(a). The first layer region was the inner region adjacent to the gold substrate and the second one was the outer layer in contact with the solution. The mobility of the SAM molecules can be calculated by the root mean square deviation (RSMD) of the alkanethiol chains in the SAM in a time interval equal to 1 ps. The larger the RMSD, the higher the mobility would be. Fig. 7(b) showed that the RMSD of the C5 surface was higher than that of the C9 surface, which showed a better ordering when the chain length was long.

Fig. 7. (a) Illustion of inner and outer layers for mixed SAM surfaces; (b) Root mean square deviations for inner and outer layers versus mixing ratio χC9 of SAM surface. (Hung et al. 2010)

The chain length dependence had been studied by experiments. (Fenter et al. 1997; Porter et al. 1987) These experiments showed that there existed distinct differences in structure between long-chained and short-chained SAMs. The long-chained SAMs formed a densely packed, crystalline-like structure while the structures of the short-chained SAMs became increasingly disordered. The results were consistent with the experiments. The large number of methylene groups in the long-chained SAM provided a strong vdW interaction to sustained an ordered structure. On the other hand, for the mixed SAM surfaces, the RMSD value was larger in the outer layer region than in the inner one. This was in agreement with the Bain's study, which showed that the inner layer packed better than the outer one.

The mixture of alkanethiols of different chain lengths in the SAM provided an additional dimension of the reaction area, especially for the hydrophobic interaction, on the limited surface. As a result, the CTX affinity was enhanced on the mixed SAM surface. In order to investigated the relation between the CTX binding and the SAM surface area, the SAS area of the mixed SAMs surface was calculated and plotted in Fig. 8. It showed that the CTX-SAM contact area was highly correlated to the SAM surface area. The result suggested that the three dimensional nanostructured morphology, due to the chain length difference of the alkanethiol chains in SAM, promoted the contact of CTX protein on the SAM surface.

Fig. 8. SAM surface area and CTX-SAM contact area versus mixing ratio χC9 of SAM surface. (Hung et al. 2010)

#### **3.2 Dynamic information of CTX desorption process**

#### **3.2.1 Structural changes of CTX during the desorption process**

CTX is a highly stable protein due to the presence of four disulfide bonds and a core of hydrophobic residues. (Sivaraman et al. 1998) As a result, the CTX protein does not undergo unfolding during desorption processes. SMD simulations provided dynamic structure histories of the CTX protein during the pulling process, and the key CTX molecular conformations were shown in Figs. 9(a)-9(d) for a pulling velocity of 0.25 nm/ns. The structural change of the CTX protein during the desorption process was due to these three loops. Starting from the equilibrium orientation of the three-finger loops facing and attached to the SAM surface (Fig. 9(a)), when CTX was pulled from the surface at a constant velocity, Loop I's tip detached from the surface first (Fig. 9(b)), and Loops II and III detached later at

The chain length dependence had been studied by experiments. (Fenter et al. 1997; Porter et al. 1987) These experiments showed that there existed distinct differences in structure between long-chained and short-chained SAMs. The long-chained SAMs formed a densely packed, crystalline-like structure while the structures of the short-chained SAMs became increasingly disordered. The results were consistent with the experiments. The large number of methylene groups in the long-chained SAM provided a strong vdW interaction to sustained an ordered structure. On the other hand, for the mixed SAM surfaces, the RMSD value was larger in the outer layer region than in the inner one. This was in agreement with

the Bain's study, which showed that the inner layer packed better than the outer one.

alkanethiol chains in SAM, promoted the contact of CTX protein on the SAM surface.

Fig. 8. SAM surface area and CTX-SAM contact area versus mixing ratio χC9 of SAM surface.

CTX is a highly stable protein due to the presence of four disulfide bonds and a core of hydrophobic residues. (Sivaraman et al. 1998) As a result, the CTX protein does not undergo unfolding during desorption processes. SMD simulations provided dynamic structure histories of the CTX protein during the pulling process, and the key CTX molecular conformations were shown in Figs. 9(a)-9(d) for a pulling velocity of 0.25 nm/ns. The structural change of the CTX protein during the desorption process was due to these three loops. Starting from the equilibrium orientation of the three-finger loops facing and attached to the SAM surface (Fig. 9(a)), when CTX was pulled from the surface at a constant velocity, Loop I's tip detached from the surface first (Fig. 9(b)), and Loops II and III detached later at

**3.2 Dynamic information of CTX desorption process** 

**3.2.1 Structural changes of CTX during the desorption process** 

(Hung et al. 2010)

The mixture of alkanethiols of different chain lengths in the SAM provided an additional dimension of the reaction area, especially for the hydrophobic interaction, on the limited surface. As a result, the CTX affinity was enhanced on the mixed SAM surface. In order to investigated the relation between the CTX binding and the SAM surface area, the SAS area of the mixed SAMs surface was calculated and plotted in Fig. 8. It showed that the CTX-SAM contact area was highly correlated to the SAM surface area. The result suggested that the three dimensional nanostructured morphology, due to the chain length difference of the about the same time (Fig. 9(c)) before the entire protein was detached from the surface (Fig. 9(d)). The correlation between the trajectories of the three loops' tips (*ZLoop,i*) and of the center of mass of protein (*ZCTX* ) from the gold substrate can quantitatively indicated the structural changes of the CTX protein during the desorption process, as shown in Fig. 9(e). The centers of mass of three amino acid residues, Val7, Ala28, and Leu47, were chosen to represent the tip positions of the three loops, *ZLoop I*, *ZLoop II*, and *ZLoop III*, respectively. The average position of the methyl groups of the SAMs was about 1.0 nm from the gold substrate, and the vdW radius of methyl group was ~0.2 nm. (A J Li and Nussinov 1998) Thus, all three *ZLoop,i* were about 1.3~1.4 nm from the gold substrate when the loop contacted the SAM surface. It was reasonable to define that the *ith* loop tip detached from the SAM surface when *ZLoop,i* was larger than 1.6 nm**.** In general, the desorption process can be described in three stages as follows: in the first stage, CTX contacted the SAM surface with the three loops attached until Loop I detached from the SAM surface at *ZCTX*~2.3-2.4 nm, and *ZLoop,i*<1.6 nm for all three loops *i*=1, 2, and 3. In the second stage, Loop I rose with *ZCTX* while Loops II and III remained in contact with the SAM surface, and *ZLoop,i*<1.6 nm for *i*=2 and 3 only. Loops II and III detached from the SAM surface almost at the same time at *ZCTX*~3.3 nm. The three loops were far from the SAM surface in the third stage.

Fig. 9. (a)-(d) CTX conformations at *ZCTX*=2.0, 2.5, 3.0, and 3.8 nm, respectively. (e) *ZLoop*, positions of the three loop tips of CTX (black solid line: LoopI; red dashed line: LoopII; blue dashed-dotted line: LoopIII) and (f) adsorption force vs. *ZCTX*, center of mass of CTX, when pulling with velocity *v*=0.25 nm/ns. Red triangles mark the positions of the snapshots in (a)- (d). (Hung et al. 2011)

### **3.2.2 Adsorption force during the desorption process**

The adsorption forces and structural changes of the CTX protein during the desorption process from the SAM surface were the important results of the SMD simulation. Fig. 9(f) shows the adsorption force of the CTX protein as it was pulled with a constant velocity of 0.25 nm/ns. Three stages can be distinguished from the force curve. In the first stage, the force was monotonically rising until it reached a peak, i.e., the rupture force in the present study at *ZCTX*~2.3-2.4 nm. The force then decreased to a plateau in the second stage for 2.5 nm< *ZCTX*<3.2 nm, and the force diminishes to nearly zero in the third stage (*ZCTX*>3.2 nm). These stages were closely correlated when the loops were either attached or detached.

Summarizing the history of the adsorption force and positions of the three loop tips, *ZLoop,i*, the desorption process can be described in three stages. In the first stage, the adsorption force increased with *ZCTX* until it was large enough to break Loop I from the SAM surface. In the second stage, the CTX-SAM was in a quasi-equilibrium state with Loops II and III contacting the SAM surface, and thus, the force remained constant in this stage. In the third stage, CTX was located far enough from the SAM surface that there was no interaction force between the CTX and SAM surface. The detachment of Loop I from the SAM surface gave the rupture force, which suggested that Loop I played an important role in the CTX desorption process from the SAM surface. This observation agreed with two-dimensional NMR spectroscopy experiments, (Sivaraman et al. 2000) which had shown that Loops II and III were more stable than Loop I. Hence, Loop I possessed the highest flexibility of the three loops, which was related to its biological activity. As it had the highest flexibility, Loop I should be the first loop to pull off from the surface, which was consistent with the SMD simulation results.

Different pulling velocities leading to different hysteresis effects (Liphardt et al. 2001) and different rupture forces (Liphardt et al. 2002) were observed in the previous studies. In the present simulations, the desorption processes at pulling velocities of *v*=1.0 nm/ns and *v*=0.25 nm/ns were qualitatively consistent with the three distinguished stages. However, three major differences were observed: the magnitude of the adsorption force at a higher pulling velocity was much higher than that at a lower pulling velocity, the quasiequilibrium state was indeterminate and the adsorption force was not diminished when the CTX was far from the SAM surface at *v*=1.0 nm/ns. These distinctions were a result of the pronounced non-equilibrium phenomena, such as friction and dissipation, at high pulling velocity. Furthermore, it was noted that the orientation of CTX in the bulk solvent environment at higher pulling velocity (Fig. 10(d)) was similar to that of CTX just departing from the SAM surface (Fig. 10(c)). There was no specific orientation of CTX in the bulk solvent environment at a lower pulling velocity (Figs. 9(c) and 9(d)), which implies that CTX was not relaxed with the surrounding molecules at a high pulling velocity of *v*=1.0 nm/ns.

To depict the adsorption force-pulling velocity dependence, the averaged force curves at four different pulling velocities were shown in Fig. 11(a). The force curves were in qualitative agreement by shape, but the magnitude of the peak adsorption force and the width of the second plateau-like stage were different at different pulling velocities. Because the difference between the forces at *v*=0.125 nm/ns and at 0. 25 nm/ns was very small and the computation time for the *v*=0.125 nm/ns case was twice that of the *v*=0.25 nm/ns case, the major computations were conducted at *v*=0.25 nm/ns for the present study. Figure 11(b) showed that the average rupture force was related to the pulling velocity in terms of the

The adsorption forces and structural changes of the CTX protein during the desorption process from the SAM surface were the important results of the SMD simulation. Fig. 9(f) shows the adsorption force of the CTX protein as it was pulled with a constant velocity of 0.25 nm/ns. Three stages can be distinguished from the force curve. In the first stage, the force was monotonically rising until it reached a peak, i.e., the rupture force in the present study at *ZCTX*~2.3-2.4 nm. The force then decreased to a plateau in the second stage for 2.5 nm< *ZCTX*<3.2 nm, and the force diminishes to nearly zero in the third stage (*ZCTX*>3.2 nm). These stages were closely correlated when the loops were either attached or detached.

Summarizing the history of the adsorption force and positions of the three loop tips, *ZLoop,i*, the desorption process can be described in three stages. In the first stage, the adsorption force increased with *ZCTX* until it was large enough to break Loop I from the SAM surface. In the second stage, the CTX-SAM was in a quasi-equilibrium state with Loops II and III contacting the SAM surface, and thus, the force remained constant in this stage. In the third stage, CTX was located far enough from the SAM surface that there was no interaction force between the CTX and SAM surface. The detachment of Loop I from the SAM surface gave the rupture force, which suggested that Loop I played an important role in the CTX desorption process from the SAM surface. This observation agreed with two-dimensional NMR spectroscopy experiments, (Sivaraman et al. 2000) which had shown that Loops II and III were more stable than Loop I. Hence, Loop I possessed the highest flexibility of the three loops, which was related to its biological activity. As it had the highest flexibility, Loop I should be the first loop to pull off from the surface, which was consistent with the SMD

Different pulling velocities leading to different hysteresis effects (Liphardt et al. 2001) and different rupture forces (Liphardt et al. 2002) were observed in the previous studies. In the present simulations, the desorption processes at pulling velocities of *v*=1.0 nm/ns and *v*=0.25 nm/ns were qualitatively consistent with the three distinguished stages. However, three major differences were observed: the magnitude of the adsorption force at a higher pulling velocity was much higher than that at a lower pulling velocity, the quasiequilibrium state was indeterminate and the adsorption force was not diminished when the CTX was far from the SAM surface at *v*=1.0 nm/ns. These distinctions were a result of the pronounced non-equilibrium phenomena, such as friction and dissipation, at high pulling velocity. Furthermore, it was noted that the orientation of CTX in the bulk solvent environment at higher pulling velocity (Fig. 10(d)) was similar to that of CTX just departing from the SAM surface (Fig. 10(c)). There was no specific orientation of CTX in the bulk solvent environment at a lower pulling velocity (Figs. 9(c) and 9(d)), which implies that CTX was not relaxed with the surrounding molecules at a high pulling velocity of *v*=1.0 nm/ns. To depict the adsorption force-pulling velocity dependence, the averaged force curves at four different pulling velocities were shown in Fig. 11(a). The force curves were in qualitative agreement by shape, but the magnitude of the peak adsorption force and the width of the second plateau-like stage were different at different pulling velocities. Because the difference between the forces at *v*=0.125 nm/ns and at 0. 25 nm/ns was very small and the computation time for the *v*=0.125 nm/ns case was twice that of the *v*=0.25 nm/ns case, the major computations were conducted at *v*=0.25 nm/ns for the present study. Figure 11(b) showed that the average rupture force was related to the pulling velocity in terms of the

**3.2.2 Adsorption force during the desorption process**

simulation results.

Fig. 10. (a)-(d) CTX conformations at *ZCTX*=2.0, 2.5, 3.0, and 3.8 nm, respectively. (e) *ZLoop*, positions of the three loop tips of CTX (black solid line: LoopI; red dashed line: LoopII; blue dashed-dotted line: LoopIII) and (f) adsorption force vs. *ZCTX*, center of mass of CTX, when pulling with velocity *v*=1.0 nm/ns. Red triangles mark the positions of the snapshots in (a)- (d). (Hung et al. 2011)

linear (green dash-dotted line) and logarithmic (red dash line) fittings. Some SMD studies (Sotomayor & Schulten 2007; Gao et al. 2002) found that the rupture force values are increased logarithmically with the pulling velocities, while other studies (Heymann & Grubmüller 1999; Marrink et al. 1998) found the linear dependence on the pulling velocities for high pulling velocities. The behavior between the rupture force and pulling velocity can be described well in terms of both linear and logarithmic relationships for pulling velocity ranging from 0.125 nm/ns to 1.0 nm/ns in our SMD simulations. The near-linear correlation indicated that the desorption process was within the friction force-dominated regime for the present study.

#### **3.2.3 PMF calculated by the umbrella sampling method**

Because the pulling forces depended strongly on the pulling velocity, further information of the CTX desorption process can be obtained from the free energy landscape, i.e., PMF. The resulting PMF profile from the umbrella sampling (Fig. 12(a)) showed a sharp change of the slope at *ZCTX*~2.5 nm, a subsequent gradual increase afterward (2.5 nm <*ZCTX*<3.2 nm), and a plateau after *ZCTX*~3.2 nm. The sharp change indicated a free energy barrier at *ZCTX*~2.5 nm.

Fig. 11. (a) Average adsorption force curves with various pulling velocities, i.e., *v*=0.125 (blue solid line), 0.25 (red dashed line), 0.5 (cyan dotted line), and 1 nm/ns (green dasheddotted line). (b) Average rupture force vs. pulling velocity from SMD simulations (squares). Blue circle indicates the rupture force obtained from the derivation of the PMF calculated using umbrella sampling. The green dashed-dotted and red dashed lines represent the best linear and logarithm fits to the average rupture forces, respectively. (Hung et al. 2011)

For comparison with the pulling force curves from SMD simulations, the mean force profile was obtained from the derivative of the PMF profile with respect to *ZCTX* (Fig. 12(b)). A remarkable agreement between the mean force (Fig. 12(b)) and SMD force-distance curves (Fig. 9(f) and Fig. 10(f)) was obtained.

Fig. 12. (a) PMF profile calculated using umbrella sampling. Red triangles mark the positions of the snapshots in (c)-(f). (b) Mean force profile obtained from the derivative of the PMF with respect to *ZCTX*. (c)-(f) CTX conformations at *ZCTX* =2.0, 2.4, 2.7, and 3.4 nm, respectively. (Hung et al. 2011)

The adsorption force from the SMD simulation included the random force, friction force, and thermodynamic force, but he mean force from the umbrella sampling method represented the thermodynamic force between the CTX protein and the SAM surface. Similar force-distance curves and similar structural changes impled cross-validation and also capture key features of the process from both approaches. Furthermore, the peak forces of both force-distance curves occurred at the same departing distance, *ZCTX*~2.5 nm, indicating that the departure of the first loop was a major rupture force in the desorption process and the use of a pulling force from SMD that was higher than the mean force from the umbrella sampling was reasonable. For the range of pulling velocities applied in SMD simulations, the friction force played a role in the adsorption force, and thus, the relationship between the adsorption force and pulling velocity was near-linear at higher pulling velocities. However, the relationship between the adsorption force and pulling velocity becomes logarithmic with lower pulling velocities for the thermodynamic force-dominated regime. (Marrink et al. 1998) As a result, the mean force (blue circle in Fig. 12(b)) cannot be achieved by extrapolating linearly to zero pulling velocity. Instead, the SMD data might approach the mean force by logarithmically extrapolating to zero pulling velocity, as shown in Fig. 12(b).

## **4. Conclusion and prospects**

290 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Fig. 11. (a) Average adsorption force curves with various pulling velocities, i.e., *v*=0.125 (blue solid line), 0.25 (red dashed line), 0.5 (cyan dotted line), and 1 nm/ns (green dasheddotted line). (b) Average rupture force vs. pulling velocity from SMD simulations (squares). Blue circle indicates the rupture force obtained from the derivation of the PMF calculated using umbrella sampling. The green dashed-dotted and red dashed lines represent the best linear and logarithm fits to the average rupture forces, respectively. (Hung et al. 2011)

For comparison with the pulling force curves from SMD simulations, the mean force profile was obtained from the derivative of the PMF profile with respect to *ZCTX* (Fig. 12(b)). A remarkable agreement between the mean force (Fig. 12(b)) and SMD force-distance curves

Fig. 12. (a) PMF profile calculated using umbrella sampling. Red triangles mark the positions of the snapshots in (c)-(f). (b) Mean force profile obtained from the derivative of the PMF with respect to *ZCTX*. (c)-(f) CTX conformations at *ZCTX* =2.0, 2.4, 2.7, and 3.4 nm,

(Fig. 9(f) and Fig. 10(f)) was obtained.

respectively. (Hung et al. 2011)

In the present study, the adsorption of CTX proteins on alkanethiol SAMs of different mixing ratios was analyzed by means of MD simulations. Different solvent models had been examined and the results demonstrated that the use of explicit water molecules was necessary to correctly take into account the enthalpic and entropic components of the hydrophobic effect when one studies protein adsorption on SAMs. The results showed that the binding energy has the highest value when χC9 was 0.5 and were in good agreement with the experimental data. Moreover, the binding energy between CTX and SAM surface was proportional to the CTX-SAM contact area. The structure of SAMs molecules determined the CTX-SAM contact area, and hence the binding energy.

Dynamic information, such as structural change and adsorption force, about the desorption of a single CTX protein from a SAM surface was investigated by means of SMD simulations successfully. The simulation results indicated that CTX did not undergo unfolding during the pulling process and Loop I was the first loop to depart from the SAM surface. This observation was in good agreement with the results of the NMR spectroscopy experiment. For the pulling velocity ranging from 0.125 nm/ns to 1.0 nm/ns employed in the present study, the near-linear dependence of force on pulling velocity indicated that the friction force played a significant role in the force measured in SMD simulations. A remarkable agreement was obtained between the force-distance interaction from the umbrella sampling method and the pulling force-distance curve from SMD, which cross-validated these techniques and also captured the same key features of the process from both approaches. Furthermore, the peak forces of both force-distance curves occurred at the same departing distance, *ZCTX*~2.5 nm, indicated that the departure of the first loop was resulted in the major rupture force in the desorption process. The results provided valuable information at atomic level toward a fundamental understanding of protein adsorption.

Future work will focus on the PMFs calculation of CTX adsorption on mixed SAMs of different mixing ratios. It will provide key features for enhacing protein adsorption or protein resistance on the designed surface by manipulating the mixing ratios.

### **5. Acknowledgment**

The authors thank the National Center for High-Performance Computing, Taiwan for computing resources and the National Science Council, Taiwan for financial support under Grant (NSC99-2221-E007-028-MY2).

#### **6. References**


The authors thank the National Center for High-Performance Computing, Taiwan for computing resources and the National Science Council, Taiwan for financial support under

Agashe, M.; Raut, V.; Stuart, S. J. & Latour, R. A. (2005). Molecular Simulation to

Bain, C.D. & Whitesides, G. M. (1989). Formation of Monolayers by the Coadsorption of

Balsera, M.; Stepaniants, S.; Izrailev, S.; Oono, Y. & Schulten, K. (1997). Reconstructing

Berendsen, H. J. C., Grigera, J. R. & Straatsma, T. P. (1987). The Missing Term in Effective Pair Potentials. *The Journal of Physical Chemistry*, Vol.91, No.24, pp. 6269-6271 Berendsen, H. J. C.; Postma, J. P. M.; Gunsteren, W. F. van; DiNola, A. & Haak, J. R. (1984).

Chandler, D. (2005). Interfaces and the Driving Force of Hydrophobic Assembly. *Nature*,

Dauplais, M.; Neumann, J. M.; Pinkasfeld, S.; Ménez, A. & Roumestand, C. (1995). An NMR

DeLano, W. L. (2002). The PyMOL Molecular Graphics System. , Available from

Dubovskii, P. V.; Lesovoy, D. M.; Dubinnyi, M. A.; Konshina, A. G.; Utkin, Y. N.; Efremov,

Dufton, M. J. & Hider, R. C. (1988). Structure and Pharmacology of Elapid Cytotoxins.

Efremov, R. G.; Nolde, D. E.; Konshina, A. G.; Syrtcev, N. P. & Arseniev, A. S. (2004).

Eisenhaber, F.; Lijnzaad, P.; Argos, P.; Sander, C. & Scharf, M. (1995). The Double Cubic

Simulations? *Current Medicinal Chemistry*, Vol.11, No.18, pp. 2421-2442 Eisenberg, D. & McLachlan, A. D. (1986). Solvation Energy in Protein Folding and Binding.

Characterize the Adsorption Behavior of a Fibrinogen Gamma-Chain Fragment.

Thiols on Gold: Variation in the Length of the Alkyl Chain. *Journal of the American* 

Potential Energy Functions from Simulated Force-Induced Unbinding Processes.

Molecular Dynamics with Coupling to an External Bath. *The Journal of Chemical* 

Study of the Interaction of Cardiotoxin Gamma from Naja Nigricollis with Perdeuterated Dodecylphosphocholine Micelles. *European Journal of Biochemistry /* 

R. G. & Dubovskii, A. S. A. (2005). Interaction of Three-Finger toxins with Phospholipid Membranes: Comparison of S- and P-Type Cytotoxins. *The* 

Peptides and Proteins in Membranes: What can We Learn via Computer

Lattice Method: Efficient Approaches to Numerical Integration of Surface Area and Volume and to Dot Surface Contouring of Molecular Assemblies. *Journal of* 

**5. Acknowledgment**

**6. References** 

Grant (NSC99-2221-E007-028-MY2).

*Langmuir*, Vol.21, No.3, pp. 1103-1117

*Physics*, Vol.81, No.8, pp. 3684-3690

Vol.437, No.7059, pp. 640-647

*FEBS*, Vol.230, No.1, pp. 213-220

http://www.pymol.org

*Chemical Society*, Vol.111, No.18, pp. 7164-7175

*Biophysical Journal*, Vol.73, No.3, pp. 1281-1287

*Biochemical Journal*, Vol.387, No.Pt 3, pp. 807-815

*Pharmacology & Therapeutics*, Vol.36, No.1, pp. 1-40

*Computational Chemistry*, Vol.16, No.3, pp. 273-284

*Nature*, Vol.319, No.6050, pp. 199-203


Hung, S.-W.; Hwang, J.-K.; Tseng, F.; Chang, J.-M.; Chen, C.-C. & Chieng, C.-C. (2006).

Isralewitz, B.; Gao, M. & Schulten, K. (2001). Steered Molecular Dynamics and Mechanical

Jarzynski, C. (1997). Nonequilibrium Equality for Free Energy Differences. *Physical Review* 

Kirkpatrick, S.; Gelatt, C. D. & Vecchi, M. P. (1983). Optimization by Simulated Annealing.

Kirkwood, J. G. (1935). Statistical Mechanics of Fluid Mixtures. *The Journal of Chemical* 

Laibinis, P. E.; Nuzzo, R. G. & Whitesides, G. M. (1992). Structure of Monolayers Formed by

Coadsorption of Two n-alkanethiols of Different Chain Lengths on Gold and Its Relation to Wetting. *The Journal of Physical Chemistry*, Vol.96, No.12, pp. 5097-5105 Levtsova, O. V.; Antonov, M. Y.; Mordvintsev, D. Y.; Utkin, Y. N.; Shaitan, K. V. &

Kirpichnikov, M. P. (2009). Steered Molecular Dynamics Simulations of Cobra Cytotoxin Interaction with Zwitterionic Lipid Bilayer: No Penetration of Loop Tips into Membranes. *Computational Biology and Chemistry*, Vol.33, No.1, pp. 29-32 Li, A. J. & Nussinov, R. (1998). A Set of van der Waals and Coulombic Radii of Protein

Atoms for Molecular and Solvent-Accessible Surface Calculation, Packing

of Single RNA Molecules by Mechanical Force. *Science*, Vol.292, No.5517, pp. 733-

Information from Nonequilibrium Measurements in an Rxperimental Test of

Assembled Monolayers of Thiolates on Metals as a Form of Nanotechnology.

Phospholipid Membrane Studied by Molecular Dynamics Simulations. *Biophysical* 

Simulations of a Hydrated Protein Vectorially Oriented on Polar and Nonpolar Soft

Adsorption of Proteins to Hydrophobic Sites on Mixed Self-Assembled

Liphardt, J.; Onoa, B.; Smith, S. B.; Tinoco, I. & Bustamante, C. (2001). Reversible Unfolding

Liphardt, J.; Dumont, S.; Smith, S. B.; Tinoco, I. & Bustamante, C. (2002). Equilibrium

Lomize, M. A.; Lomize, A. L.; Pogozheva, I. D. & Mosberg, H. I. (2006). OPM: Orientations of Proteins in Membranes Database. *Bioinformatics*, Vol.22, No.5, pp. 623-625 Love, J. C.; Estroff, L. A.; Kriebel, J. K.; Nuzzo, R. G. & Whitesides, G. M. (2005). Self-

Marrink, S.; Berger, O.; Tieleman, P. & Jahnig, F. (1998). Adhesion Forces of Lipids in a

Nordgren, C. E.; Tobias, D. J.; Klein, M. L. & Blasie, J. K. (2002). Molecular Dynamics

Nose, S. (1984). A Unified Formulation of the Constant Temperature Molecular Dynamics

Ostuni, E.; Grzybowski, B. A.; Mrksich, M.; Roberts, C. S. & Whitesides G. M. (2003).

Methods. *The Journal of Chemical Physics*, Vol.81, No.1, pp. 511-519

Evaluation, and Docking. *Proteins*, Vol.32, No.1, pp.111-127

Jarzynski's Equality. *Science*, Vol.296, No.5574, pp. 1832-1835

*Chemical Reviews*, Vol.105, No.4, pp. 1103-1169.

Surfaces. *Biophysical Journal*, Vol.83, No.6, pp. 2906-2917

Monolayers. *Langmuir*, Vol.19, No.5, pp. 1861-1872

*Journal*, Vol.74, No.2, pp. 931-943

Vol.17, No.4, pp. S8-S13.

*Letters*, Vol.78, No.14, pp. 2690-2693

*Science*, Vol.220, No.4598, pp. 671-680

*Physics*, Vol.3, No.5, pp. 300-313

230.

737

Molecular Dynamics Simulation of the Enhancement of Cobra Cardiotoxin and E6 Protein Binding on Mixed Self-Assembled Monolayer Molecules. *Nanotechnology*,

Functions of Proteins. *Current Opinion in Structural Biology*, Vol.11, No.2, pp. 224-


## **Simulations of Unusual Properties of Water Inside Carbon Nanotubes**

Yoshimichi Nakamura and Takahisa Ohno *National Institute for Materials Science; CREST-JST Japan* 

### **1. Introduction**

296 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Tobias, D. J.; Mar, W.; Blasie, J. K. & Klein, M. L. (1996). Molecular Dynamics Simulations of

Torrie, G. M. & Valleau, J. P. (1977). Nonphysical Sampling Distributions in Monte Carlo

Tupper, K. J. & Brenner, D. W. (1994). Compression-Induced Structural Transition in a Self-

van Gunsteren, W. F. & Berendsen, H. J. C. (1990). Computer Simulation of Molecular

van der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A. E. & Berendsen, H. J. C.

White, S. & Wimley, W. (1994). Peptides in Lipid Bilayers: Structural and Thermodynamic

Ytreberg, F. M.; Swendsen, R. H. & Zuckerman, D. M. (2006). Comparison of Free Energy

Zheng, J.; Li, L.; Chen, S. & Jiang, S. (2004). Molecular Simulation Study of Water

Zheng, J.; Li, L.; Tsao, H.-K.; Sheng, Y.-J.; Chen, S. & Jiang, S. (2005). Strong Repulsive Forces

Molecular Simulation Study. *Biophysical Journal*, Vol.89, No.1, pp. 158-166 Zhou, J.; Zheng, J. & Jiang, S. (2004). Molecular Simulation Studies of the Orientation and

*Journal of Physical Chemistry B*, Vol.108, No.45, pp. 17418-17424

Monolayers. *Langmuir*, Vol.20, No.20, pp. 8931-8938

Assembled Monolayer. *Langmuir*, Vol.10, No.7, pp. 2335-2338

*Chemie International Edition in English*, Vol.29, No.9, pp. 992-1023

No.6, pp. 2933-2941

Vol.23, No.2, pp. 187-199

Vol.26, No.16, pp. 1701-1718

pp. 79-86

184114

a Protein on Hydrophobic and Hydrophilic Surfaces. *Biophysical Journal*, Vol.71,

Free-Energy Estimation: Umbrella Sampling. *Journal of Computational Physics*,

Dynamics: Methodology, Applications, and Perspectives in Chemistry. *Angewandte* 

(2005). GROMACS: Fast, Flexible, and Free. *Journal of Computational Chemistry*,

Basis for Partitioning and Folding. *Current Opinion in Structural Biology*, Vol.4, No.1,

Methods for Molecular Systems. *The Journal of Chemical Physics*, Vol.125, No.18, pp.

Interactions with Oligo (Ethylene Glycol)-Terminated Alkanethiol Self-Assembled

between Protein and Oligo (Ethylene Glycol) Self-Assembled Monolayers: a

Conformation of Cytochrome c Adsorbed on Self-Assembled Monolayers. *The* 

Water, which is vital for all living creatures and essential to our daily lives, is one of the most researched materials on earth. It is amazing to see that even after a long history of research, this substance consisting of triatomic molecules is still rich in new discoveries on its unexpected properties. Among the hottest topics over the last decade is anomalous behavior of water molecules inside confined nanospaces, which stimulates our scientific curiosity from the viewpoint of how the tiny, polar molecules rearrange themselves when strongly confined. What the hydrogen-bond network is like? In this research field, molecular dynamics (MD) simulations have played a leading role and are expected to further increase its importance in predicting unexpected properties of water. Typical of well-defined, size-controlled nanospaces easily obtainable is the interior space of carbon nanotubes (CNTs). Figure 1 shows the relation of a single-walled CNT to the graphene honeycomb lattice. An example where the graphene lattice vector *m***a1** + *m***a2** corresponds to the circumference of the CNT is shown in the figure. This type of CNT is typically referred to in this chapter and the pair of integers (*m*, *m*) is often quoted as a useful index of the CNT diameter.

When exploring a new ordered phase of the water molecules, it is quite natural and reasonable to focus on low-temperature/high-pressure conditions. MD simulation is a powerful tool for such exploration. In the CNT diameter range about 1-2 nm, the water molecules were reported to be frozen into a variety of forms of 'ice nanotube' inside the CNT (Bai et al., 2006; Koga at al., 2001; Luo et al., 2008; Mikami et al., 2009). For example, at *P* = 500 bar and T ≤ 240 K, 4-, 5-, and 6-gonal ice nanotubes were observed for (14, 14), (15, 15) and (16, 16) CNTs, respectively (Koga et al., 2001). All those simulations on icenanotubes were performed with the use of infinite or capped CNTs, which means that the confined water density can be directly controlled. In one of such MD simulation studies, a double-layered helix form was demonstrated to be possible inside the infinite (10, 10) CNT at 298K for the confined water density of '1 g/cc' (Liu et al., 2005). Here, the water density was estimated based on the number of confined molecules divided by the geometrical volume of the CNT, not by its effective inner space volume, which is at most about 60% of the geometrical volume for the (10, 10) CNT due to the hydrophobic nature of the CNT wall. The value of '1 g/cc' therefore corresponds to a very high density condition.

Fig. 1. Illustration of an (*m, m*) carbon nanotube and a graphene sheet.

Amazingly, at ambient conditions (300K, 1 bar) another ordered phase similar to the 6 gonal ice nanotube was found inside uncapped (9, 9) CNTs (Mashl. et al., 2003). The new phase reveals ice-like mobility with an amount of hydrogen bonding similar to that in the bulk liquid water. Unlike the simulations using infinite or capped CNTs, each water molecule was allowed to enter (leave) the CNTs from (for) the outside bulk water. The interior of the uncapped CNTs is therefore naturally filled with water molecules. The uncapped CNTs were embedded into wafers of neutral atoms mimicking the hydrophobic interior of a phospholipid membrane. Based on the same water model used in the above study, i.e., the SPC/E model (Berendsen et al., 1987), we have confirmed that the anomalously immobilized water at ambient conditions is also observed inside the uncapped CNTs which is NOT embedded into the membrane but directly immersed in a water reservoir (Fig. 2).

It should be noted that a more recent MD study has reported a brand new water phase called 'ferroelectric mobile water (FMW)' (Nakamura & Ohno, 2011, 2012a, 2012b). Though the FMW at first sight appears similar to the immobilized water mentioned above, they are distinct from each other both in molecular structure and in dynamics. The FMW is produced inside (8, 8) and (9, 9) CNTs immersed in a water reservoir at ambient conditions based on the TIP5P-E model (Rick, 2004). The details of the FMW will be explained later.

The H2O/CNT simulation studies have thus revealed a variety of unusual properties of water molecules and improved our general knowledge of water and ice. Also, knowledge on influential factors such as the CNT diameter, temperature, and pressure has been increasing. For spontaneous filling of CNTs with liquid water, there are many simulations employing a similar set of these conditions. We see, however, that in some cases they do not always yield the similar results. This implies that difference in water models used is potentially another important influential factor to be considered. In this chapter, therefore, paying attention to the effect of water models, we explore the unusual behaviour of the confined water.

Fig. 2. Left: Illustration of an uncapped CNT in a water reservoir. Right: Snapshots of anomalously immobilized water at ambient conditions. Simulations are performed for the (9, 9) CNTs immersed in a water reservoir based on the SPC/E water model. Red for oxygen, blue for hydrogen.

### **2. Overview of water models**

298 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Fig. 1. Illustration of an (*m, m*) carbon nanotube and a graphene sheet.

water reservoir (Fig. 2).

confined water.

Amazingly, at ambient conditions (300K, 1 bar) another ordered phase similar to the 6 gonal ice nanotube was found inside uncapped (9, 9) CNTs (Mashl. et al., 2003). The new phase reveals ice-like mobility with an amount of hydrogen bonding similar to that in the bulk liquid water. Unlike the simulations using infinite or capped CNTs, each water molecule was allowed to enter (leave) the CNTs from (for) the outside bulk water. The interior of the uncapped CNTs is therefore naturally filled with water molecules. The uncapped CNTs were embedded into wafers of neutral atoms mimicking the hydrophobic interior of a phospholipid membrane. Based on the same water model used in the above study, i.e., the SPC/E model (Berendsen et al., 1987), we have confirmed that the anomalously immobilized water at ambient conditions is also observed inside the uncapped CNTs which is NOT embedded into the membrane but directly immersed in a

It should be noted that a more recent MD study has reported a brand new water phase called 'ferroelectric mobile water (FMW)' (Nakamura & Ohno, 2011, 2012a, 2012b). Though the FMW at first sight appears similar to the immobilized water mentioned above, they are distinct from each other both in molecular structure and in dynamics. The FMW is produced inside (8, 8) and (9, 9) CNTs immersed in a water reservoir at ambient conditions based on

The H2O/CNT simulation studies have thus revealed a variety of unusual properties of water molecules and improved our general knowledge of water and ice. Also, knowledge on influential factors such as the CNT diameter, temperature, and pressure has been increasing. For spontaneous filling of CNTs with liquid water, there are many simulations employing a similar set of these conditions. We see, however, that in some cases they do not always yield the similar results. This implies that difference in water models used is potentially another important influential factor to be considered. In this chapter, therefore, paying attention to the effect of water models, we explore the unusual behaviour of the

the TIP5P-E model (Rick, 2004). The details of the FMW will be explained later.

Figure 3 shows a series of the transferable interaction potentials for water molecules, TIP*n*P models (Jorgenen et al., 1983; Mohoney & Jorgensen, 2000), demonstrating how the water models are classified from the viewpoint of the number of interaction sites (denoted by *n*). The models for *n* = 3, 4, and 5 are represented in the figure. All models use the same rigid molecular structure. The van der Waals (vdW) interaction is commonly described by the 6- 12 Lennard-Jones potential between the oxygen atom sites (closed black circles). Significant differences among the models are in the description of the electrostatic interaction sites (open white circles).

In the TIP3P model, the molecular charge distribution is modeled by point charges on each nuclei (Fig. 3, lower left). Together with the vdW site, the total number of interaction sites is three, as the model's name implies. In the TIP4P model, unlike in the TIP3P model, the negative charge is placed on an additional fictive site, resulting in four interaction sites overall (Fig. 3, lower middle). In the TIP5 model, the charge distribution is more realistically treated. The negative charges along the lone-pair directions are explicitly taken into account (Fig. 3, lower right).

At the expense of more computational cost than the 3- and 4-site models, the 5-site model have succeeded in reproducing more of the water properties over a range of temperatures and pressures, including the density maximum near 4 °C (Mohoney & Jorgensen, 2000) and the melting point (Fernández et al., 2006; Vega eat al., 2005). There has also been proposed a modified version of the TIP5P, called TIP5P-E (Rick, 2004), which is constructed by only modifying the Lennard-Jones parameters so as to yield more accurate results when used with Ewald sum calculations for long-ranged electrostatic interactions.

Fig. 3. Illustration of TIP*n*P water models. Red for oxygen, blue for hydrogen.

Fig. 4. Examples of H2O/CNT simulation results based on 5-site water models (left figure is from Fig.1 of Bai et al., 2006. Copyright 2006 National Academy of Sciences.)

The TIP3P and TIP4P models have been the most commonly used water models, along with other variations of 3-site models, SPC and SPC/E (Berendsen et al., 1981, 1987). Such is also the case for H2O/CNT simulations. Though the number of H2O/CNT simulation studies based on the 5-site models is only a few, their findings that the water molecules are able to self-assemble into double helixes resembling DNA under high pressure (Bai at al., 2006) and that single-domain ferroelectric mobile water is possible at ambient conditions (Nakamura

Fig. 3. Illustration of TIP*n*P water models. Red for oxygen, blue for hydrogen.

Fig. 4. Examples of H2O/CNT simulation results based on 5-site water models (left figure is

The TIP3P and TIP4P models have been the most commonly used water models, along with other variations of 3-site models, SPC and SPC/E (Berendsen et al., 1981, 1987). Such is also the case for H2O/CNT simulations. Though the number of H2O/CNT simulation studies based on the 5-site models is only a few, their findings that the water molecules are able to self-assemble into double helixes resembling DNA under high pressure (Bai at al., 2006) and that single-domain ferroelectric mobile water is possible at ambient conditions (Nakamura

from Fig.1 of Bai et al., 2006. Copyright 2006 National Academy of Sciences.)

& Ohno, 2011, 2012a, 2012b) are all the more outstanding, considering far more numerous studies based on the 3- and 4-site models.

### **3. Spontaneous filling of CNTs with liquid water**

Regarding the spontaneous filling of CNTs with liquid water, various 3-site water models have been tried for a wide range of CNT diameter (Alexiadis & Kassinos, 2008a, 2008b). It was shown that the different choices of rigid/flexible model for the TIP3P, SPC, and SPC/E water molecules, along with rigid/flexible choices of CNTs, cause no significant differences. We now perform systematic simulations based on not only 3-site water models (TIP3P and SPC/E) but also 4-site (TIP4P) and 5-site (TIP5P-E) models (Nakamura & Ohno, 2012a).

A nonpolar, rigid (*m, m*) CNT of length *L* is solvated with *N* water molecules in a periodic box. *L* = 2.1 nm for *m* = 6, 7, 8, 9, 10, 12, 16, 20 (0.8 - 2.7 nm in diameter). *L* = 4.0 nm for *m* = 8, 9. *N* ranges from 2074 to 6508, depending on the CNT size. The carbon-carbon bond length is 1.4 Å. MD simulations are performed using AMBER 9.0 (Case et al., 2006). The water molecules and the CNTs are assumed to interact through the 6-12 Lennard-Jones potential between the oxygen and the *sp*2 carbon atoms (AMBER force field). Based on the cross section OO (Table 1) and CC (3.4 Å), and the depth of the potential well OO (Table 1) and CC (0.086 kcal/mol), CO and CO are derived from the Lorentz-Berthelot combining rules,

$$
\mathfrak{e}\_{\rm CO} = \sqrt{\mathfrak{e}\_{\rm CC} \times \mathfrak{e}\_{\rm CO}}, \quad \sigma\_{\rm CO} = \frac{\sigma\_{\rm CC} + \sigma\_{\rm CO}}{2} \tag{1}
$$

Electrostatic interactions between the water molecules are calculated by particle-mesh Ewald method (Darden, 1993). An MD time step of 2 fsec is used. In the first runs (up to 0.5- 0.6 nsec), a combination of constant volume and constant pressure simulations is performed to ensure the system is in equilibrium with the bulk water density at *T* K under 1 bar. In the subsequent MD time steps (10-120 nsec), the NVT-ensemble is used for statistical analysis. Snapshots are saved for analysis every 0.1 psec.


Table 1. Lennard-Jones potential parameters for each model.

Figure 5 shows the water density inside the CNTs, together with snapshots obtained by using the TIP5P-E model. As was already reported by the simulations based on the 3-site models, those based on the 4- and 5-site models also have three different filling modes, that is, 'wire', 'layered', and 'bulk' mode as the diameter size increases (Alexiadis & Kassinos, 2008a, 2008b).

In the wire mode, the water molecules form into a water wire regardless of the water model. Figure 5a shows a snapshot example of the single-file formation of the TIP5P-E molecules inside the (6, 6) CNT. It has been reported that the TIP3P water molecules in the wire mode undergo a burst-like transmission through the CNT (Hummer et al., 2001). A similar concerted and rapid axial motion is observed also for the other water models. An example for the TIP5P-E model is shown in Fig. 6. In the wire mode, significant differences among the water models are not seen in both the structure and dynamics of the confined water molecules.

Fig. 5. Upper: Water density inside (*m, m*) CNTs of length 2.1 nm at 280 K. Data are from Fig.1b of Nakamura & Ohno, 2012a. Lower: Snapshots from MD simulations based on TIP5P-E model. (a) Single-file formation in 'wire' mode. (b) (c) Single-layered formation in 'layered' mode. (d) Example in 'Bulk' mode.

In the 'layered' mode, the larger the CNT diameter size, the more concentric layers of the confined water molecules come to appear. At the same time, those water layers become more diffuse and the bulk water structure gradually recovers. A snapshot example practically corresponding to the 'bulk' mode is shown in Fig. 5d.

The overall trend of the confined water density is that the shorter the diameter, the smaller the density. The density values for each model are practically the same except for the (8, 8) and (9, 9) CNTs (about 1.1-1.2 nm in CNT diameter). We notice that the TIP5P-E water density does not follow the overall trend for the (8, 8) and the (9, 9) CNT. The SPC/E water density does not follow the trend for the (9, 9) CNT. Inside the (8, 8) and (9, 9) CNTs, the TIP5P-E water molecules form into a single-layered water tube as shown in Fig. 5b and c.

concerted and rapid axial motion is observed also for the other water models. An example for the TIP5P-E model is shown in Fig. 6. In the wire mode, significant differences among the water models are not seen in both the structure and dynamics of the confined water

Fig. 5. Upper: Water density inside (*m, m*) CNTs of length 2.1 nm at 280 K. Data are from Fig.1b of Nakamura & Ohno, 2012a. Lower: Snapshots from MD simulations based on TIP5P-E model. (a) Single-file formation in 'wire' mode. (b) (c) Single-layered formation in

In the 'layered' mode, the larger the CNT diameter size, the more concentric layers of the confined water molecules come to appear. At the same time, those water layers become more diffuse and the bulk water structure gradually recovers. A snapshot example

The overall trend of the confined water density is that the shorter the diameter, the smaller the density. The density values for each model are practically the same except for the (8, 8) and (9, 9) CNTs (about 1.1-1.2 nm in CNT diameter). We notice that the TIP5P-E water density does not follow the overall trend for the (8, 8) and the (9, 9) CNT. The SPC/E water density does not follow the trend for the (9, 9) CNT. Inside the (8, 8) and (9, 9) CNTs, the TIP5P-E water molecules form into a single-layered water tube as shown in Fig. 5b and c.

'layered' mode. (d) Example in 'Bulk' mode.

practically corresponding to the 'bulk' mode is shown in Fig. 5d.

molecules.

Though they at first sight resemble the 5-gonal and 6-gonal ice nanotubes (Koga et al., 2001; Luo et al., 2008; Mikami et al., 2009), they are quite distinct from those ice nanotubes as explained in the next section.

Fig. 6. Burst-like flow of the TIP5P-E water molecules along the CNT axis at 300 K.

## **4. Ferroelectric mobile water (FMW)**

We look into the structure of the single-layered TIP5P-E water tubes inside the (8, 8) and (9, 9) CNTs. They are practically one-atom-thick water tubes as we see their top views in Fig. 5b and c. To detail the molecular structure, we unfold the water tube as shown in the upper part of Fig. 7. All the water molecules are visited by following one single helix running through the CNT. To our surprise, the dipole moment of each molecule is oriented in practically the same direction, which results in a single-domain ferroelectric arrangement of the water molecules and consequently a large spontaneous polarization along the CNT axis (Fig. 7, right-hand side). This new water phase is termed 'ferroelectric mobile water (FMW).' Its 'mobile' character will be explained later.

In Fig. 7, the direction of the axial spontaneous polarization is downward for the (8, 8) CNT and upward for the (9, 9) CNT. Each direction is observed as unchanged at least over about 100 ns. Of course, there is no reason for the net axial polarization to prefer one direction over the other. For reference, we tried other series of simulations using a different random arrangement of the water molecules, and obtained the opposite polarization direction for each. We also find that the longer the CNT length, the more stable the polarization direction. That is, employing the longer CNTs has an effect similar to reducing the temperature.

The unfolded-view snapshots in Fig. 7 reveal that a hydrogen bond network is completed, i.e., the 'ice rule' is obeyed (Bernal & Fowler, 1933). Moreover, the protons are perfectly ordered (Fig. 7, unfolded views in middle). Here, it is worth referring to proton-ordered ice (Pauling, 1935; Slater, 1941; Jackson & Whitworth, 1995; Jackson et al., 1997; Su et al., 1998; Iedema et al., 1998; Bramwell, 1999; ). In ordinary ice (i.e., ice Ih), the protons are disordered though the ice rule is obeyed for its hexagonal lattice of the water molecules. The proton

Fig. 7. Left: Side views of single-layered, one-atom-thick water tubes inside CNTs. Middle: Unfolded-views of TIP5P-E water molecules inside the (8, 8) and the (9, 9) CNTs at 280 K. Broken arrows are guide for eyes. Right: Arrangement of the dipole moment component along the CNT axis. The axial dipole moment of each water molecule is represented by red (blue) arrows when its value is zero or greater (less than zero).

Fig. 7. Left: Side views of single-layered, one-atom-thick water tubes inside CNTs. Middle: Unfolded-views of TIP5P-E water molecules inside the (8, 8) and the (9, 9) CNTs at 280 K. Broken arrows are guide for eyes. Right: Arrangement of the dipole moment component along the CNT axis. The axial dipole moment of each water molecule is represented by red

(blue) arrows when its value is zero or greater (less than zero).

ordered phase is rarely produced in pure ice because the disorder is practically frozen in before reaching the ordered phase due to the very slow kinetics of the proton reorientations at very low temperatures. Though the ordering transition can be catalyzed by doping with hydroxides, the ordered phase known as ice XI (Matsuo et al., 1986) has only been partially obtained (Tajima et al., 1982, 1984; Kawada, 1972, 1989). Researchers in this area are beginning to direct their attention to the outer solar system such as Pluto, where most water ice is considered to have existed in a crystalline phase for billions of years (Fukazawa et al., 2006). In marked contrast, we have shown that the proton-ordered water, FMW, can be produced at ambient conditions by utilizing the interior space of CNTs immersed in a reservoir water.

Fig. 8. Illustration of arrangement of the dipole moment component along the CNT axis for the 5- and 6-gonal ice nanotubes. Each dipole moment is represented by red (blue) arrows when its value is zero or greater (less than zero). Solid and broken lines are guide for eyes.

It is also worth referring to so far reported *n*-gonal ice-nanotubes. Those *n*-gonal icenanotubes have been found inside infinite or capped CNTs under high pressure/low temperature conditions with non-zero net spontaneous polarization (Luo et al., 2008; Mikami at al., 2009). Figure 8 shows schematic illustrations of the arrangement of the axial dipole moment for the 5- and 6-gonal ice nanotubes. Though each of the *n* wires of water molecules along the CNT axis is ferroelectric, the inter-wire relation is antiferroelectric. They are, therefore, not single-domain ferroelectric ice. In addition, non-zero net polarization arises only for an odd number of *n*. It should be noted that unlike the ice nanotubes, the FMW yields non-zero net polarization not only for the 5-gonal but also for the 6-gonal.

Fig. 9. Proton-ordered diffusion inside the (8, 8) CNT of length 4.0 nm. 280 K. (a) Unfoldedview snapshot. (b) Axial positions of five neighbouring molecules traced in corresponding colours. The net axial polarization of water inside the CNT traced in black (units are the magnitude of a single dipole of TIP5P-E water). a and b from Fig.2 of Nakamura & Ohno, 2011. Reproduced by permission of the PCCP Owner Societies. (c) A close-up of the motions between 2.6-2.8 nsec.

In general, a highly ordered structure such as the FMW seems suggestive of a rigid, immobile phase. Counter to our intuition, however, the FMW is dynamic. Figure 9a shows a side-view (left) and its corresponding unfolded-view snapshot obtained from the simulation employing the (8, 8) CNT. Five neighbouring molecules along the helix are coloured, and

Fig. 9. Proton-ordered diffusion inside the (8, 8) CNT of length 4.0 nm. 280 K. (a) Unfoldedview snapshot. (b) Axial positions of five neighbouring molecules traced in corresponding colours. The net axial polarization of water inside the CNT traced in black (units are the magnitude of a single dipole of TIP5P-E water). a and b from Fig.2 of Nakamura & Ohno, 2011. Reproduced by permission of the PCCP Owner Societies. (c) A close-up of the motions

In general, a highly ordered structure such as the FMW seems suggestive of a rigid, immobile phase. Counter to our intuition, however, the FMW is dynamic. Figure 9a shows a side-view (left) and its corresponding unfolded-view snapshot obtained from the simulation employing the (8, 8) CNT. Five neighbouring molecules along the helix are coloured, and

between 2.6-2.8 nsec.

their axial positions are traced in corresponding colours (Fig. 9b, c). The net axial polarization of the water inside the CNT is also shown in Fig. 9b (black line). The molecules diffuse along the CNT axis, keeping the proton-ordered network intact. They fluctuate and gradually shift away from their initial positions like a one-dimensional Brownian motion. A nanometre-order shift over nanosecond-order time is often seen. This concerted diffusion is termed 'proton-ordered diffusion.'

Figure 10 shows a series of elementary processes of the proton-ordered diffusion. Due to the single helical chain structure of the FMW, there are 'vacant pockets' in the interior space of the CNT near its both ends. The vacant pocket near the CNT top is filled molecule by molecule with reservoir water molecules. The molecules filling the pockets enter deeper and become more stable. At the same time, another vacant pocket is created at the same place again, which makes the filling process endless. A similar filling process also takes place near the bottom of the CNT. The proton-ordered diffusion is thus deeply relevant to the structure of the FMW.

Fig. 10. Mechanism of the proton-ordered diffusion inside the (8, 8) CNT of length 4.0 nm. 280K. The oxygen (hydrogen) atoms of the water molecules entering from outside the CNT are colored red (blue).

The direction of the axial diffusion is practically dictated by the stochastic process near the CNT ends, and the probability of shifting in either directions is, on average, equal. From this viewpoint, a significant shift only in one direction seen in Fig. 9b might sound paradoxical. It is, however, understood by analogy with 'leads in a prolonged fair coin-tossing game.' When the cumulative number of heads is larger than that of tails, heads is in the lead (and vice versa). The leads are equivalent to the position shifts in the proton-ordered diffusion. When a great many series of fair coin-tossing games are conducted independently, the most likely result is for one side to be in the lead during the entire course of the game. This tendency becomes more pronounced for the games with a larger number of tosses. Contrary to popular notions, the least likely result is for the two sides to be tied for the time in the lead (Feller, 1968). The diffusion trajectory shown in Fig. 9b therefore represents a quite likely result.

Fig. 11. 'Go–stop–go' motion of water inside the (9, 9) CNT of length 4.0 nm. 290 K. (a) Axial positions of six coloured molecules in the unfolded-view snapshots b, c, and d are traced in corresponding colours. The net axial polarization of water inside the CNT is traced in black (units are the magnitude of a single dipole of TIP5P-E water). Unfolded-view snapshot at (b) 0.072 ns, (c) 1.214 ns, and (d) 1.998 ns. From Fig.3 of Nakamura & Ohno, 2011. Reproduced by permission of the PCCP Owner Societies.

The proton-ordered diffusion is also observed for the FMW inside the (9, 9) CNTs (Fig. 11). Looking into the axial positions of six marked molecules, we notice that they are

Fig. 11. 'Go–stop–go' motion of water inside the (9, 9) CNT of length 4.0 nm. 290 K. (a) Axial positions of six coloured molecules in the unfolded-view snapshots b, c, and d are traced in corresponding colours. The net axial polarization of water inside the CNT is traced in black (units are the magnitude of a single dipole of TIP5P-E water). Unfolded-view snapshot at (b) 0.072 ns, (c) 1.214 ns, and (d) 1.998 ns. From Fig.3 of Nakamura & Ohno, 2011. Reproduced

The proton-ordered diffusion is also observed for the FMW inside the (9, 9) CNTs (Fig. 11). Looking into the axial positions of six marked molecules, we notice that they are

by permission of the PCCP Owner Societies.

temporarily pinned at a certain axial position (at about 2.0 nm, from 0.9 to 1.5 ns in Fig. 11a). Before the pinning takes place, the confined water has a single-helix structure (b) and undergoes the axial diffusion. When pinned, however, it is no longer helical: it transforms into axially stacked layers of water molecules and consequently all the molecules become practically immobile (c). Once the pinning ends, the water recovers the FMW structure and resumes the proton-ordered diffusion (d). These motion changes literally correspond to 'Go-Stop-Go.'

In Fig. 11a, the net axial polarization of the confined water is also plotted. It should be noted that the 'Go-Stop-Go' is accompanied by step-wise changes of the net polarization. As we see the series of unfolded snapshots (b, c, d), each dipole axis comes to slightly tilt against the CNT axis when the pinning takes place, which results in a lower net polarization than that of the FMW during the pinning.

We have performed a long-time simulation with a shorter (9, 9) CNT of length 2.1 nm, in which we find a series of stepwise changes of the net polarization (Fig. 12a). Amazingly, the values fall on any one of the nine lines. Five independent phases, labeled '0' to '4', are suggested. Phase #4 corresponds to the FMW, the most stable phase (Nakamura & Ohno, 2011). Figure 12b shows unfolded-view snapshots from each phase, which is helpful to understand how the net polarization is digitized. The molecules are coloured red (blue) when their axial dipole moment *Pz* is zero or greater (less than zero). Each axial water wire is red or blue, i.e., ferroelectric. The ice rule is practically obeyed for all the snapshots. Under the ice rule, many other dipole arrangements are possible for each phase except #4.

Phase #0 is very similar to the reported 6-gonal ice nanotubes (Luo et al., 2008; Mikami at al., 2009) in that the inter-wire relation is antiferroelectric though the intra-wire is ferroelectric, resulting in a zero net spontaneous polarization. In phase #1 (#2), the axial water wires in red outnumber those in blue by two (four). The redundant water wires yield a non-zero net polarization. Phases #3 and #4 have no domain boundary along the CNT axis.

We detail the transition from one phase to the other, e.g., from phase #0 to #1 (at around 18 ns, indicated by a yellow arrow in Fig. 12a). A sequence of the transient snapshots is shown in Fig. 12c. At t = 0, the inter-wire relation is antiferroelectric. Under local perturbation from the reservoir water, the dipole orientation of the encircled molecule is changed so as to introduce a domain boundary in the axial water wire (t = 3.5 ps), which corresponds to a hydrogen bonding defect (Dellago et al., 2003; Köfinger et al., 2008; Köfinger & Dellago, 2009). The ice rule is consequently breached at the domain boundary, and local instability is created. Fluctuating rapidly along the CNT axis, the position of the domain boundary shifts toward the opposite CNT edge accompanied by the dipole reorientations (t = 4.7 ps). Eventually the whole water wire becomes red (t = 8.5 ps). Triggered by the local perturbation, the phase transition is thus completed in less than 10 ps. It should be noted that the perturbation from the reservoir water only works on the dipole reorientation and has little effect on their diffusion. This is the case for phases #0 to #3. In marked contrast, for phase #4, the local perturbation has little effect on the dipole reorientation: it only works on the diffusion of the molecules, that is, the proton-ordered diffusion of the FMW.

Fig. 12. Sequence of spontaneous transitions with step-wise changes of net polarization of water inside the (9, 9) CNT of length 2.1 nm. 280 K. (a) Net axial polarization of water traced in black (units are the magnitude of a single dipole of TIP5P-E water). Two red lines represent the ferroelectric mobile water. Seven blue lines the immobile water. Labels '0' to '4' on five lines represent five independent water phases. (b) Unfolded-view snapshots from each phase. Water molecules are colored red (blue) when their axial dipole moment *Pz* is zero or greater (less than zero). (c) Sequence of unfolded-view snapshots during the transition from phase #0 to #1, which is indicated by yellow arrow in (a). Black circles and dotted rectangles are guide for eyes. From Fig.4 of Nakamura & Ohno, 2011. Reproduced by permission of the PCCP Owner Societies.

Here, a question still remains: how the mobile phase, the FMW, is reached from the immobile phases? This process is clearly explained in Fig. 13. A schematic, representative arrangement of the confined water molecules for phase #0 to #2, all of which have axial domain boundaries, are shown in (a). In these phases, the fluctuation in directions

Fig. 12. Sequence of spontaneous transitions with step-wise changes of net polarization of water inside the (9, 9) CNT of length 2.1 nm. 280 K. (a) Net axial polarization of water traced

Here, a question still remains: how the mobile phase, the FMW, is reached from the immobile phases? This process is clearly explained in Fig. 13. A schematic, representative arrangement of the confined water molecules for phase #0 to #2, all of which have axial domain boundaries, are shown in (a). In these phases, the fluctuation in directions

in black (units are the magnitude of a single dipole of TIP5P-E water). Two red lines represent the ferroelectric mobile water. Seven blue lines the immobile water. Labels '0' to '4' on five lines represent five independent water phases. (b) Unfolded-view snapshots from each phase. Water molecules are colored red (blue) when their axial dipole moment *Pz* is zero or greater (less than zero). (c) Sequence of unfolded-view snapshots during the transition from phase #0 to #1, which is indicated by yellow arrow in (a). Black circles and dotted rectangles are guide for eyes. From Fig.4 of Nakamura & Ohno, 2011. Reproduced

by permission of the PCCP Owner Societies.

perpendicular to the CNT axis ('lateral fluctuation') is severely restricted due to the protonproton repulsion as shown in the yellow areas in (b). For phase #3, which has no axial domain boundary, there are two representative arrangements: domain-boundary-free (c) and lateral domain boundaries included (f).

Fig. 13. Proton ordering and thermal fluctuation explained with schematic unfolded views (for simplicity, square lattice is used). Left: Without fluctuation. (a) Axial (vertical) domain boundaries included. (c) Domain-boundary-free. (f) Lateral domain boundaries included. Middle: With lateral fluctuation. (b), (d) and (g) correspond to distorted structure examples of (a), (c), and (f), respectively, under lateral fluctuation. Yellow represents repulsions induced by the lateral fluctuation. Arrows in (d) and (g) represent the rotational direction of each dipole. Right: Resultant arrangements. (e) Ferroelectric mobile water reached via (d). (h) Example of unstable arrangements reached via (g). Black dotted lines are guide for eyes.

In the domain-boundary-free case (c), the repulsion also arises under the lateral fluctuation (d). It can, however, be avoided by slightly tilting each dipole orientation toward the CNT axis (indicated by arrows) and giving the axial positions a slight downward slope along the lateral direction (e). The molecules thus align along the single helix running through the CNT without breaching the ice rule. This also gives the reason why phase #4 yeilds spontaneous polarization larger than that of phase #3. In the other case (f), however, avoiding the proton-proton repulsion induced by the lateral fluctuation only leads to instability elsewhere (g to h). More dominant the domain-boundary-free arrangement (c), the more likely the transition from phase #3 to #4 takes place.

### **5. Effect of different models**

Based on the TIP5P-E water model, we have seen the quite unusual behaviour of water molecules inside the (8, 8) and the (9, 9) CNTs, i.e., that formation of single-domain ferroelectric water (FMW) and its concerted diffusion (proton-ordered diffusion). We have also seen in Fig. 5 that for these 'critical' CNT diameters (about 1.1-1.2 nm) there are nontrivial differences in water density among the water models, which strongly suggests that the confined water structure needs to be examined model by model under the same conditions as used for the TIP5P-E water simulations above.

Figure 14 shows the top-view snapshots and net axial polarization of water inside the (9, 9) CNT obtained for each water model, together with unfolded-view snapshot examples for the SPC/E model. The top-view snapshots for the TIP3P and the TIP4P reveal thicker water tubes compared to the SPC/E (and TIP5P-E, see Fig. 5c), clearly demonstrating model dependence. The time evolution of the net axial polarization *Pz* for the TIP3P and the TIP4P reveals considerable fluctuation (Fig. 14a and b, right-hand side). In contrast, a series of stepwise changes is seen for the SPC/E (Fig. 14c, right-hand side). The values fall on any of three lines, suggesting two independent phases.

We see the relation between the two phases of the SPC/E water and their corresponding molecular structure. As well as phase #0 introduced in the TIP5P-E simulations (labeled '0' in Fig. 12b), the antiferroelectric relation between the SPC/E water wires along the CNT axis yields the phase with *Pz* = 0, which is therefore named phase #0 based on the same naming scheme as used in the TIP5P-E water simulations. The transitions to the other magnitude of *Pz* (i.e, phase #1) take place by the flip of the water wire direction, similarly to those demonstrated in the TIP5P-E simulations (Fig. 12c). Phase #0 and #1 consist of axially stacked layers of anomalously immobile water molecules, which are essentially the same for the corresponding phases of the confined TIP5P-E water. It should be note that for the TIP5P-E, there are five independent phases (#0 to #4). For the SPC/E, however, only phase #0 and #1 are observed and the other phases including the FMW (i.e., phase #4) are not observed. The effect of the choice of model between the SPC/E and the TIP5P-E is thus clear.

Also for the (8, 8) CNT, the model dependence is significant (Fig. 15). The top-view snapshot for the TIP3P reveals a thick, less ordered tube structure (a), whereas the TIP4P and the SPC/E reveal a 4-gonal water tube (b and c). The net axial polarization *Pz* for the TIP3P water shows a considerable fluctuation (a, right-hand side). For the TIP4P and the SPC/E, the fluctuation is fairly suppressed and most of the values of *Pz* fall on any of five lines, suggesting three independent phases (b and c, right-hand side).

In the domain-boundary-free case (c), the repulsion also arises under the lateral fluctuation (d). It can, however, be avoided by slightly tilting each dipole orientation toward the CNT axis (indicated by arrows) and giving the axial positions a slight downward slope along the lateral direction (e). The molecules thus align along the single helix running through the CNT without breaching the ice rule. This also gives the reason why phase #4 yeilds spontaneous polarization larger than that of phase #3. In the other case (f), however, avoiding the proton-proton repulsion induced by the lateral fluctuation only leads to instability elsewhere (g to h). More dominant the domain-boundary-free arrangement (c),

Based on the TIP5P-E water model, we have seen the quite unusual behaviour of water molecules inside the (8, 8) and the (9, 9) CNTs, i.e., that formation of single-domain ferroelectric water (FMW) and its concerted diffusion (proton-ordered diffusion). We have also seen in Fig. 5 that for these 'critical' CNT diameters (about 1.1-1.2 nm) there are nontrivial differences in water density among the water models, which strongly suggests that the confined water structure needs to be examined model by model under the same

Figure 14 shows the top-view snapshots and net axial polarization of water inside the (9, 9) CNT obtained for each water model, together with unfolded-view snapshot examples for the SPC/E model. The top-view snapshots for the TIP3P and the TIP4P reveal thicker water tubes compared to the SPC/E (and TIP5P-E, see Fig. 5c), clearly demonstrating model dependence. The time evolution of the net axial polarization *Pz* for the TIP3P and the TIP4P reveals considerable fluctuation (Fig. 14a and b, right-hand side). In contrast, a series of stepwise changes is seen for the SPC/E (Fig. 14c, right-hand side). The values fall on any of

We see the relation between the two phases of the SPC/E water and their corresponding molecular structure. As well as phase #0 introduced in the TIP5P-E simulations (labeled '0' in Fig. 12b), the antiferroelectric relation between the SPC/E water wires along the CNT axis yields the phase with *Pz* = 0, which is therefore named phase #0 based on the same naming scheme as used in the TIP5P-E water simulations. The transitions to the other magnitude of *Pz* (i.e, phase #1) take place by the flip of the water wire direction, similarly to those demonstrated in the TIP5P-E simulations (Fig. 12c). Phase #0 and #1 consist of axially stacked layers of anomalously immobile water molecules, which are essentially the same for the corresponding phases of the confined TIP5P-E water. It should be note that for the TIP5P-E, there are five independent phases (#0 to #4). For the SPC/E, however, only phase #0 and #1 are observed and the other phases including the FMW (i.e., phase #4) are not observed. The effect of the choice of model between the SPC/E and

Also for the (8, 8) CNT, the model dependence is significant (Fig. 15). The top-view snapshot for the TIP3P reveals a thick, less ordered tube structure (a), whereas the TIP4P and the SPC/E reveal a 4-gonal water tube (b and c). The net axial polarization *Pz* for the TIP3P water shows a considerable fluctuation (a, right-hand side). For the TIP4P and the SPC/E, the fluctuation is fairly suppressed and most of the values of *Pz* fall on any of five lines,

suggesting three independent phases (b and c, right-hand side).

the more likely the transition from phase #3 to #4 takes place.

conditions as used for the TIP5P-E water simulations above.

three lines, suggesting two independent phases.

the TIP5P-E is thus clear.

**5. Effect of different models**

Fig. 14. Top-view snapshots and net axial polarization of water inside the (9, 9) CNT of 2.1 nm length for each water model. 280K. For SPC/E model, unfolded-view snapshot examples for phase #0 and #1 are also shown. Polarization and unfolded-view results are adapted from Fig.3 of Nakamura & Ohno, 2012a.

Fig. 15. Top-view snapshots and net axial polarization of water inside the (8, 8) CNT of 2.1 nm length for each water model. 280K. For TIP4P model, unfolded-view snapshot examples for phase #0 to #2 are also shown. Polarization and unfolded-view results are adapted from Fig.5 of Nakamura & Ohno, 2012a.

Fig. 15. Top-view snapshots and net axial polarization of water inside the (8, 8) CNT of 2.1 nm length for each water model. 280K. For TIP4P model, unfolded-view snapshot examples for phase #0 to #2 are also shown. Polarization and unfolded-view results are adapted from

Fig.5 of Nakamura & Ohno, 2012a.

Examples of unfold-view snapshots corresponding to the three independent phases (phase #0 to #2) for the TIP4P are shown on the right-hand side of Fig. 15b. The transition mechanism between the phases is essentially the same as that described earlier. Unlike the confined TIP5P-E revealing the 5-gonal water tube (Fig. 5b), the TIP4P and the SPC/E reveal the 4-gonal tubes (Fig. 15b and c, left-hand side). In addition, the transition to the phase #4 (the FMW) is not observed for both the TIP4P and the SPC/E. The single helix structure of the TIP5P-E water very efficiently fills the interior space of the (8, 8) CNTs compared to the water structures of the other water models, resulting in the highest water density. We note that for the TIP5P-E water, the shorter the (8, 8) CNT length, the more often a 4-gonal single helix water structure is observed. The 4-gonal FMW is, however, rarely observed for the (8, 8) CNTs of length about 2-4 nm compared to the 5-gonal single helix water structure.

We have seen that both for the (8, 8) and the (9, 9) CNTs, the TIP3P water results in a less ordered structure compared to the other water models. The main reason for this is possibly attributed to too fast dynamics of the TIP3P water, which gives the self-diffusion constant of bulk liquid water more than double the experimental one. The reported values (25 C, in the unit of 10-5 cm2/sec) are 2.3 for experiment, 5.19 for TIP3P, 2.49 for SPC/E, 3.29 for TIP4P, and 2.80 for TIP5P-E (Mahoney & Jorgensen, 2001; Mark & Nilsson, 2001; Mills, 1973; Price et al., 1999). The structural ordering of the confined TIP3P water molecules is considered to be interrupted by the unrealistic fluctuation. Here, we briefly refer to a report on H2O/CNT simulations employing a 'modified TIP3P' model. Though this model's diffusion constant is as high as the original TIP3P, a twisted-column shape of water inside the (7, 7), (8, 8) and (9, 9) CNTs was reported at ordinary ambient conditions (Noon et al., 2002). In the modified TIP3P model, the Lennard-Jones parameters on the hydrogen atoms were newly introduced to avoid singularities in integral equation calculations of activation free energies in complex molecular systems (Neria, 1996). The additional parameters, however, do not correct the original TIP3P's well-documented shortcomings including the very high diffusion constant (Mark & Nilsson, 2001). The effect of the additional parameters is therefore considered to be so strong that the ordered structures become possible despite the high diffusion constant.

For the rest of the water models (SPC/E, TIP4P, TIP5P-E), the model dependence has also been observed. The SPC/E and the TIP4P water molecules form into the axially stacked layers of anomalously immobile water molecules. The resultant water phases, however, correspond to the metastable phases of the confined TIP5P-E water, whose stable phase, the FMW, has not been observed for the SPC/E and the TIP4P models. What makes this difference? We pay attention to the fact that among those water models, the TIP5P-E most realistically describes the molecular charge distribution by explicitly taking into account the negative charges along the lone-pair directions of the water molecule. In the TIP4P model, the negative charge is placed on an additional fictive site other than the oxygen atom site. The number of electrostatic interaction sites, however, is three as well as the TIP3P model. Accuracy in the description of the molecular charge distribution is considered to be crucial in simulating the water inside the CNTs of the 'critical' diameters.

### **6. Conclusion**

We have reported recent progress of H2O/CNT simulations, focusing on the unusual behavior of the confined water molecules and the effect of the water models with different numbers of interaction sites (from three to five sites). All the models commonly show that the unusual behavior of the confined water changes in accordance with the CNT diameter size. We have found a critical CNT diameter range (about 1.1-1.2 nm), for which significant anomalous behavior of water that differs from model to model is observed. Except for this range, significant differences between the models have not been found. Based on the 5-site water models, which most realistically describes the molecular charge distribution among the water models used, single-domain ferroelectric water is produced at ambient conditions. The ferroelectric water diffuses while keeping its proton-ordered network intact. The mobile/immobile water transitions accompanied by the step-wise changes in net polarization of water have also been found. The outcome is expected to enable examining the ferroelectricity by detecting an abrupt change in mobility of water molecules, as well as calorimetry and dielectric experiments. Simulations based on the first-principles are also highly desirable to confirm the ferroelectric water. Such studies are now under way.

### **7. References**


numbers of interaction sites (from three to five sites). All the models commonly show that the unusual behavior of the confined water changes in accordance with the CNT diameter size. We have found a critical CNT diameter range (about 1.1-1.2 nm), for which significant anomalous behavior of water that differs from model to model is observed. Except for this range, significant differences between the models have not been found. Based on the 5-site water models, which most realistically describes the molecular charge distribution among the water models used, single-domain ferroelectric water is produced at ambient conditions. The ferroelectric water diffuses while keeping its proton-ordered network intact. The mobile/immobile water transitions accompanied by the step-wise changes in net polarization of water have also been found. The outcome is expected to enable examining the ferroelectricity by detecting an abrupt change in mobility of water molecules, as well as calorimetry and dielectric experiments. Simulations based on the first-principles are also

highly desirable to confirm the ferroelectric water. Such studies are now under way.

*Engineering Science,* 63, pp. 2047–2056.

*Chemical Reviews,* 108, pp. 5014–5034.

*of the National Academy of Sciences,* 103, pp. 19664-19667.

potentials. *Journal of Physical Chemistry,* 91, pp. 6269-6271.

Pullman, (Ed.), pp. 331-342, Reidel, Dordrecht Bramwell, S. T. (1999). Ferroelectric ice. *Nature,* 397, pp. 212-213.

University of California, San Francisco.

*of Chemical Physics,* 124, 144506.

Carbon Nanotubes. *Physical Review Letters,* 90, 105902.

*Probability Theory and Its Applications,* ch. 3, Wiley, USA

Alexiadis, A. & Kassinos, S. (2008a). The density ofwater in carbon nanotubes. *Chemical* 

Alexiadis, A. & Kassinos, S. (2008b). Molecular Simulation of Water in Carbon Nanotubes.

Bai, J.; Wang, J. & Zeng, X. C. (2006). Multiwalled ice helixes and ice nanotubes. *Proceedings* 

Bernal, J. D. & Fowler, R. H. (1933). A Theory of Water and Ionic Solution, with Particular

Berendsen, H. J. C.; Grigera, J. R. & Straatsma, T. P. (1987). The missing term in effective pair

Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F. & Hermans, J. (1981). Interaction

Case, D. A; Darden, T. A.; Cheatham, T. E. III; Simmerling, C. L.; Wang, J.; Duke, R. E.; Luo,

Darden, T.; York, D. & Pedersen, L. (1993). Particle mesh Ewald: An *N*-log (*N*) method for Ewald sums in large systems. *Journal of Chemical Physics,*98, pp. 10089-10092. Dellago, C.; Naor, MM. & Hummer, G. (2003). Proton Transport through Water-Filled

Feller, W. (1968). Fluctuations in Coin Tossing and Random Walks. In: *An Introduction to* 

Fernández, R. G.; Abascal, J. L. F. & Vega, C. (2006). The melting point of ice Ih for common

water models calculated from direct coexistence of the solid-liquid interface. *Journal* 

Reference to Hydrogen and Hydroxyl Ions. *Journal of Chemical Physics,* 1, pp.515-

models for water in relation to protein hydration, In: *Intermolecular Forces,* B.

R.; Merz, K. M.; Pearlman, D. A.; Crowley, M.; Walker, R. C.; Zhang, W.; Wang, B.; Hayik, S.; Roitberg, A.; Seabra, G.; Wong, K. F.; Paesani, F.; Wu, X.; Brozell, S.; Tsui, V.; Gohlke, H.; Yang, L.; Tan, C.; Mongan, J.; Hornak, V.; Cui, G.; Beroza, P.; Mathews, D. H.; Schafmeister, C.; Ross, W. S. & Kollman, P. A. (2006). *AMBER 9*.

**7. References** 

548.


## **Applications of All-Atom Molecular Dynamics to Nanofluidics**

Mauro Chinappi

*Department of Physics, Sapienza University of Rome, P.la Aldo Moro 5, 00185, Rome Italy*

#### **1. Introduction**

318 Molecular Dynamics – Studies of Synthetic and Biological Macromolecules

Nakamura, Y. & Ohno, T. (2011). Ferroelectric mobile water. *Physical Chemistry Chemical* 

Nakamura, Y. & Ohno, T. (2012a). Structure of water confined inside carbon nanotubes and

Nakamura, Y. & Ohno, T. (2012b). Single-Domain Ferroelectric Water and its Concerted

Neria, E. (1996). Simulation of activation free energies in molecular systems. *Journal of* 

Noon, W. H.; Ausman, K. D.; Smalley, R. E. & Ma, J. (2002). Helical ice-sheets inside carbon

Pauling, L. (1935). The Structure and Entropy of Ice and of Other Crystals with Some

Price, W. L.; Ide, H. & Arata, Y. (1999). Self-Diffusion of Supercooled Water to 238 K Using

Rick, S. W. (2004). A reoptimization of the five-site water potential TIP5P for use with Ewald

Slater, J. C. (1941). Theory of the Transition in KH2PO4. *Journal of Chemical Physics,* 9, pp.16-

Su, X.; Lianos, L.; Shen, Y. R. & Somorjai, G. A. (1998). Surface-Induced Ferroelectric Ice on

Tajima, Y.; Matsuo, T. & Suga, H. (1982). Phase transition in KOH-doped hexagonal ice.

Tajima, Y.; Matsuo, T. & Suga, H. (1984). Calorimetric study of phase transition in hexagonal

Vega, C.; Sanz, E. & Abascal, J. L. F. (2005). The melting temperature of the most common

ice doped with alkali hydroxides. *Journal of Physics and Chemistry of Solids,* 45, pp.

sums. *Journal of Chemical Physics,* 120, pp. 6085-6093.

Pt(111). *Physical Review Letters,* 80, pp.1533-1536.

models of water. *Journal of Chemical Physics,* 122, 114507.

nanotubes in the physiological condition. *Chemical Physics Letters,* 355, pp. 445–448.

Randomness of Atomic Arrangement. *Journal of the American Ceramic Society,* 57, pp.

PGSE NMR Diffusion Measurements. *Journal of Physical Chemistry A,* 103, pp. 448-

water models. *Materials Chemistry and Physics, 132, pp. 682–687.*

Diffusion in Nanotubes. Materials Science Forum, 700, pp. 108-111.

*Physics,* 13, pp. 1064–1069.

2680-2684.

450.

33.

1135-1145.

*Nature,* 299, pp. 810-812.

*Chemical Physics,* 105, pp. 1902–1921.

In recent years, due to the progress in the fabrication of micro and nanodevices, nanofluidics has become an intense research field. The interest of the scientific community is evident from the large amount of published papers and by the issuing of dedicated journals. In this scenario, a better understanding of the key aspects of fluid motion in nanoscale systems is fundamental and hence computational techniques able to provide a deeper insight in the complex phenomena involved in nanoscale mass transport play a crucial role in nanofluidic research. The aim of this chapter is to introduce the reader to the application of classical all-atom molecular dynamics (MD) to nanofluidic problems. Nanofluidics is the study of the fluid motion in confined structures whose characteristic size is some nanometers, typically 1 − 100 nm (Eijkel & Berg, 2005). Confined fluids in nanoscale geometries exhibit physical behaviors that, in several cases, largely differ from macroscale dynamics. The crucial difference is that in nanoscale systems the usual mathematical description for continuum fluid dynamics often fails to reproduce the correct behavior. Here we deal with simple liquids (Hansen & McDonald, 2006) and the appropriate macroscopic model is given by the incompressible Navier-Stokes equations for mass and momentum conservation

$$\nabla \cdot \mathbf{u} = 0 \tag{1}$$

$$\frac{\partial \mathbf{u}}{\partial t} + \mathbf{u} \cdot \nabla \mathbf{u} = -\frac{1}{\rho} \nabla p + \nu \nabla^2 \mathbf{u} + \mathbf{f} \tag{2}$$

where *ρ* is the constant fluid mass density, **u** the fluid velocity, *p* the pressure, *ν* the kinematic viscosity and **f** is the force per unit of mass due to external loads. Equations (1) and (2) are usually completed with the impermeability and the no-slip boundary conditions at solid walls that, taken together, can be written as

$$\mathbf{u} = \mathbf{0} \quad \mathbf{x} \in D\_{\prime} \tag{3}$$

where *D* is the solid-liquid interface. The absence of slip at a rigid wall is largely confirmed by direct observations at the macroscale and there are only few well documented cases (see Lauga et al. (2005)) where the use of the no-slip boundary condition (3) at solid walls does not reproduce the correct fluid behavior at macroscale.

Since fluids are composed by molecules it is, in principle, always possible to investigate the flow of a fluid in a nanoconfined system by simulating the motion of each single atom via all-atom molecular dynamics (MD) simulations. This approach, that in general is inefficient and, more often, inapplicable to usual fluid dynamics problems, is crucial for nanofluidics for two main reasons i) it is not based on continuum assumption - that often fails in nanoconfined geometries - and ii) it does not require assumptions on boundary conditions at the interfaces. In recent years MD proved to be a powerful tool for the analysis of several nanofluidics problems such as, among others, the meniscus and contact line dynamics (De Coninck & Blake, 2008; Gentner et al., 2003), the role of precursor film in wetting (Chibbaro et al., 2008), the thermal exchange properties of carbon nanotube (Chiavazzo & Asinari, 2011) and the interface dynamics of a two immiscible liquid system (Orlandini et al., 2011). Stimulated by the experimental results concerning the flow rate through carbon nanotubes that, for narrow channels (a few nanometers), exceed predictions from the no-slip Poiseuille flow by up to several orders of magnitude (Majumder et al., 2005), several authors used MD simulation to investigate the liquid transport through nanopores. Both Lennard-Jones (Cannon & Hess, 2010; Chinappi et al., 2006) and more realistic models, where water and pore are modeled with the state of the art of classical force fields (Falk et al., 2010; Hummer et al., 2001; Thomas et al., 2010) have been used. Interpretations of the observed flow rate enhancement in terms of viscosity changes in the depletion region close to the wall (Myers, 2011) and change of bulk and interface properties due to confinement (Thomas et al., 2010) have been proposed and, in part, tested via MD techniques. Another nanofluidics topic that has attracted the interest of researcher is the liquid slippage on solid walls, i.e. the presence of a finite velocity at the wall and, hence, the failure of no-slip boundary condition (3). As we will see later in more details slippage is associated to both chemical and geometrical features of the solid surface. MD was largely used to explore the role of surface hydrophobicity (Chinappi & Casciola, 2010; Huang et al., 2008), of surface roughness (Zhang et al., 2011), and of the shear rate influence (Niavarani & Priezjev, 2010; Priezjev et al., 2005) on liquid slippage.

Although the large amount of applications and the increasing computational power of calculation systems, for a large number of nanofluidics phenomena the typical time and length scales accessible to MD simulations are still far from application to realistic systems. This implies that Navier Stokes equation (2) has to be used for the description of nanofluidic phenomena. Two main issues arise when using Navier-Stokes equations for nanofluidic systems, i) is the continuum assumption reasonable at the scale of the system? and ii) which boundary condition has to be applied at the wall? In this chapter we discuss these two issues for the case of liquids in nano confined geometries providing examples of how to employ MD simulations for their analysis. Concerning the former issue, we set up a model system for the estimation of the mass flux through a pore due to an external forcing and compare the results with the hydrodynamic prediction obtained via dimensional analysis of Navier-Stokes equations (2). We show that for simple-liquids the threshold for the validity of the continuum assumption is of the order of five times the molecule dimension and that, below this scale, the hydrodynamic prediction underestimates the flow rate. The second issue is addressed by estimating the slippage for a smooth solid wall using different solid surfaces. In agreement with literature results (Huang et al., 2008) we find a relationship between surface wettability and slippage. The section ends with a discussion on the rough surface case and a comparison with experimental results for a specific system of technological interest consisting in water flowing on an hydrophobic coated surface.

The chapter is structured as follows. A first brief section concerns the continuum assumption and the sub-continuum behavior for a liquid flow through a nanopore. Then we introduce the concept of liquid slippage and we present a simulation set-up for MD characterization of slippage for both smooth and rough surfaces. The final section of the chapter is dedicated to a perspective on near future applications of MD to nanofluidic problems.

#### **2. Continuum vs single-file motion**

2 Will-be-set-by-IN-TECH

all-atom molecular dynamics (MD) simulations. This approach, that in general is inefficient and, more often, inapplicable to usual fluid dynamics problems, is crucial for nanofluidics for two main reasons i) it is not based on continuum assumption - that often fails in nanoconfined geometries - and ii) it does not require assumptions on boundary conditions at the interfaces. In recent years MD proved to be a powerful tool for the analysis of several nanofluidics problems such as, among others, the meniscus and contact line dynamics (De Coninck & Blake, 2008; Gentner et al., 2003), the role of precursor film in wetting (Chibbaro et al., 2008), the thermal exchange properties of carbon nanotube (Chiavazzo & Asinari, 2011) and the interface dynamics of a two immiscible liquid system (Orlandini et al., 2011). Stimulated by the experimental results concerning the flow rate through carbon nanotubes that, for narrow channels (a few nanometers), exceed predictions from the no-slip Poiseuille flow by up to several orders of magnitude (Majumder et al., 2005), several authors used MD simulation to investigate the liquid transport through nanopores. Both Lennard-Jones (Cannon & Hess, 2010; Chinappi et al., 2006) and more realistic models, where water and pore are modeled with the state of the art of classical force fields (Falk et al., 2010; Hummer et al., 2001; Thomas et al., 2010) have been used. Interpretations of the observed flow rate enhancement in terms of viscosity changes in the depletion region close to the wall (Myers, 2011) and change of bulk and interface properties due to confinement (Thomas et al., 2010) have been proposed and, in part, tested via MD techniques. Another nanofluidics topic that has attracted the interest of researcher is the liquid slippage on solid walls, i.e. the presence of a finite velocity at the wall and, hence, the failure of no-slip boundary condition (3). As we will see later in more details slippage is associated to both chemical and geometrical features of the solid surface. MD was largely used to explore the role of surface hydrophobicity (Chinappi & Casciola, 2010; Huang et al., 2008), of surface roughness (Zhang et al., 2011), and of the shear rate

influence (Niavarani & Priezjev, 2010; Priezjev et al., 2005) on liquid slippage.

flowing on an hydrophobic coated surface.

Although the large amount of applications and the increasing computational power of calculation systems, for a large number of nanofluidics phenomena the typical time and length scales accessible to MD simulations are still far from application to realistic systems. This implies that Navier Stokes equation (2) has to be used for the description of nanofluidic phenomena. Two main issues arise when using Navier-Stokes equations for nanofluidic systems, i) is the continuum assumption reasonable at the scale of the system? and ii) which boundary condition has to be applied at the wall? In this chapter we discuss these two issues for the case of liquids in nano confined geometries providing examples of how to employ MD simulations for their analysis. Concerning the former issue, we set up a model system for the estimation of the mass flux through a pore due to an external forcing and compare the results with the hydrodynamic prediction obtained via dimensional analysis of Navier-Stokes equations (2). We show that for simple-liquids the threshold for the validity of the continuum assumption is of the order of five times the molecule dimension and that, below this scale, the hydrodynamic prediction underestimates the flow rate. The second issue is addressed by estimating the slippage for a smooth solid wall using different solid surfaces. In agreement with literature results (Huang et al., 2008) we find a relationship between surface wettability and slippage. The section ends with a discussion on the rough surface case and a comparison with experimental results for a specific system of technological interest consisting in water

The chapter is structured as follows. A first brief section concerns the continuum assumption and the sub-continuum behavior for a liquid flow through a nanopore. Then we introduce As we discussed in the introduction, the main causes for the failure of the prediction of the standard continuum model – eq. (1), (2) and (3) – at the nanoscale are the inappropriate boundary conditions and the failure of the continuum assumption. In each MD simulation aimed at reproducing a nanoscale flow these effects are concomitant and hence it is not easy to isolate the two contributions, and, in particular, it is not possible to clearly identify a threshold for the validity of the continuum assumption and to understand how non-continuum features affect the flow. Here we introduce a MD set-up that overcomes the problem strictly controlling the boundary condition at the solid wall. The comparison of MD results with continuum model prediction is then used to estimate a threshold for the validity of the continuum assumption. As a first step in this program we need to recall a hydrodynamic prediction for the flow rate through pores.

The motion of a liquid in a macroscale system is described by the incompressible Navier-Stokes equation (2). For microscale systems the non-linear convective term (**u** · ∇**u**) is negligible respect to the viscous term *<sup>ν</sup>*∇2**<sup>u</sup>** and eq. (2) reduces to the Stokes equation

$$\frac{\partial \mathbf{u}}{\partial t} = -\frac{1}{\rho} \nabla p + \nu \nabla^2 \mathbf{u} + \mathbf{f}.\tag{4}$$

With a reference length *l*0, to be specified later, a reference diffusive time, *t*<sup>0</sup> = *l* 2 <sup>0</sup>/*ν*, speed *u*<sup>0</sup> = *l*0/*t*<sup>0</sup> = *ν*/*l*<sup>0</sup> and pressure *p*<sup>0</sup> = *ν*2*ρ*/*l* 2 <sup>0</sup>, the typical reference force for unit of mass is *f*<sup>0</sup> = *ν*2/*l* 3 <sup>0</sup>. As a result, the dimensionless parameter

$$f^\* = f / f\_{0.7} \tag{5}$$

where *f* = |**f**|, is a natural measure of the external load. With the above choices, the dimensionless form of eq. (4) reads

$$\frac{\partial \mathbf{u}^\*}{\partial t^\*} = -\nabla p^\* + \nabla^2 \mathbf{u}^\* + f^\* \mathbf{\hat{f}},\tag{6}$$

where ˆ **f** = **f**/ *f* and stars indicate dimensionless units. Since we are interested in molecular flow across a pore, it is natural to identify *l*<sup>0</sup> with a length that characterizes the pore diameter and that we will indicate as effective diameter *de*. Let us use the previous formalism to obtain a hydrodynamic prediction for the mass flux through a pore. The mass flux across a surface of area *S* is given by

$$\Phi = \int\_{S} \rho \mathbf{u} d\mathbf{S} = \rho l\_0 \nu \Phi^\* \equiv \Phi\_0 \Phi^\* , \tag{7}$$

with Φ<sup>∗</sup> = *<sup>S</sup>*<sup>∗</sup> **u**∗*d***S**<sup>∗</sup> the dimensionless flux. Since Equation (6) is linear, *u*<sup>∗</sup> (and hence Φ∗) is proportional to *f* ∗ and the scaling law

$$
\Phi \propto \frac{\rho f l\_0^4}{\nu} \tag{8}
$$

for the mass flux is found. Equation (8) is the well-known power-four law for the mass flux of a viscous fluid in a pipe, that, in the particular case of an infinite pipe of diameter *de* with no slip boundary condition at the wall, results in the Hagen-Poiseuille expression *φ* = *πP*� (*de*/2)4/(8*ν*) with *P*� the pressure gradient. It is crucial to note that this dimensional argument is valid only if no other characteristic length scales appear in the problem and, in particular, in the boundary condition. This happens, for instance, in the two cases of no-slip (the velocity at the wall is zero) and no stress (the stress at the wall is zero) boundary conditions but not for partial slip condition where a new characteristic length (the slip length *Ls*, see section 3 below) is present. Equation (8) is the continuum model prediction we will compare to MD simulation results.

#### **2.1 System set-up**

The molecular dynamics set-up presented here is similar to the one presented in Chinappi et al. (2008). For the sake of the clarity we report here the main features, while the interested reader could find the details in the above cited paper. We consider a cylindrical nanopore of height *h* and circular section of radius *r*. The nanopore connects two cylindrical reservoirs of radius *R* (see panel a of Fig. 1). A periodic boundary condition is applied in z-direction, the box length being *Lz*. The liquid molecules are monoatomic and interact via a standard Lennard-Jones (LJ) potential *VL J*(*r*) = <sup>4</sup>*�*[(*σ*/*r*)<sup>12</sup> <sup>−</sup> (*σ*/*r*)6] truncated at distance *rcut* <sup>=</sup> 3.1*σ*. In the rest of the section we will use LJ units. In all the simulations the density of the liquid is *ρ* = 0.83 and the temperature is *θ* = 0.725. The solid wall is modeled as a continuum that occupies a volume *Sw* and the interaction between the wall and a fluid atom located at **r** is given by *Vw*(**r**) = *Sw nw f*(|**r** − **r***w*|)*d***r***w*, where *nw* is a suitable density having dimension of inverse volume, *Sw* is the wall domain and *f*(*r*) is a LJ potential truncated at its minimum. A slab of width *h* endowed with the cylindrical pore separates the two reservoirs. The aspect ratio of the domain and the wall is kept constant in all simulations, namely, *Lz*/*r* = 10, *R*/*r* = 4, *h*/*r* = 1. The origin of reference system is set at the center of the pore with the *z*-coordinate running along its axis. A sketch of the simulation box is reported in panel a of Fig. 1. Due to the steepness of the wall-liquid potential *Vw*(**r**) the isosurface *Vw* = *kbθ* is a natural candidate for the boundary of the volume accessible by liquid atoms. Hence, we define the effective diameter *de* as the diameter of the narrow part of the pore delimited by the isosurface *Vw* = *kbθ*. At this virtual interface, no tangential forces are exerted on molecules. As a consequence of that in the hydrodynamic description this virtual interface corresponds to a free-slip (no stress) impermeable boundary. The flux across the pore is induced by a homogeneous forcing **f** acting on each liquid atom along the pore axis direction. The coupling to the heat-bath in the non-equilibrium simulations is achieved via a Berendsen thermostat (Berendsen et al., 1984) and preliminary tests provided confidence on the small sensitiveness of the presented results to changes in thermostat's characteristic time constant. All the simulations were performed with a molecular dynamics code obtained on the basis of the open-source code DL\_PROTEIN-2.1 (Melchionna & Cozzini, 2001).

Simulations were performed for different values of the effective pore diameter *de* with the number of atoms ranging from *N* = 435 for *de* = 1.83 to 31680 for the largest one at *de* = 9.25. Following Zhu et al. (2004), the mass flux is expressed by means of the collective variable *n*, defined in differential form by its increment in the time interval *dt*:

for the mass flux is found. Equation (8) is the well-known power-four law for the mass flux of a viscous fluid in a pipe, that, in the particular case of an infinite pipe of diameter *de* with no slip boundary condition at the wall, results in the Hagen-Poiseuille expression

argument is valid only if no other characteristic length scales appear in the problem and, in particular, in the boundary condition. This happens, for instance, in the two cases of no-slip (the velocity at the wall is zero) and no stress (the stress at the wall is zero) boundary conditions but not for partial slip condition where a new characteristic length (the slip length *Ls*, see section 3 below) is present. Equation (8) is the continuum model prediction we will

The molecular dynamics set-up presented here is similar to the one presented in Chinappi et al. (2008). For the sake of the clarity we report here the main features, while the interested reader could find the details in the above cited paper. We consider a cylindrical nanopore of height *h* and circular section of radius *r*. The nanopore connects two cylindrical reservoirs of radius *R* (see panel a of Fig. 1). A periodic boundary condition is applied in z-direction, the box length being *Lz*. The liquid molecules are monoatomic and interact via a standard Lennard-Jones (LJ) potential *VL J*(*r*) = <sup>4</sup>*�*[(*σ*/*r*)<sup>12</sup> <sup>−</sup> (*σ*/*r*)6] truncated at distance *rcut* <sup>=</sup> 3.1*σ*. In the rest of the section we will use LJ units. In all the simulations the density of the liquid is *ρ* = 0.83 and the temperature is *θ* = 0.725. The solid wall is modeled as a continuum that occupies a volume *Sw* and the interaction between the wall and a fluid atom located at **r** is

inverse volume, *Sw* is the wall domain and *f*(*r*) is a LJ potential truncated at its minimum. A slab of width *h* endowed with the cylindrical pore separates the two reservoirs. The aspect ratio of the domain and the wall is kept constant in all simulations, namely, *Lz*/*r* = 10, *R*/*r* = 4, *h*/*r* = 1. The origin of reference system is set at the center of the pore with the *z*-coordinate running along its axis. A sketch of the simulation box is reported in panel a of Fig. 1. Due to the steepness of the wall-liquid potential *Vw*(**r**) the isosurface *Vw* = *kbθ* is a natural candidate for the boundary of the volume accessible by liquid atoms. Hence, we define the effective diameter *de* as the diameter of the narrow part of the pore delimited by the isosurface *Vw* = *kbθ*. At this virtual interface, no tangential forces are exerted on molecules. As a consequence of that in the hydrodynamic description this virtual interface corresponds to a free-slip (no stress) impermeable boundary. The flux across the pore is induced by a homogeneous forcing **f** acting on each liquid atom along the pore axis direction. The coupling to the heat-bath in the non-equilibrium simulations is achieved via a Berendsen thermostat (Berendsen et al., 1984) and preliminary tests provided confidence on the small sensitiveness of the presented results to changes in thermostat's characteristic time constant. All the simulations were performed with a molecular dynamics code obtained on the basis of

Simulations were performed for different values of the effective pore diameter *de* with the number of atoms ranging from *N* = 435 for *de* = 1.83 to 31680 for the largest one at *de* = 9.25. Following Zhu et al. (2004), the mass flux is expressed by means of the collective variable *n*,

the open-source code DL\_PROTEIN-2.1 (Melchionna & Cozzini, 2001).

defined in differential form by its increment in the time interval *dt*:

*Sw nw f*(|**r** − **r***w*|)*d***r***w*, where *nw* is a suitable density having dimension of

(*de*/2)4/(8*ν*) with *P*� the pressure gradient. It is crucial to note that this dimensional

*φ* = *πP*�

compare to MD simulation results.

**2.1 System set-up**

given by *Vw*(**r**) =

$$dn(t) = \sum\_{i} \frac{dz\_i}{h} \tag{9}$$

where *dzi* is the displacement of the *i* − *th* molecule in a time step *dt*, and the sum runs over all the molecules within the channel at time *t*, i.e. −*h*/2 < *zi*(*t*) < *h*/2. Hence each molecule crossing the channel from left/right to right/left is associated to an increase/decrease *n* → *n* ± 1. The integer part of *n*(*t* + Δ*t*) − *n*(*t*) measures the number of molecules which cross the channel from left-to-right, minus the ones which cross the channel from right-to-left, in a time interval Δ*t*. In statistically stationary conditions, the flux of molecules is defined as Φ = �*n*(*t* + Δ*t*) − *n*(*t*)�/Δ*t*, where the average is taken by sampling the system in time. For a given pore diameter *de* simulation were performed at different forcing intensities *f* in order to verify that the system is in the linear response regime, i.e. Φ ∝ *f* . In the following only linear response results are considered.

#### **2.2 Flow through a cylindrical nanopore**

In order to test the validity of the continuum prediction eq. (8) we plot in panel d of Fig. 1 the quantity Φ*d*−<sup>4</sup> *<sup>e</sup>* / *f* as a function of the effective diameter *de*. It is apparent that for high *de* the curve tends to a constant value as expected from equation (8) that predicts a power four scaling law of the flux Φ with the diameter *de*. This hydrodynamic behavior sets in at an effective diameter in between 5 and 6 van der Waals radii. This result is coherent with existing literature (see among others Koplik & Banavar (1995) and Bocquet & Charlaix (2009)) indicating that continuum approach for simple liquids is valid on length scales one order of magnitude larger than molecule size. Snapshots of the simulated systems for *de* = 1.83 and *de* = 9.25 are provided in Fig. 1b, where typical configurations of the molecules are reported for equilibrium (no forcing) simulations. The molecules are represented as van der Waals spheres. Those inside the pore are drawn in actual size, while the radii of those outside the pore have been arbitrarily reduced for clarity.

Decreasing *de* the flux is larger than the hydrodynamic prediction and it appears to scale roughly as *d*<sup>3</sup> *<sup>e</sup>*. Observing the snapshot of typical configurations of the molecules for low *de* we see that in the extreme case (*de* = 1.83 lower panel of Fig. 1b) only a single atom can occupy the pore section. This evidence suggests that an appropriate framework for such a quasi-1D motion is the so called single-file motion, namely a sequential motion of concomitant molecules along a line, with no possibility of overtaking. Being the molecules densely packed within the nanopore, each time a molecule enters the inlet mouth, a molecule is kicked-out from the outlet (see Fig. 1c). The many-body aspects of single-file motion can be described by a single parameter, the hopping-rate (Berezhkovskii & Hummer, 2002) *k*, that is the inverse of the characteristic time at which molecules hop inside the inlet mouth of the nanopore, with a sufficient energy to move all the molecules inside the pore, so that the last one is expelled. As suggested by Zhu et al. (2004) in describing the flux of water through a carbon nanotube, the relevant parameter is the potential jump associated with the passage of a particle through the pore, Δ*μ* = *f Lz*. At low forcing intensity, the flux is linear in the potential jump

$$
\Phi = -k \frac{\Delta \mu}{k\_b \theta} \tag{10}
$$

Fig. 1. a) Sketch of the system geometry. The isosurface *Vw*(**r**) = *kbθ* (dot-dashed line) represents a natural choice for the boundary of the volume available to the liquid atoms, hence, the effective diameter *de* is defined as the diameter of the narrow part of the pore delimited by *Vw* = *kbθ*. At this virtual interface, no tangential forces are exerted on molecules. b) Snapshots of equilibrium simulations having two different pore sizes: namely, lower panel *de* = 1.83, upper panel *de* = 9.25. For each panel the image on the left is a projection on *zy* plane, while the image on the right is the projection on the *xy* plane. For the sake of clarity particles inside the pores (i.e. 0.5*h* < *z* < 0.5*h*) are drawn as spheres of diameter *σ* while the other particles as spheres of diameter 0.1*σ*. The image is realized using the VMD software (Humphrey et al., 1996). c) Schematic representation of single-file motion. When a particle enters the pore (e.g. particle 1 of the upper panel), the last particle in the pore exits from the opposite side (particle 5 in lower panel). d) Particle flux Φ divided by the forcing intensity *f* and the hydrodynamic scaling factor *d*<sup>4</sup> *<sup>e</sup>*. The dashed line represents the single-file scaling Φ ∝ *d*<sup>3</sup> *e*.

de

1

single-file scaling Φ ∝ *d*<sup>3</sup>

V = w kbθ

Lz

h

S w z x

r

<sup>5</sup> <sup>6</sup> <sup>4</sup> <sup>3</sup> <sup>1</sup> <sup>2</sup>

(c)

forcing intensity *f* and the hydrodynamic scaling factor *d*<sup>4</sup>

*e*.

<sup>2</sup> <sup>3</sup> <sup>4</sup> <sup>5</sup> <sup>6</sup>

R

(a) (b)

Fig. 1. a) Sketch of the system geometry. The isosurface *Vw*(**r**) = *kbθ* (dot-dashed line) represents a natural choice for the boundary of the volume available to the liquid atoms, hence, the effective diameter *de* is defined as the diameter of the narrow part of the pore delimited by *Vw* = *kbθ*. At this virtual interface, no tangential forces are exerted on

molecules. b) Snapshots of equilibrium simulations having two different pore sizes: namely, lower panel *de* = 1.83, upper panel *de* = 9.25. For each panel the image on the left is a projection on *zy* plane, while the image on the right is the projection on the *xy* plane. For the sake of clarity particles inside the pores (i.e. 0.5*h* < *z* < 0.5*h*) are drawn as spheres of diameter *σ* while the other particles as spheres of diameter 0.1*σ*. The image is realized using the VMD software (Humphrey et al., 1996). c) Schematic representation of single-file motion. When a particle enters the pore (e.g. particle 1 of the upper panel), the last particle in the pore exits from the opposite side (particle 5 in lower panel). d) Particle flux Φ divided by the

10-1

1 2 5 10

(d)

de

*<sup>e</sup>*. The dashed line represents the

2\*10-1

Φ de-4 / f

5\*10-1

Φ ∝ de 3 where *kb* is the Boltzmann constant. In stochastic models for single-file transport (Berezhkovskii & Hummer, 2002), the hopping rate *k* is a phenomenological constant of the model. Roughly speaking, for slight deviations from equilibrium, the hopping rate *k* should depend on fluid density and temperature and, of course, on the cross section of the pore. Hence, at a given thermodynamic state, it is reasonable to argue that the hopping rate is proportional to the pore cross section, i.e. *k* ∝ *d*<sup>2</sup> *<sup>e</sup>*. Since, given **f**, Δ*μ* scales with the longitudinal dimension of the pore, the above expression indicates that the molecular flux in the single-file regime scales as *d*<sup>3</sup> *<sup>e</sup>*, a result in agreement with the molecular dynamics finding.

The presented results could be qualitatively compared with recent MD results on mass flux through carbon nanotubes. Indeed stimulated by experimental studies of water flowing through carbon nanotubes (reporting flow rates exceeding the predictions given by the no-slip Hagen-Poiseuille flow by orders of magnitude (Holt et al., 2006; Majumder et al., 2005; Whitby et al., 2008)) several research groups studied the water flow in nanotubes via MD simulations. Falk et al. (2010) analyzed the water flux inside carbon nanotube (CNT) of different sizes and found that there is a transition in the friction coefficient when the pore diameter is smaller than a couple of nanometers. In particular they found that the water structure is affected by confinement for CNTs of radius below 1.6*nm*, i.e. ∼ 5 times larger that the size of water molecules, and that, below this regime, single-file motion sets in and the friction coefficient vanishes. In the system analyzed by Falk et al. (2010) both boundary (slippage) and confinement effects are present and hence their results could not be quantitatively compared to our case. However it is apparent that the threshold for the separation between the two regimes is in both cases ∼ 5 times the molecule dimension. A similar threshold was found in Thomas et al. (2010) where the authors show that a continuum model is able to reproduce the MD results if slip length and water viscosity changes with the pore size. In particular, the flow enhancement for narrow pores is interpreted in the continuum model as a decrease of the viscosity and an increase of the slippage at the boundary. In this respect it is interesting to point out that recently in Myers (2011) a continuum model based on reduced viscosity in the depletion region (i.e. the viscosity is not uniform and it decreases close to the wall) and no-slip boundary was proposed. This model correctly predicts the flow enhancement in narrow nanotubes, the enhancement factor being analogous of the one obtained using a finite slip-length. The presence in literature of different theoretical models able to interpret the same phenomena remarks the role of MD as a powerful tool for nanofluidic research since proper MD set-ups allow to isolate the different causes that contribute to the observed behaviour.

#### **3. Liquid slippage on solid walls**

In the continuum framework, liquid slippage at the wall is described in terms of the Navier boundary condition, which, in the case of parallel flow over a non-moving planar wall, reads

$$v\_w = L\_s \frac{dv}{dz} \,\prime \tag{11}$$

where *v*(*z*) is the velocity profile with *z* the wall-normal coordinate and *vw* is the slip velocity, i.e. the value the velocity profile attains at the wall. The parameter *Ls* is called the slip length. It is geometrically interpreted as the distance below the wall where the extrapolated fluid velocity vanishes, see Fig. 2. Wall slippage was observed for both smooth and patterned surfaces by several groups, for a review see, among others, Lauga et al. (2005). Slippage is

Fig. 2. Slippage on a planar wall. Left, no-slip surface: the velocity at the wall is zero. Right: partial slip surface, Navier boundary condition: the velocity at the wall *vw* is proportional to the velocity gradient *dv*(*z*)/*dz* with *z* directed as the wall normal toward the liquid (eq. (11)). The constant of proportionality is called the slip length (*Ls*).

typically classified into two broad classes: intrinsic slip, sometimes called molecular slip, and apparent slip. In the first case (intrinsic slip) a non-vanishing slip velocity at the smooth wall results from the first few layers of liquid molecules sliding on the solid surface. In the second case the slip appears at scales intermediate between the characteristic size of the system and the molecular scale. A typical case of apparent slippage is the presence of gas nano-bubbles trapped in the surface roughness elements (center panel of Fig. 5 showing the so called Cassie state), as often happens both for regularly patterned (Gogte et al., 2005) and randomly rough surfaces (Govardhan et al., 2009). The capability of a surface to trap gas bubbles is often associated to the so called superhydrophobic condition where the surfaces are characterized by a low wettability (large contact angle of a water sessile drop together with a low contact angle hysteresis) and slippage (Rothstein, 2010).

In the next section we present some results concerning slippage on smooth walls and discuss the relationship between slippage and wettability. The last section is dedicated to an example of rough surface with nanoscopic defects.

#### **3.1 Intrinsic slippage on smooth surfaces**

On smooth walls the mechanism that is responsible for liquid slippage is the so called intrinsic or molecular slippage where the first few layers of liquid molecules slide on the solid surface. Intrinsic slippage has been largely studied via MD simulations. The general picture emerging is that wettability and slippage are deeply related, in particular the larger the contact angle *θ* the larger the slip length *Ls*. In this respect the data presented by Huang et al. (2008) support the existence of a quasi-universal relationship between contact angle *θ* and slip length *Ls*. However a recent research (Ho et al., 2011) reports MD simulation results that clearly indicate that water slippage could occur also for hydrophilic surfaces suggesting that the connection between *θ* and *Ls* is purely coincidental. Here we use two simple simulation set-ups to measure the contact angle and the slip length on solid surface with different degree of hydrophobicity and discuss our results in the framework of the ongoing debate on the role of wettability in liquid slippage.


Table 1. Summary of the contact angle measurement simulations. The second column reports the value of the parameter *cSL* that rules the attractive part of the modified Lennard-Jones potential (12) for the solid-liquid interaction (*cSL* = 1 for standard LJ potential, *cSL* = 0 for repulsive potential).

#### **3.1.1 Contact angle measurement**

8 Will-be-set-by-IN-TECH

Fig. 2. Slippage on a planar wall. Left, no-slip surface: the velocity at the wall is zero. Right: partial slip surface, Navier boundary condition: the velocity at the wall *vw* is proportional to the velocity gradient *dv*(*z*)/*dz* with *z* directed as the wall normal toward the liquid (eq. (11)).

typically classified into two broad classes: intrinsic slip, sometimes called molecular slip, and apparent slip. In the first case (intrinsic slip) a non-vanishing slip velocity at the smooth wall results from the first few layers of liquid molecules sliding on the solid surface. In the second case the slip appears at scales intermediate between the characteristic size of the system and the molecular scale. A typical case of apparent slippage is the presence of gas nano-bubbles trapped in the surface roughness elements (center panel of Fig. 5 showing the so called Cassie state), as often happens both for regularly patterned (Gogte et al., 2005) and randomly rough surfaces (Govardhan et al., 2009). The capability of a surface to trap gas bubbles is often associated to the so called superhydrophobic condition where the surfaces are characterized by a low wettability (large contact angle of a water sessile drop together with a low contact

In the next section we present some results concerning slippage on smooth walls and discuss the relationship between slippage and wettability. The last section is dedicated to an example

On smooth walls the mechanism that is responsible for liquid slippage is the so called intrinsic or molecular slippage where the first few layers of liquid molecules slide on the solid surface. Intrinsic slippage has been largely studied via MD simulations. The general picture emerging is that wettability and slippage are deeply related, in particular the larger the contact angle *θ* the larger the slip length *Ls*. In this respect the data presented by Huang et al. (2008) support the existence of a quasi-universal relationship between contact angle *θ* and slip length *Ls*. However a recent research (Ho et al., 2011) reports MD simulation results that clearly indicate that water slippage could occur also for hydrophilic surfaces suggesting that the connection between *θ* and *Ls* is purely coincidental. Here we use two simple simulation set-ups to measure the contact angle and the slip length on solid surface with different degree of hydrophobicity and discuss our results in the framework of the ongoing debate on the role

v(z)

The constant of proportionality is called the slip length (*Ls*).

angle hysteresis) and slippage (Rothstein, 2010).

of rough surface with nanoscopic defects.

**3.1 Intrinsic slippage on smooth surfaces**

of wettability in liquid slippage.

Liquid

Wall

Ls

vw

v(z)

The system we consider is formed by a solid Lennard-Jones (LJ) wall and by a LJ liquid droplet (Fig. 4a). Each atom interacts with the others via a modified LJ potential

$$V(r\_{ij}) = 4\epsilon\_{ij} \left[ \left( \frac{\sigma\_{ij}}{r\_{ij}} \right)^{12} - c\_{ij} \left( \frac{\sigma\_{ij}}{r\_{ij}} \right)^{6} \right] \tag{12}$$

with *cij* the parameter used to change the affinity between atoms, indeed *cij* = 0 corresponds to a completely repulsive interaction while for *cij* = 1 the usual attractive tail of LJ potential is recovered. For liquid-liquid interaction we used standard LJ potential, i.e. *�LL* = 1, *σLL* = 1 and *cLL* = 1. The solid is more self attracting than the liquid, *�SS* = 10, *σSS* = 1 and *cSS* = 1 moreover solid atoms are constrained to a face centered cubic (FCC) lattice by a harmonic spring. Concerning solid-liquid interaction we have *�SL* = 1, *σSL* = 1 while *cSL* is varied from 0.1 to 1. In the initial configuration the solid atoms forms a FCC slab of dimension *Lx*, *Ly* and *h* in *x*,*y*, and *z* direction respectively, while the liquid atoms are arranged as a spherical cut, the center of the sphere being at a distance *b* from the last layer of solid atoms. Periodic boundary conditions are applied in the three directions being *Lx*, *Ly* and *Lz* the dimensions of the periodic box. Both solid and liquid atoms' initial positions are on a FCC lattice of density *ρ<sup>w</sup>* = *ρ<sup>l</sup>* = 0.83. Initial velocities are assigned to give to the system an initial kinetic energy corresponding to a temperature *T* = 0.75. During the equilibration phase a thermostat is applied to both wall and liquid atoms. The system temperature *T* = 0.75 (smaller than LJ fluid critical temperature) and the volume available to fluid atoms correspond to a two phase liquid-vapor system on the LJ phase diagram (Hansen & McDonald, 2006). During equilibration the initial FCC lattice structure used to assign the initial position to the liquid atoms disappears and some atoms leave the droplet surface and enter in the vapor phase. Moreover the droplet rearranges until reaching a steady state where the contact angle *θ* does not change. The system is considered at equilibrium when the time evolution of the number of fluid particles in vapor phase and the contact angle do not appreciably change. At this point the thermostat is turned off and a NVE equilibrium simulation is run. In all the cases analyzed we do not noticed further changes of the droplet contact angle during the NVE run. Fig. 3 reports snapshots of the equilibration phase of a droplet from initial configuration to equilibrium state for the cases *cSL* = 0.3 and *cSL* = 0.7, starting from an initial configuration where the center of the sphere (liquid phase) is located at a distance *b* = 2 from the surface. It is apparent that the equilibrium contact angle is strongly affected by the liquid-solid attraction parameter *cSL* resulting in an almost complete dewetting condition for *cSL* = 0.3 (panels a,c and e of Fig. 3) to a weak hydrophobicity for *cSL* = 0.7 (panels b,d and f of Fig. 3). We observe (data not shown) that the final state is independent from the initial position of the droplet and in particular from the value of *b* that rules the contact angle for the initial configuration. However the time needed to reach equilibrium dramatically increase if the initial guess of contact angle is far from the equilibrium value, this effect is particularly relevant if the initial contact angle is much larger than the equilibrium one. Equilibrium contact angle is estimated as in Werder et al. (2003) and Chinappi & Casciola (2010), i.e. calculating the density profile, defining the droplet surface as the set of points for which *ρ* is the half of the liquid density inside the droplet and fitting the surface points to a sphere. It is known that at nanometric scale, the contact angle of a drop may significantly differ from its macroscopic value due to line-tension. For instance in the case of a liquid water droplet on a hydrophobically coated surface (Chinappi & Casciola, 2010), the equilibrium contact angle is *θ* � 120◦ for a droplet of radius *r* � 34Å and *θ* � 112◦ for a droplet of radius *r* � 60Å leading to a macroscopic contact angle *θmacro* � 105◦, obtained fitting the modified Young equation, cos *θ* = cos *θmacro* − *τ*/(*rbγlv*) with *τ* the line tension at the three-phase line, *rb* its curvature radius and *γlv* the surface tension at the liquid-vapor interface. For the purpose of the present section, since we are interested in the correlation between contact angle *θ* and slippage and not in the measurement the precise values of *θ*, we do not consider this systematic correction due to the line tension *τ*. In Table 1 the value of *cSL* for the cases considered in this chapter and the measured contact angles *θ* are reported. It is apparent that *θ* increases when the attraction between solid and liquid decreases and, in particular, for *cSL* = 0.1 (case E) the thermal agitation is able to detach the drop from the solid wall.

#### **3.1.2 The Couette flow MD set-up and the slip length measurement**

A usual way to measure intrinsic slippage by MD simulation is to induce a shear in the fluid and to estimate the slip length *Ls* from the extrapolation of the bulk velocity profile. To this purpose the Couette flow is a natural choice since the shear is homogeneous and, hence, the bulk velocity profile in stationary state is linear. In order to induce a Couette flow we prepared a system formed by a channel where a liquid is confined by two solid walls. Periodic boundary conditions are implemented in wall parallel directions (*x* and *y*) being *Lx* and *Ly* the box dimensions. As in the case of droplet simulations the lower wall is constrained in a FCC lattice by harmonic potential. The upper wall is constrained only in *z* direction and a constant force is applied to each wall atom in the *x* direction resulting, after a transient, in a stationary Couette flow. The atoms of the lower wall are coupled to a thermostat in order to dissipate the heat produced by viscous drag in the liquid.

#### **3.1.3 Results, effect of liquid-solid interaction**

In panel b of Fig.4 the intrinsic slip length *Ls* is plotted as a function of the equilibrium contact angle *θ*. It is apparent that the for the hydrophilic surface there is no slippage. Moreover the slippage is found to increase with the contact angle. The figure reports also the data from a previous study on water slippage on hydrophobic coatings (Chinappi & Casciola, 2010). These results qualitatively confirm the picture of Huang et al. (2008) that propose a quasi universal relationship between the contact angle *θ* and intrinsic slip length *Ls*. In particular we do not observe slippage for hydrophilic walls. As we pointed out in section 3.1 a recent research (Ho et al., 2011) reports a positive slip length also for hydrophilic surfaces. In that paper the slippage is observed for hydrophilic surface only in the case the wall lattice spacing is

and e of Fig. 3) to a weak hydrophobicity for *cSL* = 0.7 (panels b,d and f of Fig. 3). We observe (data not shown) that the final state is independent from the initial position of the droplet and in particular from the value of *b* that rules the contact angle for the initial configuration. However the time needed to reach equilibrium dramatically increase if the initial guess of contact angle is far from the equilibrium value, this effect is particularly relevant if the initial contact angle is much larger than the equilibrium one. Equilibrium contact angle is estimated as in Werder et al. (2003) and Chinappi & Casciola (2010), i.e. calculating the density profile, defining the droplet surface as the set of points for which *ρ* is the half of the liquid density inside the droplet and fitting the surface points to a sphere. It is known that at nanometric scale, the contact angle of a drop may significantly differ from its macroscopic value due to line-tension. For instance in the case of a liquid water droplet on a hydrophobically coated surface (Chinappi & Casciola, 2010), the equilibrium contact angle is *θ* � 120◦ for a droplet of radius *r* � 34Å and *θ* � 112◦ for a droplet of radius *r* � 60Å leading to a macroscopic contact angle *θmacro* � 105◦, obtained fitting the modified Young equation, cos *θ* = cos *θmacro* − *τ*/(*rbγlv*) with *τ* the line tension at the three-phase line, *rb* its curvature radius and *γlv* the surface tension at the liquid-vapor interface. For the purpose of the present section, since we are interested in the correlation between contact angle *θ* and slippage and not in the measurement the precise values of *θ*, we do not consider this systematic correction due to the line tension *τ*. In Table 1 the value of *cSL* for the cases considered in this chapter and the measured contact angles *θ* are reported. It is apparent that *θ* increases when the attraction between solid and liquid decreases and, in particular, for *cSL* = 0.1 (case E) the

thermal agitation is able to detach the drop from the solid wall.

the heat produced by viscous drag in the liquid.

**3.1.3 Results, effect of liquid-solid interaction**

**3.1.2 The Couette flow MD set-up and the slip length measurement**

A usual way to measure intrinsic slippage by MD simulation is to induce a shear in the fluid and to estimate the slip length *Ls* from the extrapolation of the bulk velocity profile. To this purpose the Couette flow is a natural choice since the shear is homogeneous and, hence, the bulk velocity profile in stationary state is linear. In order to induce a Couette flow we prepared a system formed by a channel where a liquid is confined by two solid walls. Periodic boundary conditions are implemented in wall parallel directions (*x* and *y*) being *Lx* and *Ly* the box dimensions. As in the case of droplet simulations the lower wall is constrained in a FCC lattice by harmonic potential. The upper wall is constrained only in *z* direction and a constant force is applied to each wall atom in the *x* direction resulting, after a transient, in a stationary Couette flow. The atoms of the lower wall are coupled to a thermostat in order to dissipate

In panel b of Fig.4 the intrinsic slip length *Ls* is plotted as a function of the equilibrium contact angle *θ*. It is apparent that the for the hydrophilic surface there is no slippage. Moreover the slippage is found to increase with the contact angle. The figure reports also the data from a previous study on water slippage on hydrophobic coatings (Chinappi & Casciola, 2010). These results qualitatively confirm the picture of Huang et al. (2008) that propose a quasi universal relationship between the contact angle *θ* and intrinsic slip length *Ls*. In particular we do not observe slippage for hydrophilic walls. As we pointed out in section 3.1 a recent research (Ho et al., 2011) reports a positive slip length also for hydrophilic surfaces. In that paper the slippage is observed for hydrophilic surface only in the case the wall lattice spacing is

Fig. 3. Time evolution of a droplet from initial condition to steady state for a highly hydrophobic case – *cSL* = 0.3 (panels a,c,e) – and a slightly hydrophobic one, *cSL* = 0.7 (panels b,d,f).

significantly smaller that liquid molecule size. Here, as in Huang et al. (2008), the dimension of the liquid molecules is similar to solid lattice spacing, hence it is not surprising that our results confirm the picture that associates hydrophobicity and slippage. On the other hand this fact suggests a natural way to further investigate the issue. Indeed it is easy to systematically vary the solid lattice dimension and repeat both contact angle and slippage measurements. Such analysis, performed with a minimal model such as the LJ system, could reveal if the slippage on hydrophilic surfaces is a general phenomenon to be ascribed only to the ratio between solid and liquid molecule sizes or is due to specific choices for the force field implemented in Ho et al. (2011).

#### **3.2 Slippage on rough surfaces: the example of OTS coatings**

In order to address the role of surface nanoscale defects on slippage, we consider the case of an Octedecyltrichlorosilane (OTS) coated surface. The OTS molecule is formed by a linear alkyl of 17 carbon atoms with a methyl group on one end and a *SiCl*<sup>3</sup> (silane) group on the other end. OTS molecules are able to spontaneously form layers (sometimes a monolayer) where the molecules are assembled in hexagonal cells with the silane group attached to the solid surface and the methyl group exposed toward the fluid. Due to their ability to form compact layers exposing the methyl (hydrophobic) group, OTS coatings represent a promising technology for surface functionalization. Several groups quantified the slip length *Ls* of liquid water on smooth OTS coated surfaces (Bouzigues et al., 2008; Cottin-Bizonne et al., 2008; Li & Yoda,

Fig. 4. a) Sketch of the MD set-up for slip length measurement. The lower wall is constrained and thermalized while a constant forcing is applied to the upper wall in wall parallel direction, moreover the *z* position of the upper wall is constrained. Periodic boundary conditions are applied in *x* and *y*. b) Contact angle *θ* vs slip length *Ls* for the simulations described in the section 3.1 (filled squares). The empty circle corresponds to a result of a previous study on wettability and slippage of an Octadecyltichlorosilane (OTS) coated surface Chinappi & Casciola (2010). In order to map this result in LJ units we use the value 3.3Å as van der Waals radius of the water molecule.

2010). Here our aim is to assess the effect of nanoscale roughness of the coating on the water slippage on a OTS layer.

#### **3.2.1 Defected OTS coating: MD set-up**

The typical defect considered here consists of a hole of diameter *D*. In the greater part of the simulations the defect is obtained removing the OTS molecules in a circle of diameter *D* hence exposing the LJ (hydrophilic) uncoated surface. For the largest system considered here, for reason that will be clear later, the defect is obtained in a slightly different way; using alkyl molecules of different lengths, namely 11 carbon atoms alkyl chain for the defect and 29 carbon atoms alkyl chain for other molecules. The two different length chains are represented as dark and light blue molecules in the left panel of Fig. 6. Concerning the water molecules we used the TIP3P model (Jorgensen et al., 1983). This model fails to reproduce some of the water properties, in particular the viscosity (one third of the actual water viscosity) and the surface tension (slightly smaller). More accurate models exist such as the TIP4P/2005 (Abascal & Vega, 2005; Alejandre & Chapela, 2010) but they are more computationally demanding and, for the specific case of slippage on smooth OTS surfaces, they did not lead to results qualitatively different from the TIP3P model (Chinappi et al., 2011). A summary of the simulations, performed with the NAMD software (Phillips et al., 2005), is reported in Table 2. The box dimensions *Lx* and *Ly* are reported in columns 2 and 3 while column 4 reports the diameter of the defect. Equilibration was performed at 300 K and 1 atm. Systems were equilibrated following the same procedure described in Chinappi & Casciola (2010) for water slippage on OTS ideal (not defected) coatings and here briefly reported for

0

0 45 90 135 180

(b)

θ

5

10

Ls

Fig. 4. a) Sketch of the MD set-up for slip length measurement. The lower wall is

constrained and thermalized while a constant forcing is applied to the upper wall in wall parallel direction, moreover the *z* position of the upper wall is constrained. Periodic boundary conditions are applied in *x* and *y*. b) Contact angle *θ* vs slip length *Ls* for the simulations described in the section 3.1 (filled squares). The empty circle corresponds to a result of a previous study on wettability and slippage of an Octadecyltichlorosilane (OTS) coated surface Chinappi & Casciola (2010). In order to map this result in LJ units we use the

2010). Here our aim is to assess the effect of nanoscale roughness of the coating on the water

The typical defect considered here consists of a hole of diameter *D*. In the greater part of the simulations the defect is obtained removing the OTS molecules in a circle of diameter *D* hence exposing the LJ (hydrophilic) uncoated surface. For the largest system considered here, for reason that will be clear later, the defect is obtained in a slightly different way; using alkyl molecules of different lengths, namely 11 carbon atoms alkyl chain for the defect and 29 carbon atoms alkyl chain for other molecules. The two different length chains are represented as dark and light blue molecules in the left panel of Fig. 6. Concerning the water molecules we used the TIP3P model (Jorgensen et al., 1983). This model fails to reproduce some of the water properties, in particular the viscosity (one third of the actual water viscosity) and the surface tension (slightly smaller). More accurate models exist such as the TIP4P/2005 (Abascal & Vega, 2005; Alejandre & Chapela, 2010) but they are more computationally demanding and, for the specific case of slippage on smooth OTS surfaces, they did not lead to results qualitatively different from the TIP3P model (Chinappi et al., 2011). A summary of the simulations, performed with the NAMD software (Phillips et al., 2005), is reported in Table 2. The box dimensions *Lx* and *Ly* are reported in columns 2 and 3 while column 4 reports the diameter of the defect. Equilibration was performed at 300 K and 1 atm. Systems were equilibrated following the same procedure described in Chinappi & Casciola (2010) for water slippage on OTS ideal (not defected) coatings and here briefly reported for

15

Ls

slippage on a OTS layer.

Liquid

vw

Constrained Thermostatted

(a)

**3.2.1 Defected OTS coating: MD set-up**

Constrained (z direction)

v(z)

value 3.3Å as van der Waals radius of the water molecule.

Forced

Fig. 5. Wenzel and Cassie states. In Wenzel state (left) the liquid fills all the asperities and wets the whole surface, while in the Cassie state (center) gas (or vapor) pockets are trapped in the surface grooves resulting in a patterned interface (liquid-solid zones alternated with liquid-gas zones). The right panel illustrates the concept of apparent slip on a Cassie state (superhydrophobic) surface. The negligible stresses at the liquid-vapor interface results in a bulk velocity profile that could be reproduced by a continuum model with a Navier boundary condition with a uniform slip length *Ls* on an effective flat surface.

the sake of clarity. The wall is modeled as a Lennard-Jones (LJ) solid with the parameters *σ* and *�* selected to have a melting point at 1 atm well above the simulation temperature. The solid-liquid interface is parallel to the *xy* plane and corresponds to a 111 plane of the LJ FCC structure. The alkyl chain head group binds the wall and it is treated in hybrid manner, as a LJ atoms of the wall as concerning no-bonded interactions and as a carbon atom of the alkyl chains for the interactions with bonded atoms along the same chain. The equilibration phase was performed in a triply-periodic box where, by periodicity, the coated wall forms a unique bunch of solid with the upper wall. During equilibration a Langevin piston is applied in the wall-normal direction (*z*) in order to relax the system to the desired thermodynamic state. After equilibration the two walls were separated by inserting a void region (larger than the LJ cut-off radius of 12Å) before starting the Couette simulation. Concerning the coated wall, the atoms of its lower plane were keep fixed, while the other LJ atoms were coupled to a Langevin thermostat. Concerning the uncoated wall, the atoms of its upper plane were constrained in the wall-normal direction by a harmonic spring and, as in the case of smooth walls described in the previous section, a constant force parallel to the solid-liquid interface is applied to all the upper wall atoms.

#### **3.2.2 Results**

Water molecules did not fill the hole during the equilibration phase for all the cases we simulated with the exception of the largest hole we considered (case E in Table 2) where in the first stage of the equilibration process a great number of water molecules enter the defects and get trapped at the LJ hydrophilic surface resulting in a complete wetting (Wenzel) state (panel a of Fig. 5). In order to avoid this effect we introduced the slightly different system we discussed in the previous section where the defect is obtained by using alkyl chain of different lengths resulting in a hole where also the bottom surface is hydrophobic (see the left panel of Fig. 6). This system shows a very stable Cassie state. For all the simulations the slip length *Ls* is measured from the velocity profile (see right panel of Fig. 6). For the five cases considered *Ls* is reported in the sixth column of the Table 2. It is apparent that *Ls* increases with the hole diameter and with the system size. Moreover *Ls* is larger with respect to the smooth case (*Ls* � 5 − 7Å (Chinappi & Casciola, 2010)) as expected from the vanishing friction at the liquid-vapor interface.

Following the definition introduced in section 3 the observed slippage has to be classified as apparent slippage. In this context it is interesting to compare our MD data with continuum model results that are available in literature. The simplest way to study the apparent slippage with a continuum model is to consider a patterned surfaces where the interface is composed by solid areas where the local slippage either vanishes or, alternatively, conforms to the small intrinsic slip at solid-liquid interface (*Lin*) and by gaseous areas (corresponding to the nanobubbles) where no shear stress is acting on the liquid phase (Ng & Wang, 2010; Ybert et al., 2007). Defining the solid fraction Φ*<sup>s</sup>* as the ratio between the area of the liquid-solid interface and the projected area, i.e., for the present case <sup>Φ</sup>*<sup>s</sup>* <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>π</sup>D*2/(4*LxLy*) reported in column 5 of Table 2), Ng & Wang (2010) found that, if no intrinsic slip is assumed (*Lin* = 0 at the solid-liquid interface), a continuum model based on Stokes equation leads to the relation

$$\frac{L\_{\rm s,0}}{A} = -0.134 \ln(\Phi\_{\rm s}) - 0.023\tag{13}$$

where *A* is the cell side length of a squared periodic lattice. The suffix "0" in expression (13) indicates that no-slip (*Lin* = 0) is assumed at the liquid-solid portion of the interface. For a partially slipping solid surface (*Lin* > 0) embedding the free-slip hole, the apparent slip length increases in proportion to the intrinsic slip length *Lin* and in inverse proportion to the solid fraction Φ*<sup>s</sup>* (Ybert et al., 2007),

$$
\Delta L\_{\rm s} = L\_{\rm s} - L\_{\rm s,0} \simeq \frac{L\_{\rm in}}{\Phi\_{\rm s}} \, \, \, \, \tag{14}
$$

with order one proportionality constant. Eqs. (13) and (14) allow to predict the apparent slip length for the case considered in our MD simulations, the prediction is reported in column 6 of Table 2 for three values of the intrinsic slip length, namely *Lin* = 0, 5, 10Å. The comparison between MD and continuum model indicates that the continuum description of the apparent slip on patterned surface is valid also at the nanoscale. Moreover the value of intrinsic slippage for *Lin* that have to be used in equation (14) to obtain a quantitative agreement is in between 5 and 10Å that is consistent with the intrinsic slip measured for smooth OTS coated surfaces (Chinappi & Casciola, 2010). A similar quantitative agreement was evidenced by the comparison of MD results for LJ fluid slippage on superhydrophobic surface by Cottin-Bizonne et al. (2003) and lattice Boltzmann simulation by Benzi et al. (2006). As in the case of simple liquid analyzed in section 2 the assessment of the capability of a continuum model to reproduce (or not) a nanoscale fluid dynamics system is a precious contribution that MD is able to provide to nanofluidic research.

Moreover the presented results concerning apparent slippage on defected OTS coatings and, in particular, the stability of the superhydrophobic (Cassie) state we observed for all the considered systems, suggest a possible explanation for an interesting issue pointed out by a careful analysis of both experimental and numerical results on water slippage on hydrophobic surfaces presented by Bocquet & Charlaix (2009). The issue is the following: while for

lengths resulting in a hole where also the bottom surface is hydrophobic (see the left panel of Fig. 6). This system shows a very stable Cassie state. For all the simulations the slip length *Ls* is measured from the velocity profile (see right panel of Fig. 6). For the five cases considered *Ls* is reported in the sixth column of the Table 2. It is apparent that *Ls* increases with the hole diameter and with the system size. Moreover *Ls* is larger with respect to the smooth case (*Ls* � 5 − 7Å (Chinappi & Casciola, 2010)) as expected from the vanishing friction at the

Following the definition introduced in section 3 the observed slippage has to be classified as apparent slippage. In this context it is interesting to compare our MD data with continuum model results that are available in literature. The simplest way to study the apparent slippage with a continuum model is to consider a patterned surfaces where the interface is composed by solid areas where the local slippage either vanishes or, alternatively, conforms to the small intrinsic slip at solid-liquid interface (*Lin*) and by gaseous areas (corresponding to the nanobubbles) where no shear stress is acting on the liquid phase (Ng & Wang, 2010; Ybert et al., 2007). Defining the solid fraction Φ*<sup>s</sup>* as the ratio between the area of the liquid-solid interface and the projected area, i.e., for the present case <sup>Φ</sup>*<sup>s</sup>* <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>π</sup>D*2/(4*LxLy*) reported in column 5 of Table 2), Ng & Wang (2010) found that, if no intrinsic slip is assumed (*Lin* = 0 at the solid-liquid interface), a continuum model based on Stokes equation leads to the relation

where *A* is the cell side length of a squared periodic lattice. The suffix "0" in expression (13) indicates that no-slip (*Lin* = 0) is assumed at the liquid-solid portion of the interface. For a partially slipping solid surface (*Lin* > 0) embedding the free-slip hole, the apparent slip length increases in proportion to the intrinsic slip length *Lin* and in inverse proportion to the solid

<sup>Δ</sup>*Ls* <sup>=</sup> *Ls* <sup>−</sup> *Ls*,0 � *Lin*

with order one proportionality constant. Eqs. (13) and (14) allow to predict the apparent slip length for the case considered in our MD simulations, the prediction is reported in column 6 of Table 2 for three values of the intrinsic slip length, namely *Lin* = 0, 5, 10Å. The comparison between MD and continuum model indicates that the continuum description of the apparent slip on patterned surface is valid also at the nanoscale. Moreover the value of intrinsic slippage for *Lin* that have to be used in equation (14) to obtain a quantitative agreement is in between 5 and 10Å that is consistent with the intrinsic slip measured for smooth OTS coated surfaces (Chinappi & Casciola, 2010). A similar quantitative agreement was evidenced by the comparison of MD results for LJ fluid slippage on superhydrophobic surface by Cottin-Bizonne et al. (2003) and lattice Boltzmann simulation by Benzi et al. (2006). As in the case of simple liquid analyzed in section 2 the assessment of the capability of a continuum model to reproduce (or not) a nanoscale fluid dynamics system is a precious

Moreover the presented results concerning apparent slippage on defected OTS coatings and, in particular, the stability of the superhydrophobic (Cassie) state we observed for all the considered systems, suggest a possible explanation for an interesting issue pointed out by a careful analysis of both experimental and numerical results on water slippage on hydrophobic surfaces presented by Bocquet & Charlaix (2009). The issue is the following: while for

Φ*s*

*<sup>A</sup>* <sup>=</sup> <sup>−</sup>0.134 ln(Φ*s*) <sup>−</sup> 0.023 (13)

, (14)

*Ls*,0

contribution that MD is able to provide to nanofluidic research.

liquid-vapor interface.

fraction Φ*<sup>s</sup>* (Ybert et al., 2007),


Table 2. Summary of the molecular dynamics simulations of the Couette flow on defected OTS-SAM coatings. Columns 2 and 3 report the periodic cell dimensions, *Lx* and *Ly* respectively, column 4 the diameter *D* of the circular defect and column 5 the solid fraction <sup>Φ</sup>*<sup>s</sup>* <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>π</sup>D*2/(4*LxLy*). The apparent slip length *Ls* obtained from MD simulations is in the 6*th* column. The last column provides the value obtained combining expression 13 and 14 where *A* = (*LxLy*)0.5 is used as characteristic length. The three values correspond to intrinsic slip on solid-liquid interface *Lin* = 0 (no-slip at solid-liquid interface), and *Lin* = 5, 10 Å. The symbol −− for case C with no slip condition on solid surface is due to the fact that, as pointed out by Ng & Wang (2010) expression (13) provides a good fit of their numerical data only in the range of Φ*<sup>s</sup>* ∈ (0.22, 0.75).

smooth surfaces an increase in slip length with hydrophobicity was found for both MD and experiments, for a given contact angle *θ* the value of the experimental estimated *Ls* is larger than MD results of about one order of magnitude. In particular in the case of the OTS coatings the most credited experimental data (obtained with different techniques (Bouzigues et al., 2008; Cottin-Bizonne et al., 2008; Li & Yoda, 2010)) indicates a slip length of ∼ 20*nm* while the MD value is in the range 0.5 − 1.5*nm*. Since the MD simulations were performed by different authors (Chinappi & Casciola, 2010; Chinappi et al., 2011; Huang et al., 2008; Sendner et al., 2009) using different force fields, computational codes and numerical set-ups, this discrepancy could hardly be ascribed to modeling inaccuracies affecting the MD simulations. Hence we turn our attention on the putative presence of a non-negligible amount of wall roughness in the *smooth* surface analyzed in the cited experiments. Indeed several studies on the structure of OTS coatings suggest that perfectly smooth coatings as those analyzed in ideal MD simulations do not exist in practice. Cottin-Bizonne et al. (2008) and Joseph & Tabeling (2005) indicate the peak to peak distance between asperities in their OTS coatings in a few nanometers. Additionally, spectroscopic variable angle ellipsometry, neutron reflection and atomic force microscopy by Styrkas et al. (1999) suggest that OTS films often consist of nearby multilayer domains of different thickness, ranging from 1 to 3 and even 4 times the monolayers thickness. Our conjecture is that apparent slip effects occurring on nanopatterned surfaces can, in principle, explain the difference observed between MD simulations and experiments. By being able to trap nanobubbles, such defects locally induce slippage and increase the apparent slip length. As we show, the resulting apparent *Ls* can be estimated with continuum model and hence, a value for the typical defect size needed to obtain the observed slippage can be estimated from equation (13) and (14). In order to validate or reject the proposed scenario, the following issues need to be addressed with care: i) Presence of suitable nanoscale surface defects able to explain the experimentally measured *Ls*. ii) Stability of the superhydrophobic (Cassie) state for those nanoscale defects. The stable Cassie state we observed for all the discussed MD simulations provides a first step in the assessment of the second issue, however further analysis are needed. In particular the defect size needed to reconcile MD simulation with experiment is larger that the one analyzed here. Moreover

Fig. 6. Left. Snapshot of the defected OTS coated surface for simulation E of Table 2. The LJ (hydrophilic, in green) surface is coated by alkyl chains of different lengths, 11 carbon atoms (dark blue) in the circular defect and 29 carbon atoms chains otherwise. Right. Velocity profile obtained for the defect in the left panel, the observed slip length is ∼ 25Å.

another crucial point to be clarified in the future is the robustness of the picture to changes in the defects geometry and the effect of meniscus curvature. All these issues could be analyzed with appropriate MD simulations.

### **4. Conclusion and perspectives**

In this chapter we introduced the reader to some applications of classical all-atom molecular dynamics to nanofluidic problems and in particular to two main crucial issues i) the validity continuum assumption at nanoscale for simple liquids and the sub-continuum behavior and ii) the role of hydrophobicity and surface roughness on the liquid slippage at solid walls. For both problems dedicated MD set-ups have been prepared and some results were presented and compared with existing literature. Concerning the former issue we showed, in agreement with previous findings, that continuum model is appropriate for simple liquids when the characteristic size of the system is 5 − 6 times larger than the liquid molecule dimension. Below this threshold the molecular flow across a pore appears to be best described by a single-file model, that predicts a more efficient power-three scaling law for the mass flux. This result, achieved with a minimal model (Lennard-Jones liquid) provides a further ingredient in the interpretation of the high flow rate measured by recent experiment for water flux through carbon nanotubes. Concerning the second issue we discussed an ongoing debate on the connection of hydrophobicity with slippage. Our results suggest that the picture proposed by Huang et al. (2008) is valid when the sizes of liquid and solid molecules are similar however further investigation are needed to assess if the slippage for hydrophilic

Fig. 6. Left. Snapshot of the defected OTS coated surface for simulation E of Table 2. The LJ (hydrophilic, in green) surface is coated by alkyl chains of different lengths, 11 carbon atoms (dark blue) in the circular defect and 29 carbon atoms chains otherwise. Right. Velocity profile obtained for the defect in the left panel, the observed slip length is ∼ 25Å.

another crucial point to be clarified in the future is the robustness of the picture to changes in the defects geometry and the effect of meniscus curvature. All these issues could be analyzed

In this chapter we introduced the reader to some applications of classical all-atom molecular dynamics to nanofluidic problems and in particular to two main crucial issues i) the validity continuum assumption at nanoscale for simple liquids and the sub-continuum behavior and ii) the role of hydrophobicity and surface roughness on the liquid slippage at solid walls. For both problems dedicated MD set-ups have been prepared and some results were presented and compared with existing literature. Concerning the former issue we showed, in agreement with previous findings, that continuum model is appropriate for simple liquids when the characteristic size of the system is 5 − 6 times larger than the liquid molecule dimension. Below this threshold the molecular flow across a pore appears to be best described by a single-file model, that predicts a more efficient power-three scaling law for the mass flux. This result, achieved with a minimal model (Lennard-Jones liquid) provides a further ingredient in the interpretation of the high flow rate measured by recent experiment for water flux through carbon nanotubes. Concerning the second issue we discussed an ongoing debate on the connection of hydrophobicity with slippage. Our results suggest that the picture proposed by Huang et al. (2008) is valid when the sizes of liquid and solid molecules are similar however further investigation are needed to assess if the slippage for hydrophilic

with appropriate MD simulations.

**4. Conclusion and perspectives**

surface recently observed by Ho et al. (2011) is a general feature that emerges when solid molecules are significantly smaller that liquid ones or it is associated to the specific choices the authors did for liquid and solid molecules. Concerning slippage we also presented results for the role of surface roughness discussed here for a specific system of technological relevance, namely, the OTS coatings, providing a further example of the usage of all-atom MD for nanofluidics research. MD not only allowed to measure the apparent slippage for the specific shape of the defect but, more interestingly, was here used to propose a different interpretation of the experimentally measured slippage for hydrophobic coatings on smooth surface. Future MD studies will provide (or not) further elements to the proposed conjecture eventually stimulating experimental validations.

Beside the results for the specific problems presented here, a secondary aim of the chapter was to use examples taken from ongoing research to give a picture of the capabilities (and the limits) of all-atom Molecular Dynamics simulation for nanofluidic applications. In general the use of MD for nanofluidics problems could divided in two broad classed. From one hand one could try to use the state of the art of the force fields to make a quantitative predictions for a specific issue. Examples are the water flow through carbon nanotubes cited in section 2 and the water slippage on a given surface (such as the OTS coating discussed in section 3.2). On the other hand MD could be used to perform in silico experiments on simple systems employing a minimal model for the interatomic interaction (e.g. Lennard-Jones liquids) aimed at isolating and analyzing new phenomena or to address specific questions such as the validity of continuum assumption 2, the role of hydrophobicity in wetting (section 3.1 presented here and many other issues discussed in the literature). Both usages will be crucial for nanofluidics research and the increase of computational power in the next years will, for sure, allow the researcher to tackle problems on length and time scales that are out of the range that can be presently explored via MD, such as macromolecules behavior in liquid flows (currently addressed via coarse grained model) and large multiphase systems.

### **5. Acknowledgment**

We would like to thank Mr. Alberto Giacomello for useful discussion and help in LJ simulations of section 3.1 and Dr. Guido Bolognesi for useful discussions. Part of the simulations presented in section 3.2 were performed using computing resources made available by CASPUR under HPC Grant 2011.

### **6. References**


Bocquet, L. & Charlaix, E. (2009). Nanofluidics, from bulk to interfaces, *Chem. Soc. Rev.*

Bouzigues, C., Bocquet, L., Charlaix, E., Cottin-Bizonne, C., Cross, B., Joly, L., Steinberger,

Cannon, J. & Hess, O. (2010). Fundamental dynamics of flow through carbon nanotube

Chiavazzo, E. & Asinari, P. (2011). Enhancing surface heat transfer by carbon nanofins: towards an alternative to nanofluids?, *Nanoscale Research Letters* 6(1): 1–13. Chibbaro, S., Biferale, L., Diotallevi, F., Succi, S., Binder, K., Dimitrov, D., Milchev, A., Girardo,

Chinappi, M. & Casciola, C. (2010). Intrinsic slip on hydrophobic self-assembled monolayer

Chinappi, M., De Angelis, E., Melchionna, S., Casciola, C., Succi, S. & Piva, R. (2006).

Chinappi, M., Gala, F., Zollo, G. & Casciola, C. (2011). Tilting angle and water slippage over

Chinappi, M., Melchionna, S., Casciola, C. & Succi, S. (2008). Mass flux through asymmetric

Cottin-Bizonne, C., Barrat, J., Bocquet, L. & Charlaix, E. (2003). Low-friction flows of liquid at

Cottin-Bizonne, C., Steinberger, A., Cross, B., Raccurt, O. & Charlaix, E. (2008).

De Coninck, J. & Blake, T. (2008). Wetting and molecular dynamics simulations of simple

Eijkel, J. & Berg, A. (2005). Nanofluidics: what is it and what can we expect from it?,

Falk, K., Sedlmeier, F., Joly, L., Netz, R. & Bocquet, L. (2010). Molecular origin of fast

Gentner, F., Ogonowski, G. & De Coninck, J. (2003). Forced wetting dynamics: a molecular

Gogte, S., Vorobieff, P., Truesdell, R., Mammoli, A., van Swol, F., Shah, P. & Brinker, C. (2005). Effective slip on textured superhydrophobic surfaces, *Physics of Fluids* 17: 051701. Govardhan, R., Srinivas, G., Asthana, A. & Bobji, M. (2009). Time dependence of effective slip

Ho, T., Papavassiliou, D., Lee, L. & Striolo, A. (2011). Liquid water can slip on a hydrophilic surface, *Proceedings of the National Academy of Sciences* 108(39): 16170–16175.

on textured hydrophobic surfaces, *Physics of Fluids* 21: 052001. Hansen, J. & McDonald, I. (2006). *Theory of simple liquids*, Academic press.

*Mathematical, Physical and Engineering Sciences* 366(1869): 1455.

membranes, *Microfluidics and Nanofluidics* 8(1): 21–31.

coatings, *Physics of Fluids* 22: 042003.

*Physical review letters* 97(14): 144509.

*Physical and Engineering Sciences* 369(1945): 2537.

nanopatterned interfaces, *Nature Materials* 2: 237–240.

A., Ybert, C. & Tabeling, P. (2008). Using surface force apparatus, diffusion and velocimetry to measure slip lengths, *Philosophical Transactions of the Royal Society A:*

S. & Pisignano, D. (2008). Evidence of thin-film precursors formation in hydrokinetic and atomistic simulations of nano-channel capillary filling, *EPL (Europhysics Letters)*

Molecular dynamics simulation of ratchet motion in an asymmetric nanochannel,

hydrophobic coatings, *Philosophical Transactions of the Royal Society A: Mathematical,*

nanopores: microscopic versus hydrodynamic motion, *The Journal of Chemical Physics*

Nanohydrodynamics: The Intrinsic Flow Boundary Condition on Smooth Surfaces,

water transport in carbon nanotube membranes: Superlubricity versus curvature

39(3): 1073–1095.

84: 44003.

129: 124717.

*Langmuir* 24(4): 1165–1172.

liquids, *Annu. Rev. Mater. Res.* 38: 1–22.

dependent friction, *Nano letters* .

*Microfluidics and Nanofluidics* 1(3): 249–267.

dynamics study, *Langmuir* 19(9): 3996–4003.

