*2.3.2.2 Threading*

Threading, also known as fold recognition is a method that searches the protein structure template in a library of folds with the lowest possible energy for a given query sequence [15]. Fold recognition of a sequence requires a precise alignment of the query sequence corresponding to the positions of the amino acid residues of a folding motif. A set of possible positions of the amino acids in 3D space is

**Figure 2.** *A scheme of homology modeling.*

established by the known structure. This step is followed by making a similar structure by placing the amino acids of the query sequence into their aligned positions. The main goal of this method is either to choose the most probable fold for any given sequence or to find out the appropriate sequences that have the possibility to fold into a given structure. This method is heavily dependent on the knowledge of experimental atomic details of the recognized protein folds and is generally applicable for only those proteins whose amino acid sequences adopt one of the protein folds that have already been experimentally established.

### *2.3.3 Approaches based on hierarchy*

The Hierarchical approach is another strategy for protein structure prediction from their sequences. In principle, this method uses the hierarchy of protein structure, i.e., from the primary to secondary structure and secondary to tertiary structure. Thus, in order to understand the relationship of the primary amino acid sequence and the tertiary 3D structure, the intermediate secondary structure is predicted. This intermediate structure is used to build the tertiary 3D structure. A number of algorithms are developed for the modeling of secondary structure, but, unfortunately, the precision for prediction of secondary structures from their sequences is only about 80%. Currently the methods that are available for the secondary structure modeling can be divided into methods based on statistics, physicochemical properties, evolutionary information, combinatorial analysis and artificial intelligence [31–33].

### **2.4 Structure prediction methods and benchmarking**

The performance assessment of existing methods is one of the major setbacks in the field of protein structure prediction as methods have been and are still in the process of development using different proteins with various evaluation criterions. Thus, in 1994, an open experiment was conducted all over the world with the intention of helping the developers and users of these methods. The experiment was called the Critical Assessment of Protein Structure Prediction (CASP) (https:// predictioncenter.org/) [34]. The CASP is a community-wide, worldwide experiment which is conducted every two years since 1994. CASP allows research groups to test their structure prediction algorithms and establish the current state of the art in protein structure prediction. They help to identify the current progress as well as highlight the efforts that are needed to be addressed in the future.

## **3. Proteins: structure and function**

Proteins are simple polymers of amino acids. The short stretches of polymers join together and get folded to form secondary structures which in turn give rise to the 3D structure of proteins. The secondary structures can be recognized either by hydrogen-bonding (H-bond) patterns among the carbonyl and amide groups in a peptide backbone or from the dihedral angles viz. phi and psi. Mainly two known secondary structures in a protein are α-helices and β-sheets which tend to build up into small repeating arrangements in protein structures; termed as 'supersecondary structures' or 'motifs'. These secondary structures assemble into larger subunits of structures termed as 'domains'. Domains can be further understood as the smallest structural unit of proteins which can be folded autonomously such as serine protease which is made up of two β barrel domains. Proteins comprises either of a single domain or multiple domains. Protein structures were for the first time categorized into folds in 1976 [35]. Murzein et al. later incorporated the idea and developed the

**39**

*Role of Force Fields in Protein Function Prediction DOI: http://dx.doi.org/10.5772/intechopen.93901*

publicly accessible database named SCOP (Structural Classification of Proteins) [36]. Folds in the SCOP were categorized by the class of secondary structure: all α, all β, α/β (wherein helices and sheets are mixed) and α + β (separate helices and sheets). Proteins are the most ubiquitous biomolecules and they accomplish the vast majority of functions in all the biological domains. The sequence-structurefunction paradigm attracted the interests of scientists all over the world. As the proper functioning of all the biological processes depends on proteins and their non-functioning leads to grave diseases and disorder, biologists started working on them ever since. Way back in 1970s, Anfinsen have proposed that the 3D structure of native proteins comes from its sequence in a specified environment [37].

As proteins are dynamic in nature, experimental techniques fail to capture their different dynamical conformations and specially the transition between these conformations. One of the most widely utilized computational techniques, Molecular

MD simulations assist us to comprehend and witness the time dependent behavior

of proteins. As MD simulations have the ability to show the dynamic behavior of proteins at the level of atoms, it is also considered as computational microscope [38]. In this technique one requires an initial protein model which is obtained by either experimental methods or predictive modeling. As life sustains itself in water therefore one mimics simulation in explicit solvent. When the forces acting on all the atoms were acquired, Newton's laws of motion were utilized to compute the velocities and accelerations; besides updating the atom's positions. A time step of 2 fs (femtosecond) is usually applied for atomistic simulations while integrating the movement numerically. Finally, a trajectory of the system is generated by MD engine which can be further analyzed based on set objectives. The technique was first utilized in early 70's to study the most relevant biological challenge of the time; protein folding [39, 40]. The subsequent decades saw the application of MD simulations for investigating folding and unfolding mechanism of proteins [41]. Duan and Kollman were successful in 1998 to perform 1 μs MD simulation for the first time on parallel supercomputer. They investigated the protein folding mechanism of villin with explicit solvation [42]. Apart from proteins, the technique has been extended to study other relevant biomol-

Simulation of any system revolves around lot of factors. Earlier the system size comprises of few thousand of atoms. With the advancement of both experimental and computational techniques, availability of 3D data in regard to proteins, proteins complexes, membrane proteins etc. has been possible which made the system size amplified to several lakhs of atoms with explicit solvent in consideration [50]. Meanwhile the advent of high-performance computing (HPC) and algorithm parallelization made it possible to run long timescale simulations for the above-mentioned systems. Further advancements in the algorithms of MD engines and/or the implementation of GPUs (graphical processing units) along with CPUs have significantly improved the performance of MD simulations. Some of the most popular simulation engines are: AMBER, CHARMM, DESMOND, GROMACS and NAMD. They have been integrated with messaging passing interface (MPI), which made it possible to utilize all the available cores of the computer simultaneously during a MD run to reduce the computation time.

Force fields (FF) lie at heart of the MD simulation. In order to perform simulation, one needs the parameters to deduce the potential energy function [51].

Dynamics (MD) Simulation tackles this challenge efficiently.

**3.1 Molecular dynamics: the computational microscope**

ecules [43, 44] and protein-nanoparticle interactions [45–49].

**3.2 Workhorse of simulation: the force fields**

*Role of Force Fields in Protein Function Prediction DOI: http://dx.doi.org/10.5772/intechopen.93901*

*Homology Molecular Modeling - Perspectives and Applications*

folds that have already been experimentally established.

**2.4 Structure prediction methods and benchmarking**

highlight the efforts that are needed to be addressed in the future.

**3. Proteins: structure and function**

*2.3.3 Approaches based on hierarchy*

established by the known structure. This step is followed by making a similar structure by placing the amino acids of the query sequence into their aligned positions. The main goal of this method is either to choose the most probable fold for any given sequence or to find out the appropriate sequences that have the possibility to fold into a given structure. This method is heavily dependent on the knowledge of experimental atomic details of the recognized protein folds and is generally applicable for only those proteins whose amino acid sequences adopt one of the protein

The Hierarchical approach is another strategy for protein structure prediction from their sequences. In principle, this method uses the hierarchy of protein structure, i.e., from the primary to secondary structure and secondary to tertiary structure. Thus, in order to understand the relationship of the primary amino acid sequence and the tertiary 3D structure, the intermediate secondary structure is predicted. This intermediate structure is used to build the tertiary 3D structure. A number of algorithms are developed for the modeling of secondary structure, but, unfortunately, the precision for prediction of secondary structures from their sequences is only about 80%. Currently the methods that are available for the secondary structure modeling can be divided into methods based on statistics, physicochemical properties, evolutionary information, combinatorial analysis and artificial intelligence [31–33].

The performance assessment of existing methods is one of the major setbacks in the field of protein structure prediction as methods have been and are still in the process of development using different proteins with various evaluation criterions. Thus, in 1994, an open experiment was conducted all over the world with the intention of helping the developers and users of these methods. The experiment was called the Critical Assessment of Protein Structure Prediction (CASP) (https:// predictioncenter.org/) [34]. The CASP is a community-wide, worldwide experiment which is conducted every two years since 1994. CASP allows research groups to test their structure prediction algorithms and establish the current state of the art in protein structure prediction. They help to identify the current progress as well as

Proteins are simple polymers of amino acids. The short stretches of polymers join together and get folded to form secondary structures which in turn give rise to the 3D structure of proteins. The secondary structures can be recognized either by hydrogen-bonding (H-bond) patterns among the carbonyl and amide groups in a peptide backbone or from the dihedral angles viz. phi and psi. Mainly two known secondary structures in a protein are α-helices and β-sheets which tend to build up into small repeating arrangements in protein structures; termed as 'supersecondary structures' or 'motifs'. These secondary structures assemble into larger subunits of structures termed as 'domains'. Domains can be further understood as the smallest structural unit of proteins which can be folded autonomously such as serine protease which is made up of two β barrel domains. Proteins comprises either of a single domain or multiple domains. Protein structures were for the first time categorized into folds in 1976 [35]. Murzein et al. later incorporated the idea and developed the

**38**

publicly accessible database named SCOP (Structural Classification of Proteins) [36]. Folds in the SCOP were categorized by the class of secondary structure: all α, all β, α/β (wherein helices and sheets are mixed) and α + β (separate helices and sheets). Proteins are the most ubiquitous biomolecules and they accomplish the vast majority of functions in all the biological domains. The sequence-structurefunction paradigm attracted the interests of scientists all over the world. As the proper functioning of all the biological processes depends on proteins and their non-functioning leads to grave diseases and disorder, biologists started working on them ever since. Way back in 1970s, Anfinsen have proposed that the 3D structure of native proteins comes from its sequence in a specified environment [37].

As proteins are dynamic in nature, experimental techniques fail to capture their different dynamical conformations and specially the transition between these conformations. One of the most widely utilized computational techniques, Molecular Dynamics (MD) Simulation tackles this challenge efficiently.

#### **3.1 Molecular dynamics: the computational microscope**

MD simulations assist us to comprehend and witness the time dependent behavior of proteins. As MD simulations have the ability to show the dynamic behavior of proteins at the level of atoms, it is also considered as computational microscope [38]. In this technique one requires an initial protein model which is obtained by either experimental methods or predictive modeling. As life sustains itself in water therefore one mimics simulation in explicit solvent. When the forces acting on all the atoms were acquired, Newton's laws of motion were utilized to compute the velocities and accelerations; besides updating the atom's positions. A time step of 2 fs (femtosecond) is usually applied for atomistic simulations while integrating the movement numerically. Finally, a trajectory of the system is generated by MD engine which can be further analyzed based on set objectives. The technique was first utilized in early 70's to study the most relevant biological challenge of the time; protein folding [39, 40]. The subsequent decades saw the application of MD simulations for investigating folding and unfolding mechanism of proteins [41]. Duan and Kollman were successful in 1998 to perform 1 μs MD simulation for the first time on parallel supercomputer. They investigated the protein folding mechanism of villin with explicit solvation [42]. Apart from proteins, the technique has been extended to study other relevant biomolecules [43, 44] and protein-nanoparticle interactions [45–49].

Simulation of any system revolves around lot of factors. Earlier the system size comprises of few thousand of atoms. With the advancement of both experimental and computational techniques, availability of 3D data in regard to proteins, proteins complexes, membrane proteins etc. has been possible which made the system size amplified to several lakhs of atoms with explicit solvent in consideration [50]. Meanwhile the advent of high-performance computing (HPC) and algorithm parallelization made it possible to run long timescale simulations for the above-mentioned systems. Further advancements in the algorithms of MD engines and/or the implementation of GPUs (graphical processing units) along with CPUs have significantly improved the performance of MD simulations. Some of the most popular simulation engines are: AMBER, CHARMM, DESMOND, GROMACS and NAMD. They have been integrated with messaging passing interface (MPI), which made it possible to utilize all the available cores of the computer simultaneously during a MD run to reduce the computation time.

#### **3.2 Workhorse of simulation: the force fields**

Force fields (FF) lie at heart of the MD simulation. In order to perform simulation, one needs the parameters to deduce the potential energy function [51]. The FF is a group of equations and associated parameters designed to imitate molecular geometry and selected properties of some tested molecules. FF comprises primarily of two components; bonded and non-bonded terms. Any molecular feature can be basically represented with them. The bonded terms can be represented by springs for bond length and angles along with torsional angles; the non-bonded terms comprise of Lennard-Jones potentials for van der Waals (vdW) interactions and Coulomb's law for electrostatic interactions. They were primarily developed to reproduce structural properties and applied to predict other properties such as thermodynamic parameters. Further the energy functions utilized in molecular mechanics commonly comprise topological parameters which are obtained from experiments or quantum mechanical calculations. An important feature of FF is transferability of the parameters and the functional form. It means to model a series of related molecules; the same set of parameters can be utilized rather than defining a new set of parameters for each individual molecule. Even though most of the FF are additive, a number of them having higher order terms are called class II FF. Some of widely utilized FF for bio-molecular simulations are AMBER, CHARMM, GROMOS and OPLS [52]. Additionally it is noteworthy to mention the application of FF in predicting structures of proteins/RNA. FFs were developed and benchmarked against experimentally solved structures and these FF were later incorporated to predict the structure for the ones lacking experimental information. Another important aspect of the FF is to discriminate the near-native protein conformation among the generated 3D models [53]. FFs are subject to rigorous scrutinizing and they were refined to improve their accuracy over time. One such example is the improvement of the residue side-chain torsion potentials of the Amber ff99SB FF which is also validated with available NMR experimental datasets [54]. A number of benchmark studies were conducted time to time, to compare different FFs. One difference arises among the available variety of FF is the bias/overestimate towards particular secondary structure of proteins. Man et al. recently concluded from their comparative simulation study that FFs (AMBER94, AMBER99 & AMBER12SB) were not able to predict β-sheet formation whereas FFs (AMBER96, GROMOS45a3, GROMOS53a5, GROMOS53a6, GROMOS43a1, GROMOS43a2, and GROMOS54a7) were able to form β-sheets swiftly. Further they have showed that the best FFs for investigating amyloid peptide assembly based on their structure and kinetics were AMBER99-ILDN, AMBER14SB, CHARMM22\*, CHARMM36, and CHARMM36m [55].
