**2. The folding problem is a NP-hard problem involving a degenerate informational code**

As implied by the well-known Levinthal paradox (Levinthal, 1968), a full exploration of the entire conformational space theoretically available to a protein is out of the reach of current computational techniques. Equally unaccessible to nature is the sequence space available to polypeptide chains (Kono & Saven, 2001). Currently, the amount of available protein structures (the PDB) represents a fraction of the known protein amino acid sequences, and if the available sample is grouped in terms of different folds, the diversity in the PDB is even smaller. In addition, protein structure and function can tolerate a significant number of

© 2012 Martínez-Castilla and Rodríguez-Sotres, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Martínez-Castilla and Rodríguez-Sotres, licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

mutations. Both facts suggest an important degree of degeneracy between the information in polypeptide sequences and the associated code leading to their native structure (Bowie et al., 1990). In other words, the so-called folding code is degenerate.

On the Assessment of Structural Protein Models with ROSETTA-Design and HMMer: Value, Potential and Limitations 217

Comparative modeling exploits the wealth of experimental structural information nowadays available for proteins (Rose et al., 2011), and relies on powerful sequence alignment algorithms (Wallace et al., 2005). In CASP contests, comparative modeling servers, such as I-TASSER (Roy et al., 2010), ROBETTA (Kim et al., 2004) and SAM-T08 (Karplus, 2009), have achieved a high success rate in their predictions for protein 3Dstructures of low to intermediate difficulty (as defined by the CASP staff). Yet, one mayor limitation in these methods lies in the strategies used to match each amino acid in a target sequence to its corresponding best hosting spot in the 3D-structure of the template and,

In *ab initio* methods, the laws of physics and chemistry and/or artificial intelligence are used to generate a prediction for a native-like folding solution of a protein with known amino acid sequence (Dill et al., 2008). While *ab initio* methods have been less successful than comparative modeling, these are the only choice if no suitable homologous 3D-template is

The above considerations are all fine when the question is to grade the methods and chose the one with highest success rate, but to date, no single method gives the correct answer every time. Yet, the final aim of such methods is to produce good native-like protein 3Dpredictions, when experimental X-ray or NMR data are not available. How then is it possible to set apart models with wrong fold assignment, from those with a correct fold assignment, but with a mistraced sequence to 3D-fold alignment (Luthy et al., 1992)? Is it possible to identify cases where the fold assignment and the alignment are adequate, but the solution to the atom repacking of replaced amino acids is deficient? These questions lie

The quality assessment is of particular relevance in cases where a suitable 3D-template cannot be found, because the predicted 3D-model cannot be compared back the starting template. Again, this problem can be tackled with a number of strategies, and most of them have been implemented as computer software programs, and their validity tested at the

Quality assessment methods for the predicted 3D-structures of proteins can be classified

i. Physics-based methods use the regularities in chemical structures and the laws of physics and chemical bonding to find how much a 3D-structure deviates from the known canonical values. These methods may come in the form of force-fields and they report energies (Hu & Jiang, 2010), or may seek for abnormalities in geometrical and chemical features such as bonding lengths, bonding angles, dihedral torsion values,

ii. Statistics-based methods use the known 3D-structures to generate a set of probability distributions for a number of features of the experimentally solved structures. These distributions can be used as reference to judge the quality of a prediction. When these probability distributions are transformed into energies, using the Boltzmann law, the

again, this is a NP-complete problem (Lathrop, 1994).

available, for a given amino acid sequence (Kryshtafovych et al., 2009).

behind the quality assessment of a protein 3D-structure prediction.

charge-charge distances and so on (Rodriguez et al. 1998).

CASP contests (Shi et al., 2009).

according to their underlying principles:

However, even if the number of protein structural folds is smaller that the sequence space, the folding problem is still unsolved, because exploring the total number of conformations available to a protein or its energy landscape are NP-hard problems (Hart & Istrail 1997), and because the available methods to calculate the energy of a protein conformation imply a large systematic error (Faver et al., 2011).

The above facts set forth the intractability of solving the problem through an exhaustive search. Nevertheless, proteins in nature do reach a native structure in short times, and finding a native-like solution to the three-dimensional structure of a protein may not require a full examination of the conformational space, or its corresponding energy landscape. In fact, recent years have seen important progress in the search for solutions to the protein folding problem (Dill et al., 2008).
