**12. References**

234 Bioinformatics

the target.

**Author details** 

**Acknowledgement** 

**Abbreviations** 

4290-09, PAIP-FQ-UNAM 4290-07.

**11. Conclusions and perspectives** 

Although the Rd.HMM protocol is highly sensitive and its alignments become inaccurate when the HMM score decreases, it can be used to guide the comparative modeling of proteins, as the examples given in section 8 show. Even if the alignment employed is flawed, when the model is produced and analyzed with Rd.HMM, the flaw will become evident and

An additional advantage of Rd.HMM alignments, as a guide to comparative modeling, comes form the fact that Rd.HMM models are independent of the functional constrains reflected in the conservation of active and binding sites. Since the Rd. step removes all conservation due to ligand binding and functional sites, other than that required to keep the structure stable, geometrical differences in the organization of two related, but not identical active sites will not affect the modeling process. In contrast, in the classic comparative modeling methods, the residue conservation at active and other functional sites is usually an important reference to perform the sequence to structure alignment. Then when a model in produced with the guidance of Rd.HMM, and a model with good quality and appropriateness is obtained, any coincidences in the active site geometry, would not come as a consequence of forcing the conserved residues in the target sequence to fall at the template's active site, but should be a consequence of meeting the structural requirements of

From the above discussion, Rd.HMM is clearly a valuable tool, but has some limitations. We speculate that some of this limitations derive form the inability of HMMs to incorporate long range interactions, which can be detected as significant mutual information between distant positions in the sequence alignments. Currently we are working on the analysis of the mutual information in the Rosetta-designed sequence alignments using the statistical coupling analysis strategy (Socolich et al., 2005,Lockless et al., 1999). We hope this powerful

Funding PAPIIT-DGAPA-UNAM IN210212, CONACyT CB2008-1-101186, PAIP-FQ-UNAM

NP-hard problem, as hard to solve as an NP-complete problem; NP-complete problem, no algorithm taking a polynomial-time exists for its solution; Rd, ROSETTA-design; MM,

statistical approach can extend the Rd.HMM and provide a richer tool.

*Facultad de Química, Universidad Nacional Autónoma de México, México* 

León P. Martínez-Castilla and Rogelio Rodríguez-Sotres

the model can then be discarded, and additional modeling rounds may be tried.


Hamelryck, T., Borg, M., Paluszewski, M., Paulsen, J., Frellsen, J., Andreetta, C., Boomsma, W., Bottaro, S., & Ferkinghoff-Borg, J.(2010). Potentials of Mean Force for Protein Structure Prediction Vindicated, Formalized and Generalized. *PLoS ONE,* Vol. 5, No. 11, (Nov 2010) pp. e13714, ISSN e1932-6203

On the Assessment of Structural Protein Models with ROSETTA-Design and HMMer: Value, Potential and Limitations 237

Levinthal, C.(1968). Are there pathways for protein folding? *Journal de Chimie Physique et de Physicochimie Biologique,* Vol. 65, No. 1-4, (Jan 1968) pp. 44-45, ISSN 0021-7689 Lockless, S. W. & Ranganathan, R.(1999). Evolutionarily conserved pathways of energetic connectivity in protein families. *Science,* Vol. 286, No. 5438, (Oct 1999) pp. 295-299, ISSN

Luthy, R., Bowie, J. U., & Eisenberg, D.(1992). Assessment of protein models with threedimensional profiles. *Nature,* Vol. 356, No. 6364, (Mar 1992) pp. 83--85, ISSN 0028-0836

Martínez-Castilla, L. P. & Rodríguez-Sotres, R.(2010). A score of the ability of a threedimensional protein model to retrieve its own sequence as a quantitative measure of its quality and appropriateness. *PLoS One,* Vol. 5, No. 9, (Sep 2010) pp. e12483, ISSN

Melo, F. & Feytmans, E.(1998). Assessing protein structures with a non-local atomic interaction energy. *J Mol Biol,* Vol. 277, No. 5, (Apr 1998) pp. 1141-1152, ISSN 0022-2836 Mengersen, K. L. & Tweedie, R. L.(1966). Rates of convergence of the Hastings and Metropolis algorithms. *Annals of Statistics,* Vol. 24, No. 1, (Feb 1966) pp. 101-121, ISSN

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E.(1953). Equation of State Calculations by Fast Computing Machines. *The Journal of Chemical Physics,* Vol. 21, No. 6, (Jun 1953) pp. 1087-1092, ISSN 0021-9606 (print), 1089-7690

Pawlowski, M., Gajda, M. J., Matlak, R., & Bujnicki, J. M.(2008). MetaMQAP: a meta-server for the quality assessment of protein models. *BMC Bioinformatics,* Vol. 9, No. Sep, (Sep

Raha, K. & Merz, Jr, K. M.(2005). Large-scale validation of a quantum mechanics based scoring function: predicting the binding affinity and the binding mode of a diverse set of protein-ligand complexes. *J Med Chem,* Vol. 48, No. 14, (Jul 2005) pp. 4558-4575, ISSN

Rodriguez, R., Chinea, G., Lopez, N., Pons, T., & Vriend, G.(1998). Homology modeling, model and software evaluation: three related resources. *Bioinformatics,* Vol. 14, No. 6,

Rosales-León, L., Hernández-Domínguez, E. E., Gaytán-Mondragón, S., & Rodríguez-Sotres, R.(2012). Metal binding sites in plant soluble inorganic pyrophosphatases. An example of the use of ROSETTA design and hidden Markov models to guide the homology modeling of proteins. *Journal of the Mexican Chemical Society,* Vol. 56, No. 1, (Jan-Mar

Rose, P. W., Beran, B., Bi, C., Bluhm, W. F., Dimitropoulos, D., Goodsell, D. S., Prlic, A., Quesada, M., Quinn, G. B., Westbrook, J. D., Young, J., Yukich, B., Zardecki, C., Berman, H. M., & Bourne, P. E.(2011). The RCSB Protein Data Bank: redesigned web site and web services. *Nucleic Acids Res,* Vol. 39, No. Database issue, (Jan 2011) pp. D392-D401,

Röthlisberger, D., Khersonsky, O., Wollacott, A. M., Jiang, L., DeChancie, J., Betker, J., Gallaher, J. L., Althoff, E. A., Zanghellini, A., Dym, O., Albeck, S., Houk, K. N., Tawfik, D. S., & Baker, D.(2008). Kemp elimination catalysts by computational enzyme design.

(Jul 1998) pp. 523-528, ISSN 1460-2059 (print) 1367-4803 (electronic)

0036-8075 (print), 1095-9203 (electronic)

(print)

e1932-6203

0090-5364

(electronic)

2008) pp. 403, ISSN 1471-2105

2012) pp. 23-31, ISSN 1665-9686

ISSN 1362-4962 (print) 0305-1048 (electronic)

0022-2623 (print) 1520-4804 (electronic)


Levinthal, C.(1968). Are there pathways for protein folding? *Journal de Chimie Physique et de Physicochimie Biologique,* Vol. 65, No. 1-4, (Jan 1968) pp. 44-45, ISSN 0021-7689

236 Bioinformatics

Hamelryck, T., Borg, M., Paluszewski, M., Paulsen, J., Frellsen, J., Andreetta, C., Boomsma, W., Bottaro, S., & Ferkinghoff-Borg, J.(2010). Potentials of Mean Force for Protein Structure Prediction Vindicated, Formalized and Generalized. *PLoS ONE,* Vol. 5, No. 11,

Hart, W. E. & Istrail, S.(1997). Robust proofs of NP-hardness for protein folding: general lattices and energy potentials. *J Comput Biol,* Vol. 4, No. 1, (Jan 1997) pp. 1-22, ISSN

He, X. & Merz, K. M.(2010). Divide and Conquer Hartree-Fock Calculations on Proteins. *Journal of Chemical Theory and Computation,* Vol. 6, No. 2, (Jan 2010) pp. 405-411, ISSN

Hu, Z. & Jiang, J.(2010). Assessment of biomolecular force fields for molecular dynamics simulations in a protein crystal. *J Comput Chem,* Vol. 31, No. 2, (Jan 2010) pp. 371-80,

Humphrey, W., Dalke, A., & Schulten, K.(1996). VMD: visual molecular dynamics. *J Mol* 

Ilyin, V. A., Abyzov, A., & Leslin, C. M.(2004). Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point. *Protein Science: A Publication of the Protein Society,* Vol. 13, No. 7, (July 2004) pp. 1865-

Jiang, L., Althoff, E. A., Clemente, F. R., Doyle, L., Röthlisberger, D., Zanghellini, A., Gallaher, J. L., Betker, J. L., Tanaka, F., Barbas, C. F., Hilvert, D., Houk, K. N., Stoddard, B. L., & Baker, D.(2008). *De novo* computational design of retro-aldol enzymes. *Science (New York, N.Y.),* Vol. 319, No. 5868, (Mar 2008) pp. 1387-1391, ISSN 0036-8075 (print),

Kaplan, W. & Littlejohn, T. G.(2001). Swiss-PDB Viewer (Deep View). *Briefings in Bioinformatics,* Vol. 2, No. 2, (May 2001) pp. 195-197, ISSN 1477-4054 (print) 1467-5463

Karplus, K.(2009). SAM-T08, HMM-based protein structure prediction. *Nucleic Acids Res,*  Vol. 37, No. Web Server issue, (July 2009) pp. W492-7, ISSN 1362-4962 (print) Kim, D. E., Chivian, D., & Baker, D.(2004). Protein structure prediction and analysis using the Robetta server. *Nucleic Acids Res,* Vol. 32, No. Web Server issue, (Jul 2004) pp.

Kono, H. & Saven, J. G.(2001). Statistical theory for protein combinatorial libraries. packing interactions, backbone flexibility, and the sequence variability of a main-chain structure. *Journal of Molecular Biology,* Vol. 306, No. 3, (Feb 2001) pp. 607 - 628, ISSN 0022-2836 Kryshtafovych, A., Krysko, O., Daniluk, P., Dmytriv, Z., & Fidelis, K.(2009). Protein structure prediction center in CASP8. *Proteins,* Vol. 77, No. Suppl 9, (July 2009) pp. 5-9,

Kuhlman, B., Dantas, G., Ireton, G. C., Varani, G., Stoddard, B. L., & Baker, D.(2003). Design of a novel globular protein fold with atomic-level accuracy. *Science,* Vol. 302, No. 5649,

Lathrop, R. H.(1994). The protein threading problem with sequence amino acid interaction preferences is NP-complete. *Protein Engineering,* Vol. 7, No. 9, (Sep 1994) pp. 1059-1068,

(Nov 2003) pp. 1364--1368, ISSN 0036-8075 (print), 1095-9203 (electronic)

(Nov 2010) pp. e13714, ISSN e1932-6203

1066-5277 (print); 1557-8666 (electronic)

1549-9618 (print) 1549-9626 (electronic)

*Graph,* Vol. 14, No. 1, (Feb 1996) pp. 33-38, ISSN 1093-3263

W526-W531, ISSN 1362-4962 (print) 0305-1048 (electronic)

ISSN 1741-0134 (print) 1741-0126 (electronic)

ISSN 1096-987X

1874, ISSN 0961-8368

1095-9203 (electronic)

(electronic)

ISSN 1097-0134


*Nature,* Vol. 453, No. 7192, (May 2008) pp. 190-195, ISSN 0028-0836 (print) 1476-4687 (electronic)

**Section 6** 

**Intelligent Data Analysis** 


**Section 6** 
