**Knowledge Based Membrane Protein Structure Prediction: From X-Ray Crystallography to Bioinformatics and Back to Molecular Biology**

Alejandro Giorgetti1,2 and Stefano Piccoli1

*1Applied Bioinformatics Group, Dept. of Biotechnology, University of Verona, 2German Research School for Simulation Sciences, Jülich Research Center and RWTH-Aachen University, Jülich 1Italy 2Germany* 

#### **1. Introduction**

348 Current Trends in X-Ray Crystallography

Yamaguchi, M., Noda, N.N., Nakatogawa, H., Kumeta, H., Ohsumi, Y. & Inagaki, F. (2010).

pathway. *J Biol Chem*, Vol.285, No.38, Sep 17, pp. 29599-29607, 0021-9258

No.11, Mar 16, pp. 8036-8043, 0021-9258

carrier protein (E2) enzyme that mediates Atg8 lipidation. *J Biol Chem*, Vol.282,

Autophagy-related protein 8 (Atg8) family interacting motif in Atg3 mediates the Atg3-Atg8 interaction and is crucial for the cytoplasm-to-vacuole targeting

> Life, or at least as we know it, would not exist without the ability of living organisms to communicate with their surroundings and respond to changes within them. Cells are able to capture and decode environmental stimuli into biologically signals. Indeed, communicating mechanisms, able to detect stimuli coming from the outside world are of fundamental importance for the survival of living beings. A deep understanding of the molecular mechanisms underlying signal transduction is thus needed for a complete characterization of the way our cells communicate with the rest of the world.

> Living cells are surrounded by a plasma membrane that forms a boundary between the cell interior and the external physical world. As a consequence, the cellular plasma membrane presents a major target for environmental stimuli acting upon a living cell. The membrane contains protein molecules that confer various functions on it.

> Integral membrane proteins play a key role in detecting and conveying outside signals into cells, allowing them to interact and respond to their environment in a specific manner. They are involved as main players in several signaling pathways and therefore, the majority of drug targets are associated to the cell's membrane. The original human genome sequence project estimated 20% of the total gene count of 31,778 genes to code for membrane proteins (Almen et al., 2009). Thus membrane proteins constitute a very large set of yet-to becharacterized proteins mediating all the relevant life-related functions both in prokaryotes and eukaryotes. Moreover, the total amount of membrane proteins for which the threedimensional structure is known, is just about 842, corresponding to 298 unique proteins, as included in the Membrane Proteins with Known Structure database (http://blanco.biomol.uci.edu/mpstruc/listAll/list).

> This extremely low number of membrane proteins with known structure is due to the fact that membrane proteins are very difficult to study because they are inserted into lipid bilayers surrounding the cell and its sub-compartments. In these conditions they expose to the polar outer and inner environments portions of different sizes, completely changing the biophysics with respect to soluble proteins. Thus, when isolated from membranes, membrane proteins are

Knowledge Based Membrane Protein Structure Prediction:

proteins putatively homologous to the target protein.

From X-Ray Crystallography to Bioinformatics and Back to Molecular Biology 351

(H-bonds). This suggests that this extremely weak folding energy landscape might be easily altered by mutations, giving place to unfolded proteins. However, as all the living beings that we know today are indeed 'alive', it means that during evolution, the structure-function relationship has been preserved, so all the proteins that we observe today can only contain mutations that did not alter too much the global folding energy landscape with respect to their ancestor sequence. From those considerations it was concluded that evolutionarily related proteins, that diverged from a common ancestor via the accumulation of small changes, cannot but have similar structure, where mutations have been accommodated only causing small local rearrangements. If the number of changes, i.e. the evolutionary distance, is high, these local rearrangements can cumulatively affect the protein structure and produce relevant distortions, but the general architecture, i.e. the fold, of the protein has to be conserved. On the other hand, if two proteins have evolved from a common ancestor, it is likely that a sufficient proportion of their sequences has remained unchanged so that an evolutionary relationship can be deduced by their comparative analysis. Therefore if we can ensure that two proteins are homologous, that is evolutionary related, the structure of one can be used as a template for the building up of the structure of the other. This forms the basis of the technique known as comparative or homology modeling (Tramontano, 2006). In this regard, 25 years ago, Chothia and Lesk, in a seminal article (Chothia & Lesk, 1986) have aligned the sequences and structures of all the proteins with known structure, finding a correlation between the evolutionary distance and structural divergence between evolutionary related proteins. This work settled up the basis of the comparative (or homology) modeling technique, a method that allows the prediction of protein structures using as a template a member of the family for which the three-dimensional structure (3Dstructure) is known. So, if we assume that the sequence alignment between two protein sequences, one of unknown (the target) and one of known (the template) structure, reflects the evolutionary relationship between their amino acids, we can assume that most of them have conserved the same relative position in the structure and use the coordinates of the backbone of the template as first approximations of the coordinates of the backbone of the target (Tramontano, 2006). We must then model the conformation of the side chains and the local rearrangements of the structure brought about by the amino acid substitutions (Tramontano, 2006). Templates for a comparative model would be found among the structures present in databases, i.e. PDB and can be extracted by searching the database for

All these ideas were shown to be valid through the years for soluble proteins. Indeed, the comparative modeling technique has not always been considered valid when applied to membrane proteins (Floriano et al., 2006). The main criticism regarded the low amount of membrane proteins with known three-dimensional structure and the enormous evolutionary gap that must be filled in order to produce models of the most studied membrane proteins. However, in the last few years X-ray crystallography reached very high levels of applicability in the field of membrane proteins and a lot of new and more refined structures were solved by several groups. In the applications section we will review some of the most relevant structures and how they allowed a better characterization of the mechanisms underlying the function. Indeed, this great advancement in the crystallographic techniques produced as of today 298 structures of unique membrane proteins for a total of 842 structures. At this point, we have checked for the existence of correlation between evolutionary relationship and structure similarity. We have recently (not published)

generally less stable than globular ones. It is therefore difficult to purify them in the native, functional form, and more difficult to crystallize them. Thus, crystallization of this type of proteins is yet a very difficult process, given the fact that they expose two different chemicalphysical surfaces to the environment: water- and lipid-like. On the other hand, the lipid environment constraints the membrane protein stable folding: it is indeed evident from the structures that have been deposited so far that only all-alpha and beta-barrel structural organizations are present in nature. Indeed most of membrane proteins in the Protein Data Bank (PDB: http://www.pdb.org), i.e. 67%, consist of bundles of transmembrane helices with different tilting with respect to the membrane plane and to each other.

In the last few years, several efforts have been carried out for the determination of new crystal structures of membrane proteins. Although the improvements in the technologies allowed the determination of several structures, the gap between the known sequences and the solved structures is still enormous. Furthermore, albeit their extreme importance, crystal structure only provide a static image of a protein under conditions that sometimes are far from being physiological.

Thus, the combination of existing crystal structures, computational biology techniques and molecular biology validating experiments, may be the key to face the challenges of bridging the gap between the characterized membrane proteins with and without solved structure.

This and other issues may be resolved in the post-genomic era. To do this we should take advantage of all the theoretical efforts aiming at developing tools based on our present knowledge that are capable of extracting selected structural/functional features from known sequences/structures and of computing the likelihood of their presence in never-seen before sequences/structures. Moreover, once generated, the models can be considered as hypothesis to be tested. Thus, it is of fundamental importance to accurately validate the models and the conclusions outlined from their analysis.

Here we will review some of the efforts of the last years aimed at the characterization at the structural level of different membrane proteins for which the crystal structure is not known. We will introduce state-of-art modelling techniques that use the most recent membrane proteins crystal structures as input for the modelling of membrane proteins in different activation states. For each of the studied cases we will review also the experiments carried out in order to validate the proposed hypothesis.
