**2. Protein structural bioinformatics**

Proteins, as defined in the glossary of Molecular Biology of the cell, are just 'the major macromolecular constituent of cells. A linear polymer of amino acids linked together by peptide bonds in a specific sequence' (Alberts et al., 2002). The latter definition of a protein, albeit technically correct, lacks of the most important part of the definition, that is, proteins are the product of evolution. In fact, proteins are indeed polypeptides, but formed following a very precise relation of sequence-structure-function, modelled by events of random variation and natural selection along millions of years of evolution, that introduced small changes in the protein sequences passing from one generation to the other. The structure/fold of a protein is then determined by its amino acid sequence and, therefore, upon mutation of one amino acid with another may affect the structure in different ways. Taking into account the folded-protein energy landscape, it was observed that, although summing up big numbers coming from a multitude of weak interactions between their atoms, functional proteins have a limited stability, equivalent to just a few hydrogen bonds

generally less stable than globular ones. It is therefore difficult to purify them in the native, functional form, and more difficult to crystallize them. Thus, crystallization of this type of proteins is yet a very difficult process, given the fact that they expose two different chemicalphysical surfaces to the environment: water- and lipid-like. On the other hand, the lipid environment constraints the membrane protein stable folding: it is indeed evident from the structures that have been deposited so far that only all-alpha and beta-barrel structural organizations are present in nature. Indeed most of membrane proteins in the Protein Data Bank (PDB: http://www.pdb.org), i.e. 67%, consist of bundles of transmembrane helices with

In the last few years, several efforts have been carried out for the determination of new crystal structures of membrane proteins. Although the improvements in the technologies allowed the determination of several structures, the gap between the known sequences and the solved structures is still enormous. Furthermore, albeit their extreme importance, crystal structure only provide a static image of a protein under conditions that sometimes are far

Thus, the combination of existing crystal structures, computational biology techniques and molecular biology validating experiments, may be the key to face the challenges of bridging the gap between the characterized membrane proteins with and without solved structure. This and other issues may be resolved in the post-genomic era. To do this we should take advantage of all the theoretical efforts aiming at developing tools based on our present knowledge that are capable of extracting selected structural/functional features from known sequences/structures and of computing the likelihood of their presence in never-seen before sequences/structures. Moreover, once generated, the models can be considered as hypothesis to be tested. Thus, it is of fundamental importance to accurately validate the

Here we will review some of the efforts of the last years aimed at the characterization at the structural level of different membrane proteins for which the crystal structure is not known. We will introduce state-of-art modelling techniques that use the most recent membrane proteins crystal structures as input for the modelling of membrane proteins in different activation states. For each of the studied cases we will review also the experiments carried

Proteins, as defined in the glossary of Molecular Biology of the cell, are just 'the major macromolecular constituent of cells. A linear polymer of amino acids linked together by peptide bonds in a specific sequence' (Alberts et al., 2002). The latter definition of a protein, albeit technically correct, lacks of the most important part of the definition, that is, proteins are the product of evolution. In fact, proteins are indeed polypeptides, but formed following a very precise relation of sequence-structure-function, modelled by events of random variation and natural selection along millions of years of evolution, that introduced small changes in the protein sequences passing from one generation to the other. The structure/fold of a protein is then determined by its amino acid sequence and, therefore, upon mutation of one amino acid with another may affect the structure in different ways. Taking into account the folded-protein energy landscape, it was observed that, although summing up big numbers coming from a multitude of weak interactions between their atoms, functional proteins have a limited stability, equivalent to just a few hydrogen bonds

different tilting with respect to the membrane plane and to each other.

models and the conclusions outlined from their analysis.

out in order to validate the proposed hypothesis.

**2. Protein structural bioinformatics** 

from being physiological.

(H-bonds). This suggests that this extremely weak folding energy landscape might be easily altered by mutations, giving place to unfolded proteins. However, as all the living beings that we know today are indeed 'alive', it means that during evolution, the structure-function relationship has been preserved, so all the proteins that we observe today can only contain mutations that did not alter too much the global folding energy landscape with respect to their ancestor sequence. From those considerations it was concluded that evolutionarily related proteins, that diverged from a common ancestor via the accumulation of small changes, cannot but have similar structure, where mutations have been accommodated only causing small local rearrangements. If the number of changes, i.e. the evolutionary distance, is high, these local rearrangements can cumulatively affect the protein structure and produce relevant distortions, but the general architecture, i.e. the fold, of the protein has to be conserved. On the other hand, if two proteins have evolved from a common ancestor, it is likely that a sufficient proportion of their sequences has remained unchanged so that an evolutionary relationship can be deduced by their comparative analysis. Therefore if we can ensure that two proteins are homologous, that is evolutionary related, the structure of one can be used as a template for the building up of the structure of the other. This forms the basis of the technique known as comparative or homology modeling (Tramontano, 2006).

In this regard, 25 years ago, Chothia and Lesk, in a seminal article (Chothia & Lesk, 1986) have aligned the sequences and structures of all the proteins with known structure, finding a correlation between the evolutionary distance and structural divergence between evolutionary related proteins. This work settled up the basis of the comparative (or homology) modeling technique, a method that allows the prediction of protein structures using as a template a member of the family for which the three-dimensional structure (3Dstructure) is known. So, if we assume that the sequence alignment between two protein sequences, one of unknown (the target) and one of known (the template) structure, reflects the evolutionary relationship between their amino acids, we can assume that most of them have conserved the same relative position in the structure and use the coordinates of the backbone of the template as first approximations of the coordinates of the backbone of the target (Tramontano, 2006). We must then model the conformation of the side chains and the local rearrangements of the structure brought about by the amino acid substitutions (Tramontano, 2006). Templates for a comparative model would be found among the structures present in databases, i.e. PDB and can be extracted by searching the database for proteins putatively homologous to the target protein.

All these ideas were shown to be valid through the years for soluble proteins. Indeed, the comparative modeling technique has not always been considered valid when applied to membrane proteins (Floriano et al., 2006). The main criticism regarded the low amount of membrane proteins with known three-dimensional structure and the enormous evolutionary gap that must be filled in order to produce models of the most studied membrane proteins. However, in the last few years X-ray crystallography reached very high levels of applicability in the field of membrane proteins and a lot of new and more refined structures were solved by several groups. In the applications section we will review some of the most relevant structures and how they allowed a better characterization of the mechanisms underlying the function. Indeed, this great advancement in the crystallographic techniques produced as of today 298 structures of unique membrane proteins for a total of 842 structures. At this point, we have checked for the existence of correlation between evolutionary relationship and structure similarity. We have recently (not published)

Knowledge Based Membrane Protein Structure Prediction:

**3. Applications** 

**3.1 Ion channels** 

several members of the families.

modelling see: Giorgetti & Carloni, 2003).

years, showing an exponential development of the field.

From X-Ray Crystallography to Bioinformatics and Back to Molecular Biology 353

In the following sections we will present two cases in which homology modelling has been applied on membrane proteins, i.e. ion channels and G-protein-coupled receptors. Both cases are representative icons of the difficulties found in the structure solving and modelling of membrane proteins for many years. Fortunately, in the last few years there was an explosion of newly solved crystal structures that completely revolutionised the field. Indeed, several mechanisms were understood and functional features could be extended to

We will review in both cases the advancements in X-ray crystallography and how we have used the recently solved crystal structures combined with homology modelling and

Ion channels are integral membrane proteins that function as molecular sensors of physical and chemical stimuli and convert these stimuli into biological signals vital for the existence of every living organism. In other words, ion channels represent the doors and windows of the cell, that open and close following precise stimuli and leave the entrance/exit of very accurately selected 'visitors'. As molecular transducers of mechanical, electrical, chemical, thermal or electromagnetic (light) stimuli, ion channels contribute to changes in electrical, chemical or osmotic activity within cells by gating between the two basic conformations in which they exist – open and closed. Through the gating mechanisms, i.e. opening and closing, ion channels regulate the permeation of ions (in some cases also other solutes), allowing ions to cross the hydrophobic core of the cell membrane, affecting its activity. Because of the well-known difficulties in obtaining high resolution 3D structures by X-ray crystallography of ion channels, alternative strategies based on computational biology tools are currently used to investigate their biophysical properties (for a review about ion channel

The last two decades have been exceptionally exciting for research in the field of ion channels. Astonishing progress has resulted from the use of multidisciplinary approaches to gain insight into the structure and function of ion channels and their role in various aspects of cell physiology and signal transduction. Molecular biology and genetics have provided the sequences of a very large number of ion channel proteins and have helped identify their contribution to various cellular functions. The patch clamp technique has provided the means to study the functional properties of single ion channels with unprecedented precision. X-ray and electron crystallography have provided structural snapshots of a number of ion channel molecules at near atomic resolution, whereas magnetic resonance spectroscopy and fluorescence spectroscopy have provided means to access the dynamics of these molecules. In detail, as of today we count with 14 unique ion channel structures (http://blanco.biomol.uci.edu/mpstruc/listAll/list) for a total of about 45 crystal structures of ion channels solved in different activation states and co-crystallized with different ligands and ions. Moreover, more than 2/3 of the solved structures were obtained in the last two

Using the structural and functional information obtained by these experimental techniques, computer-assisted molecular modeling has brought ion channels to life by allowing the features underlying the molecular events that shape their function. Indeed, the

molecular biology experiments to characterize functional mechanisms.

followed the same protocol as Chothia and Lesk (Chothia & Lesk, 1986) but considering just membrane proteins. Indeed, by the use of the LGA server (http://proteinmodel.org/) we have aligned (the structures and sequences) present in the core of all the membrane proteins with known three-dimensional structure and produced the graph of Figure 1. The structural divergence between two evolutionary correlated proteins is measured as their Root Mean Square Deviation (RMSD).

Fig. 1. RMSD versus Percent Sequence Identity of membrane proteins. The core of all the membrane proteins found in the Membrane Proteins with Known Structure Database (http://blanco.biomol.uci.edu/mpstruc/listAll/list) was aligned at the structural level and at the sequence level.

Figure 1 shows the RMSD of the backbone of the core of pairs of evolutionarily related proteins as a function of the percent of identity between their amino acid sequences. The definition of the "core" of the structure differs in different methods. It can be intuitively seen as the internal, closely packed, evolutionary conserved part of the structure that contains most of the repetitive secondary structure elements (Tramontano, 2006). For practical purposes we considered as the core of the proteins, all the amino acids present in secondary structure elements and those regions not diverging for more than 3 Å, as Chothia and Lesk did (Chothia & Lesk, 1986).

As stated before, Comparative Modeling is based on the idea that evolutionary correlated proteins share similar three-dimensional structures. That is, if we want to predict the structure of a protein we can look in database for an evolutionary correlated protein with known structure, and use the latter as template for building up the structural model of our preferred protein (Tramontano, 2006). The important thing is that, based on Figure 1 the procedure is valid also for membrane proteins. In this regard, the astonishing improvements in membrane proteins crystallography together with comparative modeling techniques will allow the characterization of an enormous amount of membrane protein in the near future.
