**4. Flexibility and disorder in proteins**

Until recently the classical structure–function paradigm which states that protein function is dependent on a defined, if flexible, three-dimensional polypeptide structure was widely accepted in protein science (Anfinsen, 1973). However, even in the early days of structural biology, with only approx. 20 protein crystal structures determined, some protein segments were known which yield weak or non-detectable electron density and yet they may be essential for function (Bloomer et al., 1978; Bode et al., 1978). A common reason (apart from crystal defects) for missing electron density is that the unobserved region fails to scatter X-rays coherently due to variation in position from one atom to the next, i.e. the unobserved atoms are disordered. In addition, during the last decade, many proteins have been described that fail to adopt a stable tertiary structure under physiological conditions and yet display biological activity (Dunker et al., 2008a; Uversky & Dunker, 2010). This state of the proteins, defined as intrinsic disorder, has been found to be rather widespread; disordered regions lacking stable secondary and tertiary structure are often a prerequisite for biological activity, suggesting that structure-function relationships can be frequently only understood in a dynamic context in which function arises from conformational freedom. Fully or partly nonstructured proteins are described as intrinsically disordered (IDPs) or intrinsically unstructured proteins. The term natively unfolded proteins indicates that protein function is associated with a dynamic ensemble of different conformations (Gazi et al., 2008).

Structural plasticity and flexibility is believed to represent a key functional feature of IDPs (Dunker et al., 2008a, 2008b; Dunker & Uversky, 2008; Xie et al., 2007; Cortese et al., 2008), enabling them to interact with numerous binding partners, e.g. proteins, membranes, nucleic acids and small molecules (Durand et al., 2008; Uversky et al., 2009). Because of their functional importance, intrinsically disordered domains are very common in proteomes and play crucial roles in signaling, recognition, regulation and self-assembly (Namba, 2001). The extreme flexibility of IDPs has been suggested to represent a strategy for optimizing the search and interaction with their targets (Sugase et al., 2007). Intrinsically disordered proteins are substantially depleted in W,C,F,Y,V,L,N (order-promoting) and enriched in A,R,G,Q,S,P,E,K (disorder-promoting residues) (Dunker et al., 2002; Uversky, 2010). These biases in the amino acid compositions of IDPs (which result in low overall hydrophobicity and low net charge) are used in various methods for the prediction of the ID propensities (Prilusky et al., 2005). Such analyses suggest that approx. 45% of proteins within a eukaryotic proteome contain a disordered region (Pentony & Jones, 2010). As a result of their frequent node positions in interactoms, many disordered proteins are tightly regulated at the levels of their synthesis, degradation and posttranslational modifications (Gsponer, 2008). It is noteworthy that extreme structural plasticity and ensembles of different conformations has been occasionally observed for coiled-coils and α-helical bundles (Glykos et al., 1999, 2004); as is the case with other proteins, the plasticity of coiled coils may have functional implications, e.g. in the establishment of macromolecular assemblies based on coiled-coil interactions (Gazi et al., 2008).

#### **5. Tools for the analysis of coiled-coils and intrinsic disorder**

#### **5.1** *In silico* **prediction and analysis of coiled-coil domains**

**Prediction of coiled coils from sequence:** The 'COILS' webserver assesses the probability that a residue in a sequence is part of a coiled-coil structure by comparison of its flaking sequences with sequences of known coiled-coil proteins (Lupas et al., 1991) (http://www.ch.embnet.org/software/COILS\_form.html). In the 'Paircoil2' algorithm

accepted in protein science (Anfinsen, 1973). However, even in the early days of structural biology, with only approx. 20 protein crystal structures determined, some protein segments were known which yield weak or non-detectable electron density and yet they may be essential for function (Bloomer et al., 1978; Bode et al., 1978). A common reason (apart from crystal defects) for missing electron density is that the unobserved region fails to scatter X-rays coherently due to variation in position from one atom to the next, i.e. the unobserved atoms are disordered. In addition, during the last decade, many proteins have been described that fail to adopt a stable tertiary structure under physiological conditions and yet display biological activity (Dunker et al., 2008a; Uversky & Dunker, 2010). This state of the proteins, defined as intrinsic disorder, has been found to be rather widespread; disordered regions lacking stable secondary and tertiary structure are often a prerequisite for biological activity, suggesting that structure-function relationships can be frequently only understood in a dynamic context in which function arises from conformational freedom. Fully or partly nonstructured proteins are described as intrinsically disordered (IDPs) or intrinsically unstructured proteins. The term natively unfolded proteins indicates that protein function is associated with a dynamic ensemble

Structural plasticity and flexibility is believed to represent a key functional feature of IDPs (Dunker et al., 2008a, 2008b; Dunker & Uversky, 2008; Xie et al., 2007; Cortese et al., 2008), enabling them to interact with numerous binding partners, e.g. proteins, membranes, nucleic acids and small molecules (Durand et al., 2008; Uversky et al., 2009). Because of their functional importance, intrinsically disordered domains are very common in proteomes and play crucial roles in signaling, recognition, regulation and self-assembly (Namba, 2001). The extreme flexibility of IDPs has been suggested to represent a strategy for optimizing the search and interaction with their targets (Sugase et al., 2007). Intrinsically disordered proteins are substantially depleted in W,C,F,Y,V,L,N (order-promoting) and enriched in A,R,G,Q,S,P,E,K (disorder-promoting residues) (Dunker et al., 2002; Uversky, 2010). These biases in the amino acid compositions of IDPs (which result in low overall hydrophobicity and low net charge) are used in various methods for the prediction of the ID propensities (Prilusky et al., 2005). Such analyses suggest that approx. 45% of proteins within a eukaryotic proteome contain a disordered region (Pentony & Jones, 2010). As a result of their frequent node positions in interactoms, many disordered proteins are tightly regulated at the levels of their synthesis, degradation and posttranslational modifications (Gsponer, 2008). It is noteworthy that extreme structural plasticity and ensembles of different conformations has been occasionally observed for coiled-coils and α-helical bundles (Glykos et al., 1999, 2004); as is the case with other proteins, the plasticity of coiled coils may have functional implications, e.g. in the establishment of macromolecular assemblies based on

of different conformations (Gazi et al., 2008).

coiled-coil interactions (Gazi et al., 2008).

**5. Tools for the analysis of coiled-coils and intrinsic disorder** 

**Prediction of coiled coils from sequence:** The 'COILS' webserver assesses the probability that a residue in a sequence is part of a coiled-coil structure by comparison of its flaking sequences with sequences of known coiled-coil proteins (Lupas et al., 1991) (http://www.ch.embnet.org/software/COILS\_form.html). In the 'Paircoil2' algorithm

**5.1** *In silico* **prediction and analysis of coiled-coil domains** 

(McDonnell et al., 2006), pairwise residue probabilities are used to detect coiled-coil motifs in protein sequences (http://groups.csail.mit.edu/cb/paircoil2/paircoil2.html). 'Matcher' (http://cis.poly.edu/~jps/) determines whether a given sequence contains heptads and assigns heptad positions to residues (Fischetti et al., 1993). To predict the oligomerization states of coiled coils 'Multicoil2' (Trigg et al., 2011) uses pairwise correlations and Hidden Markov Models (HMMs). For distinguishing dimers, trimers and non-coiled-coil oligomerization states the algorithm integrates sequence features through a multinomial logistic regression and devises an optimized scoring function that incorporates pairwise correlations localized in the sequence. A database comprising 2015 sequences with reliable structural annotation from experimental data is used (http://multicoil2.csail.mit.edu). 'SCORER' (Armstrong et al., 2011) also provides predictions of coiled-coil oligomerization (http://coiledcoils.chm.bris.ac.uk/Scorer) .

**Assignment of the coiled coil packing:** COILCHECK (Alva et al., 2008) can be used for analysis and validation of coiled-coil structures through calculation of the strength of interhelical interactions in coiled coils; it can be used to rationalize the behaviour of single residue mutations and to design mutations (http://caps.ncbs.res.in/coilcheck/). SOCKET (Walshaw & Woolfson, 2001) can be used to identify coiled coils through an analysis of the knobs-into-holes side chain packing (http://coiledcoils.chm.bris.ac.uk/socket/).

**Databases:** For genomewide predictions the 'SpiriCoil' algorithm (Rackham et al., 2010) is employed which uses hundreds of HMMs representing coiled-coil-containing domain families. Their results are available through the SpiriCoil Database (http://supfam.org/ SUPERFAMILY/spiricoil). It includes results from all completely sequenced genomes. The CC+ database is a detailed, searchable repository accessible via the SOCKET program (Testa et al., 2009) (http://coiledcoils.chm.bris.ac.uk/ccplus/).

Several of the above tools have been used in sections 6 and 7 of this chapter. In addition, protein sequences were retrieved from the NCBI/GenBank and specialized databases e.g. PPI: *P.syringae* Genome Resources (www.pseudomonas-syringae.org) and the Kyoto Encyclopedia for Genes and Genomes (KEGG) (Kanehisa & Goto, 2000). Secondary structure predictions were performed with 'PSIPRED' (Jones, 1999). Protein structures were retrieved from the Protein Data Bank (PDB).

### **5.2** *In silico* **analysis of T3SS effectors and secretion signals**

A selection of bioinformatics tools is available for T3SS effector and secretion signal prediction: 'Effective' is an on-line tool for sequence-based prediction of secreted proteins available from the TUM Genome Oriented Bioinformatics, University of Vienna (Arnold et al., 2009; Jehl et al., 2011), which can be used for the effector prediction in bacterial protein-sequences (http://www.effectors.org/). 'Effective' provides pre-calculated predictions on bacterial effectors in all publicly available pathogenic and symbiotic genomes or using sequence data provided by the user. T3SS secretion signal predictions from amino acid sequences, is available from 'moblab' (http://gecco.org.chemie.unifrankfurt.de/T3SS\_prediction/T3SS\_prediction.html). The basic concepts of this tool are described by Lower & Schneider (2009). The 'SIEVE' Server (http://www.sysbep.org/sieve/) for the prediction of type III secreted effectors was originally described by Samudrala et al., (2009) and recently reviewed by McDermott et al., (2011). Potential T3SS effectors are scored using a computational model developed via Machine-Learning Methodologies.
