**3.4 Exon shuffling as a mechanism of neofunctionalization**

One of the options for neofunctionalization is the formation of "chimeric" or fusion genes (Long, 2000). This phenomenon is possible due to the duplication of a gene or part of a gene, because only then can the original gene remain functional. After gene duplication, one of the copies can capture an exon(s) from an unrelated adjacent gene. Another possibility is the addition of flanking non-coding DNA as an additional open reading frame. The model, known as "exon shuffling" (Gilbert, 1978), suggests that recombination in introns can provide a mechanism for exchanging exon sequences between genes. However, the event will be evolutionarily significant only if it involves a structural or functional domain. Moreover, the shuffling of domains can occur without the involvement of introns (Doolittle, 1995). We are thus more correct to discuss the shuffling of domains rather than exons. Introns do not occur in prokaryotic genes, but many cases of domain shuffling have been described. The presence of introns, though, greatly facilitates the shuffling of domains, especially in vertebrates. In the 30 years since the discovery of introns, many examples of exon shuffling in a variety of organisms (vertebrates, invertebrates, plants) have been found. Only relatively recently have retrotransposition and illegal recombination been shown to be responsible for these phenomena (Long et al., 2003; van Rijk & Bloemendal, 2003).

### **4. Translation factors as examples of subneofunctionalization**

### **4.1 The main stages of translation**

In the process of protein synthesis, or translation, four distinct phases are usually distinguished: initiation, elongation, termination and recycling (Figure 2).

During **initiation,** the ribosome is assembled at the initiation codon of the mRNA, and the initiating methionyl-tRNA is attached to the peptidyl (P) center of the ribosome. The main objectives of the initiation of translation are identical in bacteria and eukaryotes, but initiation is much more complex in eukaryotes than in bacteria (Kapp & Lorsch, 2004). Three initiation factors occur in bacteria, but eukaryotes have at least 12, which contain about 23 different proteins (Sonenberg & Dever, 2003). Interestingly, the initiation of translation in archaea is intermediate in complexity between bacterial and eukaryotic translation.

During **elongation,** the aminoacyl-tRNA binds to the aminoacyl center (or A-site) of the ribosome, where the information recorded on the mRNA is translated into the language of proteins. This process involves elongation factor eEF1A (EF-Tu in bacteria) in complex with GTP. The ribosomes catalyze the formation of peptide bonds when the anticodons of tRNAs correspond to the codons of the mRNA. After translocation of the mRNA in the P-center, with the help of eEF2 (EF-G in bacteria), a next codon arrives in the A-center, and the

functions. An inevitable intermediate stage in this process would be the emergence of a pseudogene, as most mutations will disrupt or inactivate a gene rather than giving rise to new functions. Because this event is considered extremely unlikely, an extended hypothesis of neofunctionalization (NF) has been proposed, which includes the following possibilities: (1) a new gene acquires a new function but keeps the old function (NF-I), (2) a new gene completely looses the old function (NF-II), or (3) a new gene retains part of the old function (NF-III) (He & Zhang, 2005). Many examples of neofunctionalization have been described in recent years (see (Hahn, 2009)), although distinguishing neofunctionalization from subfunctionalization is sometimes difficult and has led to the

One of the options for neofunctionalization is the formation of "chimeric" or fusion genes (Long, 2000). This phenomenon is possible due to the duplication of a gene or part of a gene, because only then can the original gene remain functional. After gene duplication, one of the copies can capture an exon(s) from an unrelated adjacent gene. Another possibility is the addition of flanking non-coding DNA as an additional open reading frame. The model, known as "exon shuffling" (Gilbert, 1978), suggests that recombination in introns can provide a mechanism for exchanging exon sequences between genes. However, the event will be evolutionarily significant only if it involves a structural or functional domain. Moreover, the shuffling of domains can occur without the involvement of introns (Doolittle, 1995). We are thus more correct to discuss the shuffling of domains rather than exons. Introns do not occur in prokaryotic genes, but many cases of domain shuffling have been described. The presence of introns, though, greatly facilitates the shuffling of domains, especially in vertebrates. In the 30 years since the discovery of introns, many examples of exon shuffling in a variety of organisms (vertebrates, invertebrates, plants) have been found. Only relatively recently have retrotransposition and illegal recombination been shown to be

responsible for these phenomena (Long et al., 2003; van Rijk & Bloemendal, 2003).

archaea is intermediate in complexity between bacterial and eukaryotic translation.

In the process of protein synthesis, or translation, four distinct phases are usually

During **initiation,** the ribosome is assembled at the initiation codon of the mRNA, and the initiating methionyl-tRNA is attached to the peptidyl (P) center of the ribosome. The main objectives of the initiation of translation are identical in bacteria and eukaryotes, but initiation is much more complex in eukaryotes than in bacteria (Kapp & Lorsch, 2004). Three initiation factors occur in bacteria, but eukaryotes have at least 12, which contain about 23 different proteins (Sonenberg & Dever, 2003). Interestingly, the initiation of translation in

During **elongation,** the aminoacyl-tRNA binds to the aminoacyl center (or A-site) of the ribosome, where the information recorded on the mRNA is translated into the language of proteins. This process involves elongation factor eEF1A (EF-Tu in bacteria) in complex with GTP. The ribosomes catalyze the formation of peptide bonds when the anticodons of tRNAs correspond to the codons of the mRNA. After translocation of the mRNA in the P-center, with the help of eEF2 (EF-G in bacteria), a next codon arrives in the A-center, and the

**4. Translation factors as examples of subneofunctionalization** 

distinguished: initiation, elongation, termination and recycling (Figure 2).

**4.1 The main stages of translation** 

creation of a "subneofunctionalization" model (He & Zhang, 2005).

**3.4 Exon shuffling as a mechanism of neofunctionalization** 

process repeats. In contrast to initiation, the main components involved in elongation are highly conserved in all three domains. For example, the human elongation factor eEF1A and EF-Tu of *Escherichia coli* are 33% identical along their entire length, exhibiting a higher degree of similarity in the GTP-binding domains (Cavallius *et al.* 1993). The proteins a/eEF1A and a/eEF2 reveal significant structural similarities, both in the free state and in complex with the ribosome (Andersen et al., 2001; Stark et al., 2002; Valle et al., 2002; Jorgensen et al., 2003). The similarity of elongation factors in bacteria, archaea and eukaryotes suggests that the mechanisms of elongation in eukaryotes in many respects correspond to those in bacteria and archaea (Ramakrishnan, 2002).

**Termination** of translation begins when the stop codon (UAA, UAG or UGA) enters the Asite of the ribosome. As the result of this process, the newly synthesized polypeptide chain is released. The stop codon is recognized by a release factor (RF1/RF2 in prokaryotes and eRF1 in eukaryotes) that triggers release of the nascent peptide from the ribosome. The efficiency of termination is enhanced by the GTPase release factor, RF3 in prokaryotes and eRF3 in eukaryotes (Kisselev et al., 2003). At least some stages of the termination of translation, such as recognition of the stop codon and hydrolysis of peptidyl-tRNAs, are assumed to be similar in archaea and eukaryotes. This hypothesis is based on data of the homology of aRF1 and eRF1 and the finding that aRF1 of *Methanococcus jannaschii* is able to function in an *in vitro* system containing mammalian ribosomes (Dontsova et al., 2000). Archaea, however, do not have homologs of RF3 and eRF3, which does not necessarily mean the absence of proteins with similar functions. Alternatively, these proteins may be absent due to a reduction of the apparatus of translation during the evolution of archaea (Lecompte et al., 2002).

During the final stage of translation**, recycling**, the dissociation of the ribosome occurs together with the release of the mRNA and deacylated tRNAs. An essential feature of this stage is the preparation of a new round of initiation. The details of this process are known only for bacteria.

### **4.2 Termination facto**r**s have arisen by the duplication of genes encoding elongation factors**

Comparison of amino acid sequences in the family of elongation factors raised speculation that the progenitors of EF-G and EF-Tu arose as a result of duplication and subsequent divergence of a gene encoding an ancient GTPase, and further duplications led to the emergence of modern elongation and termination factors (Nakamura & Ito, 1998; Inagaki & Doolittle, 2000) (Figure 3). RF1, RF2 and RF3, as well as eRF1 and elongation factor eEF-2, are assumed to have been derived from the bacterial elongation factor EF-G (Nakamura & Ito 1998), while eRF3 arose from the duplication of the gene encoding eukaryotic elongation factor eEF1-A (Inagaki & Doolittle 2000).

The amino acid sequences of RF1 and RF2 are 36% identical, suggesting that the genes *prfA* and *prfB* arose from a common precursor by duplication (Craigen et al., 1990). Homologs of eRF1 are found in different species, and the eRF1 protein from different species is able to replace eRF1 of *S. cerevisiae*, indicating a high degree of functional conservation (Urbero et al., 1997). An almost complete lack of similarity in the sequences of bacterial and eukaryotic termination factors probably indicates their independent origin (Kisselev et al., 2003). On the other hand, the first class factors (RF1, RF2, aRF1 and eRF1) could be so divergent that they have lost any resemblance, with the exception of the GGQ motif (Frolova et al., 1999; Lecompte et al., 2002; Seit-Nebi et al., 2001). The lack of homology between the amino acid

Gene Duplication and the Origin of Translation Factors 159

Fig. 3. The origin of the proteins involved in elongation, termination and mRNA quality control. The genes duplicated only in certain taxa are marked with asterisks: \* - duplication unique to mammals (Hoshino et al., 1998; Jakobsen et al., 2001); \*\* - duplication described only from *Saccharomyces* (Atkinson et al., 2008); \*\*\* - duplication specific to several species of ciliates (Liang et al., 2001; Atkinson et al., 2008) and *A. thaliana* (Chapman & Brown, 2004). Branch lengths are not to scale. The progenitors of prokaryotic EF-G and EF-Tu were proposed to have first diverged from a common ancestral GTPase, and then each gave rise to two protein families corresponding to the elongation and termination factors (Nakamura & Ito, 1998; Inagaki & Doolittle, 2000; Atkinson et al., 2008). EF – elongation factor, RF -

sequences of bacterial and eukaryotic termination factors does not mean that these proteins lack similarity at other levels of the organization of protein molecules. Indeed, the spatial structure of many translation factors are characterized by a number of common features that

In contrast to eRF1, eRF3 is a much less conserved protein, especially in its N-terminal domain, which can either be completely absent, as in the case of *Giardia lamblia* (Inagaki & Doolittle, 2000), or demonstrate species-specific differences in length (maximum length is 321 amino acids in *Leishmania major* (Atkinson et al., 2008)) and amino acid sequence. This lack of conservation may underlie species-specific regulation of the activity of this protein (Kodama et al., 2007). In some species of yeast, the N-terminus is enriched in QN residues

fit the hypothesis of "molecular mimicry" (Nissen et al., 2000; Nakamura & Ito, 2003).

release factor, e – eukaryotic, a – archaeal.

Fig. 2. Evolutionarily related proteins perform similar functions and interact with the same sites of the ribosome during translation. The most significant participants are shown. The arrows indicate the sequence of events. IF - initiation factor; EF - elongation factor; RF – release, or termination, factor; e – eukaryotic.

Fig. 2. Evolutionarily related proteins perform similar functions and interact with the same sites of the ribosome during translation. The most significant participants are shown. The arrows indicate the sequence of events. IF - initiation factor; EF - elongation factor; RF –

release, or termination, factor; e – eukaryotic.

Fig. 3. The origin of the proteins involved in elongation, termination and mRNA quality control. The genes duplicated only in certain taxa are marked with asterisks: \* - duplication unique to mammals (Hoshino et al., 1998; Jakobsen et al., 2001); \*\* - duplication described only from *Saccharomyces* (Atkinson et al., 2008); \*\*\* - duplication specific to several species of ciliates (Liang et al., 2001; Atkinson et al., 2008) and *A. thaliana* (Chapman & Brown, 2004). Branch lengths are not to scale. The progenitors of prokaryotic EF-G and EF-Tu were proposed to have first diverged from a common ancestral GTPase, and then each gave rise to two protein families corresponding to the elongation and termination factors (Nakamura & Ito, 1998; Inagaki & Doolittle, 2000; Atkinson et al., 2008). EF – elongation factor, RF release factor, e – eukaryotic, a – archaeal.

sequences of bacterial and eukaryotic termination factors does not mean that these proteins lack similarity at other levels of the organization of protein molecules. Indeed, the spatial structure of many translation factors are characterized by a number of common features that fit the hypothesis of "molecular mimicry" (Nissen et al., 2000; Nakamura & Ito, 2003). In contrast to eRF1, eRF3 is a much less conserved protein, especially in its N-terminal domain, which can either be completely absent, as in the case of *Giardia lamblia* (Inagaki & Doolittle, 2000), or demonstrate species-specific differences in length (maximum length is 321 amino acids in *Leishmania major* (Atkinson et al., 2008)) and amino acid sequence. This lack of conservation may underlie species-specific regulation of the activity of this protein (Kodama et al., 2007). In some species of yeast, the N-terminus is enriched in QN residues

Gene Duplication and the Origin of Translation Factors 161

(Wang et al., 2010). The precise functions of each protein thus remain to be discovered. The plant *A. thaliana* has three paralogs of eRF1, all of which are able to rescue the *sup45-2(ts)* 

Another example of duplication, found only in some taxonomic groups, is the presence of two paralogous genes encoding eRF3 in mammals. In mammals, proteins homologous to eRF3 can be divided into two subfamilies based on the sequence of their N-termini. The first subfamily includes human hGSPT1 (or eRF3a) and mouse mGSPT1 (Hoshino et al., 1989; Hoshino et al., 1998; Jean-Jean et al., 1996), while the second subfamily includes human hGSPT2 (eRF3b) and mouse mGSPT2 (Hoshino et al., 1998; Jakobsen et al., 2001). Complementation experiments have shown that only *mGSPT2* is able to complement the *SUP35* gene (encoding eRF3) mutation (Le Goff et al., 2002). *GSPT2* is a paralog of *GSPT1* that has perhaps arisen as a result of retrotransposition of the *GSPT1* transcript into the genome of the common ancestor of mouse and human. *GSPT2* may thus be a functional retrogene (Zhouravleva et al., 2006). Both eRF3a and eRF3b are able to serve as termination factors in mammalian cells and interact with eRF1 (Chauvin et al., 2005). However, eRF3a is considered the main factor (Chauvin et al., 2005) that is expressed in all tissues, while eRF3b is detected only in the brain (Hoshino et al., 1998; Chauvin et al., 2005). This duplication event may not have led to the emergence of a new gene function but may have contributed to the complexity of regulatory processes by tissue-specific expression of these genes.

**4.4 Subneofunctionalization in a family of termination factors gave rise to proteins** 

A necessary condition of protein synthesis is to obtain functionally active proteins, so the control of accuracy of protein synthesis occurs at each stage of translation (Valente & Kinzy, 2003). The accuracy of initiation is achieved by proper identification of the start codon by a multifactorial initiation complex (Asano et al., 2001). Elongation requires the control of various events, including maintenance of the correct reading frame. Shifts in the reading frame occur at a frequency near 3 x 10-5 (Atkins et al., 1991) and may lead to the synthesis of non-functional products because shifts in the reading frame will often create a premature

Eukaryotic cells possess a mechanism known as nonsense-mediated mRNA decay (**NMD**) that recognizes and degrades mRNA molecules containing premature termination codons (Amrani et al., 2006) (Figure 4). NMD is mediated by the trans-acting factors Upf1, Upf2 and Upf3, all of which directly interact with eRF3; only Upf1 interacts with eRF1 (Czaplinski et al., 1998; Wang et al., 2001). In addition to NMD, eukaryotic cells contain two additional mechanisms of mRNA quality control. No-go decay (**NGD**) releases ribosomes that are stalled on the mRNA (Doma & Parker, 2006). In yeast, NGD involves the proteins Hbs1 and Dom34 (Pelota in mammals). Another mechanism, non-stop decay (**NSD**), leads to the release of ribosomes that have read through the stop codon instead of terminating (Vasudevan et al., 2002). NSD has only been found in *S. cerevisiae* and involves the Ski7 protein (van Hoof et al., 2002). A common feature of these processes is that all involve the termination factors eRF1 and eRF3 (NMD) or their paralogs (Dom34/eRF1 and Hbs1/eRF3

Hbs1 is a paralog of eEF1A and eRF3 (Wallrapp et al., 1998; Inagaki & Doolittle, 2000), while Dom34 is a paralog of eRF1 (Koonin et al., 1994; Davis & Engebrecht, 1998) (Figure 3). The C-terminus of Hbs1, homologous to that of eRF3, is sufficient to interact with Dom34, which assumes the same structure of the complex of two pairs of proteins (Hbs1-Dom34 and eRF3-

**participating in mRNA quality control** 

termination codon (PTC).

in NGD; Ski7/eRF3 in NSD).

mutation in *SUP45* (encoding eRF1) in *S. cerevisiae* (Chapman & Brown, 2004).

and provides prionogenic properties to the protein (Kushnirov & Ter Avanesyan, 1998). The same amino acid composition is also detected in the N-terminal domains of eRF3 in the kinetoplastid protists *L. major* and *Trypanosoma cruzi*, but this similarity is unlikely to be homologous (Atkinson et al., 2008). For termination of translation and maintenance of viability, only the C-terminal domain of eRF3 (homologous to elongation factor eEF1A) is necessary. eRF3 may have arisen in the early stages of eukaryotic evolution, since neither bacterial nor archaeal genomes contain homologues of eRF3 (Inagaki & Doolittle, 2000). Recent studies have shown that the functions of eRF3 can be performed in archaea by aEF1A (Saito et al., 2010).

The termination factor eRF3, preserving the functions typical of elongation factors (GTP-ase activity and interaction with the A-site of the ribosome), lost the capacity to bind tRNA but acquired the capacity to interact with eRF1 (Table 1). From this standpoint, elongation factor EF1A of archaea is functionally intermediate between elongation and termination factors: it acquired the ability to stimulate aRF1 while maintaining all the properties of an elongation factor (Saito et al., 2010). Termination factor eRF1 is a striking example of neofunctionalization, because it has acquired a variety of functions absent in elongation factors, including the ability to decode stop signals and to catalyze the release of nascent peptides from eukaryotic ribosomes in response to stop codons.


Table 1. Functional homology between elongation and termination factors in Archaea, Bacteria and Eukaryota
