**3. The fate of duplicated genes**

Tens of millions of years after WGD in *Arabidopsis thaliana* and *S. cerevisiae*, only about 30% and 10%, respectively, of the genes are preserved in the form of duplicated copies (Seoighe & Wolfe, 1999; Wong et al., 2002; Blanc et al., 2003). Preservation of duplicated copies in evolution can be achieved by one of three processes: (1) conservation, in which the copies are stored in an unaltered state (Hahn 2009); (2) subfunctionalization, in which both paralogs are necessary for performing the functions previously provided by the ancestral gene (both terms were offered by Force (Force et al., 1999)); and (3) neofunctionalization, in which one of the paralogs acquires a new function and the other preserves the old function. Characteristically, in (2) and (3), the regulatory and/or structural parts of the gene may be changed (Figure 1).

### **3.1 Conservation of duplicated copies**

Duplicated genes are retained unchanged in cases where the normal development of the organism needs many copies of genes with similar function, which allows the synthesis of a

Gene Duplication and the Origin of Translation Factors 155

This hypothesis is to some extent the opposite to Ohno's hypothesis of evolution, because it assumes the existence of both functions before the duplication (Figure 1). The first evidence for it was the discovery of the phenomenon of "gene sharing" (Piatigorsky et al., 1988;

This model explains the emergence of new genes by the duplication of multifunctional genes. Such genes encode proteins that already perform different functions. Gene sharing was discovered in crystallins, proteins found in the lens of the eye. Crystallins make up 70% of the contents of cells, but remain in soluble form without forming aggregates (formation of aggregates leads to cataracts). Under such conditions, a majority of proteins would form insoluble aggregates within seconds. Another feature of these proteins is a record longevity (equal to the lifetime of an individual, for example 80 years); most proteins last for only minutes or hours. The eyes of all vertebrates have a standard set of crystallins (α, β, γ), and additional species-specific crystallins are encoded by genes that in other tissues encode enzymes. In most cases, this double life is ensured not by duplications but by a "division of functions": enzyme and crystallin are encoded by the same gene, but the protein can perform additional functions without changing its amino acid sequence. This phenomenon was thus called gene sharing (Piatigorsky et al., 1988; Piatigorsky, 2003). In gene sharing, a gene acquires a second function, without duplication and without loss of its primary function. A change in tissue specificity or regulation during development, however, may occur. Acquisition of a new function without duplication was first detected in crystallin ε in birds and crocodiles (up to 23% of the total protein of the lens). The amino acid sequence of crystallin ε was identical to lactate dehydrogenase B (LDH), and the protein had an activity similar to LDH. Subsequent work showed that both proteins were encoded by the same gene. Similarly, crystallin τ in lampreys, bony fishes, reptiles and birds is identical to and encoded by the same gene as α enolase. Zeta-crystallin is identical to quinone reductase. Crystallins δ, ε and τ thus illustrate examples of "division of functions", when a gene has acquired additional functions, without duplication. Multifunctional genes are characterized by significant limitations in the capabilities of any adaptive changes, since mutation that improves one function may disturb another. Duplication could provide a possible resolution of this "adaptive conflict". The molecular mechanisms leading to subfunctionalization have not been studied in detail until recently. Such analyses only became possible with the comparative analysis of genes in closely related species, for example in genes involved in galactose utilization in *S. cerevisiae* and *K. lactis* (Hittinger & Carroll, 2007). Divergence in the expression of duplicated genes over long periods of time attracted the interest of scientists as an important stage in the emergence of a new gene by duplication (Ohno, 1970; Ferris & Whitt, 1979). Thus in some cases, duplicates may have identical coding sequences but different regulatory sequences (Figure 1). Some pairs of duplicated genes can diverge in concert, forming two groups that are expressed in different tissues or under different conditions (Blanc & Wolfe, 2004). This process, which explains the divergence of metabolic

The stable maintenance of duplicated copies in the genome requires functional divergence. From Ohno's (1970) position, functional divergence is achieved by ensuring that one copy of the gene retains the old function, while other copies acquire new

**3.2 Subfunctionalization** 

pathways, is called "concerted divergence".

**3.3 Neofunctionalization** 

Piatigorsky, 2003).

larger amount of specific RNA or protein (Ohno, 1970). An increase in the number of copies of these genes correlates with the increasing complexity of the organism (Chen et al., 2007). Amplification of genes in microorganisms leads to resistance to antibiotics and heavy metals, increased virulence and other adaptive properties (Romero & Palacios, 1997; Reams & Neidle, 2004; Andersson & Hughes, 2009). In plants, amplification of genes provides resistance to herbicides (Harms et al., 1992; Shyr et al., 1992). The best known examples of conservation of duplicated copies in various organisms are genes for rRNA, tRNA and histones, many of which are organized in tandem repeats, which allows the maintenance of homogeneity by unequal crossing over or gene conversion (Hurles, 2004).

Fig. 1. Possible consequences of gene duplication (modified from (Hahn, 2009)). A and C regulatory sequence changes; B and D – coding sequence changes. Since variant 3 (conservation) does not change the duplicated copies, it is not represented in the diagram. OF (grey) - old function, NF (black) – new function, LF (white) – lost function (attributed to both regulatory and structural sequences).

One of the most interesting questions related to the preservation of duplicated copies of genes is whether the loss of genes is an occasional event or is subjected to natural selection. Which duplicates are lost, and which persist after polyploidization? About 10% of yeast genes are preserved in the form of duplicated copies, and most are not needed for viability (Z.Gu et al., 2002). The most frequently duplicated genes encode cyclins, components of the signal transduction pathway, and cytoplasmic (but not mitochondrial) ribosomal proteins. Most are characterized by high levels of expression. Perhaps selection for increasing the level of expression was the major factor for the preservation of duplicated genes (Seoighe & Wolfe, 1999).

Analysis of the most recent WGD in *Arabidopsis* showed a preferential retention of genes involved in transcription and signal transduction, whereas genes involved in DNA repair or encoding proteins of organelles were characterized by more frequent loss (Blanc & Wolfe, 2004). Interestingly, genes preserved as paralogs after duplication have a high probability of remaining duplicated after the next round of duplication (Seoighe & Gehring, 2004). Loss of duplicates is thus not a random process.

### **3.2 Subfunctionalization**

154 Gene Duplication

larger amount of specific RNA or protein (Ohno, 1970). An increase in the number of copies of these genes correlates with the increasing complexity of the organism (Chen et al., 2007). Amplification of genes in microorganisms leads to resistance to antibiotics and heavy metals, increased virulence and other adaptive properties (Romero & Palacios, 1997; Reams & Neidle, 2004; Andersson & Hughes, 2009). In plants, amplification of genes provides resistance to herbicides (Harms et al., 1992; Shyr et al., 1992). The best known examples of conservation of duplicated copies in various organisms are genes for rRNA, tRNA and histones, many of which are organized in tandem repeats, which allows the maintenance of

Fig. 1. Possible consequences of gene duplication (modified from (Hahn, 2009)). A and C -

(conservation) does not change the duplicated copies, it is not represented in the diagram. OF (grey) - old function, NF (black) – new function, LF (white) – lost function (attributed to

One of the most interesting questions related to the preservation of duplicated copies of genes is whether the loss of genes is an occasional event or is subjected to natural selection. Which duplicates are lost, and which persist after polyploidization? About 10% of yeast genes are preserved in the form of duplicated copies, and most are not needed for viability (Z.Gu et al., 2002). The most frequently duplicated genes encode cyclins, components of the signal transduction pathway, and cytoplasmic (but not mitochondrial) ribosomal proteins. Most are characterized by high levels of expression. Perhaps selection for increasing the level of expression was the major factor for the preservation of duplicated genes (Seoighe &

Analysis of the most recent WGD in *Arabidopsis* showed a preferential retention of genes involved in transcription and signal transduction, whereas genes involved in DNA repair or encoding proteins of organelles were characterized by more frequent loss (Blanc & Wolfe, 2004). Interestingly, genes preserved as paralogs after duplication have a high probability of remaining duplicated after the next round of duplication (Seoighe & Gehring, 2004). Loss of

regulatory sequence changes; B and D – coding sequence changes. Since variant 3

both regulatory and structural sequences).

duplicates is thus not a random process.

Wolfe, 1999).

homogeneity by unequal crossing over or gene conversion (Hurles, 2004).

This hypothesis is to some extent the opposite to Ohno's hypothesis of evolution, because it assumes the existence of both functions before the duplication (Figure 1). The first evidence for it was the discovery of the phenomenon of "gene sharing" (Piatigorsky et al., 1988; Piatigorsky, 2003).

This model explains the emergence of new genes by the duplication of multifunctional genes. Such genes encode proteins that already perform different functions. Gene sharing was discovered in crystallins, proteins found in the lens of the eye. Crystallins make up 70% of the contents of cells, but remain in soluble form without forming aggregates (formation of aggregates leads to cataracts). Under such conditions, a majority of proteins would form insoluble aggregates within seconds. Another feature of these proteins is a record longevity (equal to the lifetime of an individual, for example 80 years); most proteins last for only minutes or hours. The eyes of all vertebrates have a standard set of crystallins (α, β, γ), and additional species-specific crystallins are encoded by genes that in other tissues encode enzymes. In most cases, this double life is ensured not by duplications but by a "division of functions": enzyme and crystallin are encoded by the same gene, but the protein can perform additional functions without changing its amino acid sequence. This phenomenon was thus called gene sharing (Piatigorsky et al., 1988; Piatigorsky, 2003). In gene sharing, a gene acquires a second function, without duplication and without loss of its primary function. A change in tissue specificity or regulation during development, however, may occur. Acquisition of a new function without duplication was first detected in crystallin ε in birds and crocodiles (up to 23% of the total protein of the lens). The amino acid sequence of crystallin ε was identical to lactate dehydrogenase B (LDH), and the protein had an activity similar to LDH. Subsequent work showed that both proteins were encoded by the same gene. Similarly, crystallin τ in lampreys, bony fishes, reptiles and birds is identical to and encoded by the same gene as α enolase. Zeta-crystallin is identical to quinone reductase. Crystallins δ, ε and τ thus illustrate examples of "division of functions", when a gene has acquired additional functions, without duplication. Multifunctional genes are characterized by significant limitations in the capabilities of any adaptive changes, since mutation that improves one function may disturb another. Duplication could provide a possible resolution of this "adaptive conflict". The molecular mechanisms leading to subfunctionalization have not been studied in detail until recently. Such analyses only became possible with the comparative analysis of genes in closely related species, for example in genes involved in galactose utilization in *S. cerevisiae* and *K. lactis* (Hittinger & Carroll, 2007). Divergence in the expression of duplicated genes over long periods of time attracted the interest of scientists as an important stage in the emergence of a new gene by duplication (Ohno, 1970; Ferris & Whitt, 1979). Thus in some cases, duplicates may have identical coding sequences but different regulatory sequences (Figure 1). Some pairs of duplicated genes can diverge in concert, forming two groups that are expressed in different tissues or under different conditions (Blanc & Wolfe, 2004). This process, which explains the divergence of metabolic pathways, is called "concerted divergence".

### **3.3 Neofunctionalization**

The stable maintenance of duplicated copies in the genome requires functional divergence. From Ohno's (1970) position, functional divergence is achieved by ensuring that one copy of the gene retains the old function, while other copies acquire new

Gene Duplication and the Origin of Translation Factors 157

process repeats. In contrast to initiation, the main components involved in elongation are highly conserved in all three domains. For example, the human elongation factor eEF1A and EF-Tu of *Escherichia coli* are 33% identical along their entire length, exhibiting a higher degree of similarity in the GTP-binding domains (Cavallius *et al.* 1993). The proteins a/eEF1A and a/eEF2 reveal significant structural similarities, both in the free state and in complex with the ribosome (Andersen et al., 2001; Stark et al., 2002; Valle et al., 2002; Jorgensen et al., 2003). The similarity of elongation factors in bacteria, archaea and eukaryotes suggests that the mechanisms of elongation in eukaryotes in many respects

**Termination** of translation begins when the stop codon (UAA, UAG or UGA) enters the Asite of the ribosome. As the result of this process, the newly synthesized polypeptide chain is released. The stop codon is recognized by a release factor (RF1/RF2 in prokaryotes and eRF1 in eukaryotes) that triggers release of the nascent peptide from the ribosome. The efficiency of termination is enhanced by the GTPase release factor, RF3 in prokaryotes and eRF3 in eukaryotes (Kisselev et al., 2003). At least some stages of the termination of translation, such as recognition of the stop codon and hydrolysis of peptidyl-tRNAs, are assumed to be similar in archaea and eukaryotes. This hypothesis is based on data of the homology of aRF1 and eRF1 and the finding that aRF1 of *Methanococcus jannaschii* is able to function in an *in vitro* system containing mammalian ribosomes (Dontsova et al., 2000). Archaea, however, do not have homologs of RF3 and eRF3, which does not necessarily mean the absence of proteins with similar functions. Alternatively, these proteins may be absent due to a reduction of the apparatus of translation during the evolution of archaea

During the final stage of translation**, recycling**, the dissociation of the ribosome occurs together with the release of the mRNA and deacylated tRNAs. An essential feature of this stage is the preparation of a new round of initiation. The details of this process are known

**4.2 Termination facto**r**s have arisen by the duplication of genes encoding elongation** 

Comparison of amino acid sequences in the family of elongation factors raised speculation that the progenitors of EF-G and EF-Tu arose as a result of duplication and subsequent divergence of a gene encoding an ancient GTPase, and further duplications led to the emergence of modern elongation and termination factors (Nakamura & Ito, 1998; Inagaki & Doolittle, 2000) (Figure 3). RF1, RF2 and RF3, as well as eRF1 and elongation factor eEF-2, are assumed to have been derived from the bacterial elongation factor EF-G (Nakamura & Ito 1998), while eRF3 arose from the duplication of the gene encoding eukaryotic elongation

The amino acid sequences of RF1 and RF2 are 36% identical, suggesting that the genes *prfA* and *prfB* arose from a common precursor by duplication (Craigen et al., 1990). Homologs of eRF1 are found in different species, and the eRF1 protein from different species is able to replace eRF1 of *S. cerevisiae*, indicating a high degree of functional conservation (Urbero et al., 1997). An almost complete lack of similarity in the sequences of bacterial and eukaryotic termination factors probably indicates their independent origin (Kisselev et al., 2003). On the other hand, the first class factors (RF1, RF2, aRF1 and eRF1) could be so divergent that they have lost any resemblance, with the exception of the GGQ motif (Frolova et al., 1999; Lecompte et al., 2002; Seit-Nebi et al., 2001). The lack of homology between the amino acid

correspond to those in bacteria and archaea (Ramakrishnan, 2002).

(Lecompte et al., 2002).

factor eEF1-A (Inagaki & Doolittle 2000).

only for bacteria.

**factors** 

functions. An inevitable intermediate stage in this process would be the emergence of a pseudogene, as most mutations will disrupt or inactivate a gene rather than giving rise to new functions. Because this event is considered extremely unlikely, an extended hypothesis of neofunctionalization (NF) has been proposed, which includes the following possibilities: (1) a new gene acquires a new function but keeps the old function (NF-I), (2) a new gene completely looses the old function (NF-II), or (3) a new gene retains part of the old function (NF-III) (He & Zhang, 2005). Many examples of neofunctionalization have been described in recent years (see (Hahn, 2009)), although distinguishing neofunctionalization from subfunctionalization is sometimes difficult and has led to the creation of a "subneofunctionalization" model (He & Zhang, 2005).
