**2. Types of duplications**

Duplication of DNA can occur in many ways: (1) partial duplication of a gene (or an internal duplication), (2) duplication of a single gene, (3) partial duplication of a chromosome, (4) duplication of an entire chromosome, and (5) genome duplication, or polyploidy. The first four types of duplication are sometimes combined under the term SSD (smaller scale duplication) (Davis & Petrov 2005). Other authors prefer the terms "paralogon" (derived from "paralog"), for extended duplicated regions containing paralogs, and SGD (single gene duplication), for duplications of individual genes (Durand & Hoberman, 2006). Duplication of the entire genome is designated as WGD (whole genome duplication) (Davis & Petrov, 2005). According to Ohno, duplication of the genome rather than its individual parts is more important for evolution, because the partial duplication of regulatory genes or other restricted elements of the genome may lead to regulatory imbalances (Ohno 1970).

### **2.1 Whole genome duplications**

Ancient polyploidizations of the genome have been identified in all four eukaryotic kingdoms: plants, animals, fungi and protists. In all cases, the proportion of genes in the

Gene Duplication and the Origin of Translation Factors 153

yet answered the question of the number of duplications of the ancestral genome. Some authors continue to support the 2R hypothesis (Larhammar et al., 2002; Spring, 1997; Meyer & Schartl, 1999; Wang & Gu, 2000; Dehal & Boore, 2005), others find evidence of only one round of WGD (X.Gu et al., 2002; Guigo et al., 1996; McLysaght et al., 2002), while others disclaim the possibility of WGD entirely and discuss only duplications of a limited number

Ohno (1970) argued that duplication of the genome rather than its individual parts is more important for evolution, because partial duplications can lead to regulatory imbalances. Nevertheless, partial and complete duplications of genes also play very important roles in evolution. WGDs have occurred several times during the evolutionary history of organisms, while SSDs arise continuously through multiple mechanisms. Several mechanisms have been suggested for the improvement in function of existing proteins and for the creation of new functions. One such mechanism is the internal (partial) duplication of genes, which is important for increasing the functional complexity of genes in evolution (Li, 1997). Such duplications are believed to have played a key role in the emergence of complex genes. Many proteins of modern organisms contain internal repeats of amino acids, and these repeats often correspond to functional or structural domains of proteins. These data suggest that the genes encoding these proteins were formed by internal duplications (Lavorgna et al., 2001). Internal duplication provides the possibility of improving protein function by increasing the number of active sites. Internal duplications can also lead to the acquisition of new functions by the modification of duplicated regions or the reorganization of modules. Numerous data on the role of intragenic duplications in the early stages of evolution of proteins were obtained by comparative analyses of sequenced genomes (Marcotte et al., 1999; Lavorgna et al., 2001; Conant & Wagner 2005; Chen et al. 2007). Duplicated regions can accumulate mutations that contribute to the divergence of the repeated fragments, which can then become fixed. Often, only traces of duplications in the form of imperfect repeats can be detected in contemporary amino acid sequences (Li, 1997). Eukaryotic proteins have

more repeats than do prokaryotic proteins (Marcotte et al., 1999; Chen et al., 2007).

Tens of millions of years after WGD in *Arabidopsis thaliana* and *S. cerevisiae*, only about 30% and 10%, respectively, of the genes are preserved in the form of duplicated copies (Seoighe & Wolfe, 1999; Wong et al., 2002; Blanc et al., 2003). Preservation of duplicated copies in evolution can be achieved by one of three processes: (1) conservation, in which the copies are stored in an unaltered state (Hahn 2009); (2) subfunctionalization, in which both paralogs are necessary for performing the functions previously provided by the ancestral gene (both terms were offered by Force (Force et al., 1999)); and (3) neofunctionalization, in which one of the paralogs acquires a new function and the other preserves the old function. Characteristically, in (2) and (3), the regulatory and/or structural parts of the gene may be changed (Figure 1).

Duplicated genes are retained unchanged in cases where the normal development of the organism needs many copies of genes with similar function, which allows the synthesis of a

of segments (Friedman & Hughes, 2001; 2003).

**2.2 Smaller scale duplication** 

**3. The fate of duplicated genes** 

**3.1 Conservation of duplicated copies** 

form of duplicated copies ranges from 10 to 50% and often correlates with the time elapsed since duplication (Scannell et al., 2006).

WGD is widespread in plants (Vision et al., 2000; Adams & Wendel, 2005). Estimates of the incidence of polyploidy in angiosperms vary from 30 to 80%, and about 3% of speciation events are explained by genome duplications (Otto & Whitton, 2000). Many, if not all, species of plants may thus have at least one polyploid ancestor. Most eudicots are assumed to have an ancient hexaploid ancestor, with subsequent tetraploidization in some taxa (Jaillon et al., 2007).

Duplication of the entire genome in the yeast *Saccharomyces cerevisiae* led to an initial increase in the number of genes from 5000 to 10 000, but the subsequent loss of paralogs has led to the preservation in modern *Saccharomyces* of about 5500 protein-coding genes, of which 1102 form 551 paralogous pairs (Byrne & Wolfe, 2005). A special term, ohnologs, dedicated to S. Ohno, was proposed for paralogs resulting from WGD (Wolfe, 2000).

Detection of natural polyploidy is a difficult task, especially for ancient events. Recent duplications can be detected by comparing closely related species, one of which underwent diploidization and therefore contains twice as many chromosomes as species that did not undergo WGD. For example, a comparison of the genomes of *Ashbya gossypii* and *S. cerevisiae* revealed that both species evolved from a single ancestor that had seven or eight chromosomes (Dietrich et al., 2004). Changes in chromosome number due to mutations (in particular translocations) led to the ancestors of *A. gossypii* and *S. cerevisiae*. WGD in *S. cerevisiae* has provided this species with new opportunities for functional divergence absent in *A. gossypii*. A similar comparative analysis was also carried out for *S. cerevisiae* and its closest non-WGD relative, *Kluyveromyces waltii* (Kellis et al., 2004).

The older the duplication, the harder the analysis, because a period of diploidization often follows polyploidization, which "transforms" the polyploid genome to the diploid state. Diploidization is achieved by an intensive loss of genes, rearrangements of the genome and the divergence of duplicated genes. Recent analyses have also shown that the duplication of individual genes in evolution has occurred much more frequently than was previously thought (Lynch & Conery, 2000; Lynch et al., 2001). Diploidization has been studied in many genomes including those of plants (Chapman et al., 2006; Jaillon et al., 2007; Tuskan et al., 2006), bony fishes (Brunet et al., 2006), yeasts (Piskur, 2001; Kellis et al., 2004; Scannell et al., 2006; Scannell et al., 2007), *Paramecium* (Aury et al., 2006) and vertebrata (Blomme et al., 2006).

Plants have repeatedly undergone polyploidization during evolution, presumably aided by their ability to propagate vegetatively and by the existence of specific regulatory mechanisms in plant cells. In particular, model polyploids have been characterized by a rapid loss of some genes and the specific inactivation of others by methylation (Kashkush et al., 2002; Comai et al., 2000; Lee & Chen, 2001). Epigenetic silencing may protect the duplicated copies from pseudogenization, thus facilitating the acquisition of new functions (Rodin & Riggs, 2003).

Vertebrate genomes contain many families of genes that are not found in invertebrates, and many gene duplications apparently occurred early in the evolution of the chordates (Taylor & Raes, 2004). Ohno suggested that the complex genome of vertebrates arose as a result of two rounds (2R) of WGD (Ohno, 1970). This view was once supported by the belief that the human genome contained about 100 000 genes, which was four times more than the estimated number of genes in the genomes of invertebrates. Sequencing of the human genome has since reduced the estimate of the number of genes to 20 000-25 000 but has not yet answered the question of the number of duplications of the ancestral genome. Some authors continue to support the 2R hypothesis (Larhammar et al., 2002; Spring, 1997; Meyer & Schartl, 1999; Wang & Gu, 2000; Dehal & Boore, 2005), others find evidence of only one round of WGD (X.Gu et al., 2002; Guigo et al., 1996; McLysaght et al., 2002), while others disclaim the possibility of WGD entirely and discuss only duplications of a limited number of segments (Friedman & Hughes, 2001; 2003).
