**2. Mechanisms of duplication**

There are several different mutational mechanisms through which gene duplicates can be produced. Depending on the type of event, the nature and scale of what is duplicated can differ significantly; single genes may be copied, with or without their peripheral regulatory elements, or entire genome can be duplicated. While each mechanism ultimately results in the duplication of one or more genes, the mechanisms differ in three key respects; how much regulatory information the duplicated genes retain, where the duplicates are integrated into the genome, and how many interaction partners are duplicated. Duplication

Detection and Analysis of Functional Specialization in Duplicated Genes 39

Fig. 1. Modes of gene duplication. Upper left: transposition mediated by either a DNA or RNA intermediate can produce gene locations at distant locations in the genome. RNA intermediates retain little of the regulatory sequence surrounding the parent gene. Upper right: Errors during homologous recombination can produce tandem arrays of genes, situated in series. Bottom Left: Doubling of all chromosomes will produce duplicates of all genes in the genome. Bottom Right: Allopolyploids contain genomes from two compatible

species, with duplicate gene pairs from formerly orthologous genes.

mechanisms can be broadly categorized into three groups – DNA/RNA-mediated transposition, unequal recombination, and genome doubling/hybridization. These mechanisms all produce paralogs -- homologous genes that are both present in and native to the same genome (in contrast to orthologs, where speciation acts as a 'duplication event' and the homologous genes are components of different genomes). Figure 1 provides a diagram depicting various modes of duplication.

### **2.1 DNA/RNA transposition**

DNA/RNA transposition refer to mechanisms by which a specific short nucleotide sequence, either mRNA (as in retrotransposition) or DNA (e.g., transposon-mediated duplication) is copied from one location in the genome to another. The insertion location is essentially random; any compatible destination locus will do, and thus the produced duplicate need not necessarily be located near its progenitor template. RNA-mediated retrotransposition is unique in that it uses post-transcription sequence as a template for the nascent duplicate. Hence, upstream and downstream regulatory sequences lying outside the transcribed gene sequence are not preserved, and the newly produced gene will have most or all introns (and possibly some exons) spliced out. The new gene may also possess a genetically encoded poly-A tail. Since RNA-mediated retrotransposition does not preserve most non-coding elements, the duplicate gene must depend on the a-priori availability or acquisition of promoter/regulatory sequences in order to be transcribed. Absence of these elements effectively means the new gene duplicate is a pseudogene.

DNA-mediated duplications, as mediated by transposons, for example, often retain regulatory information and intron/exon structure. Nonetheless, they still operate on a very specific subsequence of DNA, and elements relocated by DNA-mediated transposition can be inserted in any eligible location in the genome.

### **2.2 Segmental duplication/unequal cross-over**

Errors during homologous recombination can produce serial duplications of genetic sequence. Unequal crossing-over is an error stemming from the mis-alignment of homologous chromosomes during mitosis/meiosis. Ordinarily, homologous sequences are aligned and cross-over events result in balanced exchanges of sequence information across chromosomes. An abundance of repetitive sequences can, however, cause chromosomes to misalign, in which case a segment of one chromosome is inserted into its sister chromatid (thus producing a duplication and a reciprocal deletion).

Since multiple rounds of unequal crossing over tend to gradually inflate the number of candidate repeat regions, some genomic regions are hotbeds for sequence duplication and can give rise to a large number of duplicate genes in series. These serially arranged duplicates are referred to as "tandem duplicates". These tandem gene arrays are highly localized in the genome, and tandem duplicates retain most or all of their intron/exon structure and peripheral non-coding elements. Unequal crossing-over also plays a role in the generation of copy number variations (Redon et al., 2006).

### **2.3 Whole genome duplication/allopolyploids**

In some circumstances, errors during segregation can produce diploid gametes, and the fusion of these diploid gametes can result in a complete doubling of genomic content (all chromosomes present in duplicate). While very rare, these whole genome duplication

mechanisms can be broadly categorized into three groups – DNA/RNA-mediated transposition, unequal recombination, and genome doubling/hybridization. These mechanisms all produce paralogs -- homologous genes that are both present in and native to the same genome (in contrast to orthologs, where speciation acts as a 'duplication event' and the homologous genes are components of different genomes). Figure 1 provides a diagram

DNA/RNA transposition refer to mechanisms by which a specific short nucleotide sequence, either mRNA (as in retrotransposition) or DNA (e.g., transposon-mediated duplication) is copied from one location in the genome to another. The insertion location is essentially random; any compatible destination locus will do, and thus the produced duplicate need not necessarily be located near its progenitor template. RNA-mediated retrotransposition is unique in that it uses post-transcription sequence as a template for the nascent duplicate. Hence, upstream and downstream regulatory sequences lying outside the transcribed gene sequence are not preserved, and the newly produced gene will have most or all introns (and possibly some exons) spliced out. The new gene may also possess a genetically encoded poly-A tail. Since RNA-mediated retrotransposition does not preserve most non-coding elements, the duplicate gene must depend on the a-priori availability or acquisition of promoter/regulatory sequences in order to be transcribed. Absence of these

DNA-mediated duplications, as mediated by transposons, for example, often retain regulatory information and intron/exon structure. Nonetheless, they still operate on a very specific subsequence of DNA, and elements relocated by DNA-mediated transposition can

Errors during homologous recombination can produce serial duplications of genetic sequence. Unequal crossing-over is an error stemming from the mis-alignment of homologous chromosomes during mitosis/meiosis. Ordinarily, homologous sequences are aligned and cross-over events result in balanced exchanges of sequence information across chromosomes. An abundance of repetitive sequences can, however, cause chromosomes to misalign, in which case a segment of one chromosome is inserted into its sister chromatid

Since multiple rounds of unequal crossing over tend to gradually inflate the number of candidate repeat regions, some genomic regions are hotbeds for sequence duplication and can give rise to a large number of duplicate genes in series. These serially arranged duplicates are referred to as "tandem duplicates". These tandem gene arrays are highly localized in the genome, and tandem duplicates retain most or all of their intron/exon structure and peripheral non-coding elements. Unequal crossing-over also plays a role in the

In some circumstances, errors during segregation can produce diploid gametes, and the fusion of these diploid gametes can result in a complete doubling of genomic content (all chromosomes present in duplicate). While very rare, these whole genome duplication

elements effectively means the new gene duplicate is a pseudogene.

be inserted in any eligible location in the genome.

**2.2 Segmental duplication/unequal cross-over** 

(thus producing a duplication and a reciprocal deletion).

generation of copy number variations (Redon et al., 2006).

**2.3 Whole genome duplication/allopolyploids** 

depicting various modes of duplication.

**2.1 DNA/RNA transposition** 

Fig. 1. Modes of gene duplication. Upper left: transposition mediated by either a DNA or RNA intermediate can produce gene locations at distant locations in the genome. RNA intermediates retain little of the regulatory sequence surrounding the parent gene. Upper right: Errors during homologous recombination can produce tandem arrays of genes, situated in series. Bottom Left: Doubling of all chromosomes will produce duplicates of all genes in the genome. Bottom Right: Allopolyploids contain genomes from two compatible species, with duplicate gene pairs from formerly orthologous genes.

Detection and Analysis of Functional Specialization in Duplicated Genes 41

Nonfunctionalization describes the situation where one duplicate's expression is abolished, making it invisible to natural selection and thus free to accumulate mutations. While it is technically possible for a nonfunctionalized gene to have its function restored, the vast majority become relics progressively crippled by the accumulation of disabling and deleterious mutations. There has been some interest in studying the impact losing a duplicate via nonfunctionalization has on sibling genes – for example, whole genome duplication events can lead to cases of "ohnologs gone missing", where a WGD duplicate has been lost (Canestro et al., 2009). Reciprocal duplicate loss has been hypothesized as one means of speciation. Figure 2 depicts two hypothetical duplications and their respective

Neofunctionalization refers to the scenario where one duplicate gene acquires mutations that allow it to acquire previously unexplored functions, either through changes in regulation (e.g. tissue localization) or coding sequence. Claims of neofunctionalization tend to focus on the generation of new functions, though it should be noted that these developments may also result in the loss of ancestral function(s) (Turunen et al., 2009). A specific example of neofunctionalization can be found in a recent study of the MADS-box gene family in angiosperms. MADS-box genes are well-known for their role in developmental processes, but the functions of some gene family members have been difficult to determine. Viaene et al. (2010) provide evidence that a group of these genes, the AGL6 subfamily, can be neatly divided into two groups based on duplication history. One of these groups retains the ancestral function of guiding reproductive development, while the other seems to have acquired a novel role in regulating the growth of vegetative tissues. A second example, describing the functional differentiation of two paralogs in maize, shows that ancestral functions can still be retained even when one duplicate acquires a novel function (Goettel & Messing, 2010). Two paralogs, named p1 and p2, both drive the synthesis of maysin, which in turn contributes to resistance against earworm. In addition, the p1 gene also has a secondary role in controlling the accumulation of red pigments. The authors propose of a series of recombination events that describe how these genes acquired

Subfunctionalization involves each gene taking upon a complementary subset of the parental gene's functions, such that neither is independently capable of fulfilling all the parental gene's roles. Subfunctionalization is conceptually synonymous with the Duplication, Degeneration, and Complementation (DDC) model. Regulatory subfunctionalization could result in non-overlapping tissue distributions for the nascent duplicates, with the union of the expression profiles matching the parental gene's range. Jarinova et al. (2008) describe an instance of subfunctionalization the Hox genes of zebrafish. Through a careful analysis of peripheral non-coding elements, the authors show how the two hoxb complexes in zebrafish, hoxb5a and hoxb5b, acquired non-overlapping expression profiles. In particular, the experimental removal of one regulatory element unique to hoxb5a

The idea of structural subfunctionalization is perhaps best captured in the "Escape from Adaptive Conflict" (EAC) hypothesis. Consider a hypothetical gene product with multiple

resulted in the two paralogs (re)acquiring a similar expression profile.

functional specializations.

**4.1 Neofunctionalization** 

their distinct characters.

**4.2 Subfunctionalization** 

(WGD) events have a dramatic impact on the content of the genome, and a number of WGD events have been hypothesized in the history of various lineages (Van de Peer et al., 2009). By their nature, WGD events result in the duplication of all loci, preserving non-coding elements, intron/exon structure, and even overall stoichiometry within gene/protein interaction networks. Interestingly, it has been observed that lineages that undergo separate, distinct WGD events (in this case, *Xenopus tropicalis* and zebrafish) often ultimately retain similar (i.e. orthologous) duplicates – that is to say, WGD duplicates that becomes fixed in one lineage, were also often fixed in the other (Semon & Wolfe, 2008). Pairs of duplicate genes that arose through WGD are sometimes referred to as Ohnologs (Turunen et al., 2009). WGD events are relatively common in plant lineages, which may have interesting implications for the evolution of gene regulation (Lockton & Gaut, 2005).

Allopolyploids are a variant of whole genome duplications in which the diploid gametes come from two different species. These genomic hybrids contain two formerly independent complete genomes. The most commonly studied allopolyploids are plants, though a number of examples have been documented elsewhere in the animal kingdom (including the model organism *Xenopus Laevis*). Duplicates produced through allopolyploidy (i.e. formerly orthologous genes now present in the same organism) are often referred to as "homeologs"(Flagel et al., 2008).
