**4. Theoretical models for duplicate retention and functional specialization**

A number of theoretical models have been proposed to describe how a parental gene's functions can be partitioned between offspring, and how this partitioning affects the chances of these genes to avoid pseudogenization and eventual deletion.

Three archetypal outcomes – specifically, nonfunctonalization, subfunctionalization, and neofunctionalization, are based on concepts typically attributed to Ohno (1970). Nonfunctionalization describes the situation where one duplicate's expression is abolished, making it invisible to natural selection and thus free to accumulate mutations. While it is technically possible for a nonfunctionalized gene to have its function restored, the vast majority become relics progressively crippled by the accumulation of disabling and deleterious mutations. There has been some interest in studying the impact losing a duplicate via nonfunctionalization has on sibling genes – for example, whole genome duplication events can lead to cases of "ohnologs gone missing", where a WGD duplicate has been lost (Canestro et al., 2009). Reciprocal duplicate loss has been hypothesized as one means of speciation. Figure 2 depicts two hypothetical duplications and their respective functional specializations.

### **4.1 Neofunctionalization**

40 Gene Duplication

(WGD) events have a dramatic impact on the content of the genome, and a number of WGD events have been hypothesized in the history of various lineages (Van de Peer et al., 2009). By their nature, WGD events result in the duplication of all loci, preserving non-coding elements, intron/exon structure, and even overall stoichiometry within gene/protein interaction networks. Interestingly, it has been observed that lineages that undergo separate, distinct WGD events (in this case, *Xenopus tropicalis* and zebrafish) often ultimately retain similar (i.e. orthologous) duplicates – that is to say, WGD duplicates that becomes fixed in one lineage, were also often fixed in the other (Semon & Wolfe, 2008). Pairs of duplicate genes that arose through WGD are sometimes referred to as Ohnologs (Turunen et al., 2009). WGD events are relatively common in plant lineages, which may have interesting

Allopolyploids are a variant of whole genome duplications in which the diploid gametes come from two different species. These genomic hybrids contain two formerly independent complete genomes. The most commonly studied allopolyploids are plants, though a number of examples have been documented elsewhere in the animal kingdom (including the model organism *Xenopus Laevis*). Duplicates produced through allopolyploidy (i.e. formerly orthologous genes now present in the same organism) are often referred to as

One significant hurdle to the discussion of duplicate functional specialization is defining gene function. Gene function may be broken into two broad categories – regulation and gene product (MacCarthy & Bergman, 2007). Regulation encompasses the "when, where, why, and how much" aspects of a gene's transcription – non-coding elements around a gene (such as enhancers and signaling sequences) can direct when a gene should be expressed, and in what quantity. These non-coding elements are responsive to various cellular and environmental triggers. Changes to regulation alone may be sufficient to bring about the

Gene products, on the other hand, primarily dictate the "how" in a gene's function (along with some regulatory and subcellular localization information present in the 5' and 3' untranslated regions (UTRs)). Studies of duplicated genes have focused on changes to various coding sequence properties, such as binding sites, eligible cofactors, indels, and catalytic residues (Turunen et al., 2009). It's theoretically possible for a duplicate gene to become functionally specialized without any change to its regulation (Des Marais &

It is worth noting that changes falling into these two categories can occur serially or in concert. For example, a change in tissue localization may precede structural mutations

A number of theoretical models have been proposed to describe how a parental gene's functions can be partitioned between offspring, and how this partitioning affects the chances

Three archetypal outcomes – specifically, nonfunctonalization, subfunctionalization, and neofunctionalization, are based on concepts typically attributed to Ohno (1970).

**4. Theoretical models for duplicate retention and functional specialization**

of these genes to avoid pseudogenization and eventual deletion.

implications for the evolution of gene regulation (Lockton & Gaut, 2005).

"homeologs"(Flagel et al., 2008).

specialization of a new duplicate.

adapting a protein to a new environment.

Rausher, 2008).

**3. Defining gene function**

Neofunctionalization refers to the scenario where one duplicate gene acquires mutations that allow it to acquire previously unexplored functions, either through changes in regulation (e.g. tissue localization) or coding sequence. Claims of neofunctionalization tend to focus on the generation of new functions, though it should be noted that these developments may also result in the loss of ancestral function(s) (Turunen et al., 2009).

A specific example of neofunctionalization can be found in a recent study of the MADS-box gene family in angiosperms. MADS-box genes are well-known for their role in developmental processes, but the functions of some gene family members have been difficult to determine. Viaene et al. (2010) provide evidence that a group of these genes, the AGL6 subfamily, can be neatly divided into two groups based on duplication history. One of these groups retains the ancestral function of guiding reproductive development, while the other seems to have acquired a novel role in regulating the growth of vegetative tissues.

A second example, describing the functional differentiation of two paralogs in maize, shows that ancestral functions can still be retained even when one duplicate acquires a novel function (Goettel & Messing, 2010). Two paralogs, named p1 and p2, both drive the synthesis of maysin, which in turn contributes to resistance against earworm. In addition, the p1 gene also has a secondary role in controlling the accumulation of red pigments. The authors propose of a series of recombination events that describe how these genes acquired their distinct characters.

### **4.2 Subfunctionalization**

Subfunctionalization involves each gene taking upon a complementary subset of the parental gene's functions, such that neither is independently capable of fulfilling all the parental gene's roles. Subfunctionalization is conceptually synonymous with the Duplication, Degeneration, and Complementation (DDC) model. Regulatory subfunctionalization could result in non-overlapping tissue distributions for the nascent duplicates, with the union of the expression profiles matching the parental gene's range.

Jarinova et al. (2008) describe an instance of subfunctionalization the Hox genes of zebrafish. Through a careful analysis of peripheral non-coding elements, the authors show how the two hoxb complexes in zebrafish, hoxb5a and hoxb5b, acquired non-overlapping expression profiles. In particular, the experimental removal of one regulatory element unique to hoxb5a resulted in the two paralogs (re)acquiring a similar expression profile.

The idea of structural subfunctionalization is perhaps best captured in the "Escape from Adaptive Conflict" (EAC) hypothesis. Consider a hypothetical gene product with multiple

Detection and Analysis of Functional Specialization in Duplicated Genes 43

Should this gene be duplicated, however, each offspring gene could be free to acquire mutations that optimize binding to one specific substrate, thus escaping the conflict without a loss of functionality. The EAC model essentially describes this process, where a single enzyme with multiple interaction partners gives rise to duplicate genes with more specific

EAC is interesting in that it lies somewhere on the boundary between subfunctionalization and neofunctionalization. Three claims are required to invoke the model: that i) both duplicates accumulate adaptive changes post mutation, that ii) these mutations enhance ancestral functions, and lastly that iii) the ancestral gene was constrained from improving functions (Barkman & Zhang, 2009). The key difference (and challenge) lies in proving the ancestral form was bi-functional. Studies demonstrating the EAC model in action are still relatively uncommon. An early attempt to apply the model to the genes from the anthocyanin biosynthetic pathway of morning glories has come under criticism for not clearly providing these three veins of supporting evidence (Barkman & Zhang, 2009; Des

The EAC process has also been invoked to describe the evolution of a novel anti-freeze protein in an Antarctic zoarcid fish (Deng et al., 2010). The authors demonstrate that an ancestral gene had a rudimentary ice-binding affinity in addition to its primary catalytic function (a sialic acid synthase), and that a duplication event allowed one copy of this gene to abolish this ancestral function and refine its ice-binding capability. The discussion

While duplicated genes are generally relegated to one of the fates listed above, a number of case studies have shown that recent duplicates can maintain identical functional profiles. One possible explanation for this is that the duplicates have acquired mutations that have restored the "status quo" that was present prior to duplication. If mutations cause the sum of the duplicate genes' expression levels to equal the expression level of their ancestor, both genes could experience some level of selective pressure to maintain expression despite being fully redundant. Ganko et al. (2007) observe that a vast majority of duplicates, regardless of duplication mechanism, showed asymmetric expression, with one gene consistently showing higher levels of expression than its sibling across all tissues. This suggests that a limited form of subfunctionalization may play an initial role in the retention of duplicates. Asymmetrical expression divergence was also observed in a study of duplicated genes in the fly, with a tendency for the "parent" gene of the duplicate to have high expression levels (Langille & Clark, 2007). Interestingly, Qian et al. (2010) point out that many gene duplicates are synthetically lethal or deleterious, and they suggest that expression load may being

A number of other subtle variations have been proposed to augment these three primary fates. Subneofunctionalization, for example, is a model that argues that subfunctionalization followed by neofunctionalization is a common and sequential process. Subfunctionalization permits a relaxation of selection on various subregions of the gene, which in turn allows the (eventual) evolution of novel functions, suggesting that subfunctionalization is more of a

In addition, while subfunctionalization does not make any a priori claims about the proportion of functions lost by each duplicate, it appears that in some cases the losses are

includes a careful analysis of the three EAC criteria listed above.

but enhanced functionality.

Marais & Rausher, 2008).

shared by both genes after duplication.

**4.3 Alternative models and odd cases** 

midstep than an endpoint (Johnson & Thomas, 2007).

Fig. 2. Possible functional specializations following duplication. Two hypothetical examples showing how retention models can apply either to regulatory regions or gene products. A) Duplicated genes subfunctionalize at the regulatory level, partitioning their parental regulatory domains and suggesting subdivided roles. The gene product, however, has acquired a novel element (i.e. new exon), suggesting neofunctionalization at the coding sequence level. B) Following duplication, one gene loses its regulatory domains and is interrupted by an early stop codon, reflecting nonfunctionalization both at the regulatory and gene product levels.

interaction partners (e.g. an enzyme with two possible substrates). Selection for bifunctionality in this enzyme may limit the binding/catalytic efficiency of either specific reaction -- mutations that improve one may inhibit the other, hence the "adaptive conflict".

Fig. 2. Possible functional specializations following duplication. Two hypothetical examples showing how retention models can apply either to regulatory regions or gene products. A) Duplicated genes subfunctionalize at the regulatory level, partitioning their parental regulatory domains and suggesting subdivided roles. The gene product, however, has acquired a novel element (i.e. new exon), suggesting neofunctionalization at the coding sequence level. B) Following duplication, one gene loses its regulatory domains and is interrupted by an early stop codon, reflecting nonfunctionalization both at the regulatory

interaction partners (e.g. an enzyme with two possible substrates). Selection for bifunctionality in this enzyme may limit the binding/catalytic efficiency of either specific reaction -- mutations that improve one may inhibit the other, hence the "adaptive conflict".

and gene product levels.

Should this gene be duplicated, however, each offspring gene could be free to acquire mutations that optimize binding to one specific substrate, thus escaping the conflict without a loss of functionality. The EAC model essentially describes this process, where a single enzyme with multiple interaction partners gives rise to duplicate genes with more specific but enhanced functionality.

EAC is interesting in that it lies somewhere on the boundary between subfunctionalization and neofunctionalization. Three claims are required to invoke the model: that i) both duplicates accumulate adaptive changes post mutation, that ii) these mutations enhance ancestral functions, and lastly that iii) the ancestral gene was constrained from improving functions (Barkman & Zhang, 2009). The key difference (and challenge) lies in proving the ancestral form was bi-functional. Studies demonstrating the EAC model in action are still relatively uncommon. An early attempt to apply the model to the genes from the anthocyanin biosynthetic pathway of morning glories has come under criticism for not clearly providing these three veins of supporting evidence (Barkman & Zhang, 2009; Des Marais & Rausher, 2008).

The EAC process has also been invoked to describe the evolution of a novel anti-freeze protein in an Antarctic zoarcid fish (Deng et al., 2010). The authors demonstrate that an ancestral gene had a rudimentary ice-binding affinity in addition to its primary catalytic function (a sialic acid synthase), and that a duplication event allowed one copy of this gene to abolish this ancestral function and refine its ice-binding capability. The discussion includes a careful analysis of the three EAC criteria listed above.

While duplicated genes are generally relegated to one of the fates listed above, a number of case studies have shown that recent duplicates can maintain identical functional profiles. One possible explanation for this is that the duplicates have acquired mutations that have restored the "status quo" that was present prior to duplication. If mutations cause the sum of the duplicate genes' expression levels to equal the expression level of their ancestor, both genes could experience some level of selective pressure to maintain expression despite being fully redundant. Ganko et al. (2007) observe that a vast majority of duplicates, regardless of duplication mechanism, showed asymmetric expression, with one gene consistently showing higher levels of expression than its sibling across all tissues. This suggests that a limited form of subfunctionalization may play an initial role in the retention of duplicates. Asymmetrical expression divergence was also observed in a study of duplicated genes in the fly, with a tendency for the "parent" gene of the duplicate to have high expression levels (Langille & Clark, 2007). Interestingly, Qian et al. (2010) point out that many gene duplicates are synthetically lethal or deleterious, and they suggest that expression load may being shared by both genes after duplication.

### **4.3 Alternative models and odd cases**

A number of other subtle variations have been proposed to augment these three primary fates. Subneofunctionalization, for example, is a model that argues that subfunctionalization followed by neofunctionalization is a common and sequential process. Subfunctionalization permits a relaxation of selection on various subregions of the gene, which in turn allows the (eventual) evolution of novel functions, suggesting that subfunctionalization is more of a midstep than an endpoint (Johnson & Thomas, 2007).

In addition, while subfunctionalization does not make any a priori claims about the proportion of functions lost by each duplicate, it appears that in some cases the losses are

Detection and Analysis of Functional Specialization in Duplicated Genes 45

particularly high sequence similarity also showed a tendency towards retaining similar regulation profiles. The authors suggest that this was a consequence of the promoter regions being propagated via gene conversion activity acting on the locally similar subsequences

In general, multiple splice forms (and the potential for these splice variants to have distinct functions) have not received much attention in studies of gene duplication and functional divergence. In a first step towards addressing this oversight, Zhan et al. (2011) studied the potential for alternative splicing in *Drosophila* duplicates. New genes tended to show lower levels of alternative splicing, and the subset of duplicates that retained the potential for multiple spliceforms were expressed in fewer tissues, at lower levels, and had had their expression breadth shifted towards preferential expression in testes. The authors also noted that a duplicate's alternative splicing potential depended on duplication mode, with retrotransposed genes being copied with a specific and frozen configuration of

The rate at which duplicated genes acquire novel functions is of great interest. Studies have been done to compare the standard metrics of gene evolution (synonymous distance, nonsynonymous distance) to measures of functional differentiation across duplicate genes. While initial studies demonstrated only a weak correlation between expression divergence and sequence divergence, subsequent studies have drawn attention to a number of gene parameters that strongly influence the rate and extent of functional differentiation across

The mode of duplication has been cited in multiple instances as an important determinant of eventually retention/functional specialization. In a study comparing the functional evolution of genes duplicated through different mechanisms, Ganko et al. (2007) found that WGD-derived duplicates tended to be expressed at higher levels and were more broadly expressed (in contrast to duplicates derived in smaller scale duplications). Wang et al. (2010) found that tandem gene duplicates tended to have conserved function, whereas

Duplicate genes can differ in their tissue distribution, and certain tissues seem to have a greater propensity to adopt genes with novel function than others. In particular, novel duplicates show a tendency towards expression in the testes. Langille and Clark (2007) found that retrotransposed duplicates in particular showed testis-biased expression. Mikhaylova et al. (2008) also found that duplicated genes expressed in the testes tended to show particularly divergent expression across species. Gallach (2010) illustrate a trend for mitochondrial-associated proteins to preferentially fixate in autosomes (i.e. avoiding the X

Han et al. (2009) revealed an interesting trend for duplicate genes that had been relocated (i.e. created by transposition). In instances where these duplicated genes showed asymmetric sequence evolution, in more than 80% of cases it was the relocated gene that showed stronger support for positive selection. This suggests an important role for chromosomal distribution in the evolution of gene function and duplicate divergence. A study by Tsankov et al. (2010) also showed that local chromatin organization (i.e.

**5.3 Properties of genes that influence duplicate specialization** 

retrotransposed genes were more likely to have undergone EAC.

chromosome), and to have a strong tissue bias towards testes expression.

(Yim et al., 2009).

exons/introns.

duplicates.

**5.2 Alternative splicing** 

highly asymmetrical. Panchin et al. (2010) show that in human duplicate genes one duplicate appears to remain totally unchanged, while its sibling accumulates the majority of functional (in this case, amino acid) mutations.

Contrary to expectations, many gene duplicate pairs appear to be retained despite total apparent functional redundancy. A relatively recent model has been proposed to explain this phenomenon. The theory, coined "originalization", uses arguments based on purifying selection and recombination to support the preservation of both duplicate copies (i.e. prevent non-functionalization) for an extended period of time (Xue & Fu, 2009; Xue et al., 2010).

It has also been suggested that models of duplicate retention are focusing on too small a unit, and that protein interaction networks (themselves composed of a number of coexpressed and functionally related proteins) provide a more coherent perspective on the size of perturbation required to have a phenotypically relevant effect (MacCarthy & Bergman, 2007). The authors argue that cases of regulatory subfunctionalization and neofunctionalization often have no phenotypic consequence on the output of a protein network, and are thus effectively neutral for longer than duplicate-oriented models would suggest. A number of studies have reported an unexplained level and duration of retention for redundant duplicates (Skamnioti et al., 2008).

The relative importance of these models to the retention of duplicates is a subject of continued interest. Whole genome duplication events, which effectively introduce a paralogous copy of every gene in the genome, present an opportunity to tally the cases for which each model applies best. For a review of the relative importance of these various retention models specifically as they pertain to duplicates produced in plant WGD events, see (Edger & Pires, 2009).
