**8.1 Biochemical function and analysis**

Unequivocal evidence for functional specialization can be drawn from studies of enzyme kinetics. By measuring the substrate affinity and catalytic rates of enzymes, for example, it is possible to quantitatively measure differences in performance between duplicate genes. Biochemical approaches are very labor intensive and only readily applicable to certain classes of genes, but the evidence they provide is direct and readily interpreted.

In a study highlighting the potential importance of the EAC model, Des Marais and Rauscher (2008) used enzyme affinity assays to demonstrate the enzymatic function of paralogous anthocyanin biosynthetic pathway genes in morning glories. Enzyme kinetics were compared across different species that differed by a duplication of a specific enzyme, with the unduplicated ortholog acting as a proxy for the ancestral function.

A recent innovative approach to studying gene function used directed (in lab) evolution to try and encourage a derived gene to revert to an hypothesized ancestral function (Bershtein & Tawfik, 2008). The authors studied the rate of 'reversion' and how this rate varied when various degrees of selective pressure (selecting for the ancestral function, the current function, or both) were applied. By studying the transitional states the gene underwent as its function shifted, the authors found evidence that best fit the subfunctionalization model of duplicate gene evolution.

### **8.2 Expression profiling and reconstruction**

Expression profiles (mined from expression assays like high-throughput sequencing) can provide immediate evidence of functional differentiation between duplicated genes (based on divergent, non-overlapping expression behaviour). For example, Yasukawa et al. (2010) use RNA in-situ and reporter analysis to determine precise expression localization and timing of duplicates in *Drosophila*. Rajashekar (2007) used microarray expression profiles to analyze the similarity of duplicates in the hydrophobin gene family in a fungus, *Paxillus involutus*.

By augmenting comparisons of gene expression profiles with reconstructed estimates of parental gene expression (via ancestral character estimation projected onto phylogenetic trees, see section 7), it is further possible to estimate how each gene progeny specialized from its parent following duplication. Case examples are the studies by Doxey et al. (2007) and Zou et al. (2009), both mentioned previously. Zou et al. (2009) reconstructed the expression behaviour in stress response genes in Arabidopsis, and with the additional information made available in the estimates of ancestral behaviour, the authors were able to infer patterns of subfunctionalization and neofunctionalization leading to the expression behaviour in extant duplicates. Karanth et al. (2009) reconstructed the ancestral tissue expression patterns of fatty acid binding proteins in zebrafish, revealing an apparent neofunctionalization event followed by subfunctionalization in a subsequent duplication.

### **8.3 Comparing with a non-duplicated ortholog**

50 Gene Duplication

The most important techniques for studying functional specialization focus on different aspects of gene function, but are all generally associated with the task of distinguishing the roles and fates of duplicate genes. Figure 3 provides a diagram summarizing the various

Unequivocal evidence for functional specialization can be drawn from studies of enzyme kinetics. By measuring the substrate affinity and catalytic rates of enzymes, for example, it is possible to quantitatively measure differences in performance between duplicate genes. Biochemical approaches are very labor intensive and only readily applicable to certain classes of genes, but the evidence they provide is direct and readily

In a study highlighting the potential importance of the EAC model, Des Marais and Rauscher (2008) used enzyme affinity assays to demonstrate the enzymatic function of paralogous anthocyanin biosynthetic pathway genes in morning glories. Enzyme kinetics were compared across different species that differed by a duplication of a specific enzyme, with the unduplicated ortholog acting as a proxy for the ancestral

A recent innovative approach to studying gene function used directed (in lab) evolution to try and encourage a derived gene to revert to an hypothesized ancestral function (Bershtein & Tawfik, 2008). The authors studied the rate of 'reversion' and how this rate varied when various degrees of selective pressure (selecting for the ancestral function, the current function, or both) were applied. By studying the transitional states the gene underwent as its function shifted, the authors found evidence that best fit the subfunctionalization model of

Expression profiles (mined from expression assays like high-throughput sequencing) can provide immediate evidence of functional differentiation between duplicated genes (based on divergent, non-overlapping expression behaviour). For example, Yasukawa et al. (2010) use RNA in-situ and reporter analysis to determine precise expression localization and timing of duplicates in *Drosophila*. Rajashekar (2007) used microarray expression profiles to analyze the similarity of duplicates in the hydrophobin gene family

By augmenting comparisons of gene expression profiles with reconstructed estimates of parental gene expression (via ancestral character estimation projected onto phylogenetic trees, see section 7), it is further possible to estimate how each gene progeny specialized from its parent following duplication. Case examples are the studies by Doxey et al. (2007) and Zou et al. (2009), both mentioned previously. Zou et al. (2009) reconstructed the expression behaviour in stress response genes in Arabidopsis, and with the additional information made available in the estimates of ancestral behaviour, the authors were able to infer patterns of subfunctionalization and neofunctionalization leading to the expression behaviour in extant duplicates. Karanth et al. (2009) reconstructed the ancestral tissue expression patterns of fatty acid binding proteins in zebrafish, revealing

**8. Making a case for duplicate specialization** 

**8.1 Biochemical function and analysis** 

interpreted.

function.

duplicate gene evolution.

in a fungus, *Paxillus involutus*.

**8.2 Expression profiling and reconstruction** 

aspects of gene function that are amenable to these techniques.

As discussed earlier, one effective technique for estimating the function of the ancestor of a pair of duplicate genes is to refer to a related species where the locus is unduplicated. In this case, the assumption is that the orthologous gene is behaving in the related genome as the parental gene was behaving prior to the duplication event. This point of reference makes it possible to distinguish between models of duplicate retention, lending to support towards subfunctionalization versus neofunctionalization, for example.

In a study of zebrafish-specific WGD-produced duplicates, Kassahn et al. (2009) use unduplicated mouse orthologs as a reference, despite the considerable distance separating these two organisms. Multiple gene properties were compared between paralogs and their mouse ortholog, including sequence, structure, and expression information. The authors found support for neofunctionalization in a number of duplicates, and that regulatory changes were far more common than changes to gene products.

In a study of human genes, Panchin et al. (2010) chose to use distantly related gene family members as proxies for ancestors of recent paralogs. They demonstrated that, in many cases, the recent duplicates are evolving asymmetrically, with one duplicate accumulating sequence mutations much faster than its sibling.

Semon and Wolfe (2008) conducted a study comparing the fate of WGD duplicates in *X.laevis*, an allopolyploid, to *X. tropicalis*, a related species that did not undergo any WGD. Expression patterns were compared across 11 tissue types, and related losses of tissue breadth to possible subfunctionalization. In addition to this, the authors also compared the fate of duplicated genes produced through two different large-scale duplication mechanisms by comparing *X.laevis* to zebrafish, a species with a well studied WGD that did not stem from allopolyploidy. They find that duplicates retained in the *X.laevis* duplication were also frequently retained in duplicate in zebrafish, suggesting common influences on the duplicability of these gene varieties.

Another example of a well-studied allopolyploid, cotton, has been discussed in previous sections (Chaudhary et al., 2009; Flagel et al., 2008; Flagel & Wendel, 2010). One unique observation made possible in this system is the phenomenon of transgressive segregation, where the expression profiles of homeologous genes eventually evolve to resemble neither of the parental strains, suggesting a unique adaptation to the presence of two essentially complete genomes within a single cell (Flagel & Wendel, 2010).

### **8.4 Comparing gene product properties**

While not as easily assayed as gene expression, the transcribed content of genes (i.e. proteins) can also suggest the gain and loss of functions. As a simple example, the rate of protein sequence evolution can be compared between duplicates by comparing their respective rates of synonymous and non-synonymous mutation. While not necessarily illustrative of the nature of the difference, this method can provide evidence for asymmetrical selection, suggesting one duplicate is acquiring amino acid altering mutations faster than the other (Ganko et al., 2007). Working from a list of 15 of the most asymmetrically diverged WGD-derived protein sequences in *S. cerevisiae*, Turunen et al. (2009) noted substantial indels in addition to changes in important catalytic residues and

Detection and Analysis of Functional Specialization in Duplicated Genes 53

active/cofactor binding sites. A literature search seemed to support that many of these

Other aspects of gene function, such as the position and number of introns or methylation sites, have also been used to characterize divergence between duplicated genes. For example, Xiong et al. (2010) include intron position in a study of the expansion of the ABC transporter gene family in the ciliate *Tetrahymena thermophila*. In addition to comparing the expression profiles constructed by clustering gene expression data, the authors also compare intron positions to group the family members into functional subcategories (Xiong et al., 2010). A similar study has examined differential splicing forms in duplicate genes of

Yang et al. (2006) compared the "DNA-binding with one finger" (DOF) gene family across three plant species – rice, *Arabidopsis thaliana*, and poplar. Their multifaceted approach to describing gene function included an analysis of protein motif gain/loss and changes to methylation patterns. Combined with information about gene regulation drawn from microarrays, PCR and massively parallel signature sequencing, the authors compared the

When the information is available, the protein-protein interaction partners of duplicates can also be compared to study duplicate specialization. Nielsen et al. (2010) compared a set of residues in the tail ends of tubulin genes in fruit flies, noting divergence in these regions which may reflect changes in protein-protein interaction partners. Studying the applicability of models like subfunctionalization and neofunctionalization at the level of gene networks has helped integrate duplicate specialization into a broader systems biology context (Gibson

Studies of the evolution of duplicate genes are pushing the field towards more exacting standards and definitions for gene function. Since the rate and extent of duplicate gene specialization is dependent on so many factors, and since novel functions can emerge in so many different ways, integrative approaches will be of paramount importance to understanding this key aspect of genomic evolution. Future studies can benefit in particular from the inclusion of data from gene families as a whole, as this additional information helps both with estimating ancestral gene functions and with evaluating the breadth of

While empirical evidence of differential catalytic function remains the gold standard for proving functional specialization of duplicated genes, high-throughput studies exploiting the vast quantities of minable expression data provide a cheap and effective means for studying functional specialization at the level of whole gene families. Genomes with annotations beyond expression profiles (such as gene-by-gene interaction profiles and essentiality data) should be helpful for determining the extent to which functional changes

Arnaiz, O., Gout, J. F., Betermier, M., Bouhouche, K., Cohen, J., Duret, L., Kapusta, A.,

Meyer, E. & Sperling, L. (2010). Gene expression in a paleopolyploid: a

relative applicability of various duplicate retention models to the DOF family.

highly modified duplicates had acquired novel functions.

& Goldberg, 2009; MacCarthy & Bergman, 2007).

function previously covered by related genes.

at the regulatory level actually impact phenotype.

*Drosophila* (Zhan et al., 2011).

**9. Conclusion** 

**10. References**

Fig. 3. Gene properties that can be examined for evidence of functional specialization. The top set (orange) are approaches that check for differences in gene regulation; expression levels reflect measurements of transcription in tissues in response to a series of stresses (e.g. as obtained from microarrays). The bottom set (blue) are aspects of the gene product that may differ between duplicates. Sequence logos may be generated using the WebLogo software (Crooks et al., 2004).

active/cofactor binding sites. A literature search seemed to support that many of these highly modified duplicates had acquired novel functions.

Other aspects of gene function, such as the position and number of introns or methylation sites, have also been used to characterize divergence between duplicated genes. For example, Xiong et al. (2010) include intron position in a study of the expansion of the ABC transporter gene family in the ciliate *Tetrahymena thermophila*. In addition to comparing the expression profiles constructed by clustering gene expression data, the authors also compare intron positions to group the family members into functional subcategories (Xiong et al., 2010). A similar study has examined differential splicing forms in duplicate genes of *Drosophila* (Zhan et al., 2011).

Yang et al. (2006) compared the "DNA-binding with one finger" (DOF) gene family across three plant species – rice, *Arabidopsis thaliana*, and poplar. Their multifaceted approach to describing gene function included an analysis of protein motif gain/loss and changes to methylation patterns. Combined with information about gene regulation drawn from microarrays, PCR and massively parallel signature sequencing, the authors compared the relative applicability of various duplicate retention models to the DOF family.

When the information is available, the protein-protein interaction partners of duplicates can also be compared to study duplicate specialization. Nielsen et al. (2010) compared a set of residues in the tail ends of tubulin genes in fruit flies, noting divergence in these regions which may reflect changes in protein-protein interaction partners. Studying the applicability of models like subfunctionalization and neofunctionalization at the level of gene networks has helped integrate duplicate specialization into a broader systems biology context (Gibson & Goldberg, 2009; MacCarthy & Bergman, 2007).
