**5.3 Properties of genes that influence duplicate specialization**

The rate at which duplicated genes acquire novel functions is of great interest. Studies have been done to compare the standard metrics of gene evolution (synonymous distance, nonsynonymous distance) to measures of functional differentiation across duplicate genes. While initial studies demonstrated only a weak correlation between expression divergence and sequence divergence, subsequent studies have drawn attention to a number of gene parameters that strongly influence the rate and extent of functional differentiation across duplicates.

The mode of duplication has been cited in multiple instances as an important determinant of eventually retention/functional specialization. In a study comparing the functional evolution of genes duplicated through different mechanisms, Ganko et al. (2007) found that WGD-derived duplicates tended to be expressed at higher levels and were more broadly expressed (in contrast to duplicates derived in smaller scale duplications). Wang et al. (2010) found that tandem gene duplicates tended to have conserved function, whereas retrotransposed genes were more likely to have undergone EAC.

Duplicate genes can differ in their tissue distribution, and certain tissues seem to have a greater propensity to adopt genes with novel function than others. In particular, novel duplicates show a tendency towards expression in the testes. Langille and Clark (2007) found that retrotransposed duplicates in particular showed testis-biased expression. Mikhaylova et al. (2008) also found that duplicated genes expressed in the testes tended to show particularly divergent expression across species. Gallach (2010) illustrate a trend for mitochondrial-associated proteins to preferentially fixate in autosomes (i.e. avoiding the X chromosome), and to have a strong tissue bias towards testes expression.

Han et al. (2009) revealed an interesting trend for duplicate genes that had been relocated (i.e. created by transposition). In instances where these duplicated genes showed asymmetric sequence evolution, in more than 80% of cases it was the relocated gene that showed stronger support for positive selection. This suggests an important role for chromosomal distribution in the evolution of gene function and duplicate divergence. A study by Tsankov et al. (2010) also showed that local chromatin organization (i.e.

Detection and Analysis of Functional Specialization in Duplicated Genes 47

Within the past decade, however, a number of high-throughput technologies have become available that allow the localization and abundance of gene products to measured empirically on a genomic/proteomic scale. At present, the most widely used platform is the microarray, an assay with a very large number of transcript-specific probes. Each probe is specific to a known transcript, allowing the potential for complete coverage of all known and predicted genes in a known genome sequence. Custom arrays can also be built from cDNA libraries when working with non-model organisms. Databases replete with microarray data are now publicly available for data mining, allowing a gene's expression (or lack thereof) to be profiled across tissues, timepoints, and stimuli. This aggregate gene behaviour is referred to as an "expression profile", and can serve as an empirical proxy of overall gene function. As more microarray data becomes available, the quality of this proxy

Expression measurement technologies measure gene activation directly, and are agnostic to the regulatory inputs/mechanisms that lead to transcription. In some cases, cis-regulatory regions can undergo substantial changes/shuffling without having much effect on the ultimate transcription behaviour of a gene -- transcription measurement technologies can help distinguish these cases from those that have actually changed a gene's expression

In addition to general purpose (i.e. gene, exon) microarrays, several arrays have been designed to be maximally sensitive to differences between closely related genes. Microarrays use probes that measure targets by hybridizing to nucleotides directly via base complementation. Studies have previously demonstrated that the nucleotides at the center of the probe have the most influence on binding strength. In order to minimize the potential for cross-hybridization, some researchers have designed microarrays for comparing closely related genes (e.g. homeologs) by using probes that feature a known distinguishing SNP at the central position in a probe (Chaudhary et al., 2009; Flagel & Wendel., 2010; Flagel et al., 2008; Udall et al., 2006). This design should minimize cross-hybridization, though it should be noted that previous studies have found that cross-hybridization is only of concern when target sequences are >90-95% identical (Rajashekar et al., 2007). For duplicate genes that have highly similar sequences, alternative measurement technologies like deep sequencing

Quantitative proteomics techniques such as iTRAQ (Burkhart et al., 2011) or 2D differential in-gel electrophoresis provide a similarly high-throughput platform for the quantitation of protein abundance. The data differs from microarray data in two respects – the identities of quantified proteins are often not known in advance, and the coverage of the proteome is not complete and is sensitive to experimental parameters. However, protein abundances may be a more accurate reflection of gene action, as proteins are the active products of genes in most cases and mRNA abundance doesn't always correlate

Gibson and Goldberg (2009) conducted a study on yeast duplicates using a novel metric of functional differentiation -- number and type of protein interactions. The authors used both affinity-precipitation mass spectometry and yeast-2-hybrid assays to construct networks of protein interactions, and then sought to test whether the patterns of functional differentiation better fit models of subfunctionalization or neofunctionalization. Their work expands on previous studies that describe the functional evolution of the genome/proteome in terms of the growth of (novel) protein interactions. They illustrate how existing methods

can be used to obtain unbiased paralog-specific expression profiles.

will improve.

phenotype (Comelli & Gonzalez, 2009).

with protein abundance.

nucleosome positions) has a strong effect on gene expression, which suggests that translocated duplicates may show expression divergence by virtue of chromosomal position alone. In addition, Ren et al. (2005) found that tandem duplicates that shared expression domains tended to have dissimilar sequence-based functions. Shoja et al. (2007) noted that tandem gene duplicates tended to show a relationship between expression divergence and chromosomal distance.

In their work on the possible action of gene conversion on the evolution of duplicated segments in *Drosophila*, Osada and Innan (2008) noted that duplications lying near the edges of duplicated segments showed more sequence divergence, suggesting that sublocation within a duplicated segment is an additional factor to consider in studies of duplicate divergence.

The broad functional category to which a gene belongs can also influence its freedom to explore divergent functions. In an analysis of genes in the rice genome produced through a specific WGD, Yim et al. (2009) found that duplicate genes with divergent functions showed a significant enrichment towards metabolism-related activity. Langille and Clark (2007) showed that "cell physiological process" genes were particularly amenable to duplication via transposition. Perhaps reflecting similar functional pressures, Li et al. (2010) found that subcellular localization also influenced the divergence of expression between duplicate genes.

The mode of retention may also depend on the amount of selective pressure acting on its coding sequence. Semon and Wolfe (2008) showed that duplicates undergoing slow rates of sequence evolution seemed particularly prone to regulatory subfunctionalization. This observation is echoed in Arnaiz et al. (2010), who find that duplicate pairs in *Tetraurelia* with divergent expression profiles were unlikely to undergo sequence subfunctionalization. Li et al. (2009) found that the mode of duplication had a substantial effect on the degree of expression divergence between duplicates, based on microarray expression profiles of rice tissues.

Nielsen et al. (2010) suggest that genes under strong selective pressure produce duplicates that are quickly nonfunctionalized, suggesting low tolerance for (poisonous) isoforms of essential products. Thus, a gene's essentiality and, by consequence, age, may both determine the extent to which gene duplicates may be retained.
