**3. Discussion**

The role of partial gene duplication in the formation of novel genes is still poorly understood, although recent reports in *Drosophila* (Chen et al., 2010; Zhou et al., 2008) and *C.elegans* (Katju & Lynch, 2006; 2003) indicate that partially duplicated gene copies are very frequent. The present study analyses a set of primate-specific genes formed by partial gene duplication. We find that the rate of divergence of the partially duplicated copy is, in all cases, higher that the rate of divergence of the parental copy, generalizing previous observations for XAGE1-A (Toll-Riera et al., 2009a). This, together with the fact that most partially duplicated genes recruit additional sequences, strengthens the notion that partial duplication is a major process for the formation of genes with novel structures and functions. In these genes, any remaining similarity to the homologous proteins is being quickly erased by high sequence turnover. As a consequence, distant homologues are difficult to identify and these proteins end up being classified as orphans. This fits the model of Domazet-Loso and Tautz in explaining the high number of orphan genes in *Drosophila*: orphan genes are created by gene duplication followed by a period of rapid sequence divergence that erases the similarity with its homologues (Domazet-Loso & Tautz, 2003). Although we now have evidence that not all orphan genes are generated in this manner (Toll-Riera et al., 2009a; Toll-Riera et al., 2009b; Zhou et al., 2008), a significant portion is.

A large fraction of the duplicated gene copies that become fixed in a population are subsequently lost, presumably because the new copy is completely redundant and thus dispensable. However, the formation of chimeric gene structures, encoding part of an existing protein together with additional sequences, could in principle favour their retention, as these genes are not going to be functionally equivalent to the ancestral gene (Patthy, 1999; Zhou et al., 2008). In support of this, in *Drosophila* it was found that the proportion of novel genes corresponding to complete gene duplications decreased with gene age, suggesting that complete gene duplications had a shorter lifespan than partial gene duplications (Zhou et al., 2008).

Fig. 4. Phylogenetic tree of the FAM9 gene family. Branch lengths correspond to the estimated number of amino acid substitutions per site, using the alignment in Fig 3. The protein alignment shown corresponds to exon 5 in FAM9B and FAM9C and to exon 6 in FAM9A, human SYCP3 and mouse SYCP3. The expanded low-complexity region in FAM9A

FAM9A shows higher sequence divergence from the common ancestor than FAM9B.

acidic clusters have been shown to mediate protein nucleolar retention (Ochs et al., 1996; Shu-Nu et al., 2000; Ueki et al., 1998).In FAM9A, the low complexity sequence is located within the Cor1/Xlr/Xmr conserved region, perhaps interfering with its function. In fact,

The role of partial gene duplication in the formation of novel genes is still poorly understood, although recent reports in *Drosophila* (Chen et al., 2010; Zhou et al., 2008) and *C.elegans* (Katju & Lynch, 2006; 2003) indicate that partially duplicated gene copies are very frequent. The present study analyses a set of primate-specific genes formed by partial gene duplication. We find that the rate of divergence of the partially duplicated copy is, in all cases, higher that the rate of divergence of the parental copy, generalizing previous observations for XAGE1-A (Toll-Riera et al., 2009a). This, together with the fact that most partially duplicated genes recruit additional sequences, strengthens the notion that partial duplication is a major process for the formation of genes with novel structures and functions. In these genes, any remaining similarity to the homologous proteins is being quickly erased by high sequence turnover. As a consequence, distant homologues are difficult to identify and these proteins end up being classified as orphans. This fits the model of Domazet-Loso and Tautz in explaining the high number of orphan genes in *Drosophila*: orphan genes are created by gene duplication followed by a period of rapid sequence divergence that erases the similarity with its homologues (Domazet-Loso & Tautz, 2003). Although we now have evidence that not all orphan genes are generated in this manner (Toll-Riera et al., 2009a; Toll-Riera et al., 2009b; Zhou et al., 2008), a significant

A large fraction of the duplicated gene copies that become fixed in a population are subsequently lost, presumably because the new copy is completely redundant and thus dispensable. However, the formation of chimeric gene structures, encoding part of an existing protein together with additional sequences, could in principle favour their retention, as these genes are not going to be functionally equivalent to the ancestral gene (Patthy, 1999; Zhou et al., 2008). In support of this, in *Drosophila* it was found that the proportion of novel genes corresponding to complete gene duplications decreased with gene age, suggesting that complete gene duplications had a shorter lifespan than partial

is depicted above the alignment.

**3. Discussion** 

portion is.

gene duplications (Zhou et al., 2008).

Orphan genes are in general poorly annotated and their function is unknown in most cases (Kuo & Kissinger, 2008). The fact that organisms had lived perfectly well without them until recent times when they made their appearance, has led scientists to think that orphan genes were, for the most part, dispensable. However, a recent study by Chen and colleagues (Chen et al., 2010) has challenged this viewpoint. In their study, the authors identified new young genes in Drosophila melanogaster (around 34 million years old) and designed RNA interference lines to knoch each of them out (KO). Surprisingly, they found that 30% of these young genes KOs were lethal, as Drosophila could not survive without them. These young genes had mainly arisen through duplication and they showed higher evolutionary rates than the parental gene, indicating the action of positive selection, or relaxation of functional constraints. They hypothesized that new genes are quickly integrated into existing pathways, and hence many of them soon become essential for the viability of the organism.

Capra and colleagues (Capra et al., 2010) compared the evolutionary patterns of genes that arose by duplication with those that did not (named novel genes). They argued that the evolutionary pressures should be different in each case as, contrary to novel genes, duplicated genes were functionally and structurally well formed from birth. They showed that although duplicated genes are initially more integrated into cellular networks, both types of new genes gain functions and interactions with time, though novel genes do it more rapidly than duplicated genes. Additionally, novel genes also increase in length through the incorporation of transposable elements or surrounding sequences. This increase in length could be related with the rapid gain of function and interactions experienced by novel genes. They also found that genes tended to interact with genes similar in age and mode of origin. Thus, the mechanism by which a gene originates seems to significantly impact on its subsequent evolution.

Several studies have demonstrated that duplicated genes show increased protein evolutionary rates with respect to non-duplicated genes in the same lineage (Castillo-Davis et al., 2004; Cusack & Wolfe, 2007; Kondrashov et al., 2002; Lynch & Conery, 2000; Nembaware et al., 2002; Scannell & Wolfe, 2008; Van de Peer et al., 2001). Here we identified a very strong asymmetry in the rates of evolution of the newly evolved copy (orphan) and the well-conserved copy (parental), the former evolving much faster than the latter. Surprisingly, the parental protein copy did not evolve consistently faster than the outgroup protein (not duplicated), highlighting the fact that we are dealing with a special type of gene duplication in which the copy containing the partially duplicated segment rapidly departs from the ancestral family, which remains essentially unaffected.

Increased evolutionary rates may reflect either relaxation of purifying selection, positive selection, or the combined effects of both these forces. The orphan genes under study predated the split of the human and macaque lineages, which occurred approximately 25 million years ago so, if relaxed selection was the only factor for their increased rates, the genes should by now have become pseudogenes and not be expressed. However, all genes were expressed at the RNA level in one or several tissues. Therefore we must hypothesize that, at least to some extent, positive selection has influenced the evolution of these genes.

We compared the rates of evolution of the protein regions that were conserved between orphan and parental proteins, but what about the unique sequences contained in the orphan proteins? These sequences lacked any similarity to other protein-coding genes, so they may be ancestral non-coding sequences that have been co-opted for a coding function (Long et al., 2003). Genes generated *de novo* from non-coding sequences are among the fastest evolving genes (Levine et al., 2006), and there is no reason to believe that unique sequences

Partial Gene Duplication and the Formation of Novel Genes 107

Castillo-Davis, C.I., Hartl, D.L. & Achaz, G. 2004. cis-Regulatory and protein evolution in

Castresana, J., Guigo, R. & Alba, M.M. 2004. Clustering of genes coding for DNA binding

Chen, S., Zhang, Y.E. & Long, M. 2010. New genes in Drosophila quickly become essential.

Cusack, B.P. & Wolfe, K.H. 2007. Not born equal: increased rate asymmetry in relocated and retrotransposed rodent gene duplicates. *Mol Biol Evol* 24(3): 679-686. Domachowske, J.B. & Rosenberg, H.F. 1997. Eosinophils inhibit retroviral transduction of

Domazet-Loso, T. & Tautz, D. 2003. An evolutionary analysis of orphan genes in

Drosophila 12 Genomes Consortium. 2007. Evolution of genes and genomes on the

Farre, D. & Alba, M.M. 2010. Heterogeneous patterns of gene-expression diversification in

Felsenstein, J. 2005. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle. Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L.,

Fondon, J.W., 3rd & Garner, H.R. 2004. Molecular origins of rapid and continuous morphological evolution. *Proc Natl Acad Sci U S A* 101(52): 18058-18063. Gilad, Y., Man, O. & Glusman, G. 2005. A comparison of the human and chimpanzee

Gruber, J.J., Zatechka, D.S., Sabin, L.R., Yong, J., Lum, J.J., Kong, M., Zong, W.X., Zhang, Z.,

Gu, Z., Nicolae, D., Lu, H.H. & Li, W.H. 2002. Rapid divergence in expression between duplicate genes inferred from microarray data. *Trends Genet* 18(12): 609-613. Guo, W.J., Li, P., Ling, J. & Ye, S.P. 2007. Significant comparative characteristics between

Heinen, T.J., Staubach, F., Haming, D. & Tautz, D. 2009. Emergence of a new gene from an

Hughes, A.L. 1994. The evolution of functionally novel proteins after gene duplication. *Proc* 

International Chicken Genome Sequencing Consortium. 2004. Sequence and comparative

Gunasekaran, P., Ceric, G., Forslund, K., Holm, L., Sonnhammer, E.L., Eddy, S.R. & Bateman, A. 2010. The Pfam protein families database. *Nucleic Acids Res*

Lau, C.K., Rawlings, J., Cherry, S., Ihle, J.N., Dreyfuss, G. & Thompson, C.B. 2009. Ars2 links the nuclear cap-binding complex to RNA interference and cell

orphan and nonorphan genes in the rice (Oryza sativa L.) genome. *Comp Funct* 

analysis of the chicken genome provide unique perspectives on vertebrate

proteins in a region of atypical evolution of the human genome. *J Mol Evol* 59(1):

human target cells by a ribonuclease-dependent mechanism. *J Leukoc Biol* 62(3):

orthologous and duplicate genes. *Genome Res* 14(8): 1530-1536.

72-79.

363-368.

*Science* 330(6011): 1682-1685.

38(Database issue): D211-222.

proliferation. *Cell* 138(2): 328-339.

*Genomics*: 21676.

*Biol Sci* 256(1346): 119-124.

evolution. *Nature* 432(7018): 695-716.

Drosophila. *Genome Res* 13(10): 2213-2219.

Drosophila phylogeny. *Nature* 450(7167): 203-218.

mammalian gene duplicates. *Mol Biol Evol* 27(2): 325-335.

olfactory receptor gene repertoires. *Genome Res* 15(2): 224-230.

Haldane, J.B.S. 1932. The causes of evolution. London: Longmans and Green.

intergenic region. *Curr Biol* 19(18): 1527-1531.

in orphan proteins will evolve slower than the conserved protein regions, rather the contrary would seem more logical. In a previous study we showed that the nonsynonymous to synonymous nucleotide substitution rates of primate-specific genes, measured for human and macaque orthologues, were, on average, twice as high as those of mammalian-specific genes and five times higher than those of deeply conserved eukaryotic proteins (Toll-Riera et al., 2009a). The differences in amino acid substitution rates between orphan and parental genes described here reinforce the idea that the evolution of a new gene is strongly associated with very rapid sequence change.
