**3. Differential retention and loss of duplicated genes during teleost diversification**

The most common fate of a duplicated gene is nonfunctionalization (pseudogenization). After a whole genome duplication event, many genes share this fate, so that a genome's gene content may only appear be slightly increased long after the duplication (Wolfe and Shields 1997; Jaillon et al. 2004). Our data suggest that only 3.3 percent (zebrafish) to 7.2 percent (Takifugu) of genes in current teleost genomes result from the FSGD event (Table 1). These percentages are lower than the 13 percent of retained duplicates in yeast (Wolfe and Shields 1997). One possible reason for this difference might lie in our topology-based method to identify likely FSGD duplicates (Figure 1), which enforces duplicated genes to exist in at least

duplication. First, many duplicated genes that resulted from the FSGD event were preserved in teleost genomes. Second, five teleost genomes have been sequenced and more teleost genomes are being sequenced. Here, we investigate retention, loss, and molecular evolution of duplicate genes after the FSGD in five available teleost geomes that include the genomes of zebrafish *Danio rerio*, stickleback *Gasterosteus aculeatus*, medaka *Oryzias latipes*, Takifugu

**2. Identifying duplicated genes that resulted from the FSGD event throughout** 

We obtained 23,155 gene families from the database HOMOLENS version 4 (ftp://pbil.univ-lyon1.fr/databases/homolens4.php) (Penel et al. 2009), which is based on the Ensembl release 49. We chose HOMOLENS, because it allowed us to reliably retrieve sets of orthologous genes for our evolutionary analysis. HOMOLENS is devoted to metazoan genomes from Ensembl and contains gene families from complete animal genomes found in Ensembl. HOMOLENS has the same architecture as HOVERGEN (Duret et al. 1994), in which genes are organized in families and include precalculated alignments and phylogenies. In HOMOLENS 4, alignments are computed using MUSCLE (Edgar 2004) with default parameters; phylogenetic trees are computed with PHYML, using the JTT amino acid substitution model (Jones et al. 1992). Phylogenies are computed based on conserved blocks of the alignments selected with Gblocks (Castresana 2000). Each phylogenetic tree is reconciled with a species tree using the program RAP (Dufayard et al. 2005), which, combined with the tree pattern search functionality, allows detection of ancient gene duplications or selection of orthologous genes (Penel et al. 2009). Several studies on duplicated gene evolution have been performed with data retrieved from

We employed a topology-based method to identify duplicated genes that resulted from the FSGD event in the five teleost genomes we study. Briefly, if two teleosts have been subject to the same whole genome duplication event, a gene *X* that has been duplicated in this event and retained in both genomes, should form two gene lineages ''*Xa*'' and ''*Xb*'' (Figure 1A). We identified gene trees with the topology shown in Figure 1A using the TreePattern functionality (Dufayard et al. 2005) of the FamFetch client for HOMOLENS. We required duplicated genes to exist in at least two species to increase the likelihood that they result from the FSGD event (Figure 1B). In total, we identified 1,500 gene families with duplicated

The most common fate of a duplicated gene is nonfunctionalization (pseudogenization). After a whole genome duplication event, many genes share this fate, so that a genome's gene content may only appear be slightly increased long after the duplication (Wolfe and Shields 1997; Jaillon et al. 2004). Our data suggest that only 3.3 percent (zebrafish) to 7.2 percent (Takifugu) of genes in current teleost genomes result from the FSGD event (Table 1). These percentages are lower than the 13 percent of retained duplicates in yeast (Wolfe and Shields 1997). One possible reason for this difference might lie in our topology-based method to identify likely FSGD duplicates (Figure 1), which enforces duplicated genes to exist in at least

**3. Differential retention and loss of duplicated genes during teleost** 

*Takifugu rubripes*, and Tetraodon *Tetraodon nigroviridis*.

HOMOLENS (Brunet et al. 2006; Studer et al. 2008).

**the teleost genomes** 

genes in this way.

**diversification** 

two teleost genomes. Thus, our method would overlook duplicated genes that result from the FSGD and that are retained in only one teleost genome. While we cannot exclude this possibility, we note that our observations are consistent with a genome-wide study of Tetraodon, in which Jaillon et al. (2004) showed that up to 3 percent of duplicated genes may have been retained since the FSGD event. One plausible explanation of the difference in duplicated gene retention between teleost and yeast may come from the different ages of the genome duplication event. In addition, Kassahn et al. (2009) suggested that a minimum of 3 to 4 percent of protein-coding loci have been retained in two copies in each of the five model

(A) Species A Species B Species A Species B Gene\_*Xa* Gene\_*Xb* Duplication Speciation (B) *Homo sapiens* FSGD Gene\_*Xa* Gene\_*Xb* Gene\_*X* N≥2 N≥2

Fig. 1. (A) Expected phylogenetic relationship of duplicated gene *Xa* and *Xb* in two related species A and B when speciation occurred after the duplication event; (B) Tree topology we used for duplicated gene identification in the database HOMLENS 4. 'N ≥ 2' means that duplicated gene pairs must exist at least in two species to increase the likelihood that the duplicated genes actually resulted from the FSGD event.

Duplicated Gene Evolution Following Whole-Genome Duplication in Teleost Fish 31

*Hox* genes exist in extant diploid teleosts. They are a result of the FSGD event, which was followed by loss of some *Hox* gene duplicates. The putative *Hox* cluster complement of the teleost ancestor and the *Hox* clusters of several model teleost species are shown in Figure 3. *Hox* clusters exhibit remarkably different gene complements in different teleost lineages after the FSGD event. Theoretically, 8 *Hox* clusters containing at least 80 *Hox* genes genes may have existed in the ancestor of teleosts after the FSGD event. Up to now, 66 of these *Hox* genes have been found in different teleost species and extant evolutionary diploid teleost usually have 45 to 49 *Hox* genes in their genome (Figure 3). According to the summary of Hoegg et al (2007) (Figure 3), the Ostariophysii have lost seven *Hox* genes since their hypothetical common ancestor with the Neoteleosts; during the evolution of the Neoteleosts eight *Hox* genes were lost; and the pufferfish lineage lost three genes in the common lineage leading to Takifugu and Tetraodon. Some *Hox* genes are specifically preserved in different teleosts, for example, *HoxA1b* has been identified thus far only in the Japanese eel (Guo et al. 2010). At the cluster level, eight *Hox* clusters were retained in basal species such as the Japanese eel (Guo et al. 2010) and the goldeye (Chambers et al. 2009), whereas one *Hox* cluster (C or D) was lost respectively in the Otocephala (Amores et al. 1998) and Euteleostei (Kurosawa et al. 2006). Based on the phylogeny of teleosts, Guo et al. (2010) proposed that the *HoxDb* cluster

731\* / 541\*\* / 228\*\*\*

1500

681 / 669 / 150

1,162 / 311 / 27

1,340 / 148 / 12

1,047 / 397 / 56

Fig. 2. Differential retention and loss of duplicated genes during teleost diversification. The topology is adopted from (Negrisolo et al. 2010). \*: retention of both copies; \*\*: retention of

FSGD

one copy; \*\*\*: loss of both copies.

*Homo sapiens*

*Danio rerio*

*Oryzias latipes*

*Gasterosteus aculeatus*

*Tetraodon nigroviridis*

*Takifugu rubripes*


*Gene families with likely FSGD duplicates* 

\* Total gene number in each genome, data based on the Ensembl release 49.

Table 1. Summary of different gene retention and loss in the 1,500 duplicated gene families we identified.

fish genomes. The FSGD occurred between 253 and 404 Million years ago (MYA) (Hoegg et al. 2004; Vandepoele et al. 2004), whereas the yeast whole genome duplication may have occurred more recently, between 100 and 150 MYA (Sugino and Innan 2005). More time has elapsed since the FSGD, allowing more duplicate genes to be lost.

Differential retention and loss of duplicated genes is a common phenomenon during speciation after genome duplication. It has been observed in yeast (Scannell et al. 2006) as well as in teleosts (Semon and Wolfe 2007), and is believed to lead to speciation. We thus expected that our dataset would contain many gene families with differential gene retention and loss, as well as fewer families where both copies are retained in all five teleost genomes. Indeed, when we consider all five species together, we observed that 90.4 percent of the 1,500 gene families we identified show differential retention and loss of duplicated genes, and in only 9.6 percent (144 gene families) are both copies retained in all five teleost genomes. Figure 2 and Table 1 show relevant data, broken down by study species. In 45.4 percent to 89.3 percent (depending on the species) of the 1,500 gene families we identified, both duplicates were retained. In 9.9 percent to 44.6 percent of the duplicates (depending on the species), one copy was lost. Our data also indicate that differences in differential gene retention are associated with the phylogenetic position and the relatedness between two teleost species (Figure 2). Taken together, these observations indicate that differential duplicated gene retention and loss are pervasive in teleosts, that the loss of duplicated genes is an ongoing process that has continued for hundreds of million years after the FSGD event, and that this process may be associated with teleost diversification.

We next discuss an illustrative example of differential duplicate gene retention and loss. It involves *Hox* genes, which encode a subclass of homeodomain transcription factors that help determine the anterior–posterior axis of bilaterian animals (McGinnis and Krumlauf 1992). In vertebrates, *Hox* genes have evolved a highly compact organization, where genes are arranged in clusters on chromosomes. *Hox* gene clusters are one of the best-studied systems for assessing gene retention and loss after the FSGD event (Amores et al. 1998; Prohaska and Stadler 2004; Hoegg et al. 2007; Guo et al. 2010), due to their genomic architecture and gene complement variation in teleosts. Seven or eight *Hox* clusters with different complements of

*D. rerio* 21,420 731 541 228 *G. aculeatus* 20,839 681 669 150 *O.latipes* 19,687 1,162 311 27 *T. rubripes* 18,709 1,340 148 12 *Te. nigroviridis* 27,991 1,047 397 56

Table 1. Summary of different gene retention and loss in the 1,500 duplicated gene families

fish genomes. The FSGD occurred between 253 and 404 Million years ago (MYA) (Hoegg et al. 2004; Vandepoele et al. 2004), whereas the yeast whole genome duplication may have occurred more recently, between 100 and 150 MYA (Sugino and Innan 2005). More time has

Differential retention and loss of duplicated genes is a common phenomenon during speciation after genome duplication. It has been observed in yeast (Scannell et al. 2006) as well as in teleosts (Semon and Wolfe 2007), and is believed to lead to speciation. We thus expected that our dataset would contain many gene families with differential gene retention and loss, as well as fewer families where both copies are retained in all five teleost genomes. Indeed, when we consider all five species together, we observed that 90.4 percent of the 1,500 gene families we identified show differential retention and loss of duplicated genes, and in only 9.6 percent (144 gene families) are both copies retained in all five teleost genomes. Figure 2 and Table 1 show relevant data, broken down by study species. In 45.4 percent to 89.3 percent (depending on the species) of the 1,500 gene families we identified, both duplicates were retained. In 9.9 percent to 44.6 percent of the duplicates (depending on the species), one copy was lost. Our data also indicate that differences in differential gene retention are associated with the phylogenetic position and the relatedness between two teleost species (Figure 2). Taken together, these observations indicate that differential duplicated gene retention and loss are pervasive in teleosts, that the loss of duplicated genes is an ongoing process that has continued for hundreds of million years after the FSGD event,

We next discuss an illustrative example of differential duplicate gene retention and loss. It involves *Hox* genes, which encode a subclass of homeodomain transcription factors that help determine the anterior–posterior axis of bilaterian animals (McGinnis and Krumlauf 1992). In vertebrates, *Hox* genes have evolved a highly compact organization, where genes are arranged in clusters on chromosomes. *Hox* gene clusters are one of the best-studied systems for assessing gene retention and loss after the FSGD event (Amores et al. 1998; Prohaska and Stadler 2004; Hoegg et al. 2007; Guo et al. 2010), due to their genomic architecture and gene complement variation in teleosts. Seven or eight *Hox* clusters with different complements of

*Gene families with likely FSGD duplicates* 

FSGD Duplicates Singleton Double loss

*Number of genes\** 

\* Total gene number in each genome, data based on the Ensembl release 49.

elapsed since the FSGD, allowing more duplicate genes to be lost.

and that this process may be associated with teleost diversification.

we identified.

*Hox* genes exist in extant diploid teleosts. They are a result of the FSGD event, which was followed by loss of some *Hox* gene duplicates. The putative *Hox* cluster complement of the teleost ancestor and the *Hox* clusters of several model teleost species are shown in Figure 3. *Hox* clusters exhibit remarkably different gene complements in different teleost lineages after the FSGD event. Theoretically, 8 *Hox* clusters containing at least 80 *Hox* genes genes may have existed in the ancestor of teleosts after the FSGD event. Up to now, 66 of these *Hox* genes have been found in different teleost species and extant evolutionary diploid teleost usually have 45 to 49 *Hox* genes in their genome (Figure 3). According to the summary of Hoegg et al (2007) (Figure 3), the Ostariophysii have lost seven *Hox* genes since their hypothetical common ancestor with the Neoteleosts; during the evolution of the Neoteleosts eight *Hox* genes were lost; and the pufferfish lineage lost three genes in the common lineage leading to Takifugu and Tetraodon. Some *Hox* genes are specifically preserved in different teleosts, for example, *HoxA1b* has been identified thus far only in the Japanese eel (Guo et al. 2010). At the cluster level, eight *Hox* clusters were retained in basal species such as the Japanese eel (Guo et al. 2010) and the goldeye (Chambers et al. 2009), whereas one *Hox* cluster (C or D) was lost respectively in the Otocephala (Amores et al. 1998) and Euteleostei (Kurosawa et al. 2006). Based on the phylogeny of teleosts, Guo et al. (2010) proposed that the *HoxDb* cluster

Fig. 2. Differential retention and loss of duplicated genes during teleost diversification. The topology is adopted from (Negrisolo et al. 2010). \*: retention of both copies; \*\*: retention of one copy; \*\*\*: loss of both copies.

Duplicated Gene Evolution Following Whole-Genome Duplication in Teleost Fish 33

2004), and calculated DNA alignments from protein alignments with RevTrans (Wernersson and Pedersen 2003). The following computations were then done on the new DNA alignments. We estimated the nucleic acid evolutionary distance between fish genes and their human orthologs using the LogDet nucleotide substitution model

Previous studies show that duplicated genes in yeast often diverge asymmetrically (Kellis et al. 2004), meaning that one copy evolves significantly faster than the other. We asked whether this is also the case for teleost duplicates. To this end, we compared evolutionary distances of duplicated genes with their human orthologs within the 1,500 gene families we had identified. There is indeed evidence for asymmetric evolution between duplicated gene pairs from the FSGD event (Table 2). Average evolutionary distances to the human homologue between members of duplicated gene pairs are significantly different for each of our five teleost species (paired *t*-test: *P* < 4.8 × 10—95). As all duplicated gene pairs stemming from the FSGD diverged at the same time from their human orthologs, we can directly convert differences between evolutionary distances into differences between evolutionary rates. Taken together, our observations suggest that duplicate genes tend not to accumulate sequence change at the same rate. Our results are consistent with previous works in teleosts (Brunet et al. 2006; Steinke et al. 2006) and yeast (Kellis et al. 2004), and confirm that asymmetric sequence evolution between duplicated genes is a frequent pattern of duplicated gene evolution after a genome duplication

 *D. rerio G. aculeatus O.latipes T. rubripes Te. nigroviridis* 

Duplicate\_L 0.613 ± 0.243 0.607 ± 0.229 0.621 ± 0.230 0.623 ± 0.229 0.614 ± 0.224 Duplicate\_S 0.529 ± 0.213 0.526 ± 0.200 0.536 ± 0.195 0.535 ± 0.195 0.505 ± 0.182 P-value\* 4.1 × 10—105 4.8 × 10—95 1.9 × 10—165 7.3 × 10—175 8.9 × 10—133

Table 2. Average evolutionary distances of duplicated genes in five teleost species to their

In summary, we used a phylogenetic method to identify 1,500 duplicated gene families in five teleost species that are likely to have resulted from the FSGD event. Only a small fraction of genes in extant teleost genomes have been retained in the FSGD event. Differential retention and loss of duplicated gene is pervasive in the five species we studied, as is illustrated by genes in the teleost *Hox* gene clusters. Sequence analysis suggests that some duplicated genes pairs may evolve asymmetrically. Our work provides a framework for future studies of the evolutionary trajectory of duplicated genes

Duplicate\_L: duplicated gene in each duplicate pair that has the larger distance to the human orthologue (distances averaged over all duplicate gene families); Duplicate\_S: duplicated gene in each duplicate pair that has the smaller distance to the human orthologue (distances averaged over all

duplicate gene families). All means are ± one standard deviation.

(Tamura and Kumar 2002) in PHYLIP-3.6b (Felsenstein 2004).

event.

\* paired *t*-test

human orthologs.

**5. Conclusion** 

in the teleost genome.

Fig. 3. *Hox* gene clusters, the best-studied examples of differential duplicate gene retention and loss in teleosts. Hypothetical *Hox* clusters of the teleost ancestor (modified from Guo et al. 2010), and *Hox* clusters of teleost model fish species, together with specific gene loss events shown on a phylogenetic tree of select fish species (adapted form Hoegg et al. 2007).

was lost independently in the Otocephala and Euteleostei after the FSGD event. The ongoing process of *Hox* gene loss and retention in teleosts illustrates again that degeneration of functionally important duplicated genes can last for hundreds of millions of years after the FSGD event.
