**3.** *Met* **homologs across holometabola**

Reports of methoprene resistance in mosquito populations (Dame *et al*., 1998; Cornel *et al*., 2000; Cornel *et al*., 2002) led us to investigate the *Met* orthologs of three mosquito species: *Aedes aegypti*, *Culex pipiens*, and *Anopheles gambiae*. Using a combination of degenerate RT-PCR and genomic database mining, we isolated a single *Met* homolog from each of these mosquitoes. Sequence analysis of these genes showed that they share high identity with both *Met* and the closely related *gce* from *Drosophila*, as expected. However, a comparison of the genomic structures among *DmMet*, *Dmgce*, and the three putative mosquito *Met* genes revealed higher structural conservation between each mosquito *Met* and *Dmgce*. Importantly, the intron number of these genes is more consistent with that of *Dmgce* than *DmMet* (from six to nine, *versus* one in *DmMet*). Furthermore, several introns in each mosquito gene are positionally conserved with those in *Dmgce*. This led to our proposal that the *Met* gene of higher Diptera originated via retrotransposition of a basal, *gce-*like gene of lower Diptera (Wang *et al*., 2007).

Retrotransposition, or retroposition is a mechanism of gene duplication that proceeds through an mRNA intermediate. Following post-transcriptional splicing, the parental message is reintegrated into the genome. Ultimately, for the duplicate copy to escape the fate of becoming a pseudogene, it must reintegrate with associated regulatory elements intact or incorporate into a suitable transcriptional environment elsewhere in the genome. Following duplication, the increase in copy number of the parental gene affords a relaxation of selective constraint, facilitating functional divergence. This may manifest as subfunctionalization, in which a modification of the parental function evolves, or neofunctionalization, which refers to attainment of a novel function (MacCarthy & Bergman, 2007). *DmMet* retains a strong diagnostic feature of retroposition: a paucity of introns relative to *gce*, which is consistent with splicing and genomic reintegration of an ancestral *gce-*like transcript.

A conserved *gce*-like gene appears to be conserved across holometabolan genomes, including the red flour beetle, *Tribolium castaneum*, and the honeybee, *Apis mellifera*. An independent gene duplication within the Lepidoptera has given rise to two *Met*-like proteins, presently called Methoprene tolerant proteins I and II, whose functions are currently under investigation (i.e. Li *et al*., 2010). Despite a demonstrated sequence conservation favoring the *Met*-like genes of more primitive Holometabola as ancestral to *gce*, we will continue to refer to these genes as *Met*-like in this text.

## **3.1** *Met* **and** *gce* **within the genus** *Drosophila*

When the genomes of 12 representative *Drosophila* species became available (Ashburner, 2007), we chose to examine the molecular evolution of *Met* and *gce* within this genus of flies. Both paralogs are conserved in each species, indicating that the origin of *Met* predates that

reduced oogenesis (~20% compared to *Met+*), consistent with a role for JH in this physiology. However, since absence of a JH receptor is expected to preclude normal development, the viability of *Met27* flies challenged the notion of *Met* as a *bona fide* JH receptor. Some evidence supports alternative mechanism(s) of JH signaling (see Flatt *et al.,* 2008; Riddiford *et.al.*, 2010). In this chapter, we review data that support the notion of *germ cell expressed* (*gce*), the paralog of *Met* in higher Diptera, as conferring at least partial

Reports of methoprene resistance in mosquito populations (Dame *et al*., 1998; Cornel *et al*., 2000; Cornel *et al*., 2002) led us to investigate the *Met* orthologs of three mosquito species: *Aedes aegypti*, *Culex pipiens*, and *Anopheles gambiae*. Using a combination of degenerate RT-PCR and genomic database mining, we isolated a single *Met* homolog from each of these mosquitoes. Sequence analysis of these genes showed that they share high identity with both *Met* and the closely related *gce* from *Drosophila*, as expected. However, a comparison of the genomic structures among *DmMet*, *Dmgce*, and the three putative mosquito *Met* genes revealed higher structural conservation between each mosquito *Met* and *Dmgce*. Importantly, the intron number of these genes is more consistent with that of *Dmgce* than *DmMet* (from six to nine, *versus* one in *DmMet*). Furthermore, several introns in each mosquito gene are positionally conserved with those in *Dmgce*. This led to our proposal that the *Met* gene of higher Diptera originated via retrotransposition of a basal, *gce-*like gene of

Retrotransposition, or retroposition is a mechanism of gene duplication that proceeds through an mRNA intermediate. Following post-transcriptional splicing, the parental message is reintegrated into the genome. Ultimately, for the duplicate copy to escape the fate of becoming a pseudogene, it must reintegrate with associated regulatory elements intact or incorporate into a suitable transcriptional environment elsewhere in the genome. Following duplication, the increase in copy number of the parental gene affords a relaxation of selective constraint, facilitating functional divergence. This may manifest as subfunctionalization, in which a modification of the parental function evolves, or neofunctionalization, which refers to attainment of a novel function (MacCarthy & Bergman, 2007). *DmMet* retains a strong diagnostic feature of retroposition: a paucity of introns relative to *gce*, which is consistent with splicing and genomic reintegration of an ancestral

A conserved *gce*-like gene appears to be conserved across holometabolan genomes, including the red flour beetle, *Tribolium castaneum*, and the honeybee, *Apis mellifera*. An independent gene duplication within the Lepidoptera has given rise to two *Met*-like proteins, presently called Methoprene tolerant proteins I and II, whose functions are currently under investigation (i.e. Li *et al*., 2010). Despite a demonstrated sequence conservation favoring the *Met*-like genes of more primitive Holometabola as ancestral to *gce*,

When the genomes of 12 representative *Drosophila* species became available (Ashburner, 2007), we chose to examine the molecular evolution of *Met* and *gce* within this genus of flies. Both paralogs are conserved in each species, indicating that the origin of *Met* predates that

we will continue to refer to these genes as *Met*-like in this text.

**3.1** *Met* **and** *gce* **within the genus** *Drosophila*

functional redundancy.

lower Diptera (Wang *et al*., 2007).

*gce-*like transcript.

**3.** *Met* **homologs across holometabola** 

of the genus *Drosophila,* some 63 million years ago (Tamura *et al*., 2004). The architecture of these genes is generally conserved in each species, with a few notable exceptions. A single conserved intron is present in *Met* in the PAS B domain of 11 species. In addition to this conserved intron, independent intron gains have occurred in the lineages leading to *D. simulans* and *D. willistoni*. A single *Met* ortholog exists in each *Drosophila* genome examined, but *D. persimilis* harbors two separate, consecutive loci on the X chromosome, currently called GL13106 and GL13107, that align to distinct regions of *DmMet*. The 5' putative gene GL13106 contains a complete PAS A domain followed by a severely truncated PAS B domain. We performed RT-PCR across these two genes and failed to obtain a single PCR product, suggesting that GL13106 and GL13107 indeed code for two distinct open reading frames. Eleven of the 12 representative *gce* orthologs contain at least six conserved introns, with independent intron gains evident in the lineages leading to *D. melanogaster*, *D. pseudoobscura*, and *D. mojavensis*, whereas a substantial deletion in *D. persimilis gce* has eliminated the central portion of this gene, including the PAS repeats.

In addition to the bHLH, PAS, and PAC domains, putative transactivation domains (TAD) are evident in *Met* and *gce* orthologs. TADs are glutamine and/or aspartic acid-rich motifs whose amino acid sequences are broadly defined and generally reside in the C-terminal region of PAS proteins (Ramadoss & Perdew, 2005). *Met* homologs show Q- and D-rich motifs between the PAS B and PAC domains, while alignments of *gce* homologs indicate a D-rich region C-terminal to the PAC domain. Miura *et al*. (2005) suggest the presence of a Cterminal TAD in recombinant MET protein, but this region has yet to be functionally defined.

Using *DmMet* and *Dmgce* as query sequences, we conducted homology searches under tBLASTx criteria (translated nucleotide query to search a translated nucleotide database) against the publicly available EST library of *Glossina morsitans*, the tsetse fly. Our search recovered several clones, which were imported into the Sequencher program to produce two independent contigs. These composite nucleotide sequences were used to infer a gene tree with other holometabolan *Met* and *gce* orthologs, including those of two representative *Drosophila* species (Figure 2). This preliminary analysis reveals the presence of distinct *Met* and *gce* orthologs in the *G. morsitans* genome, indicating that the origin of *Met* predates the divergence of the Aschiza and Schizophora. These two taxonomic groups, which are estimated to have diverged more than 85 million years ago (Bertone & Wiegmann, 2009), reside within the brachyceran infraorder Muscomorpha.

### **3.2 Evidence for differential selective constraint imposed on** *Met* **and** *gce*

Based on an *a priori* hypothesis that *Met* and *gce* were subject to differential post-duplication selective constraint, we performed analyses of nonsynonymous-to-synonymous (dN/dS) substitution ratios on codon alignments of these *Drosophila* paralogs. Datasets were analyzed using the DataMonkey tool (Kosakovsky-Pond & Frost, 2005), a web-based implementation of the HyPhy package (Kosakovsky Pond *et al*., 2005). dN/dS analyses can be used to infer the relative selective pressure along entire coding sequences or in a site-specific manner. A substantially depressed dN/dS ratio (i.e. zero or close to zero) implies purifying (negative) selection. That is, nonsynonymous changes are stringently selected against. In contrast, when dN/dS is nearly one, neutral evolution is inferred. A dN/dS value far in excess of one implies positive selection, or adaptive evolution. In this case, nonsynonymous substitutions confer a selective advantage.

Molecular Evolution of Juvenile Hormone Signaling 339

A functional characterization of *gce*, named for its expression in a subset of embryonic germ cells (Moore *et al.,* 2000), is in its infancy. Column pulldown assays showed MET, in addition to forming homodimers, forms heterodimers with GCE, and addition of JH or either of two JHAs significantly impaired these interactions (Godlewski *et al*., 2006). It is unknown whether GCE forms homodimers, like MET, or whether GCE can bind JH or its analogs. *GAL4*/*UAS*-driven (Brand & Perrimon, 1993) overexpression of *Met+* from *actin* or *tubulin* promoters results in larval lethality in the absence of methoprene (Barry *et al.,* 2008), perhaps by upsetting the stoichiometry of MET and GCE dimers, favoring MET homodimerization at inappropriate times or in inappropriate tissues. Recently, JH was shown to inhibit MET and GCE in *D. melanogaster* by preventing caspase-driven programmed cell death (PCD) and histolysis of the larval fat body. DRONC and DRICE, evolutionarily conserved caspase genes involved in this physiology at the onset of metamorphosis, were shown to be downregulated in *Met* and *gce* deficient flies (Liu *et al.,* 2009). Similarly, methoprene interferes with caspase-driven midgut remodeling in *A. aegypti* (Nishiura *et al*., 2003; Wu *et al*., 2006) and *T. castaneum* (Parthasarathy *et al*., 2008; Parthsarathy *et al*., 2009), showing that this mechanism of JH action is evolutionarily conserved. It is noteworthy that recombinant MET can repress reporter gene expression in the absence of JH (presumably, MET forms homodimers in this system; Miura *et al.,* 2005); transcriptional repression has previously been reported in other PAS proteins (Dolwick *et al.,* 1993). Therefore, the JH-dependent, stage-specific formation of alternative MET/GCE dimers may have unique regulatory

To evaluate the notion that *gce* might confer viability to *Met* null flies, we manipulated *gce*  expression using a binary *UAS/GAL4* system to drive either a *gce* cDNA or an RNAi construct designed to target *gce* transcript. We carried these experiments out in a variety of genotypic contexts in order to examine the effect of *gce* transcript abundance on several methoprene conditional and non-conditional phenotypes (Baumann *et al*., 2010b). First, we explored the effect of *gce* over- and under-expression on a *Met*-specific non-conditional phenotype that manifests as a variable number of grossly malformed posterior facets of the compound eye (Figure 3). This phenotype is visible in *Met27* and *Metw3* flies, and is enhanced in the latter genotype. In our experiments, we found that *gce* overexpression in a *Metw3* genetic background can rescue the *Met*-specific eye phenotype, suggesting functional overlap of *gce* and *Met*. Notably, when *gce* was overexpressed in a *Met27* background from the *GawB}dan[AC116]* promoter, targeting transgene expression to the compound eye, the

The *Met27* phenotype mimics a set of defects resulting from genetic ablation of the JHproducing corpus allatum (CAX), including a heterochronic shift in EcR-B1 expression in the optic lobe (Riddiford *et.al.*, 2010). Exogenous JH application rescues the entire suite of defects in CAX prepupae, while JH provision to *Met27* flies rescues only a subset of these defects, suggesting an alternate mechanism of JH signal transduction (Riddiford *et al*., 2010). Based on our findings that *gce* can substitute for *Met* in the compound eye, further study of GCE involvement in eye development may provide a link between these phenomena. For instance, GCE may partially substitute for MET as a ligand binder to mediate JH signaling

**4. Toward a functional definition of** *Dmgce*

consequences on distinct suites of target genes.

eye phenotype was completely rescued (Baumann *et al*., 2010b).

**4.1** *Dmgce* **substitution for** *DmMet*

when this hormone is supplied in excess.

The results of our dN/dS analyses showed dramatic dissimilarity in the relative selective pressures that have shaped the coding sequences of *Met* and *gce*. In the case of *Met*, dN/dS was generally suppressed along the entirety of the coding sequence, indicating strong selection against nonsynonymous codon substitution. This is perhaps surprising, since MET deficiency has no effect on viability (Wilson & Ashok, 1998). Possibly, mutations that alter amino acid identities are selected against in *Met* due its involvement in reproduction. In the absence of methoprene selection, *Met* mutants are quickly out-competed by wild type flies despite the seemingly slight fitness cost of *Met* loss (Minkhoff III & Wilson, 1992). In contrast, dN/dS values close to one dominate the C-terminal half of *gce*, indicating a substantial relaxation of selective constraint in this region. The N-terminal region of this gene, containing the canonical bHLH and PAS functional domains, shows a strongly depressed dN/dS. Based on functional data from other PAS proteins, this region is assumed to harbor DNA and ligand binding activity, whereas the C-terminal region contains putative TADs. C-terminal degeneracy was shown to confer differential target gene specificity between the *Ahr* homologs of mice and humans (Ramadoss & Perdew, 2005; Flaveny *et al.,* 2010). Similarly, the disparate selective constraints evident in the C-terminal regions of *Met*  and *gce* may partially define these genes' functions.

Fig. 2. A gene tree of some holometabolous *Met*-like genes, showing placement of two distinct *G. morsitans* sequences as putative *Met* and *gce* orthologs. *D. melanogaster Tango* (*tgo*), the homolog of the vertebrate *Aryl hydrocarbon receptor* (*Ahr*), is used as an outgroup sequence.

The results of our dN/dS analyses showed dramatic dissimilarity in the relative selective pressures that have shaped the coding sequences of *Met* and *gce*. In the case of *Met*, dN/dS was generally suppressed along the entirety of the coding sequence, indicating strong selection against nonsynonymous codon substitution. This is perhaps surprising, since MET deficiency has no effect on viability (Wilson & Ashok, 1998). Possibly, mutations that alter amino acid identities are selected against in *Met* due its involvement in reproduction. In the absence of methoprene selection, *Met* mutants are quickly out-competed by wild type flies despite the seemingly slight fitness cost of *Met* loss (Minkhoff III & Wilson, 1992). In contrast, dN/dS values close to one dominate the C-terminal half of *gce*, indicating a substantial relaxation of selective constraint in this region. The N-terminal region of this gene, containing the canonical bHLH and PAS functional domains, shows a strongly depressed dN/dS. Based on functional data from other PAS proteins, this region is assumed to harbor DNA and ligand binding activity, whereas the C-terminal region contains putative TADs. C-terminal degeneracy was shown to confer differential target gene specificity between the *Ahr* homologs of mice and humans (Ramadoss & Perdew, 2005; Flaveny *et al.,* 2010). Similarly, the disparate selective constraints evident in the C-terminal regions of *Met* 

Fig. 2. A gene tree of some holometabolous *Met*-like genes, showing placement of two distinct *G. morsitans* sequences as putative *Met* and *gce* orthologs. *D. melanogaster Tango* (*tgo*),

the homolog of the vertebrate *Aryl hydrocarbon receptor* (*Ahr*), is used as an outgroup

sequence.

and *gce* may partially define these genes' functions.
