**1. Introduction**

314 Gene Duplication

White, T., Miyasaki S. & Agabian, N. (1993). Three distinct secreted aspartyl proteinases in

Wolfe, K. (2006). Comparative genomics and genome evolution in yeasts. *Philos Trans R Soc* 

Wünschmann, J., Beck, A., Meyer, L., Letzel, T., Grill, E. & Lendzian, K. (2007)

Phytochelatins are synthesized by two vacuolar serine carboxypeptidases in

*Candida albicans*. *J Bacteriol.* Vol. (175): 6126-6133.

*Saccharomyces cerevisiae*. *FEBS Lett.* Vol. 17:1681-7.

*Lond B Biol Sci.* Vol. (361): 403-412.

Malaria, one of the most serious infectious diseases prevalent in the tropics, is caused by the genus *Plasmodium*. Despite considerable global efforts to control this parasitic disease at least 90% of deaths still occur in sub-Saharan Africa (WHO, 2010). The rising threat of drugresistant parasites, together with key interventions dependent on the use of a limited class of insecticides, underscore the fragility of malaria control. A better understanding of the parasite biology is required to gain insights for cost-effective tools or strategies including malaria vaccines and new antimalarial drugs that can be instrumental for sustained control, if not elimination, of malaria.

Malaria parasites comprise a diverse group of over 200 *Plasmodium* species that infect mammals, birds and reptiles (Levine, 1988). Each *Plasmodium* species exhibits a restricted host range, such that primate parasite cannot infect rodent, bird or reptile hosts. To find genomic factors that determine host range transcends the interest of malaria researchers. It is essential for control and conservation of wildlife. Recently, genome projects on several *Plasmodium* species from different hosts have been completed (Gardner et al., 2002; Carlton et al., 2002, 2008; Pain et al., 2008), with gene information available in the public database (http://plasmodb.org/plasmo/). By comparing the genomes from different species we obtain basic information at the molecular level on how *Plasmodium* has evolved and allows us to infer the function of genes and noncoding regions in the genome. One of the prominent features of *Plasmodium* genomes is the presence of various unique multigene families. Multigene families are a group of related genes that are presumed to share a common ancestor and are derived from each other by duplication and subsequent divergence. One such example, the largest family identified so far in human, primate and rodent malaria, is the *Plasmodium* interspersed repeat, pir (Janssen et al., 2004). The pir gene family members are highly species-specific, suggesting evolution of lineage-specific immune evasion mechanisms. *P. falciparum* var gene family is by far the best documented multigene family of the most virulent human malaria parasite. Products of var genes appear on the surface of infected erythrocytes and are involved in antigenic variation to evade host immunity. Other species-specific gene families encode proteins involved in host cell invasion, e.g. rhoptry proteins and parasite surface antigens, merozoite surface protein-3 and -7. There are also examples of families with few gene members. In sharp contrast to several hundreds of tandem arrayed rRNA gene family members in other eukaryotes,

Clues to Evolution of the SERA Multigene Family in the Genus *Plasmodium* 317

the gene arrangement order in *P. falciparum* (Aoki et al., 2002; Miller et al., 2002). It is interesting to note that SERA5 is the only member with repeated serine residues among nine gene members. The characteristic of the family is not the richness in serine residues but motifs that generate the framework of a cysteine protease. SERA homologs were identified in other *Plasmodium* species. Kiefer et al. (1996) found five SERA genes from another human parasite, *P. vivax* and Gor et al. (1998) identified three SERA genes from the rodent parasite of *P. vinckei*. Completed or ongoing genome projects of eight *Plasmodium* species: two human malaria parasites, *P. falciparum* and *P. vivax*; chimpanzee parasite *P. reichenowi*; macaque parasite *P. knowlesi*; three rodent parasites *P. berghei*, *P. yoelii* and *P. chabaudi*; and avian parasite *P. gallinaceum*, confirmed the gene organization and allowed phylogenetic analysis of the SERA gene family (Burgon et al., 2005; Arisue et al., 2007). In addition, Arisue et al. (2011) newly identified SERA genes in 11 *Plasmodium* species that further elaborate the

The *in vitro* observation that *P. falciparum* SERA5 was released into the culture supernatant at the time of schizont rupture/melozoite release corresponds to its specific processing into several polypeptides. The full-length 120 kDa precursor accumulates in the parasitophorous vacuole during late trophozoite and schizont stages. As shown in Fig. 1., during the course of schizont rupture/merozoite release, SERA is proteolytically processed into a 47 kDa Nterminal (P47), a 50 kDa central (P50), an 18 kDa C-terminal (P18) and a 6 kDa domain (Delplace et al., 1987, 1988; Debrabant et al., 1992; Li et al., 2002a). The N-terminal P47 fragment is further processed into two 25 kDa fragments (P25n and P25c) in some allelic types (Li et al., 2002a). P47 is linked with the C-terminal P18 via disulfide bond that is localized at the merozoite surface that is localized at the merozoite surface (Delplace et al., 1987; Li et al., 2002a; Okitsu et al., 2007). The proteolytic processing is mediated by subtilisin-like serine protease subtilase 1 or SUB1 (Yeoh et al., 2007). Inhibition of SERA maturation blocks parasite egress from the host erythrocyte (Li et al., 2002b; Yeoh et al.,

Fig. 1. Processing of *P. falciparum* SERA5 during parasite egress from host erythrocyte.

genome organization of the gene family.

**2.2 Processing of** *P. falciparum* **SERA5** 

2007).

*Plasmodium* has 4-7 gene units physically separated in the genome (Nei & Rooney, 2005; Carlton et al., 2008). Thus, *Plasmodium* possesses unique multigene families with distinctive evolutionary conundrums.

For more than 10 years after the first description of a gene family member, the existence of the *Plasmodium* serine repeat antigen (SERA) multigene family has been overlooked. Serine repeat antigen family proteins share homology with the papain family of cysteine proteases (Kiefer et al., 1996; Gor et al., 1998; Bourgon et al., 2004; Arisue et al., 2007, 2011). Almost all SERA genes are clustered in a head-to-tail manner and the number of SERA genes in the clustered region varies among parasite species (Bourgon et al., 2004; McCoubrie et al., 2007; Arisue et al., 2007, 2011). This leads us to infer that gene duplication occurred repeatedly during evolution. Some SERA genes were confirmed to play essential role(s) in the parasite life cycle (Miller et al., 2002; Aly & Matuschewski, 2005; McCoubrie et al., 2007). In addition, a gene family member in *P. falciparum*, SERA5, is a vaccine candidate now on phase Ib clinical trial (Horii et al., 2010). Two observations promise SERA5 as a vaccine candidate: (1) SERA genes are not differently expressed like other antigen encoding gene families such as var and rifin that show antigenic variation to evade host immune response (Aoki et al., 2002; Miller et al., 2002; Palacpac et al., 2006; Schmidt- Christensen et al., 2008; Putrianti et al., 2010; Arisue et al., 2011); and (2) *P. falciparum* SERA5 is less polymorphic (Fox et al., 1997: Morimatsu et al., 1997; Liu et al., 2000) than other vaccine candidate genes such as merozoite surface protein 1 (McBride et al., 1985) and apical membrane protein 1 (Polley et al., 2003; Cortés et al., 2003). These characteristics are indeed appealing and show the unique biological features of *Plasmodium* SERA. Here, we summarize current reports and our recent findings to understand the evolution of the SERA gene family.

### **2. SERA gene repertories in** *Plasmodium* **species**

We refer briefly to the research history of the SERA multigene family: (i) the identification of SERA5 and the proteolytic processing of the protein; (ii) the discovery of the gene family following chromosome 2 sequencing of the *P. falciparum* genome; (iii) currently known SERA genes from different species; and (iv) the resulting analyses of the multigene family in various malaria parasites.

### **2.1 Research history of the SERA multigene family**

SERA was first found in *P. falciparum* as an abundant, exported, soluble late-trophozite to schizont stage protein (Perrin et al., 1984). The protein, independently isolated by different groups, was described under various names as Pf140 (Perrin et al., 1984), p113 (Chulay et al., 1987), p126 (Deplace et al., 1987) or SERP (Knapp et at., 1989). All identified a gene with a long stretch of repeated serine residues in the N-terminal domain to which the family owes its name (Bzik et al., 1988). At the central domain, SERA possess a motif which align with two active site-determining regions of cysteine proteinases. The secreted protein was described to accumulate in the parasitophorous vacuole, and released into the culture medium at the time of schizont rupture. Notably, before the sequence of *P. falciparum* Chromosome 2 was opened, Knapp et al. (1991) discovered a SERA homolog (*serp H*) lacking the characteristic serine homopolymer, and subsequently Fox and Bzik (1994) reported SERA as one of three consecutive series of homologous genes. The complete genome sequence of *P. falciparum* revealed that SERA belongs to a multigene family (Gardner et al., 1998). The originally described SERA gene was renamed SERA5 according to the gene arrangement order in *P. falciparum* (Aoki et al., 2002; Miller et al., 2002). It is interesting to note that SERA5 is the only member with repeated serine residues among nine gene members. The characteristic of the family is not the richness in serine residues but motifs that generate the framework of a cysteine protease. SERA homologs were identified in other *Plasmodium* species. Kiefer et al. (1996) found five SERA genes from another human parasite, *P. vivax* and Gor et al. (1998) identified three SERA genes from the rodent parasite of *P. vinckei*. Completed or ongoing genome projects of eight *Plasmodium* species: two human malaria parasites, *P. falciparum* and *P. vivax*; chimpanzee parasite *P. reichenowi*; macaque parasite *P. knowlesi*; three rodent parasites *P. berghei*, *P. yoelii* and *P. chabaudi*; and avian parasite *P. gallinaceum*, confirmed the gene organization and allowed phylogenetic analysis of the SERA gene family (Burgon et al., 2005; Arisue et al., 2007). In addition, Arisue et al. (2011) newly identified SERA genes in 11 *Plasmodium* species that further elaborate the genome organization of the gene family.

### **2.2 Processing of** *P. falciparum* **SERA5**

316 Gene Duplication

*Plasmodium* has 4-7 gene units physically separated in the genome (Nei & Rooney, 2005; Carlton et al., 2008). Thus, *Plasmodium* possesses unique multigene families with distinctive

For more than 10 years after the first description of a gene family member, the existence of the *Plasmodium* serine repeat antigen (SERA) multigene family has been overlooked. Serine repeat antigen family proteins share homology with the papain family of cysteine proteases (Kiefer et al., 1996; Gor et al., 1998; Bourgon et al., 2004; Arisue et al., 2007, 2011). Almost all SERA genes are clustered in a head-to-tail manner and the number of SERA genes in the clustered region varies among parasite species (Bourgon et al., 2004; McCoubrie et al., 2007; Arisue et al., 2007, 2011). This leads us to infer that gene duplication occurred repeatedly during evolution. Some SERA genes were confirmed to play essential role(s) in the parasite life cycle (Miller et al., 2002; Aly & Matuschewski, 2005; McCoubrie et al., 2007). In addition, a gene family member in *P. falciparum*, SERA5, is a vaccine candidate now on phase Ib clinical trial (Horii et al., 2010). Two observations promise SERA5 as a vaccine candidate: (1) SERA genes are not differently expressed like other antigen encoding gene families such as var and rifin that show antigenic variation to evade host immune response (Aoki et al., 2002; Miller et al., 2002; Palacpac et al., 2006; Schmidt- Christensen et al., 2008; Putrianti et al., 2010; Arisue et al., 2011); and (2) *P. falciparum* SERA5 is less polymorphic (Fox et al., 1997: Morimatsu et al., 1997; Liu et al., 2000) than other vaccine candidate genes such as merozoite surface protein 1 (McBride et al., 1985) and apical membrane protein 1 (Polley et al., 2003; Cortés et al., 2003). These characteristics are indeed appealing and show the unique biological features of *Plasmodium* SERA. Here, we summarize current reports and our recent

We refer briefly to the research history of the SERA multigene family: (i) the identification of SERA5 and the proteolytic processing of the protein; (ii) the discovery of the gene family following chromosome 2 sequencing of the *P. falciparum* genome; (iii) currently known SERA genes from different species; and (iv) the resulting analyses of the multigene family in

SERA was first found in *P. falciparum* as an abundant, exported, soluble late-trophozite to schizont stage protein (Perrin et al., 1984). The protein, independently isolated by different groups, was described under various names as Pf140 (Perrin et al., 1984), p113 (Chulay et al., 1987), p126 (Deplace et al., 1987) or SERP (Knapp et at., 1989). All identified a gene with a long stretch of repeated serine residues in the N-terminal domain to which the family owes its name (Bzik et al., 1988). At the central domain, SERA possess a motif which align with two active site-determining regions of cysteine proteinases. The secreted protein was described to accumulate in the parasitophorous vacuole, and released into the culture medium at the time of schizont rupture. Notably, before the sequence of *P. falciparum* Chromosome 2 was opened, Knapp et al. (1991) discovered a SERA homolog (*serp H*) lacking the characteristic serine homopolymer, and subsequently Fox and Bzik (1994) reported SERA as one of three consecutive series of homologous genes. The complete genome sequence of *P. falciparum* revealed that SERA belongs to a multigene family (Gardner et al., 1998). The originally described SERA gene was renamed SERA5 according to

findings to understand the evolution of the SERA gene family.

**2. SERA gene repertories in** *Plasmodium* **species** 

**2.1 Research history of the SERA multigene family** 

evolutionary conundrums.

various malaria parasites.

The *in vitro* observation that *P. falciparum* SERA5 was released into the culture supernatant at the time of schizont rupture/melozoite release corresponds to its specific processing into several polypeptides. The full-length 120 kDa precursor accumulates in the parasitophorous vacuole during late trophozoite and schizont stages. As shown in Fig. 1., during the course of schizont rupture/merozoite release, SERA is proteolytically processed into a 47 kDa Nterminal (P47), a 50 kDa central (P50), an 18 kDa C-terminal (P18) and a 6 kDa domain (Delplace et al., 1987, 1988; Debrabant et al., 1992; Li et al., 2002a). The N-terminal P47 fragment is further processed into two 25 kDa fragments (P25n and P25c) in some allelic types (Li et al., 2002a). P47 is linked with the C-terminal P18 via disulfide bond that is localized at the merozoite surface that is localized at the merozoite surface (Delplace et al., 1987; Li et al., 2002a; Okitsu et al., 2007). The proteolytic processing is mediated by subtilisin-like serine protease subtilase 1 or SUB1 (Yeoh et al., 2007). Inhibition of SERA maturation blocks parasite egress from the host erythrocyte (Li et al., 2002b; Yeoh et al., 2007).

Fig. 1. Processing of *P. falciparum* SERA5 during parasite egress from host erythrocyte.

Clues to Evolution of the SERA Multigene Family in the Genus *Plasmodium* 319

Species (strain) Accession No. Gene Reference

*P. vinckei vinckei* U59861 SERAvin-2 *P. vinckei vinckei* U59862 SERAvin-3

*P. ovale* (Nigeria II) AB576871 SERA1-SERA7 *P. cynomolgi* (Mulligan) AB576872 SERA1-SERA11 *P. fieldi* (N-3) AB576873 SERA1-SERA9 *P. simiovale* AB576874 SERA1-SERA9 *P. inui* (Celebes) AB576875 SERA1-SERA7 *P. hylobati* (WAK) AB576876 SERA1-SERA7 *P. coatneyi* (CDC) AB576877 SERA1-SERA7 *P. knowlesi* (ATCC30158) AB576878 SERA1-SERA6 *P. fragile* (Hackeri) AB576879 SERA1-SERA5 *P. gonderi* AB576880 SERA1-SERA4 *P. gonderi* AB576881 SERA5-SERA9

Table 2. Accession numbers of SERA genes in the public NCBI database.

each SERA group.

**C**onserved hypothetical protein gene

SERA gene.

Fig. 2. Organization of *Plasmodium* SERA multigene family and the characteristic features of

Because of limited information for *P. reichenowi*, *P. gonderi* and *P. vinckei*, gene numbers are tentative for these species. The total number of SERA pseudogenes, truncated gene and gene fragments in Group IV SERA gene region are still undetermined in primate parasites (Arisue et al., 2011). Except for *P. gallinaceum*, all parasite species have one each of Group I to III SERA genes. The number of Group IV SERA genes remarkably increased in the primate parasite lineage. The bird parasite, *P. gallinaceum,* has the least number of SERA genes: two from Group I, and one from a branched common ancestor of Group II to IV

*P. vivax* (SalI) U51723 V\_SERA1-V\_SERA5 Kiefer et al., 1996 *P. vinckei vinckei* U59860 SERAvin-1 Gor et al., 1998

*P. malariae* (Kisii67) AB576870 SERA1-SERA10 Arisue et al., 2011

## **2.3 SERA genes in the database**

The discovery of the SERA gene family in *P. falciparum* sparked the subsequent identification of a number of SERA genes in different *Plasmodium* species. Currently known SERA genes in the PlasmoDB database (http://plasmodb.org/plasmo/) are summarized in Table 1 and accession numbers of SERA genes found in the public database (NCBI, http://www.ncbi.nlm.nih.gov/) are summarized in Table 2.

We opted not to list SERA genes of *P. reichenowi* and *P. gallinaceum* in either Table, but for our analysis, their SERA gene sequences were assembled from various reads in the partial genome shotgun database of *Plasmodium* at The Sanger Institute. Blast programs in the following web sites were used to search SERA coding reads for: *P. reichenowi*: http://www.sanger.ac.uk/cgi-bin/blast/submitblast/p\_reichenowi; and *P. gallinaceum*: http://www.sanger.ac.uk/cgi-bin/blast/submitblast/p\_gallinaceum). Identified SERA gene sequences could be referred to in Arisue et al. (2007).
