**2.4 Characteristic features of the SERA multigene family**

Almost all SERA genes were aligned in a tandem cluster between the conserved hypothetical protein gene and the iron-sulfur assembly protein gene. According to genetic background, SERA genes can be categorized into Groups I to IV (Arisue et al., 2007, 2011). Characteristic features of each Group are summarized in Fig. 2. Group I to Group III SERA genes possess the protease motif that includes an active site cysteine residue, in contrast to Group IV SERA genes where the cysteine residue is replaced by a serine (Bourgon et al., 2005; Arisue et al., 2007, 2011). The mRNA transcription and/or protein expression of Group I SERA genes were observed in the mosquito vector, while those of Group II to Group IV SERA genes were observed in the vertebrate host (Ali & Matuschewski, 2005; Arisue et al., 2007, 2011; Putrianti et al., 2010). The difference in the gene repertoire among species is due to the number of Group IV SERA genes. SERA gene repertories are summarized in Table 3.


\*The gene region of PY02062 and PY00294 was re-annotated and used as *P. yoelii* SERA4.

Table 1. GeneID of SERA genes in the PlasmoDB (http://plasmodb.org/plasmo/).

The discovery of the SERA gene family in *P. falciparum* sparked the subsequent identification of a number of SERA genes in different *Plasmodium* species. Currently known SERA genes in the PlasmoDB database (http://plasmodb.org/plasmo/) are summarized in Table 1 and accession numbers of SERA genes found in the public database (NCBI,

We opted not to list SERA genes of *P. reichenowi* and *P. gallinaceum* in either Table, but for our analysis, their SERA gene sequences were assembled from various reads in the partial genome shotgun database of *Plasmodium* at The Sanger Institute. Blast programs in the following web sites were used to search SERA coding reads for: *P. reichenowi*: http://www.sanger.ac.uk/cgi-bin/blast/submitblast/p\_reichenowi; and *P. gallinaceum*: http://www.sanger.ac.uk/cgi-bin/blast/submitblast/p\_gallinaceum). Identified SERA

Almost all SERA genes were aligned in a tandem cluster between the conserved hypothetical protein gene and the iron-sulfur assembly protein gene. According to genetic background, SERA genes can be categorized into Groups I to IV (Arisue et al., 2007, 2011). Characteristic features of each Group are summarized in Fig. 2. Group I to Group III SERA genes possess the protease motif that includes an active site cysteine residue, in contrast to Group IV SERA genes where the cysteine residue is replaced by a serine (Bourgon et al., 2005; Arisue et al., 2007, 2011). The mRNA transcription and/or protein expression of Group I SERA genes were observed in the mosquito vector, while those of Group II to Group IV SERA genes were observed in the vertebrate host (Ali & Matuschewski, 2005; Arisue et al., 2007, 2011; Putrianti et al., 2010). The difference in the gene repertoire among species is due to the number of Group IV SERA genes. SERA gene repertories are summarized in Table 3.

Species *P. vivax P. falciparum P. knowlesi P. berghei P. yoelii P. chabaudi* (strain) (SalI) (3D7) (H) (ANKA) (17XNL) (AS) SERA1 PVX\_003850 PFB0360C PKH\_041200 PB000108.03.0 PY00291 PCAS\_030730 SERA2 PVX\_003845 PFB0355C PKH\_041210 PB107093.00.0 PY00292 PCAS\_030720 SERA3 PVX\_003840 PFB0350C PKH\_041230 PB000107.03.0 PY00292 PCAS\_030710 SERA4 PVX\_003835 PFB0345C PKH\_041250 PB000352.01.0 PY02062 PCAS\_030700

SERA5 PVX\_003830 PFB0340C PKH\_041260 PB000649.01.0 PY02063 PCAS\_030690

PY00294

\*The gene region of PY02062 and PY00294 was re-annotated and used as *P. yoelii* SERA4. Table 1. GeneID of SERA genes in the PlasmoDB (http://plasmodb.org/plasmo/).

**2.3 SERA genes in the database** 

http://www.ncbi.nlm.nih.gov/) are summarized in Table 2.

gene sequences could be referred to in Arisue et al. (2007).

**2.4 Characteristic features of the SERA multigene family** 

SERA6 PVX\_003825 PFB0335C PKH\_041270

SERA7 PVX\_003820 PFB0330C SERA8 PVX\_003810 PFB0325C SERA9 PVX\_003805 PFI0135C

SERA10 PVX\_003800 SERA11 PVX\_003795 SERA12 PVX\_003790


Table 2. Accession numbers of SERA genes in the public NCBI database.

Fig. 2. Organization of *Plasmodium* SERA multigene family and the characteristic features of each SERA group.

Because of limited information for *P. reichenowi*, *P. gonderi* and *P. vinckei*, gene numbers are tentative for these species. The total number of SERA pseudogenes, truncated gene and gene fragments in Group IV SERA gene region are still undetermined in primate parasites (Arisue et al., 2011). Except for *P. gallinaceum*, all parasite species have one each of Group I to III SERA genes. The number of Group IV SERA genes remarkably increased in the primate parasite lineage. The bird parasite, *P. gallinaceum,* has the least number of SERA genes: two from Group I, and one from a branched common ancestor of Group II to IV SERA gene.

Clues to Evolution of the SERA Multigene Family in the Genus *Plasmodium* 321

P50 in Fig. 1, are flanked by the reported SUB1 cleavage sites (Yeoh et al., 2007). The consensus sequence of the cleavage site is (Val/Leu/Ile)-Xaa-(Gly/Ala)-Paa, in which Xaa is any amino acid residue and Paa is any non-polar residue except for Leu (Yeoh et al., 2007). This consensus sequence is well conserved with slight modifications in all Group II to IV *Plasmodium* SERA genes which we have analyzed. *In vitro*, *P. falciparum* SERA4 (Group IV) and SERA6 (Group II) were cleaved by recombinant PfSUB1 (Yeoh et al., 2007). Peculiarly, Group I SERA genes lack most of the N-terminal variable domain 1 and SUB1 cleavage sites.

Fig. 3. Schematic representation of the SERA gene structure (A) and their putative domain

The categorization of SERA genes into four groups, Group I to IV, was based on phylogenetic analysis (Arisue et al., 2007, 2011). SERA amino acid sequences from 18 *Plasmodium* species were aligned using CLUSTAL W program (http://clustalw.ddbj.nig.ac.jp/top-j.html) under default options with manual corrections. Unambiguously aligned amino acid positions corresponding to the putative pro-enzyme domain, enzyme domain and cysteine rich conserved domain were selected and used for the phylogenetic analysis. Maximum likelihood tree was inferred using the PROML program in PHYLIP version 3.69 (Felsenstein, 1996). Except for the number of genes and number of amino acid sequences included in the analysis,

A simplified maximum likelihood tree inferred from 134 SERA genes with 392 amino acid positions is shown in Fig. 4. Bootstrap proportion values were placed only on the common ancestor branch of each group. The monophyletic grouping of Group I SERA genes was supported with a bootstrap value of 100%. The long internal branch separating Group I from Groups II to IV suggests that the root of the tree is located on the branch leading to the common ancestor of Group I SERA genes. It is thus likely that Group I genes have appeared early in the evolution of the SERA gene family. *P. gallinaceum* SERA1 branches at the common ancestor of Group II to IV, suggesting that gene duplication events which produced Groups II, III and IV likely occurred after the divergence of *P. gallinaceum* from the

the same method was used to infer the phylogenetic tree shown in Fig. 4 and 5.

organization (B).

**3. Phylogenetic relationships of SERA genes** 

common ancestral lineage of *Plasmodium*.


Table 3. The number of SERA genes that belong to each group from several *Plasmodium* species. 'Degenerate'(\*) denotes defective gene copies, *i.e.*, the total number of pseudogenes, truncated gene and gene fragments found.

### **2.5 Primary structure of SERA molecules and genes**

Schematic representation of SERA gene structure is shown in Fig. 3A. Group I SERA genes code for around 700 amino acids. They share a similar six exon and five intron structure, except for *P. falciparum* SERA8 and *P. vivax* SERA12 that both lack one intron. Group II to Group IV SERA genes code for about 1000 amino acid residues, and similar to Group I, share a common four exon/three intron structure with few exceptions. All SERA genes have the structural context of cysteine proteinases, however, it is important to note that the canonical Cys His Asn triad of the active proteinase is not present in all. The relatively small number of amino acid residues in Group I SERA resulted to a shorter N-terminal region when compared to Group II to IV SERA. Multiple amino acid sequence alignments revealed the consensus primary structure of SERA, which consists of six putative domains shown in Fig. 3B.

Amino acid sequences of the putative pro-enzyme and enzyme domains are remarkably conserved, but extensive sequence variations are found in variable domains 1 and 2. In the C-terminal cysteine rich conserved domain, seven cysteine residues are perfectly conserved in all SERA genes.

The pro-enzyme and enzyme domains of *P. falciparum* SERA5 was identified by functional genetic and structural analyses (Hodder et al., 2003, 2009). These domains, corresponding to

Species Natural host I II III IV Degenerated\* *P. falciparum* human 1116 0 *P. vivax* human 1119 2 *P. malariae* human 1117 3 *P. ovale* human 1114 1 *P. knowlwsi* human/macaque 1 1 1 3 2 *P. cynomolgi* macaque 1118 3 *P. coatneyi* macaque 1114 3 *P. fragile* macaque 1112 3 *P. simiovale* macaque 1116 3 *P. fieldi* macaque 1116 3 *P. inui* macaque 1114 5 *P. hylobati* gibbon 1114 1 *P. berghei* rat 1112 0 *P. yoelii* rat 1112 0 *P. chabaudi* rat 1112 0 *P. gallinaceum* bird 2 0 *P. gonderi* mangabey, guenon 1 1 1 6 ? *P. reichenowi* chimpanzee 1 1 1 5? ? *P. vinckei* rat ? 1 1 1? ?

Table 3. The number of SERA genes that belong to each group from several *Plasmodium* species. 'Degenerate'(\*) denotes defective gene copies, *i.e.*, the total number of pseudogenes,

Schematic representation of SERA gene structure is shown in Fig. 3A. Group I SERA genes code for around 700 amino acids. They share a similar six exon and five intron structure, except for *P. falciparum* SERA8 and *P. vivax* SERA12 that both lack one intron. Group II to Group IV SERA genes code for about 1000 amino acid residues, and similar to Group I, share a common four exon/three intron structure with few exceptions. All SERA genes have the structural context of cysteine proteinases, however, it is important to note that the canonical Cys His Asn triad of the active proteinase is not present in all. The relatively small number of amino acid residues in Group I SERA resulted to a shorter N-terminal region when compared to Group II to IV SERA. Multiple amino acid sequence alignments revealed the consensus primary structure of SERA, which consists of six putative domains shown in

Amino acid sequences of the putative pro-enzyme and enzyme domains are remarkably conserved, but extensive sequence variations are found in variable domains 1 and 2. In the C-terminal cysteine rich conserved domain, seven cysteine residues are perfectly conserved

The pro-enzyme and enzyme domains of *P. falciparum* SERA5 was identified by functional genetic and structural analyses (Hodder et al., 2003, 2009). These domains, corresponding to

truncated gene and gene fragments found.

Fig. 3B.

in all SERA genes.

**2.5 Primary structure of SERA molecules and genes** 

Number of SERA gene in each group

\*

1

P50 in Fig. 1, are flanked by the reported SUB1 cleavage sites (Yeoh et al., 2007). The consensus sequence of the cleavage site is (Val/Leu/Ile)-Xaa-(Gly/Ala)-Paa, in which Xaa is any amino acid residue and Paa is any non-polar residue except for Leu (Yeoh et al., 2007). This consensus sequence is well conserved with slight modifications in all Group II to IV *Plasmodium* SERA genes which we have analyzed. *In vitro*, *P. falciparum* SERA4 (Group IV) and SERA6 (Group II) were cleaved by recombinant PfSUB1 (Yeoh et al., 2007). Peculiarly, Group I SERA genes lack most of the N-terminal variable domain 1 and SUB1 cleavage sites.

Fig. 3. Schematic representation of the SERA gene structure (A) and their putative domain organization (B).
