**2.3** *YPS genes* **in clinically relevant** *Candida* **species**

The genome sequence projects of *Candida* species allows for the exploration of whether *YPS* genes are harboured in these opportunistic pathogen yeasts. *C. dubliniensis* sequences were obtained from the Sanger Institute Microorganisms Sequencing Group (http://www.sanger.ac.uk/sequencing/Candida/dubliniensis/). Sequences from *C. guilliermondii*, *C. lusitaniae*, *C. tropicalis* and *C. parapsilosis* were obtained from (http://www.broad.mit.edu/annotation/genome/candida\_group/MultiHome.html). The GenBank database (http://www.ncbi.nlm.nih.gov) was also used. The detection was made by using the previous *YPS* and *SAP* genes detected in *S. cerevisiae* (http://www.yeastgenome.org), *C. glabrata* (http://cbi.labri.fr/Genolevures/elt/CAGL) and *C. albicans* (http://www.candidagenome.org) genomes, and the proteins detected by BLAST analysis in NCBI. Also, the different patterns of motif that could be obtained were used as a new query. In *C. lusitaniae* and *C. guilliermondii* only one *YPS* was detected. Meanwhile in *C. dubliniensis* and *C. albicans* four *YPS* genes were detected, in *C. tropicalis* two, and in *C. parapsilosis* six. Theoretical isoelectric point, molecular weight and amino acid content were calculated using Antheprot 2000 version 5.2 (Table 6).

Prediction of motif sequences was performed with PROSITE (http://www.expasy.org) (Falquet et al. 2002). Some of the proteins possess a typical molecular structure of aspartyl proteases, but others have some differences in composition (Fig. 4; Table 6). Some of them possess high Ser/Thr content in the amino terminal, suggesting that this zone is exposed at the surface of the protein. The presence of Ser/Thr in the carboxyl terminal in almost all *YPS*

Evolution of GPI-Aspartyl Proteinases (Yapsines) of *Candida* spp 299

**(kDa) IP MOTIF Signal peptide** 

66-77: V**Q**LDTGSSDLWF 304-315: ALLDTGTTYTYM 60-70: LECT

65-76: V**Q**LDTGSADLWF 301-312: ALIDSGTTISEF 62-68: LECT

64-75: L**G**L**GLAQ**PYVWV 302-313: VLLD**PSF**ALSYL

65-76: V**Q**LDTGSSDLWF 304-315: ALLDSGTTLTVV

65-76: L**Q**IDTGSSDLFV 300-311: **T**LLDSGSTISLL

> 61-72: A**Q**LDTGSSDLWF 298-309: ALFDSGTSYSYV

> 63-74: LLVDTGSSDFWV 310-321: ALLDTGSTDTHL

> 68-79: LVLDTGSSDLWV 279-290: ALLDTGSTLI**E**L 448-495: SER

73-84: LAADTGS**W**LIQI 245-256: **Y**TIDTG**GR**Y**G**FL

> 159-170: L**R**LD**LIQ**PEIWV 406-417: VLLDS**R**ASNFYL 565-662: SER

**(aa) <sup>C</sup>**

15 E

15 E

18 A

15 E

16 E

13 E

29 E

19 J

17 2

20 7

**Yps (AN)** 

CgYps5

CgYps6

CgYps7

CgYps8

CgYps9

CgYps10

CgYps11

CgYps12

**Amino acid residues**

CAGL0E01771g 519 57.2 5.5

CAGL0E01793g 516 55.9 4.6

CAGL0A02431g 587 63.4 4.7

CAGL0E01815g 519 56.7 6.8

CAGL0E01837g 521 56.9 5.1

CAGL0E01859g 505 55.3 7.3

CAGL0E01881g 508 55.6 5.0

CAGL0J02288g 541 59.5 4.6

orf19\_852 365 39.6 5.4

orf19.6481 702 75.9 4.4

**MM**

is postulated to be heavily O-glycosylated. The exact function of this Ser/Thr-rich domain in yapsins has not been investigated. However, O-mannosylation is important for proper cellwall biogenesis and integrity. It has also been proposed that clustered O-glycans create rigid stalks that keep protein domains away from membranes or wall surfaces (Lipke & Ovalle, 1998).


is postulated to be heavily O-glycosylated. The exact function of this Ser/Thr-rich domain in yapsins has not been investigated. However, O-mannosylation is important for proper cellwall biogenesis and integrity. It has also been proposed that clustered O-glycans create rigid stalks that keep protein domains away from membranes or wall surfaces (Lipke & Ovalle,

**(kDa) IP MOTIF Signal peptide** 

98-109: VLVDTGSSDLWI 368-379: ALLDSGTTLTYL

96-107: VLVDTGSSDLWV 356-368: VLLDSGTTISYM 496-570: SER

78-89: VLLDTGSADLWV 285-296: ALLDSGTTLTYL 439-470: THR

82-93: LQLDTGSSDMIV 321-32: VMLDSGTTFSYL

> 71-82: LLVD**VIIQ**PYINL 318-329: ALLDS**T**SSVSYL

60-71: VLFDTGSADFWV 284-295: VLLDSGTSLLNA

88-99: VLVDTGSSDLWI 375-386: ALLDSGTTLTYL

> 82-92: LLLDTGSSDMWV 366-377: ALLDSGTTVSYL

66-79: V**Q**LDTGSSDLWF 305-316: VLLDTGTTL**A**YA

65-76: V**Q**LDTGSSDLWF 303-14: **T**LLDTGVTTSVL

**(aa) <sup>C</sup>**

21 XVII

18 IV

20 XVII

24 IX

16 IV

18 M

18 E

14 E

15 E

1998).

**Yps (AN)** 

ScYps1

ScYps2

ScYps3

ScYps6

ScYps7

ScBar1

CgYps1

CgYps2

CgYps3

CgYps4

YIL015W <sup>587</sup>

**Amino acid residues**

YLR120C 569 60 4.5

YDR144C 596 64.2 4.3

YLR121C 508 54.5 8.4

YLR039C 537 58.2 3.9

YDR349C 596 64.4 4.6

CAGL0M04191g 601 63.8 5.0

CAGL0E01419g 591 63.2 4.4

CAGL0E01727g 539 58.9 6.4

CAGL0E01749g 482 53.2 8.4

**MM**


Evolution of GPI-Aspartyl Proteinases (Yapsines) of *Candida* spp 301

**(kDa) IP MOTIF Signal peptide** 

LAFDTGSA**G**LIL

74-85: VLLDT**A**STVLNV 246-257: VL**H**DSGTPTM**E**L

74-85: VLLDT**A**S**I**VLNV 246-255: VL**H**DSGTPTMAL

157-168: L**R**LD**LIQ**PEVWV 417-428: VLLDS**RIL**YSYL 19-55: SER

81-92: LRLD**LTQ**PEIWV 224-235: LV**QQGVII**KSS**AY** 

63-74: VLLDTGSSDLWV 275-286: ALLDSGTSL**Q**YL 470-701: SER 540–638: THR

Table 6. Aspartyl proteases GPI- linked to cell membrane in pathogenic *Candida* spp. AN: Access number in the respective genome; MM: molecular mass in kilodaltons (kDa); IP: Isoelectric Point; C: Chromosome/Contig or supercontig; the atypical amino acids in the PROSITE motif are shown in black (Eukaryotic and viral aspartyl protease active site).

The presence of a GPI attachment site, a characteristic feature of the yapsin family, was determined with big-PI predictor (http://mendel.imp.univie.ac.at/gpi/gpi\_server.html), and GPI-SOM. GPI-anchor signals were identified by a Kohonen Self Organizing Map (http://gpi.unibe.ch/). A total of 36 protein sequences were analyzed, but GPI sites were recognized only in 21 proteins. GPI sites were not detected in ScYps2 and CgYps2, although both proteins have been previously confirmed as Yps proteins. The software programs must

PSORTII (http://www.psort.org/) and Softberry (http://www.softberry.com) programs were used to predict subcellular localization. All proteins detected seem to be extracellular, which could be because of the presence of a signal peptide in the amino terminal extreme. Nevertheless during their synthesis, yapsins are cotranslocated and modified by the addition of GPI to the lumen of the endoplasmic reticulum (ER). Then proteins are glycosylated in Golgi apparatus, associated to membrane vesicles and sent to plasma membrane or the cell wall (Mayor & Rieaman, 2004; Caro et al. 1997). Softberry program was also used to find exons, which were absent in all genes studied. A search was made for

be enhanced, but an experimental approach to confirm the cell location is necessary.

**(aa) <sup>C</sup>**

<sup>15</sup>126: 70630-

<sup>15</sup>116:138449

<sup>27</sup>139:123872

<sup>16</sup>6: 496161-

<sup>14</sup>1: 1836367-

71727 +



497909

1838532 +

**Yps (AN)** 

**Amino acid residues**

CPAG\_03253 366 40.9 6.4

CPAG\_02564 366 40.5 6.5

CPAG\_04713 700 75.5 4.5

PGUG\_04882 583 63.5 4.1

CLUG\_00903 722 74 3

**MM**


**(kDa) IP MOTIF Signal peptide** 

72-83: LSIDTGSWLTHI 244-255: **Y**TLDTG**GG**T**G**FL 42-44: RGD

> 53-64: VIVDSGSSDLMI 229-240: **YQ**IDSGTN**G**FV**P**

72-82: VVII-1DTGS**WLT**HI 848-859: **Y**TLDTG**GG**NGYL

73-84: IAADTGS**W**LT**Q**I 246-257: **Y**T**M**DTG**GG**Y**G**YL

149-150: LRLDLIQPEIWVM 402-412: VILDSRASNFY

60-71: VII-1VDSGSSDLMI 236-47: **YQ**IDSGTN**G**FV**P**

151-162: L**R**LD**LIQ**PEIWV 401-412: VLIDS**R**SS**Y**FYL

55-66: VII-1VDSGSSDLMI 232-243: **YQ**IDSGSN**G**FL**P** 392-423: THR

72-83: VMIDTGS**W**RLNV 245-256: I**G**IDSG**N**PRLAF

251-262: LALDTG**N**P**G**I**G**L 76-77: V**F**IDTGS**W**ALNF

76-87: VVII-1DTGS**W**ALNF 248-259:

**(aa) <sup>C</sup>**

17 2

14 2

17 2

17 2

13 7

27 2

<sup>20</sup>7: 407395-

<sup>20</sup>2: 57814-

<sup>20</sup>139:296423

<sup>19</sup>139:334234

<sup>19</sup>139:337822

409464 -

59109 -




**Yps (AN)** 

Cd36\_18360

**Amino acid residues**

orf19\_853 364 39.1 5.7

orf19\_2082\* 436 47.7 3.8

Cd36\_18370 365 40 5.6

Cd36\_72090 697 76.6 4.6

Cd36\_15430 442 48.7 4.2

CTRG\_05014 690 74.8 4

CTRG\_01112 432 47.9 3.8

CPAG\_04785 369 40.5 4.5

CPAG\_04801 374 40.5 5.7

CPAG\_04802 371 40.4 6.1

**MM**


Table 6. Aspartyl proteases GPI- linked to cell membrane in pathogenic *Candida* spp. AN: Access number in the respective genome; MM: molecular mass in kilodaltons (kDa); IP: Isoelectric Point; C: Chromosome/Contig or supercontig; the atypical amino acids in the PROSITE motif are shown in black (Eukaryotic and viral aspartyl protease active site).

The presence of a GPI attachment site, a characteristic feature of the yapsin family, was determined with big-PI predictor (http://mendel.imp.univie.ac.at/gpi/gpi\_server.html), and GPI-SOM. GPI-anchor signals were identified by a Kohonen Self Organizing Map (http://gpi.unibe.ch/). A total of 36 protein sequences were analyzed, but GPI sites were recognized only in 21 proteins. GPI sites were not detected in ScYps2 and CgYps2, although both proteins have been previously confirmed as Yps proteins. The software programs must be enhanced, but an experimental approach to confirm the cell location is necessary.

PSORTII (http://www.psort.org/) and Softberry (http://www.softberry.com) programs were used to predict subcellular localization. All proteins detected seem to be extracellular, which could be because of the presence of a signal peptide in the amino terminal extreme. Nevertheless during their synthesis, yapsins are cotranslocated and modified by the addition of GPI to the lumen of the endoplasmic reticulum (ER). Then proteins are glycosylated in Golgi apparatus, associated to membrane vesicles and sent to plasma membrane or the cell wall (Mayor & Rieaman, 2004; Caro et al. 1997). Softberry program was also used to find exons, which were absent in all genes studied. A search was made for

Evolution of GPI-Aspartyl Proteinases (Yapsines) of *Candida* spp 303

detected by TRUST (Szklarczyk & Heringa, 2004) even when it is likely that the Yps and Sap

The analysis of possible evolutive and molecular events that has given place to the presence of different numbers of *YPS* in each pathogenic *Candida* species was made to establish the COGs between Yps. Phylogenetic analysis was performed by an alignment of *YPS* homologues identified *in silico* and those of the previously characterized. The alignment was carried out using MUSCLE in SeaView 2.4 program (Galtier et al. 1996) with default alignment parameter adjustments. The phylogenetic analyses were performed in the MEGA4 program (Tamura et al. 2007) using minimum evolution computed with the Poisson correction. A similitude and identity matrix were computed with the MatGAT4.50.2 software (Campanella et al. 2003). To corroborate support for the branches on trees, bootstrap analysis (1,000 replicates) was performed. Synteny analysis was made to

> **CgYps6 (CAGL0E01793g) CgYps8 (CAGL0E01815g) CgYps9 (CAGL0E01837g) CgYps4 (CAGL0E01749g) CgYps5 (CAGL0E01771g) CgYps3 (CAGL0E01727g) CgYps10 (CAGL0E01859g) CgYps2 (CAGL0E01419g) CgYps11 (CAGL0E01881g) CgYps1 (CAGL0M04191g) ScYap1 (YLR120C) ScYap2 YDR144C ScYap3 (YLR121C) ScYap6 (YIR039C) CgYps12 (CAGL0J02288g) ScBar1 (YIL015W )**

> > **Ca (orf19.6481) Cd (Cd36\_72090) Ct (CTRG\_05014) Cp (CPAG\_04713) Cgu (PGUG\_04882) CgYps7 (CAGL0A02431g) ScYap7 (YDR349C) Ca (orf19.2082) Cd (Cd36\_15430) Ct (CTRG\_01112) Ca (orf19.852) Cd (Cd36\_18370) Ca (orf19.853) Cd (Cd36\_18360) Cp (CPAG\_03253) Cp (CPAG\_02564) Cp (CPAG\_04785) Cp (CPAG\_04801) Cp (CPAG\_04802)**

Fig. 5. Minimum evolution phylogenetic tree of GPI-anchored aspartyl proteinase (Yps) superfamily of opportunistic pathogenic *Candida* species. Ca, *C. albicans*; Cd, *C. dubliniensis*; Cg, *C. glabrata*; Cgu, *C. guilliermondii*; Cl, *C. lusitaniae*; Cp, *C. parapsilosis*; Ct, *C. tropicalis;* Sc, *S. cerevisiae*. Bootstrap values > 50% are on branches. Curly brackets and arrows indicate the Yps protein families defined by phylogenetic relationships, similitude percentage (> 50%), synteny and motif array. Yps are grouped into 8 families. Family A, CgYps2-6 and 8-11; family B, CgYps1, ScYps1-3 and ScYps6; family C, CgYps12 and ScBar1; family D, ClYps (*C. lusitaniae*); family E, CgYps7, ScYps7, CaYps (orf19.6481), CdYps (Cd36\_72090), CtYps (CTRG\_05014), CpYps (CPAG\_04713) and CguYps (PGUG\_04882); family F, CaYps (orf19.2082), CdYps (Cd36\_15430) and CtYps (CTRG\_01112); family G, CaYps (orf19.852), CdYps (Cd36\_18370), CaYps (orf19.853) and CdYps (Cd36\_18360); family H, CpYps (CPAG\_03253, CPAG\_02564, CPAG\_04785, CPAG\_04801 and CPAG\_04802).

**A) CgYps2-6 y 8-11** (*C. glabrata*)

**B) CgYps1** (*C. glabrata*); **ScYps1-3 y 6** (*S. cerevisiae*)

**Family**

**C) CgYps12** (*C. glabrata*) Bar1 (*S. cerevisiae*) **D) ClYps** (*C. lusitaniae*) **E) Yps** (*C. albicans, C. dubliniensis, C. tropicalis, C. parapsilosis, C. guilliermondii*); **CgYps7** (*C. glabrata*); **ScYps7** (*S. cerevisiae*)

> **F) Yps** (*C. albicans, C. dubliniensis, C. tropicalis*) **G) Yps** (*C. albicans, C. dubliniensis*)

**H) CpYps** (*C. parapsilosis*)

**Cl (CLUG\_00903)**

**97 99**

**100**

**100**

**<sup>100</sup> <sup>100</sup>**

**100**

**100 99 100**

superfamilies have duplicated aspartyl protease motifs.

recognize the putative COGs (Fig. 5).

**100**

**76 100**

**100**

**74**

**0.1**

**60**

**89**

**100**

**98 98 93**

**89**

internal protein sequence repeats to detect possible internal duplication events, but none were detected by TRUST (Szklarczyk & Heringa, 2004) even though it is likely were not

Fig. 4. Motifs of *Candida* spp. GPI-anchored aspartyl proteases (Yps). Rectangle boxes (SP): amine terminal signal peptide; pentagon (ASP): aspartyl protease domains in agreement with PROSITE; circles (ASP): atypical aspartyl protease domains proposed as [LIVMFGACTPSYF]-(LIVMTADNQSFH)-(LIVFSAE)-D-(STP)-(GS)-(STAV)-(STAPDENQY)- X-(LIVMFSTNCGQ)-(LIVMFGTAW); hexagons: serine (SER), threonine (THR), lecithin (LEC) rich regions; star: RGD motif; rhombus (C): cysteine residues, semicircles. Ca, *C. albicans*; Cd, *C. dubliniensis*; Cg, *C. glabrata*; Cgu, *C. guilliermondii*; Cl, *C. lusitaniae*; Cp, *C. parapsilosis*; Ct, *C. tropicalis;* Sc, *S. cerevisiae*. **A)** ScYps1 (YLR120C), ScYps6 (YLR139C), CgYps1 (CAGL0M04191g), CgYps2 (CAGL0E01419g), CgYps11 (CAGL0E01881g); **B)** ScYps2 (YDR144C); **C)** ScYps3 (YLR121C); **D)** ScYps7 (YDR349C), CdYps (Cd36\_18370), CpYps (CPAG\_04785), CpYps (CPAG\_04801), CpYps (CPAG\_04802), CpYps (CPAG\_03253), CpYps (CPAG\_02564), CguYps (PGUG\_04882), CgYps3 (CAGL0E01727g), CgYps4 (CAGL0E01749g), CgYps7 (CAGL0A02431g), CgYps9 (CAGL0E01837g), CaYps (orf19\_852), CdYps (Cd36\_72090); **E)** CdYps (Cd36\_15430), CaYps (orf19.2082); **F)** CtYps (CTRG\_01112); **G)** CpYps (CPAG\_04713); **H)** CgYps8 (CAGL0E01815g), CgYps10 (CAGL0E01859g); **I)**  ClYps (CLUG\_00903); **J)** CtYps (CTRG\_05014); **K)** CgYps5 (CAGL0E01771g), CgYps6 (CAGL0E01793g); **L)** CgYps12 (CAGL0J02288g); **M)** CaYps (orf19.6481); **N)** CaYps (orf19\_853), CdYps (Cd36\_18360).

internal protein sequence repeats to detect possible internal duplication events, but none were detected by TRUST (Szklarczyk & Heringa, 2004) even though it is likely were not

Fig. 4. Motifs of *Candida* spp. GPI-anchored aspartyl proteases (Yps). Rectangle boxes (SP): amine terminal signal peptide; pentagon (ASP): aspartyl protease domains in agreement

[LIVMFGACTPSYF]-(LIVMTADNQSFH)-(LIVFSAE)-D-(STP)-(GS)-(STAV)-(STAPDENQY)- X-(LIVMFSTNCGQ)-(LIVMFGTAW); hexagons: serine (SER), threonine (THR), lecithin (LEC) rich regions; star: RGD motif; rhombus (C): cysteine residues, semicircles. Ca, *C. albicans*; Cd, *C. dubliniensis*; Cg, *C. glabrata*; Cgu, *C. guilliermondii*; Cl, *C. lusitaniae*; Cp, *C. parapsilosis*; Ct, *C. tropicalis;* Sc, *S. cerevisiae*. **A)** ScYps1 (YLR120C), ScYps6 (YLR139C),

CgYps1 (CAGL0M04191g), CgYps2 (CAGL0E01419g), CgYps11 (CAGL0E01881g); **B)** ScYps2 (YDR144C); **C)** ScYps3 (YLR121C); **D)** ScYps7 (YDR349C), CdYps (Cd36\_18370), CpYps (CPAG\_04785), CpYps (CPAG\_04801), CpYps (CPAG\_04802), CpYps (CPAG\_03253), CpYps

(CAGL0E01749g), CgYps7 (CAGL0A02431g), CgYps9 (CAGL0E01837g), CaYps (orf19\_852), CdYps (Cd36\_72090); **E)** CdYps (Cd36\_15430), CaYps (orf19.2082); **F)** CtYps (CTRG\_01112); **G)** CpYps (CPAG\_04713); **H)** CgYps8 (CAGL0E01815g), CgYps10 (CAGL0E01859g); **I)**  ClYps (CLUG\_00903); **J)** CtYps (CTRG\_05014); **K)** CgYps5 (CAGL0E01771g), CgYps6 (CAGL0E01793g); **L)** CgYps12 (CAGL0J02288g); **M)** CaYps (orf19.6481); **N)** CaYps

with PROSITE; circles (ASP): atypical aspartyl protease domains proposed as

(CPAG\_02564), CguYps (PGUG\_04882), CgYps3 (CAGL0E01727g), CgYps4

(orf19\_853), CdYps (Cd36\_18360).

detected by TRUST (Szklarczyk & Heringa, 2004) even when it is likely that the Yps and Sap superfamilies have duplicated aspartyl protease motifs.

The analysis of possible evolutive and molecular events that has given place to the presence of different numbers of *YPS* in each pathogenic *Candida* species was made to establish the COGs between Yps. Phylogenetic analysis was performed by an alignment of *YPS* homologues identified *in silico* and those of the previously characterized. The alignment was carried out using MUSCLE in SeaView 2.4 program (Galtier et al. 1996) with default alignment parameter adjustments. The phylogenetic analyses were performed in the MEGA4 program (Tamura et al. 2007) using minimum evolution computed with the Poisson correction. A similitude and identity matrix were computed with the MatGAT4.50.2 software (Campanella et al. 2003). To corroborate support for the branches on trees, bootstrap analysis (1,000 replicates) was performed. Synteny analysis was made to recognize the putative COGs (Fig. 5).

Fig. 5. Minimum evolution phylogenetic tree of GPI-anchored aspartyl proteinase (Yps) superfamily of opportunistic pathogenic *Candida* species. Ca, *C. albicans*; Cd, *C. dubliniensis*; Cg, *C. glabrata*; Cgu, *C. guilliermondii*; Cl, *C. lusitaniae*; Cp, *C. parapsilosis*; Ct, *C. tropicalis;* Sc, *S. cerevisiae*. Bootstrap values > 50% are on branches. Curly brackets and arrows indicate the Yps protein families defined by phylogenetic relationships, similitude percentage (> 50%), synteny and motif array. Yps are grouped into 8 families. Family A, CgYps2-6 and 8-11; family B, CgYps1, ScYps1-3 and ScYps6; family C, CgYps12 and ScBar1; family D, ClYps (*C. lusitaniae*); family E, CgYps7, ScYps7, CaYps (orf19.6481), CdYps (Cd36\_72090), CtYps (CTRG\_05014), CpYps (CPAG\_04713) and CguYps (PGUG\_04882); family F, CaYps (orf19.2082), CdYps (Cd36\_15430) and CtYps (CTRG\_01112); family G, CaYps (orf19.852), CdYps (Cd36\_18370), CaYps (orf19.853) and CdYps (Cd36\_18360); family H, CpYps (CPAG\_03253, CPAG\_02564, CPAG\_04785, CPAG\_04801 and CPAG\_04802).

Evolution of GPI-Aspartyl Proteinases (Yapsines) of *Candida* spp 305

isomerase; *POM152*, nuclear pore membrane glycoprotein; *RPL2B*, protein component of the large ribosomal subunit; *SAN1*, ubiquitin-protein-ligase; *SBE2*, protein involved in the transport of cell wall components from the Golgi to the cell surface; *SEC1*, Sm-like protein involved in docking and fusion of exocytic vesicles through binding to assembled SNARE complexes at the membrane; *SNL1*, putative protein involved in nuclear pore complex biogenesis and maintenance; *SRN2*, component of the ESCRT-I complex; *SVF1*, protein with

a potential role in cell survival pathways; *SW15*, transcription factor that activates transcription of genes expressed at the M/G1 phase boundary and in G1 phase; *TAF12*, subunit (61/68 kDa) of TFIID and SAGA complexes; *TIR3*, cell wall mannoprotein of the Srp1p/Tip1p family of serine-alanine-rich proteins; *TMA20*, protein associated with ribosomes with a putative RNA binding domain; *UGA11*: gamma-aminobutyrate

transaminase (4-aminobutyrate aminotransferase); *VID28*, protein involved in proteasomedependent catabolite degradation of fructose-1,6-bisphosphatase (FBPase); *v-SNARE*, component of the vacuolar SNARE complex involved in vesicle fusion; *YCF1*, putative glutathione S-conjugate transporter; YLR126C, protein with similarity to glutamine

amidotransferase proteins; *YMD8*, putative nucleotide sugar transporter; ORF, *APM2*, *BSC6*, CAGL0M04125g, Cd36\_72050, Cd36\_72080, *FM02*, *IFK2*, orf19.6482, *RTC1*, tRNA-Glu, *YDR352W*, YDR348C and YLR125W and ORF, unknown predicted open reading frame.

The lack of *SAP* genes and the expansion of 12 *CgYPS* genes in *C. glabrata,* and the extended family of *SAP* genes in *C. albicans* support the hypothesis that both protein superfamilies are an example of convergent evolution. Although more research is necessary to reach definite conclusions, apparently *YPS* of *C. glabrata* and *SAP* of *C. albicans* have developed some equivalent physiological functions and roles in virulence. The rest of pathogenic *Candida* species are less virulent, and, curiously, harbour less genes in their genomes than *C. albicans.*  These facts lead to the supposition that *SAP* and *YPS* have evolved in an independent way for at least 700 million years. However, more *SAP* duplication events have happened in *C.* 

Phylogenetic analyses of Yps deduced protein sequences of *Candida* spp. and *S. cerevisiae* allow for the definition of 8 Yps families, A-H (Fig. 5). In particular, CgYps1-12 proteins of *C. glabrata* were clustered in four families. Family A was constituted exclusively of nine Yps of *C. glabrata* (CgYps2-6 and CgYps8-11) encoded in chromosome E. With exception of CgYps2, all codifying genes of these proteins are organized in tandem, and possibly derived from at least eight recent duplication events that occurred exclusively in the *C. glabrata*  genome. Apparently these recent duplications led to the emergence of a paralogous gene family with novel or slightly different functions. No pseudogenes were detected in *CgYPS1- 11* genes, but in their deduced proteins a moderate amino acid similitude (48-53%) and identity (36-38%) were retained. Frequently, very high similitudes are maintained by concerted evolution in paralogous members of some multigene families (László, 1999). However, in *CgYPS* genes, this evolutive phenomenon is not evident. Previously, *CgYPS4*  and *CgYPS11* were recognized as GPI anchored aspartyl proteases (Kaur *et al.,* 2007), but comparative studies of the regulatory region and expression of each *CgYPS* genes are necessary to clearly define the physiological role and orthology relationships of each gene. Family B was formed by a set of Yps proteins, detected exclusively in *S. cerevisiae* (ScYps2-3 and ScYps6), and a highly similar putative orthologous pair (*ScYPS1*/*CgYPS1*) (Fig. 6A). Also, the partial synteny observed between the *ScYPS2*/*CgYPS2* gene pair supports the hypothesis that those protein-coding genes are probable orthologous (Fig. 6B). Family C was

*albicans* (Parra et al. 2009).

Fig. 6. Synteny of *YPS* genes of *S. cerevisiae* (Sc), *C. glabrata* (Cg), *C. albicans* (Caand *C.dubliniensis* (Cd). **A)** *ScYPS1* and *CgYPS1;* **B)** *ScYPS2* and *CgYPS2;* **C)** *ScYPS7* and *CgYPS7;*  **D)** *CaYPS7* (orf19.6481) and *CdYPS7* (Cd36\_72090); **E)** *CaYPS* (Sap99), *Cd* (orf19.853 and Sap98, orf19.852); **F)** *CaYPS* and *CdYPS* (Bar1); **G)** Cg*YPS* and *Sc*Bar1. *CgYPS1, CgYPS7, ScYPS1*, *ScYPS3*, Ca*YPS7*, Ca*SAP98, CaSAP99* and *BAR1* are GPI anchored aspartyl proteases; *APC2* and CAGL0M04235g, subunit of the anaphase-promoting; *APT1*, acylprotein thioesterase; *ATP22*, mitochondrial inner membrane protein; CAGL0M04147g, similar to low affinity vacuolar membrane, is a localized monovalent cation/H+ antiporter protein; CAGL0M04169g, similar to cell wall glycoprotein involved in beta-glucan assembly; *CDH1*, cell-cycle regulated activator of the anaphase-promoting complex/cyclosome (APC/C); *CLF1*, crooked neck-like factor; *DOP1*, protein essential for viability; *EKL1*, ethanolamine kinase; *FAF1*, protein required for pre-rRNA processing and 40S ribosomal subunit assembly; *HAT*, histone acetyltransferase; *HXT3*, low affinity glucose transporter of the major facilitator superfamily; *LDG3* and *LDG4*, leucine, aspartic acid, glycine rich; *MNN42*, putative positive regulator of mannosylphosphate transferase; *MNT3*, alpha-1,3 mannosyltransferase; *MTQ2*, S-adenosylmethionine-dependent methyltransferase; *MRP1*, mitochondrial ribosomal protein of the small subunit; *NOP16*, constituent of 66S preribosomal particles; *NTA1*, amidase; orf19.2088, shared subunit of DNA polymerase (II) epsilon and of ISW2/yCHRAC chromatin accessibility complex; PDR11, ATP-binding cassette transporter, *PEX7*, peroxisomal signal receptor; *PFK27*, 6-phosphofructo-2-kinase; *PHHB*, transposon mutation affects filamentous growth; *PMI40*, mannose-6-phosphate

Fig. 6. Synteny of *YPS* genes of *S. cerevisiae* (Sc), *C. glabrata* (Cg), *C. albicans* (Caand

*CDH1*, cell-cycle regulated activator of the anaphase-promoting complex/cyclosome (APC/C); *CLF1*, crooked neck-like factor; *DOP1*, protein essential for viability; *EKL1*, ethanolamine kinase; *FAF1*, protein required for pre-rRNA processing and 40S ribosomal subunit assembly; *HAT*, histone acetyltransferase; *HXT3*, low affinity glucose transporter of the major facilitator superfamily; *LDG3* and *LDG4*, leucine, aspartic acid, glycine rich; *MNN42*, putative positive regulator of mannosylphosphate transferase; *MNT3*, alpha-1,3 mannosyltransferase; *MTQ2*, S-adenosylmethionine-dependent methyltransferase; *MRP1*, mitochondrial ribosomal protein of the small subunit; *NOP16*, constituent of 66S preribosomal particles; *NTA1*, amidase; orf19.2088, shared subunit of DNA polymerase (II) epsilon and of ISW2/yCHRAC chromatin accessibility complex; PDR11, ATP-binding cassette transporter, *PEX7*, peroxisomal signal receptor; *PFK27*, 6-phosphofructo-2-kinase; *PHHB*, transposon mutation affects filamentous growth; *PMI40*, mannose-6-phosphate

*C.dubliniensis* (Cd). **A)** *ScYPS1* and *CgYPS1;* **B)** *ScYPS2* and *CgYPS2;* **C)** *ScYPS7* and *CgYPS7;*  **D)** *CaYPS7* (orf19.6481) and *CdYPS7* (Cd36\_72090); **E)** *CaYPS* (Sap99), *Cd* (orf19.853 and Sap98, orf19.852); **F)** *CaYPS* and *CdYPS* (Bar1); **G)** Cg*YPS* and *Sc*Bar1. *CgYPS1, CgYPS7, ScYPS1*, *ScYPS3*, Ca*YPS7*, Ca*SAP98, CaSAP99* and *BAR1* are GPI anchored aspartyl proteases; *APC2* and CAGL0M04235g, subunit of the anaphase-promoting; *APT1*, acylprotein thioesterase; *ATP22*, mitochondrial inner membrane protein; CAGL0M04147g, similar to low affinity vacuolar membrane, is a localized monovalent cation/H+ antiporter protein; CAGL0M04169g, similar to cell wall glycoprotein involved in beta-glucan assembly; isomerase; *POM152*, nuclear pore membrane glycoprotein; *RPL2B*, protein component of the large ribosomal subunit; *SAN1*, ubiquitin-protein-ligase; *SBE2*, protein involved in the transport of cell wall components from the Golgi to the cell surface; *SEC1*, Sm-like protein involved in docking and fusion of exocytic vesicles through binding to assembled SNARE complexes at the membrane; *SNL1*, putative protein involved in nuclear pore complex biogenesis and maintenance; *SRN2*, component of the ESCRT-I complex; *SVF1*, protein with a potential role in cell survival pathways; *SW15*, transcription factor that activates transcription of genes expressed at the M/G1 phase boundary and in G1 phase; *TAF12*, subunit (61/68 kDa) of TFIID and SAGA complexes; *TIR3*, cell wall mannoprotein of the Srp1p/Tip1p family of serine-alanine-rich proteins; *TMA20*, protein associated with ribosomes with a putative RNA binding domain; *UGA11*: gamma-aminobutyrate transaminase (4-aminobutyrate aminotransferase); *VID28*, protein involved in proteasomedependent catabolite degradation of fructose-1,6-bisphosphatase (FBPase); *v-SNARE*, component of the vacuolar SNARE complex involved in vesicle fusion; *YCF1*, putative glutathione S-conjugate transporter; YLR126C, protein with similarity to glutamine amidotransferase proteins; *YMD8*, putative nucleotide sugar transporter; ORF, *APM2*, *BSC6*, CAGL0M04125g, Cd36\_72050, Cd36\_72080, *FM02*, *IFK2*, orf19.6482, *RTC1*, tRNA-Glu, *YDR352W*, YDR348C and YLR125W and ORF, unknown predicted open reading frame.

The lack of *SAP* genes and the expansion of 12 *CgYPS* genes in *C. glabrata,* and the extended family of *SAP* genes in *C. albicans* support the hypothesis that both protein superfamilies are an example of convergent evolution. Although more research is necessary to reach definite conclusions, apparently *YPS* of *C. glabrata* and *SAP* of *C. albicans* have developed some equivalent physiological functions and roles in virulence. The rest of pathogenic *Candida* species are less virulent, and, curiously, harbour less genes in their genomes than *C. albicans.*  These facts lead to the supposition that *SAP* and *YPS* have evolved in an independent way for at least 700 million years. However, more *SAP* duplication events have happened in *C. albicans* (Parra et al. 2009).

Phylogenetic analyses of Yps deduced protein sequences of *Candida* spp. and *S. cerevisiae* allow for the definition of 8 Yps families, A-H (Fig. 5). In particular, CgYps1-12 proteins of *C. glabrata* were clustered in four families. Family A was constituted exclusively of nine Yps of *C. glabrata* (CgYps2-6 and CgYps8-11) encoded in chromosome E. With exception of CgYps2, all codifying genes of these proteins are organized in tandem, and possibly derived from at least eight recent duplication events that occurred exclusively in the *C. glabrata*  genome. Apparently these recent duplications led to the emergence of a paralogous gene family with novel or slightly different functions. No pseudogenes were detected in *CgYPS1- 11* genes, but in their deduced proteins a moderate amino acid similitude (48-53%) and identity (36-38%) were retained. Frequently, very high similitudes are maintained by concerted evolution in paralogous members of some multigene families (László, 1999). However, in *CgYPS* genes, this evolutive phenomenon is not evident. Previously, *CgYPS4*  and *CgYPS11* were recognized as GPI anchored aspartyl proteases (Kaur *et al.,* 2007), but comparative studies of the regulatory region and expression of each *CgYPS* genes are necessary to clearly define the physiological role and orthology relationships of each gene. Family B was formed by a set of Yps proteins, detected exclusively in *S. cerevisiae* (ScYps2-3 and ScYps6), and a highly similar putative orthologous pair (*ScYPS1*/*CgYPS1*) (Fig. 6A). Also, the partial synteny observed between the *ScYPS2*/*CgYPS2* gene pair supports the hypothesis that those protein-coding genes are probable orthologous (Fig. 6B). Family C was

Evolution of GPI-Aspartyl Proteinases (Yapsines) of *Candida* spp 307

*vivo* conditions when alpha pheromone is degraded (Hull *et al.,* 2000; Magee & Magee*,* 2000) and *C. glabrata* harbours homologous genes of *S. cerevisiae* that control the mating (Srikantha *et al.,* 2003). Nevertheless, in *C. glabrata* a cell cycle has not been demonstrated, and the participation of CgYps7 of *C. glabrata* in alpha pheromone inactivation has not been demonstrated. No possible gene orthologous to possible gene orthologous to ScBar1 was detected in *C. guilliermondii*, *C. lusitaniae, C. parapsilosis, C. tropicalis, C. guilliermondii* or *C. lusitaniae*. All these yeasts have a heterothallic sex cycle (cross-mating only), but *C.* 

Family G is formed by two *C. albicans*/*C. dubliniensis* Yps protein pairs with high similitude (>88%), located in tandem in chromosome 2 and with very similar synteny. All this data is evidence from the recent speciation of both species (Fig. 6E). According to the *Candida* genome database (http://www.candidagenome.org/cgi-bin/locus.pl?locus=orf19.852) Cal orf19.852 and Cdu Cd36\_18370 sequences are described as *CaSAP98* and *CdSAP98* genes, respectively, and have their best hits with *PEP4* of *S. cerevisiae* (Pra protein). *S. cerevisiae* PrA is a vacuolar protease, and clearly *C. albicans*/*C. dubliniensis* Yps are not phylogenetically grouped with PrA. In our opinion no orthology relationship among these proteins exists. Cal orf19.853 and Cdu Cd36\_18360 formed a second pair, described as *CaSAP99* and *CdSAP99* genes, which had their best hits with *ScYPS3* of *S. Cerevisiae.* Similarly, it is clear that *CaSAP99* has no synteny, phylogenetic relationship, or possible common physiological

Why have *C. albicans/C. dubliniensis* and *C. glabrata*/*S. cerevisiae* been suffering some genetic duplication events in their Sap and Yps superfamilies? This is something that has not been resolved, but it is clear that the decrease in virulence in null mutants, in both *CaSAP* and *CgYPS,* endorse the idea that the presence and expansion of *SAP* and *YPS* families is necessary for adaptation to the host, and therefore for survival and virulence. Also, species with broad aspartyl protease families are more virulent than those with a limited number of these proteins. *C. glabrata* belongs to a phylogenetic group with no pathogenic yeast, and its virulence attributes could be evolving independently from the CTG clade, where *C. albicans* is the main opportunistic pathogenic species. The expansion of the *CgYPS* gene superfamily of *C. glabrata* maintains a parallelism with the expansion of the *SAP* gene superfamily of *C. albicans*, and constitutes a possible example of convergent evolution. The transition from a commensally life style to a successful opportunistic pathogen could be related to gene expansion that encodes for each kind of aspartyl protease. A lot of experimental methodologies must be performed to recognize the orthologous gene families, as well as the virulence, participation and transition commensal-pathogen roles of aspartyl proteases,

We are grateful for the financial support from CONACyT-CB-13695, CONACyT-69984, SIP201005214 and SIP20113066. BPO is a fellow of CONACyT and PIFI-IPN. Thanks to Dr. Bernard Dujon (Institut Pasteur and Université Pierre et Marie Curie) for donating

strains. Thanks also to Bruce Allan Larsen for reviewing the use of English.

*parapsilosis* and *C. tropicalis* mating has never been observed (Butler *et al.,* 2009).

role with *ScYPS3*.

**3. Conclusion** 

including Sap and Yps.

**4. Acknowledgment** 

integrated by CgYps12 and ScBar1 of *S. cerevisiae*, a putative orthologous pair with low similitude synteny, but with a clear ancestor-descendant relationship (Fig. 6G). Finally, family E was formed by a representative of each *Candida* spp. Yps, CgYps7 and ScYps7. This family forms a sub tree with the same topology as those phylogenies constructed with ribosomal and other protein sequences (Diezman *et al.,* 2004). The CgYPS7 and ScYPS7 genes exhibited an extensive synteny (Fig. 6C), but no synteny with *CaYPS* (orf19.6481) and *CdYPS* (Cd36\_72090) was observed (Fig. 6D). In *C. albicans* and *C. dubliniensis* genome databases these *YPS* are described as *ScYPS7* orthologous genes (Schaefer *et al*., 2007). Nevertheless, both *YPS* exhibited low similarity with *ScYPS7* (37.2-38.7%) and no-synteny. The final decision to consider family E as an orthologous family will depend on comparative analyses of functional features not yet performed.

Families C, F, G and H have not any *C. glabrata* or *S. cerevisiae* Yps representative protein*.*  Families C and H were formed only by one ClYps gene of *C. lusitaniae* and seven CpYps genes of *C. parapsilosis*, respectively (Fig. 5). Curiously, *C. lusitaniae* is the species that harbours the fewest Cl*YPS* (n=1) and *SAP* (n=3) genes, and its isolation frequency from clinical samples ,as well as its virulence, are lower than the other *Candida* species (Abi-Said et al. 1997)*.* This evidence supports a hypothesis of relevance of aspartyl proteases in virulence. That is, species with numerous aspartyl proteases in virulence; species with broad aspartyl proteases are more virulent than those with a limited number of these proteins.

Family F harboured *C. albicans, C. dubliniensis* and *C. tropicalis* yapsins organized congruently according to the ribosomal phylogenetic tree. The *C. albicans* CaBar1 (orf19.2082) and *C. dubliniensis* CdBar1 (Cd36\_15430) gene, found in family F, has been described as orthologous to *S. cerevisiae BAR1* (Schaefer *et al.,* 2007) found in family C. In both species, *C. albicans* and *S. cerevisiae*, the protein is involved in alpha pheromone degradation and secreted to the periplasmic space of mating alpha-type cells. These proteins help cells find mating partners by cleaving and inactivating the alpha factor, which allows cells to recover from alpha-factor-induced cell cycle arrest (Mackay *et al.,* 1988). The *in silico* analysis performed in this work established that these proteins and the Bar1 from *C. dubliniensis* are extracellular, but anchored to the cell wall or cell membrane. Also, phylogenetic analysis shows that Bar1 from *C. albicans* and *C. dubliniensis* belongs to the Yps superfamily, with a similarity of 40%, and are not grouped with CgYps12 of *C. glabrata*  (CgYps12 or CgBar1) and Bar1 of *S. cerevisiae*. The reason for which an aspartyl protease, that apparently is secreted, is groupedwith the yapsines superfamily could be a mistake in the cell location method because almost all software use the signal peptide, transmembranal regions, and the GPI site in the C-terminal, to predict the cell location. In *C. albicans* it has been detected that aspartyl proteases are associated with the plasmatic membrane, or to both the plasmatic membrane and cell wall. This makes the experimental corroboration of the cell location necessary. The Bar1 protein of *C. albicans* has been described as a protein with three domains: 2 aspartyl protease domains and another unidentified. Apparently, this GPI-membrane anchored domain determines that Bar proteins are not secreted, but anchored to cellular membranes, and their two actives sites are oriented to cellular membranes, and their two actives sites are oriented to the exterior to inactivate alpha pheromone, which is secreted by Mat-alpha cells. In *C. albicans*, the degradation of secreted alpha pheromone is not exclusive to Bar1. CaYPS7 (orf19.6481) of family E also encodes for this function with lesser efficiency (Schaefer *et al*., 2007). This physiological redundancy has not been demonstrated in *S. cerevisiae* ScYps7. *C. albicans* can mate under some *in vitro* and *in*  *vivo* conditions when alpha pheromone is degraded (Hull *et al.,* 2000; Magee & Magee*,* 2000) and *C. glabrata* harbours homologous genes of *S. cerevisiae* that control the mating (Srikantha *et al.,* 2003). Nevertheless, in *C. glabrata* a cell cycle has not been demonstrated, and the participation of CgYps7 of *C. glabrata* in alpha pheromone inactivation has not been demonstrated. No possible gene orthologous to possible gene orthologous to ScBar1 was detected in *C. guilliermondii*, *C. lusitaniae, C. parapsilosis, C. tropicalis, C. guilliermondii* or *C. lusitaniae*. All these yeasts have a heterothallic sex cycle (cross-mating only), but *C. parapsilosis* and *C. tropicalis* mating has never been observed (Butler *et al.,* 2009).

Family G is formed by two *C. albicans*/*C. dubliniensis* Yps protein pairs with high similitude (>88%), located in tandem in chromosome 2 and with very similar synteny. All this data is evidence from the recent speciation of both species (Fig. 6E). According to the *Candida* genome database (http://www.candidagenome.org/cgi-bin/locus.pl?locus=orf19.852) Cal orf19.852 and Cdu Cd36\_18370 sequences are described as *CaSAP98* and *CdSAP98* genes, respectively, and have their best hits with *PEP4* of *S. cerevisiae* (Pra protein). *S. cerevisiae* PrA is a vacuolar protease, and clearly *C. albicans*/*C. dubliniensis* Yps are not phylogenetically grouped with PrA. In our opinion no orthology relationship among these proteins exists. Cal orf19.853 and Cdu Cd36\_18360 formed a second pair, described as *CaSAP99* and *CdSAP99* genes, which had their best hits with *ScYPS3* of *S. Cerevisiae.* Similarly, it is clear that *CaSAP99* has no synteny, phylogenetic relationship, or possible common physiological role with *ScYPS3*.
