**4. Structural evolution of the substrate specificity in the ADP-dependent sugar kinase family**

The ADP-dependent sugar kinases have been found in several members of the *Pyrococcus*, *Thermococcus*, *Methanosarcina*, *Methanosaeta*, *Methanococcoides*, *Methanococcus*, *Methanocaldococcus*, and *Archaeoglobus* genera (Hansen & Schönheit, 2004; Kengen et al., 1994; Koga et al., 2000; Tuininga et al., 1999; Verhees et al., 2001). Also, it has been possible to

<sup>3</sup> These enzymes have a slightly different fold compared with the other nucleoside kinases (such as inosine-guanosine kinases) from the superfamily mentioned by Bork et al. (1993)

Fig. 2. *β*-meander region of several members of the ribokinase superfamily such as Pfk-2

phosphofructokinase from *S. aureus* (**C**), and the ADP-dependent glucokinase from *P. furiosus*

On the Specialization History of the ADP-Dependent Sugar Kinase Family 245

suggests that the ADP-dependent enzymes are most closely related to the other glycolytic enzymes present in the superfamily, which seems to be reasonable given the similarity of

Indeed, this C-terminal reordering is quite interesting since this same region constitutes almost all the nucleotide binding site. However, while this permutation almost certainly alters the dynamics of the binding pocket, we are not sure if it will alter the specificity of the enzyme. Nevertheless, it requires empirical testing which is now being performed in our laboratory. Interestingly, the *α* and *β* phosphates of ADP are accommodated in the binding site of the PfkC enzymes almost in the same way as the *β* and *γ* phosphates of ATP in the remaining members of the superfamily. This led Ito et al. (2001) to suggest that the bulky side chain of Y357 in the ADP-dependent glucokinase from *T. litoralis* which is located below the ribose moiety of ADP was pushing the nucleotide forward and then rendering an enzyme unable to use ATP. However, we indirectly demonstrated that this is not the case since for the ADP-dependent phosphofructokinase from *P. horikoshii* the presence of a significantly less bulky side chain

(I340) does not produce an enzyme with ATP-dependent activity (Currie et al., 2009).

had to wait six years to be tested (Merino & Guixé, 2008).

While in most of the members of the *Euryarchaeota* there are two ADP-dependent enzymes coded in their genomes, the archaeon *M. jannaschii* presents just one copy of these genes. Surprisingly, the enzyme is able to catalyze the transfer of the *β*-phosphate of ADP to either glucose or fructose-6-phosphate (Sakuraba et al., 2002). Based on this feature, it was proposed that this enzyme represent an ancestral state of the family, which later gave rise to the separate specificities through a gene duplication event (Sakuraba et al., 2002). However, this hypothesis

To test this hypothesis we used the Bayesian method of phylogenetic inference implemented in the MrBayes 3.1 software (Huelsenbeck & Ronquist, 2001; Ronquist & Huelsenbeck, 2003). Initially, a structural based sequence profile was built by means of a structural alignment of the ADP-dependent glucokinases from *T. litoralis*, *P. horikoshii*, and *P. furiosus* and the ADP-dependent phosphofructokinase from *P. horikoshii*. Later all the ADP-dependent kinases from archaeal source were aligned to this profile. After several rounds of alignment refinement the eukaryotic ADP-dependent enzymes were added. As they share only about 15 to 20% sequence identity with the archaeal versions the alignment was guided by secondary

from *E. coli* (**A**), a fructose-1-phosphate kinase from *B. halodurans* (**B**), a putative

permutation.

their substrates.

structure predictions.

(**D**). In red is shown the C-terminal extension thought to be needed for the circular

identify a distant homolog of these enzymes in the genome of higher eukaryotes, which has been proven to be an ADP-dependent glucokinase (Ronimus & Morgan, 2004). The metabolic role of the eukaryotic ADP-dependent glucokinases is unclear, but they have been suggested to be used in ischemic conditions (Ronimus & Morgan, 2004).

To date, the crystallographic structures of the ADP-dependent glucokinases from *Thermococcus litoralis* (Ito et al., 2001), *Pyrococcus horikoshii* (Tsuge et al., 2002), *Pyrococcus furiosus* (Ito et al., 2003), and the ADP-dependent phosphofructokinase from *Pyrococcus horikoshii* (Currie et al., 2009) have been solved. As opposed to the vitamin kinase or the PfkB-like branches of the ribokinase superfamily, to date just two specificities have been observed in the ADP-dependent branch (see Table 1). Considering that the ribokinase superfamily contains enzymes that catalyze the transfer the terminal phosphate of a nucleotide phosphate to the methyl alcohol end of a big number of small molecules which includes pyridoxal, pyrimidine derivatives, nucleosides, and several sugars, the PfkC family seems to be the one with the smallest substrate specificity in this group.

While there are many phosphoryl acceptor substrates in this superfamily, just two nucleotides, ADP and ATP, are described as the primary phosphoryl donors. Given the metabolic importance of the phosphoryl donor this specificity problem has received more attention than the acceptor problem in the literature. Of course, the specificity is not strict, and some other nucleotides can replace them. For instance, it has been shown that several ADP-dependent enzymes can use other purines (such as GDP) or even pyrimidines (such as UDP, (Currie et al., 2009)) as phosphoryl donors (Guixé & Merino, 2009). Also, GTP can be used by the phosphofructokinase-2 from *E. coli* and even produce substrate inhibition (unpublished results). Yet, it is important to remember that only those nucleotides with the right number of phosphates (either two for the ADP dependent enzymes or three for the ATP dependent) can be used, as it has been reviewed by us elsewhere (Guixé & Merino, 2009). This shows that any hint for the transition between nucleotide specificities has been obscured by evolution and specialization.

From an evolutionary perspective, while Zhang's hypothesis (Zhang et al., 2004) can be oversimplifying the problem, it captures the most common trend in the evolution of protein families: newer versions within the group tend to increase their structural complexity (Fong et al., 2007). Through their reasoning, ADP-dependent kinases should be closely related to the monomeric adenosine kinases. What has not been properly mentioned in the literature before is the fact that while the tertiary structure of the PfkC enzymes is almost equivalent to that of the PfkB enzymes, the topology of the C-terminal region is completely different (Figure 2). Indeed, this is the reason why it was not possible to group the ADP-dependent enzymes with the other members of the ribokinase superfamily just based on sequence comparison.

A BLAST search on the genome sequence of the archaeon *P. furiosus* reveals three PfkB-like enzymes of unknown function. As it is also possible to find vitamin kinase like enzymes in the genome of the *thermococcales* (see for instance Table 1), then it is possible to deduce that all three modern branches of the ribokinase superfamily have been originated by ancient gene duplication events followed by extensive topological modifications. While the addition of the small domain can be viewed as a trivial modification since it only involves the insertion of sequence, the C-terminal topological reordering involves a non-cyclic permutation. Now, considering that the ADP-dependent enzymes should be the modern ones, in order to be compatible with the topological reordering an ATP-dependent enzyme should present an extra strand in the C-terminal end of the protein extending the central *β*-sheet. Figure 2 shows that indeed, this requirement is fulfilled by some PfkB-like enzymes. Quite surprisingly, the sugar-phosphate kinases and not adenosine kinases are those who show the extra strand. This 8 Will-be-set-by-IN-TECH

identify a distant homolog of these enzymes in the genome of higher eukaryotes, which has been proven to be an ADP-dependent glucokinase (Ronimus & Morgan, 2004). The metabolic role of the eukaryotic ADP-dependent glucokinases is unclear, but they have been suggested

To date, the crystallographic structures of the ADP-dependent glucokinases from *Thermococcus litoralis* (Ito et al., 2001), *Pyrococcus horikoshii* (Tsuge et al., 2002), *Pyrococcus furiosus* (Ito et al., 2003), and the ADP-dependent phosphofructokinase from *Pyrococcus horikoshii* (Currie et al., 2009) have been solved. As opposed to the vitamin kinase or the PfkB-like branches of the ribokinase superfamily, to date just two specificities have been observed in the ADP-dependent branch (see Table 1). Considering that the ribokinase superfamily contains enzymes that catalyze the transfer the terminal phosphate of a nucleotide phosphate to the methyl alcohol end of a big number of small molecules which includes pyridoxal, pyrimidine derivatives, nucleosides, and several sugars, the PfkC family

While there are many phosphoryl acceptor substrates in this superfamily, just two nucleotides, ADP and ATP, are described as the primary phosphoryl donors. Given the metabolic importance of the phosphoryl donor this specificity problem has received more attention than the acceptor problem in the literature. Of course, the specificity is not strict, and some other nucleotides can replace them. For instance, it has been shown that several ADP-dependent enzymes can use other purines (such as GDP) or even pyrimidines (such as UDP, (Currie et al., 2009)) as phosphoryl donors (Guixé & Merino, 2009). Also, GTP can be used by the phosphofructokinase-2 from *E. coli* and even produce substrate inhibition (unpublished results). Yet, it is important to remember that only those nucleotides with the right number of phosphates (either two for the ADP dependent enzymes or three for the ATP dependent) can be used, as it has been reviewed by us elsewhere (Guixé & Merino, 2009). This shows that any hint for the transition between nucleotide specificities has been obscured by evolution and

From an evolutionary perspective, while Zhang's hypothesis (Zhang et al., 2004) can be oversimplifying the problem, it captures the most common trend in the evolution of protein families: newer versions within the group tend to increase their structural complexity (Fong et al., 2007). Through their reasoning, ADP-dependent kinases should be closely related to the monomeric adenosine kinases. What has not been properly mentioned in the literature before is the fact that while the tertiary structure of the PfkC enzymes is almost equivalent to that of the PfkB enzymes, the topology of the C-terminal region is completely different (Figure 2). Indeed, this is the reason why it was not possible to group the ADP-dependent enzymes with

the other members of the ribokinase superfamily just based on sequence comparison.

A BLAST search on the genome sequence of the archaeon *P. furiosus* reveals three PfkB-like enzymes of unknown function. As it is also possible to find vitamin kinase like enzymes in the genome of the *thermococcales* (see for instance Table 1), then it is possible to deduce that all three modern branches of the ribokinase superfamily have been originated by ancient gene duplication events followed by extensive topological modifications. While the addition of the small domain can be viewed as a trivial modification since it only involves the insertion of sequence, the C-terminal topological reordering involves a non-cyclic permutation. Now, considering that the ADP-dependent enzymes should be the modern ones, in order to be compatible with the topological reordering an ATP-dependent enzyme should present an extra strand in the C-terminal end of the protein extending the central *β*-sheet. Figure 2 shows that indeed, this requirement is fulfilled by some PfkB-like enzymes. Quite surprisingly, the sugar-phosphate kinases and not adenosine kinases are those who show the extra strand. This

to be used in ischemic conditions (Ronimus & Morgan, 2004).

seems to be the one with the smallest substrate specificity in this group.

specialization.

Fig. 2. *β*-meander region of several members of the ribokinase superfamily such as Pfk-2 from *E. coli* (**A**), a fructose-1-phosphate kinase from *B. halodurans* (**B**), a putative phosphofructokinase from *S. aureus* (**C**), and the ADP-dependent glucokinase from *P. furiosus* (**D**). In red is shown the C-terminal extension thought to be needed for the circular permutation.

suggests that the ADP-dependent enzymes are most closely related to the other glycolytic enzymes present in the superfamily, which seems to be reasonable given the similarity of their substrates.

Indeed, this C-terminal reordering is quite interesting since this same region constitutes almost all the nucleotide binding site. However, while this permutation almost certainly alters the dynamics of the binding pocket, we are not sure if it will alter the specificity of the enzyme. Nevertheless, it requires empirical testing which is now being performed in our laboratory.

Interestingly, the *α* and *β* phosphates of ADP are accommodated in the binding site of the PfkC enzymes almost in the same way as the *β* and *γ* phosphates of ATP in the remaining members of the superfamily. This led Ito et al. (2001) to suggest that the bulky side chain of Y357 in the ADP-dependent glucokinase from *T. litoralis* which is located below the ribose moiety of ADP was pushing the nucleotide forward and then rendering an enzyme unable to use ATP. However, we indirectly demonstrated that this is not the case since for the ADP-dependent phosphofructokinase from *P. horikoshii* the presence of a significantly less bulky side chain (I340) does not produce an enzyme with ATP-dependent activity (Currie et al., 2009).

While in most of the members of the *Euryarchaeota* there are two ADP-dependent enzymes coded in their genomes, the archaeon *M. jannaschii* presents just one copy of these genes. Surprisingly, the enzyme is able to catalyze the transfer of the *β*-phosphate of ADP to either glucose or fructose-6-phosphate (Sakuraba et al., 2002). Based on this feature, it was proposed that this enzyme represent an ancestral state of the family, which later gave rise to the separate specificities through a gene duplication event (Sakuraba et al., 2002). However, this hypothesis had to wait six years to be tested (Merino & Guixé, 2008).

To test this hypothesis we used the Bayesian method of phylogenetic inference implemented in the MrBayes 3.1 software (Huelsenbeck & Ronquist, 2001; Ronquist & Huelsenbeck, 2003). Initially, a structural based sequence profile was built by means of a structural alignment of the ADP-dependent glucokinases from *T. litoralis*, *P. horikoshii*, and *P. furiosus* and the ADP-dependent phosphofructokinase from *P. horikoshii*. Later all the ADP-dependent kinases from archaeal source were aligned to this profile. After several rounds of alignment refinement the eukaryotic ADP-dependent enzymes were added. As they share only about 15 to 20% sequence identity with the archaeal versions the alignment was guided by secondary structure predictions.

Fig. 4. Dendrogram grouping the archaeal ADP-dependent genes according to their average difference in relative synonymous codon usage. Modified from (Merino & Guixé, 2008).

On the Specialization History of the ADP-Dependent Sugar Kinase Family 247

*methanosarcinales*. Indeed a similar scenario for the generation of paralogous genes has been

To test this hypothesis we analyzed the relative synonymous codon usage of the archaeal PfkC genes (McInerney, 1998). By this methodology, the frequency of any given codon in a gene is calculated relative to the frequency expected for an unbiased codon usage. Figure 4 shows that, in general, genes are grouped very close to their paralogous. If this is not the case, they are at least inside a group that contains closely related species. The only exception is the glucokinase from *Methanosaeta thermophila* which is located inside the *thermococcales* group (Figure 4). Indeed, when the codon usage of this gene is compared with the codon usage of the archaeal genomes, it seems to be more related to the genome of *T. litoralis* than to its own

While the data present above are not enough to prove the horizontal transfer hypothesis it still strongly suggests that this process has been involved in the evolution of the ADP-dependent sugar kinase family. It is important to stress out that if the event of horizontal gene transfer is ancient enough, then the accumulation of a sufficient number of mutations should have masked it. If this is our case then, to our knowledge, there is no sequence based technique to

Sakuraba et al. (2002) demonstrated that when the bifunctional enzyme was using fructose-6-phosphate as substrate glucose can act as a competitive inhibitor. They proposed that this was produced because both sugars bind to the same site. It is important to mention that competitive inhibition does not necessarily indicates that substrate and inhibitor have the same site, but in this case it is certainly the case. To take advantage of this fact we modeled the bifunctional enzyme and its interaction with both sugars. In this way, it is possible to gain as much information as possible about the structural determinants of the sugar specificity. Figure 5 shows the predicted interaction geometries for both substrates. For clarity just the residues ina5Å radius are shown. As it was inferred by Sakuraba et al. (2002) the interaction between the protein and both substrates are very similar. Indeed, just three of the residues seems to differ significantly in the way they interact with the sugars. For instance, while

proposed before (Gogarten et al., 2002).

genome (not shown).

prove the hypothesis.

Fig. 3. Phylogenetic tree of the archaeal part of the ADP-dependent sugar kinase family. The eukaryotic sequences were used as outgroup. For displaying clarity just the posterior probabilities of the most important nodes are shown in the figure. The node shown in bold letters correspond to the bifunctional enzyme from *M. jannaschii*. Modified from (Merino & Guixé, 2008).

Considering that in the Archaea most organisms present ADP-dependent glucokinases and phosphofructokinases while in the Eukarya just glucokinases can be found we thought that the divergence between both domains happened before the gene duplication event. In this light, the eukaryotic enzymes seem to be a reasonable choice for the outgroup. Nevertheless, for consistency, other members of the ribokinase superfamily were tested as outgroups as well.

Figure 3 shows the resulting evolutionary tree. It clearly shows four main clades: a group containing the glucokinases from the *methanosarcinales*, a group containing the glucokinases from the *thermococcales*, a group containing the phosphofructokinases from the *thermococcales*, and a group containing the phosphofructokinases from *methanococcales* and *methanosarcinales*. Surprisingly, irrespective of the outgroup used, the root of the tree appears between both glucokinases groups and not dividing both specificities as should be expected from the bifunctional ancestor hypothesis. This demonstrates that the bifunctional enzyme from *M. jannaschii* does not represent an ancestral state of the family. However, given its basal position inside the phosphofructokinases it still could represent a transitional form between both specificities.

Given the tree topology, it is possible to infer that the first separation between *thermococcales* and *methanosarcinales*, i.e. the glucokinases separation close to the root, was produced by a speciation event. Most likely, the gene duplication event is located close to the last common ancestor between both specificities. As this node is located after the speciation event, it is necessary to have an extra horizontal gene transfer event in order to explain the presence of ADP-dependent glucokinases and phosphofructokinases in the genomes of *thermococcales* and 10 Will-be-set-by-IN-TECH

Fig. 3. Phylogenetic tree of the archaeal part of the ADP-dependent sugar kinase family. The eukaryotic sequences were used as outgroup. For displaying clarity just the posterior probabilities of the most important nodes are shown in the figure. The node shown in bold letters correspond to the bifunctional enzyme from *M. jannaschii*. Modified from (Merino &

Considering that in the Archaea most organisms present ADP-dependent glucokinases and phosphofructokinases while in the Eukarya just glucokinases can be found we thought that the divergence between both domains happened before the gene duplication event. In this light, the eukaryotic enzymes seem to be a reasonable choice for the outgroup. Nevertheless, for consistency, other members of the ribokinase superfamily were tested as outgroups as

Figure 3 shows the resulting evolutionary tree. It clearly shows four main clades: a group containing the glucokinases from the *methanosarcinales*, a group containing the glucokinases from the *thermococcales*, a group containing the phosphofructokinases from the *thermococcales*, and a group containing the phosphofructokinases from *methanococcales* and *methanosarcinales*. Surprisingly, irrespective of the outgroup used, the root of the tree appears between both glucokinases groups and not dividing both specificities as should be expected from the bifunctional ancestor hypothesis. This demonstrates that the bifunctional enzyme from *M. jannaschii* does not represent an ancestral state of the family. However, given its basal position inside the phosphofructokinases it still could represent a transitional form between both

Given the tree topology, it is possible to infer that the first separation between *thermococcales* and *methanosarcinales*, i.e. the glucokinases separation close to the root, was produced by a speciation event. Most likely, the gene duplication event is located close to the last common ancestor between both specificities. As this node is located after the speciation event, it is necessary to have an extra horizontal gene transfer event in order to explain the presence of ADP-dependent glucokinases and phosphofructokinases in the genomes of *thermococcales* and

Guixé, 2008).

well.

specificities.

Fig. 4. Dendrogram grouping the archaeal ADP-dependent genes according to their average difference in relative synonymous codon usage. Modified from (Merino & Guixé, 2008).

*methanosarcinales*. Indeed a similar scenario for the generation of paralogous genes has been proposed before (Gogarten et al., 2002).

To test this hypothesis we analyzed the relative synonymous codon usage of the archaeal PfkC genes (McInerney, 1998). By this methodology, the frequency of any given codon in a gene is calculated relative to the frequency expected for an unbiased codon usage. Figure 4 shows that, in general, genes are grouped very close to their paralogous. If this is not the case, they are at least inside a group that contains closely related species. The only exception is the glucokinase from *Methanosaeta thermophila* which is located inside the *thermococcales* group (Figure 4). Indeed, when the codon usage of this gene is compared with the codon usage of the archaeal genomes, it seems to be more related to the genome of *T. litoralis* than to its own genome (not shown).

While the data present above are not enough to prove the horizontal transfer hypothesis it still strongly suggests that this process has been involved in the evolution of the ADP-dependent sugar kinase family. It is important to stress out that if the event of horizontal gene transfer is ancient enough, then the accumulation of a sufficient number of mutations should have masked it. If this is our case then, to our knowledge, there is no sequence based technique to prove the hypothesis.

Sakuraba et al. (2002) demonstrated that when the bifunctional enzyme was using fructose-6-phosphate as substrate glucose can act as a competitive inhibitor. They proposed that this was produced because both sugars bind to the same site. It is important to mention that competitive inhibition does not necessarily indicates that substrate and inhibitor have the same site, but in this case it is certainly the case. To take advantage of this fact we modeled the bifunctional enzyme and its interaction with both sugars. In this way, it is possible to gain as much information as possible about the structural determinants of the sugar specificity.

Figure 5 shows the predicted interaction geometries for both substrates. For clarity just the residues ina5Å radius are shown. As it was inferred by Sakuraba et al. (2002) the interaction between the protein and both substrates are very similar. Indeed, just three of the residues seems to differ significantly in the way they interact with the sugars. For instance, while

the sequences. By this nomenclature one can define a measurement of the conservation of

On the Specialization History of the ADP-Dependent Sugar Kinase Family 249

It is clear that if a residue is conserved from the root of the tree, then it will have a *ri* of 1. As it gets less and less conserved *ri* will be higher. To account for the sequence conservation within

> − 20 ∑ *aa*=1 *f g ia* log *f g ia*

*ia* stands for the frequency of appearance of amino acid *a* inside the group *g*. Figure 5 (bottom) shows the result of the ranking applied to the whole PfkC family, and both separated specificities. It is clear from the figure that most of the residues are conserved in the whole family. Interestingly, E82 is only conserved inside the glucokinase specificity, which is in good agreement with the role proposed above. Also, K170 and R203 are only conserved inside the phosphofructokinase specificity which makes them the inverse case of

Interestingly, N172 is conserved inside both specificities, but it is not in the whole family. The reason for this is that within phosphofructokinases this residue is strictly an asparagine while inside the glucokinases is always a histidine. This suggests that this residue is also related

Recently, we used a more elegant method known as explicit likelihood of subset covariation (ELCS) (Dekker et al., 2004) to explore the correlation between mutations to search for the

Figure 6 shows the group of side chains with the highest ELCS score. Surprisingly, the group contain a side chain that belongs to a highly conserved motif called NXXE which has been related with metal binding to the enzymes of the superfamily (Maj et al., 2002; Parducci et al., 2006; Rivas-Pardo et al., 2011). In fact, we have demonstrated that this motif is related to the binding of the catalytic and regulatory metals in the ADP-dependent sugar kinase family (To be published). Also, the first group found by the ELCS method contains some residues that we proposed before as specificity related. The role of the R48/D654, R65/S76, P73/F90 mutations is not clear, but seem to be related to the dynamics of the small domain. K158/C174 (equivalent to K170 in the bifunctional enzyme), N160/H176 (equivalent to N172 in the bifunctional enzyme), and R191/D203 (equivalent to R203 in the bifunctional enzyme) are clearly interacting with the sugars. Interestingly, when the position R191/D203 presents an arginine, this positive side chain coordinates the phosphate group present in the fructose-6-phosphate molecule. On the other hand, when it presents an aspartic acid, this side chain interacts with the histidine in the N160/H176 position, allowing the histidine to be correctly positioned to make an h-bond with the O2 hydroxyl group of glucose. Curiously, the position equivalent to E82 from the bifunctional enzyme does not appear to be correlated with other positions by the ELCS method. However this could be due to the small amount of

<sup>4</sup> We use the numbering of *Ph*PFK/*Pf*GK for the correlated mutations. See Figure 6 for clarity.

each group, the *ri* value was weighted by sequence entropy given the expression

1 *n*

*n* ∑ *g*=1

*N*−1 ∑ *n*=1

with sugar specificity, but the reason is not as clear as the above examples.

0 if position i conserved within each group g

1 otherwise (1)

(2)

each position in the alignment *i* where

*ri* = 1 +

where *f*

*g*

the E82 residue.

structural specificity determinants.

sequence information used for the analysis.

*N*−1 ∑ *n*=1

*ρ<sup>i</sup>* = 1 +

Fig. 5. **Left.** Glucose docked to the bifunctional enzyme. **Right.** Fructose-6-phosphate docked to the bifunctional enzyme. **Bottom.** Results of the real value evolutionary trace analysis for all the residues within 5 Å from the ligands. The results for the glucokinase specificity are shown in black, those for the phosphofructokinase specificity in red, and those for the whole family in green. Modified from (Merino & Guixé, 2008).

E82 makes a hydrogen bond with the hydroxyl located at C2 in glucose it does not seem to interact in any specific way with fructose-6-phosphate. Indeed, this side chain has been proposed by other authors as key for the glucokinase specificity (Ito et al., 2003; Sakuraba et al., 2002). On the other hand, R203 is making a close salt bridge with the phosphate moiety of fructose-6-phosphate while it does not interact with glucose. Although K170 is not in the 5 Å radius we had strong evidence that, as in the R203 case, this side chain was also involved in the phosphate binding (see below).

To quantify the conservation degree of the residues inside the sugar binding site we used a tree-based residue ranking system called real value evolutionary trace (Mihalek et al., 2004). Briefly, the method ranks the residues as follows:

First, let us consider a rooted evolutionary tree with *N* leaves (sequences). If we number the nodes in the tree starting with the root being 1 then, using as example Figure 3, the node number 2 should be the one with 0.98 posterior probability, and so on. Using this method it is possible to number *N* − 1 nodes. Each node defines some groups *g* of sequences. The root node of course creates a group with all of them. Node number 2 creates a group that contains the ADP-dependent glucokinases from *methanosarcinales* and other with the rest of 12 Will-be-set-by-IN-TECH

Fig. 5. **Left.** Glucose docked to the bifunctional enzyme. **Right.** Fructose-6-phosphate docked to the bifunctional enzyme. **Bottom.** Results of the real value evolutionary trace analysis for all the residues within 5 Å from the ligands. The results for the glucokinase specificity are shown in black, those for the phosphofructokinase specificity in red, and those

E82 makes a hydrogen bond with the hydroxyl located at C2 in glucose it does not seem to interact in any specific way with fructose-6-phosphate. Indeed, this side chain has been proposed by other authors as key for the glucokinase specificity (Ito et al., 2003; Sakuraba et al., 2002). On the other hand, R203 is making a close salt bridge with the phosphate moiety of fructose-6-phosphate while it does not interact with glucose. Although K170 is not in the 5 Å radius we had strong evidence that, as in the R203 case, this side chain was also involved in

To quantify the conservation degree of the residues inside the sugar binding site we used a tree-based residue ranking system called real value evolutionary trace (Mihalek et al., 2004).

First, let us consider a rooted evolutionary tree with *N* leaves (sequences). If we number the nodes in the tree starting with the root being 1 then, using as example Figure 3, the node number 2 should be the one with 0.98 posterior probability, and so on. Using this method it is possible to number *N* − 1 nodes. Each node defines some groups *g* of sequences. The root node of course creates a group with all of them. Node number 2 creates a group that contains the ADP-dependent glucokinases from *methanosarcinales* and other with the rest of

for the whole family in green. Modified from (Merino & Guixé, 2008).

the phosphate binding (see below).

Briefly, the method ranks the residues as follows:

the sequences. By this nomenclature one can define a measurement of the conservation of each position in the alignment *i* where

$$r\_i = 1 + \sum\_{n=1}^{N-1} \begin{cases} 0 & \text{if position i conserved within each group g} \\ 1 & \text{otherwise} \end{cases} \tag{1}$$

It is clear that if a residue is conserved from the root of the tree, then it will have a *ri* of 1. As it gets less and less conserved *ri* will be higher. To account for the sequence conservation within each group, the *ri* value was weighted by sequence entropy given the expression

$$\rho\_i = 1 + \sum\_{n=1}^{N-1} \frac{1}{n} \sum\_{g=1}^{n} \left( - \sum\_{aa=1}^{20} f\_{ia}^{\mathcal{S}} \log f\_{ia}^{\mathcal{S}} \right) \tag{2}$$

where *f g ia* stands for the frequency of appearance of amino acid *a* inside the group *g*.

Figure 5 (bottom) shows the result of the ranking applied to the whole PfkC family, and both separated specificities. It is clear from the figure that most of the residues are conserved in the whole family. Interestingly, E82 is only conserved inside the glucokinase specificity, which is in good agreement with the role proposed above. Also, K170 and R203 are only conserved inside the phosphofructokinase specificity which makes them the inverse case of the E82 residue.

Interestingly, N172 is conserved inside both specificities, but it is not in the whole family. The reason for this is that within phosphofructokinases this residue is strictly an asparagine while inside the glucokinases is always a histidine. This suggests that this residue is also related with sugar specificity, but the reason is not as clear as the above examples.

Recently, we used a more elegant method known as explicit likelihood of subset covariation (ELCS) (Dekker et al., 2004) to explore the correlation between mutations to search for the structural specificity determinants.

Figure 6 shows the group of side chains with the highest ELCS score. Surprisingly, the group contain a side chain that belongs to a highly conserved motif called NXXE which has been related with metal binding to the enzymes of the superfamily (Maj et al., 2002; Parducci et al., 2006; Rivas-Pardo et al., 2011). In fact, we have demonstrated that this motif is related to the binding of the catalytic and regulatory metals in the ADP-dependent sugar kinase family (To be published). Also, the first group found by the ELCS method contains some residues that we proposed before as specificity related. The role of the R48/D654, R65/S76, P73/F90 mutations is not clear, but seem to be related to the dynamics of the small domain. K158/C174 (equivalent to K170 in the bifunctional enzyme), N160/H176 (equivalent to N172 in the bifunctional enzyme), and R191/D203 (equivalent to R203 in the bifunctional enzyme) are clearly interacting with the sugars. Interestingly, when the position R191/D203 presents an arginine, this positive side chain coordinates the phosphate group present in the fructose-6-phosphate molecule. On the other hand, when it presents an aspartic acid, this side chain interacts with the histidine in the N160/H176 position, allowing the histidine to be correctly positioned to make an h-bond with the O2 hydroxyl group of glucose. Curiously, the position equivalent to E82 from the bifunctional enzyme does not appear to be correlated with other positions by the ELCS method. However this could be due to the small amount of sequence information used for the analysis.

<sup>4</sup> We use the numbering of *Ph*PFK/*Pf*GK for the correlated mutations. See Figure 6 for clarity.

**Enzyme** *kcat KM* F6P *kcat*/*KM* F6P

On the Specialization History of the ADP-Dependent Sugar Kinase Family 251

**Wild Type** 45.5 <sup>±</sup> 4.0 15.2 <sup>±</sup> 2.5 2.98 · <sup>10</sup><sup>6</sup> **A71E** 39.3 <sup>±</sup> 9.9 22.2 <sup>±</sup> 2.6 1.77 · <sup>10</sup><sup>6</sup> **K158A** 41.0 <sup>±</sup> 6.7 6500 <sup>±</sup> 1300 6.30 · <sup>10</sup><sup>3</sup> **N160A** <sup>151</sup> <sup>±</sup> 16 415 <sup>±</sup> 13 3.65 · <sup>10</sup><sup>5</sup> **N160Q** 14.7 <sup>±</sup> 0.6 6300 <sup>±</sup> 720 2.33 · <sup>10</sup><sup>3</sup> **R191A** 27.4 <sup>±</sup> 1.5 254.4 <sup>±</sup> 26.1 1.07 · <sup>10</sup><sup>5</sup> **R191E** 42.5 <sup>±</sup> 1 4870 <sup>±</sup> 170 8.73 · <sup>10</sup><sup>3</sup>

Table 2. Kinetic parameters of wild type and mutant versions of the ADP-dependent phosphofructokinase from *P. horikoshii*. Modified from (Currie et al., 2009). All experiments

these mutations produce an enzyme with glucokinase activity.

produces a much stronger effect than A71E.

imposes a strain in the transition state, which results in a decrease of *kcat*. However, none of

The A71E mutation does not affect the catalytic constants nor *KM* for fructose-6-phosphate. Surprisingly, it produces an enzymes that now can catalyze the transfer of the *β*-phosphate of ADP to glucose with a *kcat* of 2.7 <sup>±</sup> 0.05 s−<sup>1</sup> and a *KM* of 3.95 <sup>±</sup> 0.2 mM5. Also we recently have produced a N160H mutant. It dramatically increases the *KM* value for fructose-6-phosphate to 6.3 ± 0.72 mM and decreases the *kcat* almost four-fold (Table 2). As in the A71E case, it also produces a bifunctional enzyme that can use glucose as substrate. However, for this mutant no clear saturation is seen even for 25 mM glucose. Based on a Lineweaver-Burk plot, it is possible to estimate a *kcat* of 2.42 s−<sup>1</sup> and a *KM* value of 25.3 mM. Clearly, this mutation

The last two mutations are key to understanding the specialization problem since they not only enable the phosphofructokinase from *P. horikoshii* to use glucose as substrate. Competition experiments with this enzyme have shown that glucose does not bind to the wild type version, which demonstrates that the mutations somehow unblock the binding site for the binding of glucose. Curiously, both mutations points to the interaction between the protein and the hydroxyl group at C2 of glucose. This suggests that the specificity determinants are not evenly distributed amongst the binding site, but rather concentrated in hot-spots. In this

Considering that the glycolysis of *M. jannaschii* is functional with just one enzyme in charge of the phosphorylation of glucose and fructose-6-phosphate it is not clear, at a first glance, why two genes were select by nature in the other members of the *Euryarchaeota*. As it was mentioned above, glucokinases are on the top of several pathways and hence the modification of their activity affects a big part of the metabolism. Indeed, this enzyme generally have a great control of the carbon flux. On the other hand, phosphofructokinases seem to be closely related with the balance between glycolysis and gluconeogenesis. In the archaeon *P. furiosus* it has been shown that the switching between these two metabolic pathways is controlled at the expression level (Schut et al., 2003). When the ADP-dependent phosphofructokinase is expressed the fructose-1,6-bisPase is repressed and *vice versa*. Of course, just shutting the

<sup>5</sup> Glucokinase experiments were performed at 40 °Cgiven the instability of the auxiliar enzyme used.

light, in order to revert the specificities, just a couple of mutations are needed.

**5. Are two ADP-dependent kinases better than one?**

were performed at 50 °C.

*s*−<sup>1</sup> *μM M*−1*s*−<sup>1</sup>

Fig. 6. First cluster of correlated mutations in the PfkC family identified by the ELCS method. **A.** Crystal structure of the glucokinase from *P. furiosus*. Glucose and AMP are shown. **B.** Structural model for the ternary complex between the phosphofructokinase from *P. horikoshii*, ADP and fructose-6-phosphate. The coordinates were derived from the molecular dynamics simulation performed in (Currie et al., 2009)

We have tested the predictions made by the evolutionary trace and the ELCS methods by means of mutagenesis using the ADP-dependent phosphofructokinase from *horikoshii* (Currie et al., 2009) as a model.

Table 2 shows the effect of each mutation on the kinetic parameters of fructose-6-phosphate and MgADP. As it can be predicted, either the mutations R191A, R191E, or K158A produce a high increase in the *KM* value for fructose-6-phosphate with a little effect on *kcat* or *KM* from MgADP.

On the other hand, the N160A increase three-fold *kcat* with a concomitant high increase in the *KM* value for fructose-6-phosphate. The reason for the increase in activity is not clear, but it suggests that while this interaction increases the affinity of the protein for the sugar it 14 Will-be-set-by-IN-TECH

Fig. 6. First cluster of correlated mutations in the PfkC family identified by the ELCS method. **A.** Crystal structure of the glucokinase from *P. furiosus*. Glucose and AMP are shown. **B.** Structural model for the ternary complex between the phosphofructokinase from *P. horikoshii*, ADP and fructose-6-phosphate. The coordinates were derived from the molecular dynamics

We have tested the predictions made by the evolutionary trace and the ELCS methods by means of mutagenesis using the ADP-dependent phosphofructokinase from *horikoshii* (Currie

Table 2 shows the effect of each mutation on the kinetic parameters of fructose-6-phosphate and MgADP. As it can be predicted, either the mutations R191A, R191E, or K158A produce a high increase in the *KM* value for fructose-6-phosphate with a little effect on *kcat* or *KM* from

On the other hand, the N160A increase three-fold *kcat* with a concomitant high increase in the *KM* value for fructose-6-phosphate. The reason for the increase in activity is not clear, but it suggests that while this interaction increases the affinity of the protein for the sugar it

simulation performed in (Currie et al., 2009)

et al., 2009) as a model.

MgADP.


Table 2. Kinetic parameters of wild type and mutant versions of the ADP-dependent phosphofructokinase from *P. horikoshii*. Modified from (Currie et al., 2009). All experiments were performed at 50 °C.

imposes a strain in the transition state, which results in a decrease of *kcat*. However, none of these mutations produce an enzyme with glucokinase activity.

The A71E mutation does not affect the catalytic constants nor *KM* for fructose-6-phosphate. Surprisingly, it produces an enzymes that now can catalyze the transfer of the *β*-phosphate of ADP to glucose with a *kcat* of 2.7 <sup>±</sup> 0.05 s−<sup>1</sup> and a *KM* of 3.95 <sup>±</sup> 0.2 mM5. Also we recently have produced a N160H mutant. It dramatically increases the *KM* value for fructose-6-phosphate to 6.3 ± 0.72 mM and decreases the *kcat* almost four-fold (Table 2). As in the A71E case, it also produces a bifunctional enzyme that can use glucose as substrate. However, for this mutant no clear saturation is seen even for 25 mM glucose. Based on a Lineweaver-Burk plot, it is possible to estimate a *kcat* of 2.42 s−<sup>1</sup> and a *KM* value of 25.3 mM. Clearly, this mutation produces a much stronger effect than A71E.

The last two mutations are key to understanding the specialization problem since they not only enable the phosphofructokinase from *P. horikoshii* to use glucose as substrate. Competition experiments with this enzyme have shown that glucose does not bind to the wild type version, which demonstrates that the mutations somehow unblock the binding site for the binding of glucose. Curiously, both mutations points to the interaction between the protein and the hydroxyl group at C2 of glucose. This suggests that the specificity determinants are not evenly distributed amongst the binding site, but rather concentrated in hot-spots. In this light, in order to revert the specificities, just a couple of mutations are needed.
