**2. Glucose degradation in the members of the Archaea domain**

Nowadays, organisms can be classified in three principal domains of life: Bacteria, Eukarya, and Archaea (Woese & Fox, 1977; Woese et al., 1990). Interestingly, although there are some known archaea that growth in mesophilic conditions, most of them are extremophiles. Two main phylogenetic groups can be found inside Archaea: *Euryarchaeota* and *Crenarchaeota* (Allers & Mevarech, 2005). Also, recently based in environmental samples, two more groups called *Korarchaeota* and *Nanoarchaeota* has been proposed (Allers & Mevarech, 2005).

Considering the potential for technological applications, most of the attention has been directed to study those archaea able to grow in extreme temperature conditions (known as thermophiles or hyperthermophiles), extremely high salinities (known as halophiles), extremely low pH (known as acidophiles), and most commonly a combination of them. From the *Crenarchaeota*, the *Sulfolubus* and *Aeropyrum* genera receive lots of attention since both are aerobic thermophilic organisms. In the *Euryarchaeota*, the methanogenic organisms are

<sup>1</sup> Sometimes phosphofructokinase-1 and phosphofructokinase-2 from *E. coli* are called the major and minor enzyme respectively.

2 Will-be-set-by-IN-TECH

convergent evolution the ribokinase family is the only one that contains enzymes able to

In particular, glucokinases have been extensively studied since they are in the top on many metabolic pathways, and hence some sort of metabolic hub, and also they are responsible for most of the flux control in glycolysis (Torres et al., 1988). On the other hand, while in normal conditions the phosphofructokinase from rat liver shows almost no control over the glycolytic flux, in starving conditions it becomes almost as important as glucokinase (Torres et al., 1988) which suggests that they become key in gluconeogenic conditions. Moreover, phosphofructokinases are extensively studied because they are highly regulated enzymes. In this light, phosphofructokinases have also been recognized as one of the key enzymes of

From the ribokinase family, one of the most studied enzyme is the phosphofructokinase-2 from *Escherichia coli* which is often referred to as a member of the PfkB subfamily (Cabrera et al., 2010). It is possible to find a second phosphofructokinase, called phosphofructokinase-11, in the genome of *E. coli* which belongs to another family called PfkA. In this family, the most extensively studied members are the phosphofructokinase-1 from *E. coli* and the phosphofructokinase from *Bacillus stearothermophilus* (Evans et al., 1981; Schirmer & Evans, 1990). Initially it was thought that both PfkB and PfkA groups had a common origin (Wu et al., 1991), but now we know that they are two non-homologous families. Interestingly, while not phylogenetically related, both phosphofructokinase-1 and phosphofructokinase-2 from *E. coli* show strong inhibition at high concentrations of their substrate MgATP (Atkinson & Walton, 1965; Kotlarz & Buc, 1981), which suggests that this is a key requirement of this metabolic step. This reinforces the idea that these enzymes are strongly related to the balance

Indeed, it has been already demonstrated that the substrate inhibition is needed for the avoidance of a futile cycle of phosphorylation/dephosphorylation of fructose-6-phosphate/fructose-1,6-bisP which will ultimately lead to a net hydrolysis of ATP (Torres et al., 1997). Interestingly, some microorganisms present phosphofructokinases (also members of the PfkA family) which use polyphosphates as a source of phosphate and

Nowadays, organisms can be classified in three principal domains of life: Bacteria, Eukarya, and Archaea (Woese & Fox, 1977; Woese et al., 1990). Interestingly, although there are some known archaea that growth in mesophilic conditions, most of them are extremophiles. Two main phylogenetic groups can be found inside Archaea: *Euryarchaeota* and *Crenarchaeota* (Allers & Mevarech, 2005). Also, recently based in environmental samples, two more groups

Considering the potential for technological applications, most of the attention has been directed to study those archaea able to grow in extreme temperature conditions (known as thermophiles or hyperthermophiles), extremely high salinities (known as halophiles), extremely low pH (known as acidophiles), and most commonly a combination of them. From the *Crenarchaeota*, the *Sulfolubus* and *Aeropyrum* genera receive lots of attention since both are aerobic thermophilic organisms. In the *Euryarchaeota*, the methanogenic organisms are

<sup>1</sup> Sometimes phosphofructokinase-1 and phosphofructokinase-2 from *E. coli* are called the major and

called *Korarchaeota* and *Nanoarchaeota* has been proposed (Allers & Mevarech, 2005).

phosphorylate sugar phosphates.

between glycolysis and gluconeogenesis.

minor enzyme respectively.

hence they do not appear to be regulated (Peng & Mansour, 1992).

**2. Glucose degradation in the members of the Archaea domain**

glycolysis.

intensively studied. One of the most studied organism here is *Methanocaldococcus jannaschii*<sup>2</sup> (Jones et al., 1983) since it is one of the few organisms known to produce methane at extreme temperatures. Besides it, the *Halobacterium* and *Haloferax* genera are used as models for halophilic organisms while organisms from the *Thermococcus* and *Pyrococcus* genera are used as models of hyperthermophilic organisms. Here, by far, the most studied organism is *Pyrococcus furiosus*.

In these organisms, sugar degradation proceeds either through the Entner-Doudoroff or the Embden-Meyerhof pathway (Verhees et al., 2003). For instance, members of the *Thermoproteus*, *Thermoplasma*, and *Sulfolobus* genera degrade glucose through a modified version of the Entner-Doudoroff pathway where sugars are phosphorylated only at the 2-keto-3-deoxygluconate or glycerate level. While the former version is still able to produce one ATP molecule per glucose the later does not produce any ATP (for a review see Verhees et al. (2003)). On the other hand, up until the early 90s it was thought that some archaea of the *Euryarchaeota* used a modified unphosphorylated version of the Entner-Doudoroff pathway to degrade glucose (Mukund & Adams, 1991) which was called pyroglycolysis. However, in 1994 it was possible to demonstrate that, in fact, the flux to pyruvate proceeds through a highly modified version of the Embden-Meyerhof pathway (Kengen et al., 1994).Here, although all the intermediates are present, only four of the ten textbook enzymes are conserved (Verhees et al., 2003). In this pathway, the redox reactions are carried out by ferredoxin containing enzymes which latter use the electrons to reduce protons (producing hydrogen) to couple the proton motive force to ATP synthesis by means of a membrane bound hydrogenase enzyme (Sapra et al., 2003). Between the oxido-reductases present in these organisms, perhaps the most interesting is the glyceraldehyde-3-phosphate oxido-reductase. This enzyme is responsible for the single-step conversion of glyceraldehyde-3-phosphate to 3-phosphoglycerate in a phosphate independent manner (Mukund & Adams, 1995). Besides redox reactions, one of the most striking modifications seen in this version of the Embden-Meyerhof pathway is that the phosphorylation of glucose and fructose-6-phosphate is carried out by enzymes that use ADP and not ATP or polyphosphates as the phosphoryl donor (Kengen et al., 1994). These ADP-dependent enzymes are, in fact, homologous to each other and they show no sequence identity over the noise level with any of the hitherto known ATP, or polyphosphate dependent kinases (Tuininga et al., 1999). For this reason it was initially proposed that they belong to a new protein family called PfkC.

Given that these ADP-dependent enzymes were initially discovered in the hyperthermophilic archaeon *P. furiosus* (Kengen et al., 1994), it has been argued in the literature that the main reason for this "ADP-dependence" is the fact that ADP has a higher thermostability than ATP and also that both nucleotides are essentially equivalent since both have a similar standard ΔG of hydrolysis. However, these arguments are highly misleading since, (i) as metabolism is a non-equilibrium process the free energy change upon phosphoryl transfers depends on the concentration of the metabolites, (ii) several ATP-dependent enzymes can be found in hyperthermophilic organisms, (iii) the ADP-dependent enzymes are also present in mesophilic organisms (see below), and (iv) the half life of ATP at high temperatures is higher than some other metabolic intermediates present in the Embden-Meyerhof pathway (Dörr et al., 2003).

The adaptive value of the appearance of the ADP-dependent enzymes has been a matter of great debate. As we have argued before (Guixé & Merino, 2009), it is most likely unrelated

<sup>2</sup> This organism was initially named *Methanococcus jannaschii* and was later renamed as *Methanocaldococcus jannaschii* to aknowledge the fact that those organisms from the *Methanococcus* genus are not thermophilic.

**PDB Code Organism Function**

1UA4 *Pyrococcus furiosus* Glucokinase 1GC5 *Thermococcus litoralis* Glucokinase 1L2L *Pyrococcus horikoshii* Glucokinase 1U2X *Pyrococcus horikoshii* Fructose-6-phosphate kinase

1JXH *Salmonella typhimurium* 4-amino-5-hydroxymethyl-2-methylpyrimidine

On the Specialization History of the ADP-Dependent Sugar Kinase Family 241

1EKQ *Bacillus subtilis* Hydroxyethylthiazole kinase 1V8A *Pyrococcus horikoshii* Hydroxyethylthiazole kinase 1UB0 *Thermus thermophilus* Phosphomethylpyrimidine kinase 1LHP *Ovis Aries* Pyridoxal kinase 1TD2 *Escherichia coli* Pyridoxal kinase (PdxY) 2DDM *Escherichia coli* Pyridoxal kinase (PdxK) 2F7K *Homo sapiens* Pyridoxal kinase 2I5B *Bacillus subtilis* Pyridoxal kinase 1KYH *Bacillus subtilis* Unknown function 2AX3 *Thermotoga maritima* Unknown function 2R3B *Enterococcus faecalis* Unknown function

2AFB *Thermotoga maritima* 2-keto-3-deoxygluconate kinase 2VAR *Sulfolobus solfataricus* 2-keto-3-deoxygluconate kinase 2DCN *Sulfolobus tokodaii* 2-keto-3-deoxygluconate kinase 1V1A *Thermus thermophilus* 2-keto-3-deoxygluconate kinase 2QCV *Bacillus halodurans* 5-dehydro-2-deoxygluconate kinase 1TZ6 *Salmonella enterica* Aminoimidazol riboside kinase 1BX4 *Homo sapiens* Adenosine kinase 1LII *Toxoplasma gondii* Adenosine kinase

phosphate kinase

Adenosine kinase

Nucleoside kinase

Fructokinase

Unknown function

Unknown function

Unknown function

**PfkC like**

> **Vitamin kinase like**

> **PfkB like**

> > 2PKN *Mycobacterium*

2QHP *Bacteroides*

3BF5 *Thermoplasma*

2NWH *Agrobacterium*

2RBC *Agrobacterium*

2C49 *Methanocaldococcus*

*tuberculosis*

*jannaschii*

*thetaiotaomicron*

*acidophilum*

*tumefaciens*

*tumefaciens*

1RKD *Escherichia coli* Ribokinase 1VM7 *Thermotoga maritima* Ribokinase 2FV7 *Homo sapiens* Ribokinase

2HW1 *Homo sapiens* Ketohexokinase 2ABQ *Bacillus halodurans* Fructose-1-phosphate kinase 2F02 *Enterococcus Faecalis* Tagatose-6-phosphate kinase 2JG1 *Staphylococcus aureus* Tagatose-6-phosphate kinase 3CQD *Escherichia coli* Fructose-6-phosphate kinase

1VK4 *Thermotoga maritima* Unknown function

2AJR *Thermotoga maritima* Unknown function 2JG5 *Staphylococcus aureus* Unknown function

Table 1. Crystal structures of the ribokinase superfamily found in the PDB database.

to the temperature at which most of the *thermococcales* grow. The most intriguing question arising here is what happens with the adenylate charge inside these archaea. As they present a glyceraldehyde-3-phosphate ferredoxin oxidoreductase (Mukund & Adams, 1995) which produces 3-phosphoglycerate in a single step that does not produce ATP and also considering both glucose and fructose-6-phosphate are phosphorylated using ADP as phosphoryl donor, it was though that this modified glycolysis had a net ATP production of zero. However, it has been demonstrated by Sakuraba et al. (2004) that the pyruvate kinase from *P. furiosus* catalyze the synthesis of ATP from AMP, phosphoenolpyruvate, and Pi. In this way, the pathway from glucose to pyruvate produces two ATP molecules from every glucose molecule degraded. Up until now, we have three protein families that contain phosphofructokinases: PfkA, the ribokinase family (which contains the PfkB-like kinases), and PfkC. While the first PfkA crystal structure (The phosphofructokinase from *B. stearothermophilus*) was solved in the 80s (Evans et al., 1981), the first PfkB-like crystal structure (the ribokinase from *E. coli*) (Sigrell et al., 1998) was solved in the late 90s, and the first PfkC crystal structure (The ADP-dependent glucokinase from *Thermococcus litoralis*) just in 2001 (Ito et al., 2001). As all of them were discovered before the middle 90s most of the phylogenetic analysis were performed only on the basis of sequence data. Quite surprisingly, despite the extremely low sequence identity, the PfkC family can be structurally classified as another member of the ribokinase group (Ito et al., 2001) which is now known as the ribokinase superfamily.

### **3. The ribokinase superfamily**

Structurally, the PfkC and PfkB-like groups contain enzymes that present two domains. The large domain, which contains the core ribokinase-like fold, is an *αβα* structure where a central *β*-sheet mainly composed of parallel strands is flanked by *α*-helices on both sides. Also, they present a smaller *β* domain which in general is used as a scaffold for dimerization (Sigrell et al., 1998). However, some of the enzymes are monomers. In this case, the hydrophobic core of the small domain is formed by the insertion of some *α*-helices (Ito et al., 2001; Mathews et al., 1998). While not all the known PfkC enzymes are monomers (Jeong et al., 2003; Koga et al., 2000; Tuininga et al., 1999) all of them present those *α*-helices in the small domain. Interestingly, the way in which many of them form multimers is not known, but seems to be highly enzyme specific.

The active site of these enzymes is located in a cleft between both domains (Ito et al., 2001; Sigrell et al., 1998). For some members of the ribokinase family, it has been shown by means of x-ray crystallography that the relative orientation of the domains can be modified by the binding of the phosphoryl acceptor ligand (Schumacher et al., 2000; Sigrell et al., 1999) which has been suggested as a key step in the catalytic mechanism of these enzymes. In the PfkC case, a similar scenario has been suggested (Ito et al., 2003; Tsuge et al., 2002). Here, although the evidence is also crystallographic, it is indirect because the only enzyme crystallized in the apo form and complexed with a substrate is the ADP-dependent phosphofructokinase from *Pyrococcus horikoshii* (Currie et al., 2009) which does not show any domain movement. However, it was not possible to obtain a crystalline form of the enzyme in the presence of fructose-6-phosphate which could be the key component to induce the domain closing. In fact, it has been previously shown by us based on molecular modeling that the open conformation of these enzymes is most likely inactive (Merino & Guixé, 2008).

Table 1 shows most of the members of the ribokinase superfamily with known crystallographic structures. Based on this structural data it is possible to add other specificities 4 Will-be-set-by-IN-TECH

to the temperature at which most of the *thermococcales* grow. The most intriguing question arising here is what happens with the adenylate charge inside these archaea. As they present a glyceraldehyde-3-phosphate ferredoxin oxidoreductase (Mukund & Adams, 1995) which produces 3-phosphoglycerate in a single step that does not produce ATP and also considering both glucose and fructose-6-phosphate are phosphorylated using ADP as phosphoryl donor, it was though that this modified glycolysis had a net ATP production of zero. However, it has been demonstrated by Sakuraba et al. (2004) that the pyruvate kinase from *P. furiosus* catalyze the synthesis of ATP from AMP, phosphoenolpyruvate, and Pi. In this way, the pathway from glucose to pyruvate produces two ATP molecules from every glucose molecule degraded. Up until now, we have three protein families that contain phosphofructokinases: PfkA, the ribokinase family (which contains the PfkB-like kinases), and PfkC. While the first PfkA crystal structure (The phosphofructokinase from *B. stearothermophilus*) was solved in the 80s (Evans et al., 1981), the first PfkB-like crystal structure (the ribokinase from *E. coli*) (Sigrell et al., 1998) was solved in the late 90s, and the first PfkC crystal structure (The ADP-dependent glucokinase from *Thermococcus litoralis*) just in 2001 (Ito et al., 2001). As all of them were discovered before the middle 90s most of the phylogenetic analysis were performed only on the basis of sequence data. Quite surprisingly, despite the extremely low sequence identity, the PfkC family can be structurally classified as another member of the ribokinase group (Ito

Structurally, the PfkC and PfkB-like groups contain enzymes that present two domains. The large domain, which contains the core ribokinase-like fold, is an *αβα* structure where a central *β*-sheet mainly composed of parallel strands is flanked by *α*-helices on both sides. Also, they present a smaller *β* domain which in general is used as a scaffold for dimerization (Sigrell et al., 1998). However, some of the enzymes are monomers. In this case, the hydrophobic core of the small domain is formed by the insertion of some *α*-helices (Ito et al., 2001; Mathews et al., 1998). While not all the known PfkC enzymes are monomers (Jeong et al., 2003; Koga et al., 2000; Tuininga et al., 1999) all of them present those *α*-helices in the small domain. Interestingly, the way in which many of them form multimers is not known, but seems to be

The active site of these enzymes is located in a cleft between both domains (Ito et al., 2001; Sigrell et al., 1998). For some members of the ribokinase family, it has been shown by means of x-ray crystallography that the relative orientation of the domains can be modified by the binding of the phosphoryl acceptor ligand (Schumacher et al., 2000; Sigrell et al., 1999) which has been suggested as a key step in the catalytic mechanism of these enzymes. In the PfkC case, a similar scenario has been suggested (Ito et al., 2003; Tsuge et al., 2002). Here, although the evidence is also crystallographic, it is indirect because the only enzyme crystallized in the apo form and complexed with a substrate is the ADP-dependent phosphofructokinase from *Pyrococcus horikoshii* (Currie et al., 2009) which does not show any domain movement. However, it was not possible to obtain a crystalline form of the enzyme in the presence of fructose-6-phosphate which could be the key component to induce the domain closing. In fact, it has been previously shown by us based on molecular modeling that the open conformation

Table 1 shows most of the members of the ribokinase superfamily with known crystallographic structures. Based on this structural data it is possible to add other specificities

et al., 2001) which is now known as the ribokinase superfamily.

of these enzymes is most likely inactive (Merino & Guixé, 2008).

**3. The ribokinase superfamily**

highly enzyme specific.


Table 1. Crystal structures of the ribokinase superfamily found in the PDB database.

to the superfamily, such as adenosine kinase<sup>3</sup> (Mathews et al., 1998), 2-keto-3-deoxygluconate kinase (Ohshima et al., 2004), and aminoimidazole riboside kinase (Zhang et al., 2004). Beyond the sugar containing molecules, three dimensional structure comparison showed that kinases like 4-methyl-5-*β*-hydroxyethylthiazole kinase (Campobasso et al., 2000), pyridoxal kinase (Li et al., 2002), and 4-amino-5-hydroxymethyl-2-methylpyrimidine phosphate kinase (Cheng et al., 2002) are also members of the ribokinase superfamily. Interestingly, these enzymes lack

On the Specialization History of the ADP-Dependent Sugar Kinase Family 243

Already based on substrate specificities three major branches can be recognized (Figure 1). One of them contains those enzymes that catalyze the transfer of the *γ*-phosphate of ATP to molecules such as pyridoxal, or pyrimidine derivatives which we know as vitamin kinase like branch. The second contains all the enzymes that catalyze the transfer of the *γ*-phosphate of ATP to sugar containing molecules, such as fructose-6-phosphate, adenosine, aminoimidazole riboside, etc. We know this as the PfkB like branch. The last of them contains the enzymes that catalyze the transfer of the *β*-phosphate of ADP to glucose and fructose-6-phosphate which, as was mentioned before is known as PfkC family or ADP-dependent sugar kinase family. Based mainly on the presence of the small domain and the monomer complexity Zhang et al. (2004) proposed that the most ancient activity of the superfamily should be that catalyzed by the simplest enzyme which is 4-methyl-5-*β*-hydroxyethylthiazole kinase. In that way, they propose that the increase of complexity in the monomers fold indicates a newer enzyme. By this hypothesis, the ADP-dependent enzymes and the monomeric adenosine kinases should be the newest acquisitions of the superfamily. However, this hypothesis was never tested. Nevertheless, although it could capture the essence of the evolutionary history of this group, considering the linearity of the hypothesis, it is rather unlikely that the true history of the

As it can be inferred from Figure 1 and Table 1 the ribokinase superfamily is an excellent example of how gene duplication has been used several times by nature to produce new specificities. This process has been recognized before as one of the most important steps in the creation of new protein functions (Chothia et al., 2003). Indeed, most of the proteins present inside a genome belong to a few protein families or a combination of them (see for example Chothia et al. (2003)). This degeneracy causes that the number of protein families represented

Just as an example, a simple PSI-BLAST search on the genome of *E. coli* using the phosphofructokinase-2 as query finds 28 non-redundant proteins including: 6-phosphofructokinase, 1-phosphofructokinase, ribokinase, 2-keto-3-deoxygluconate kinase, and several proteins of unknown function. All of them present the PfkB-like fold (see Figure 1) which shows that this family is a very interesting example of gene duplications. However, the study of this feature is complicated by the lack of information on the function

**4. Structural evolution of the substrate specificity in the ADP-dependent sugar**

The ADP-dependent sugar kinases have been found in several members of the *Pyrococcus*, *Thermococcus*, *Methanosarcina*, *Methanosaeta*, *Methanococcoides*, *Methanococcus*, *Methanocaldococcus*, and *Archaeoglobus* genera (Hansen & Schönheit, 2004; Kengen et al., 1994; Koga et al., 2000; Tuininga et al., 1999; Verhees et al., 2001). Also, it has been possible to

<sup>3</sup> These enzymes have a slightly different fold compared with the other nucleoside kinases (such as

inosine-guanosine kinases) from the superfamily mentioned by Bork et al. (1993)

the small domain.

group is entirely represented by it.

of several of the PfkB-like proteins.

**kinase family**

in a genome are much smaller that the number of genes there.

Fig. 1. Schematic representation of the three branches of the ribokinase superfamily. For the vitamin kinase like branch the pyridoxal kinase (pdxK) from *E. coli* (PDBID 2DDM) is used as example, for the PfkB like branch the ribokinase from *E. coli* (PDBID 1RKD) is used, and for the ADP-dependent branch the glucokinase from *T. litoralis* (PDBID 1GC5) is shown.

6 Will-be-set-by-IN-TECH

Fig. 1. Schematic representation of the three branches of the ribokinase superfamily. For the vitamin kinase like branch the pyridoxal kinase (pdxK) from *E. coli* (PDBID 2DDM) is used as example, for the PfkB like branch the ribokinase from *E. coli* (PDBID 1RKD) is used, and for the ADP-dependent branch the glucokinase from *T. litoralis* (PDBID 1GC5) is shown.

to the superfamily, such as adenosine kinase<sup>3</sup> (Mathews et al., 1998), 2-keto-3-deoxygluconate kinase (Ohshima et al., 2004), and aminoimidazole riboside kinase (Zhang et al., 2004). Beyond the sugar containing molecules, three dimensional structure comparison showed that kinases like 4-methyl-5-*β*-hydroxyethylthiazole kinase (Campobasso et al., 2000), pyridoxal kinase (Li et al., 2002), and 4-amino-5-hydroxymethyl-2-methylpyrimidine phosphate kinase (Cheng et al., 2002) are also members of the ribokinase superfamily. Interestingly, these enzymes lack the small domain.

Already based on substrate specificities three major branches can be recognized (Figure 1). One of them contains those enzymes that catalyze the transfer of the *γ*-phosphate of ATP to molecules such as pyridoxal, or pyrimidine derivatives which we know as vitamin kinase like branch. The second contains all the enzymes that catalyze the transfer of the *γ*-phosphate of ATP to sugar containing molecules, such as fructose-6-phosphate, adenosine, aminoimidazole riboside, etc. We know this as the PfkB like branch. The last of them contains the enzymes that catalyze the transfer of the *β*-phosphate of ADP to glucose and fructose-6-phosphate which, as was mentioned before is known as PfkC family or ADP-dependent sugar kinase family.

Based mainly on the presence of the small domain and the monomer complexity Zhang et al. (2004) proposed that the most ancient activity of the superfamily should be that catalyzed by the simplest enzyme which is 4-methyl-5-*β*-hydroxyethylthiazole kinase. In that way, they propose that the increase of complexity in the monomers fold indicates a newer enzyme. By this hypothesis, the ADP-dependent enzymes and the monomeric adenosine kinases should be the newest acquisitions of the superfamily. However, this hypothesis was never tested. Nevertheless, although it could capture the essence of the evolutionary history of this group, considering the linearity of the hypothesis, it is rather unlikely that the true history of the group is entirely represented by it.

As it can be inferred from Figure 1 and Table 1 the ribokinase superfamily is an excellent example of how gene duplication has been used several times by nature to produce new specificities. This process has been recognized before as one of the most important steps in the creation of new protein functions (Chothia et al., 2003). Indeed, most of the proteins present inside a genome belong to a few protein families or a combination of them (see for example Chothia et al. (2003)). This degeneracy causes that the number of protein families represented in a genome are much smaller that the number of genes there.

Just as an example, a simple PSI-BLAST search on the genome of *E. coli* using the phosphofructokinase-2 as query finds 28 non-redundant proteins including: 6-phosphofructokinase, 1-phosphofructokinase, ribokinase, 2-keto-3-deoxygluconate kinase, and several proteins of unknown function. All of them present the PfkB-like fold (see Figure 1) which shows that this family is a very interesting example of gene duplications. However, the study of this feature is complicated by the lack of information on the function of several of the PfkB-like proteins.
