**5. Future directions**

There are 4306 predicted *E. coli* K12 protein sequences present in the UniProt proteome database (http://www.uniprot.org/proteomes/) [129]. An initial analysis of their compartmentalization within the cell using the prediction software TOPCONS2 (http://topcons.cbr.su.se/ pred/) [130] allowed us to putatively assign each of these proteins to one of three subcellular compartments: cytoplasmic, transmembrane in the inner membrane (referred to as transmembrane hereafter), or secreted. To hone in on proteins exhibiting possible oxidoreductase activity, the CXXC motif was used as a signature to identify 406 proteins, which showed that approximately 10% of all predicted *E. coli* proteins contain this motif, thereby demonstrating its relative ubiquity. Of these 406 proteins, ~75% are cytoplasmic, ~18% are transmembrane, and ~7% are secreted (see **Table 2**). The pool of non-CXXC-containing proteins comprises the remaining 3900 proteins, of which ~63% are cytoplasmic, ~23% are transmembrane, and ~14% are secreted (omitted from **Table 2**). The transmembrane and secreted compartments have a lower fraction of CXXC-containing proteins in keeping with the exclusion of cysteine residues from these compartments in aerobes [77]. A comparison of the non-CXXC sequence pool with the CXXC sequence pool shows a slight enrichment of CXXC proteins in the cytoplasm (~75%) versus non-CXXC cytoplasmic proteins (~63%). The distribution of CXXC and non-CXXC proteins in the transmembrane is similar (18 and 23%, respectively); however, about twice as many non-CXXC proteins are secreted (14%) compared to CXXC proteins. Approximately 22% (90 of 406) of CXXC proteins are annotated in the UniProt data as binding metal ions or as iron-sulfur cluster-containing proteins. While 46% of all CXXC proteins have been functionally characterized, the remaining majority (54%) should be characterized to develop a better understanding of the reactions they catalyze, how those identified to be oxidoreductases may contribute to the redox biology of bacteria, and to identify novel targets for therapeutics.


Secreted refers to proteins in the periplasm and secreted outside of the cell. Compartment location was predicted using topological and signal sequence input data on the TOPCONS server. Gene ontology (GO) codes EXP and IDA were used to identify proteins with experimentally verified function from the UniProt database; those lacking these codes were defined as having unknown function. GO codes were also used to identify CXXC proteins annotated to bind metals [129].

**Table 2.** The *E. coli* proteome separated by compartment, the presence of CXXC motifs, and known function.
