**11. The** *E. coli* **genome and proteome**

The full genome of *E. coli* K12 was published by Science in 1997, making it one of the first species to have its genome completely sequenced. *E. coli* has a circular DNA molecule with 4288 annotated protein-coding genes (arranged into 2584 operons), 7 ribosomal RNA (rRNA) operons, and 86 transfer RNA (tRNA) (data for the *E. coli* laboratory strain K-12 derivative MG1655) [8]. However, *E. coli* core genome (i.e., genes found in all strains) accounts for less than 20% of the pan genome's genes

#### Escherichia coli *- Old and New Insights*

**Figure 12.** *Phylogenetic tree of* E. coli *strains [47].*

or nearly all (90%) of the genomes, leaving only a tiny fraction of genes found in roughly half of the genomes [54]. The *E. coli* core genome is estimated to have less than 1500 genes, while it has a huge pan-genome with more than 22,000 genes [55]. According to genomic analysis many of the genes of the pan-genome could be not yet unidentified but crucial virulence factors [56]. There are 27,621 *E. coli* genome assemblies and annotation sequences available to date and each genome comprises between 4000 and 5500 genes [57]. The *E. coli* genome as a whole is remarkably ordered in terms of local replication direction and oligonucleotides that may be involved in replication and recombination [58]*.*

The diverse behavior of this species is explained by its enormous genetic and phenotypic diversity. With a mean distance between genes of only 118 base pairs, Escherichia coli*: An Overview of Main Characteristics DOI: http://dx.doi.org/10.5772/intechopen.105508*

the coding density was found to be extremely high. A multitude of factors contribute to the higher gene density: a. bacterial genes lack introns throughout the genome, and neighboring genes are fairly near together, i.e., there are no many large non-coding DNA sections between genes. There are several transposable genetic elements, repetitive elements, cryptic prophages, and bacteriophage remnants in the genome and a variety of additional patches with unique compositions, showing genome plasticity due to horizontal gene transfer [58, 59].

*E. coli* is an excellent model for studying the general characteristics of the bacterial proteome, such as its dynamics under different physiological situations, its dynamic range of expression, and its changes. According to the genomic sequence data of the *E. coli* K-12 strain, there are 4364 ORFs or ORF fragments in the *E. coli* K-12 W3110 strain. The *E. coli* proteome has been used as a standard for evaluating and validating new technologies and methodologies in recent years, including sample prefractionation, protein enrichment, two-dimensional gel electrophoresis (2-DE), protein detection, bio-mass spectrometry (MS), combinatorial assays with n-dimensional chromatography and image analysis. In comparison to the proteomes of other organisms such as plants and animals, the *E. coli* proteome is much smaller and with less protein modification and hence provides an excellent model for various research needs. The usage of the *E. coli* proteome as a model is further boosted by the existence of public databases such as SWISS-PROT (http:// www.expasy.ch/ch2d/) and NCBI (http://www.ncbi.nlm.nih.gov/), which contain rich information on proteins and corresponding genes of *E. coli* and the existence of the *E. coli* SWISS-2DPAGE maps, which are based on a large amount of biochemical and biological data [60]*.*
