**3.2 The** *E. coli* **genome**

The first complete *E. coli* genome sequence was the sequence of the K-12 MG1655 strain of *E. coli*, published in 1997. The sequenced strain has been maintained as a laboratory strain with minimal genetic manipulation, having only been cured of the temperate bacteriophage lambda and F plasmid. The published genome has 4,639,221 base pairs. Protein-coding genes account for 87.8% of the genome, 0.8% encodes stable RNAs, and 0.7% consists of noncoding repeats. Eleven percent of the genome are involved in regulation of gene expression and also other functions [9]. A circular map of the *E. coli* genome is represented in **Figure 4**.

The map is based on the K-12 MG1655 sequence data as deposited in GenBank (Accession number NC\_000913) [10]. The multiplier for the ticks is 1e-6 (1.0 represents 1,000,000). In blue, the forward genes are shown, in purple the reverse genes, tRNA genes in orange, and rRNA genes in red. The map was drawn with

**Figure 4.** *Circular map of the E. coli K-12 MG1655 strain.*

the online tool ClicO FS, available at the Internet site http://www.codoncloud. com:3000/home [11, 12].

Genomes of pathogenic *E. coli* strains are general bigger, as the pathogenic strains need several special properties, so-called virulence factors. These are encoded in the virulence-associated genes (VAGs), which are frequently clustered in DNA regions called pathogenicity islands (PAIs) [13]. Often the pathogenic strains possess also extrachromosomal DNA elements, i.e., plasmids, that can also carry additional VAGs [7]. Some examples of genomes of pathogenic strains in comparison with the K-12 MG1655 strain are given in **Table 1**.

Data in the table are based on data available in the genome database of the National Center for Biotechnology Information (Internet site: www.ncbi.nlm.nih.gov) [14].

The most famous *E. coli* plasmid is the plasmid F (**Figure 5**). It is the paradigm plasmid for plasmid-specified transfer systems, as bacterial conjugation was first identified as a function of the F plasmid. Further, this plasmid was used to develop many of the genetic techniques commonly used to dissect prokaryotic systems, and F product analysis has been central in elucidating the basic mechanisms of plasmid replication and transmission [15].

F plasmid has two functional replication regions, RepFIA and RepFIB. The RepFIA region is believed to be primarily responsible for the typical replication properties of F [16]. The secondary replication region, RepFIB, is independently functional and can perform replication in the absence of RepFIA. F plasmid has also remnants of a third replication region, RepFIC, whose function was abolished by transposition of Tn*1000* into this replication region [17]. Apart from Tn*1000* also insertion sequences IS2 and IS3 are carried on F plasmid [16]. The plasmid-specified transfer system is encoded in the *tra* region, starting with the origin of transfer (*oriT*) [15].

**7**

**Figure 5.**

*as deposited in GenBank [18].*

**3.3 The phylogenetic groups of** *E. coli*

high, 50,000–100,000 or more [20].

The *E. coli* species has an extensive genetic substructure and the methods to assess the phylogenetic relationship among *E. coli* strains evolved during the time. In the pre-molecular era, the *E. coli* diversity was studied by serotyping. Serotyping studies showed that the somatic (O) antigen, the flagellar (H) antigen, and to a lesser extent the capsular (K) antigen are useful in distinguishing *E. coli* strains [19]. The *E. coli* serotyping is complex—173 O antigens, 80 K antigens, and 56 H antigens are known—and the O, K, and H antigens can be found in nature in many of the possible combinations. The final number of *E. coli* serotypes is therefore very

*Map of the E. coli F plasmid. The map was drawn based on the complete nucleotide sequence of the F plasmid* 

*Introductory Chapter: The Versatile Escherichia coli DOI: http://dx.doi.org/10.5772/intechopen.88882*

> **with infection**

Hemorrhagic diarrhea

infection

Crohn's disease

Hemolyticuremic syndrome

infection

**Chromosome size (Mbp)**

**Number of genes in the chromosome**

/ 4.64 4.566 / / /

5.5 5.329 pO157 92.721 85

5.13 5.092 / / /

4.75 4.532 pO83\_CORR 147.060 154

5.27 5.081 pG-EA11 1549 1

5.2 5.096 p1ESCUM 122.301 156

**Plasmids Plasmid** 

**size (bp)**

pOSAK1 3306 3

pAA-EA11 74.217 82 pESBL-EA11 88.544 94

p2ESCUM 33.809 49

**Number of genes on the plasmid**

*E. coli* **strain Associated** 

O7:K1 IAI39 Urinary tract

UMN026 Urinary tract

*Genomes of different E. coli strains.*

K-12 MG1655

O157:H7 Sakai

O83:H1 NRG 857C

O104:H4 2011C-3493 ASM29945v1

**Table 1.**

*Introductory Chapter: The Versatile Escherichia coli DOI: http://dx.doi.org/10.5772/intechopen.88882*


#### **Table 1.**

*The Universe of Escherichia coli*

com:3000/home [11, 12].

*Circular map of the E. coli K-12 MG1655 strain.*

**Figure 4.**

replication and transmission [15].

the online tool ClicO FS, available at the Internet site http://www.codoncloud.

Genomes of pathogenic *E. coli* strains are general bigger, as the pathogenic strains need several special properties, so-called virulence factors. These are encoded in the virulence-associated genes (VAGs), which are frequently clustered in DNA regions called pathogenicity islands (PAIs) [13]. Often the pathogenic strains possess also extrachromosomal DNA elements, i.e., plasmids, that can also carry additional VAGs [7]. Some examples of genomes of pathogenic strains in

Data in the table are based on data available in the genome database of the National

F plasmid has two functional replication regions, RepFIA and RepFIB. The RepFIA region is believed to be primarily responsible for the typical replication properties of F [16]. The secondary replication region, RepFIB, is independently functional and can perform replication in the absence of RepFIA. F plasmid has also remnants of a third replication region, RepFIC, whose function was abolished by transposition of Tn*1000* into this replication region [17]. Apart from Tn*1000* also insertion sequences IS2 and IS3 are carried on F plasmid [16]. The plasmid-specified transfer system is encoded in

Center for Biotechnology Information (Internet site: www.ncbi.nlm.nih.gov) [14]. The most famous *E. coli* plasmid is the plasmid F (**Figure 5**). It is the paradigm plasmid for plasmid-specified transfer systems, as bacterial conjugation was first identified as a function of the F plasmid. Further, this plasmid was used to develop many of the genetic techniques commonly used to dissect prokaryotic systems, and F product analysis has been central in elucidating the basic mechanisms of plasmid

comparison with the K-12 MG1655 strain are given in **Table 1**.

the *tra* region, starting with the origin of transfer (*oriT*) [15].

**6**

*Genomes of different E. coli strains.*

#### **Figure 5.**

*Map of the E. coli F plasmid. The map was drawn based on the complete nucleotide sequence of the F plasmid as deposited in GenBank [18].*

#### **3.3 The phylogenetic groups of** *E. coli*

The *E. coli* species has an extensive genetic substructure and the methods to assess the phylogenetic relationship among *E. coli* strains evolved during the time. In the pre-molecular era, the *E. coli* diversity was studied by serotyping. Serotyping studies showed that the somatic (O) antigen, the flagellar (H) antigen, and to a lesser extent the capsular (K) antigen are useful in distinguishing *E. coli* strains [19]. The *E. coli* serotyping is complex—173 O antigens, 80 K antigens, and 56 H antigens are known—and the O, K, and H antigens can be found in nature in many of the possible combinations. The final number of *E. coli* serotypes is therefore very high, 50,000–100,000 or more [20].

The molecular studies of *E. coli* diversity began with the measurement of variations in electrophoretic mobility of enzymes derived from different *E. coli* strains [21]. In 1980s the multi-locus enzyme electrophoresis (MLEE) became the common technique for the study of bacterial diversity. It was found that *E. coli* populations evolve in a clonal manner, with recombination playing a limited role, and it also became clear that genetically distant strains can have the same serotype and that closely related strains may have different serotypes [19]. Based on the MLEE studies of 38 enzyme loci, four major phylogenetic groups among *E. coli* were found: A, B1, B2, and D [22]. Clermont et al. [23] established a method of rapid and simple determination of the *E. coli* phylogenetic groups by a triplex PCR. This genotyping method is based on the amplification of a 279 bp fragment of the *chuA* gene; a 211 bp fragment of the *yjaA* gene; and a 152 bp fragment of TSPE4.C2, a noncoding region of the genome. The presence or absence of combinations of these three amplicons is used to assign the *E. coli* to the phylogenetic groups: A, B1, B2, or D (**Figure 6**).

However, subsequently, on the basis of multi-locus sequence typing and complete genome data, additional *E. coli* phylogenetic groups were recognized [24, 25]. The number of defined phylogenetic groups thus rose to eight (A, B1, B2, C, D, E, F that belongs to *E. coli* sensu stricto, and the eighth—the *Escherichia* cryptic clade I). Clermont et al. [26] thus revised their method to encompass the newly described phylogenetic groups. To enable identification of the F phylogenetic group, the new extended PCR phylotyping method employs an additional gene target, *arpA*, which serves also as an internal control for DNA quality. Thus, the revised PCR method is based on a quadruplex PCR, and if required, additional single PCR reactions are employed to distinguish between E and clade I, A or C, and D or E phylo-group [26] (**Figure 7**).

Two collections of human fecal isolates were screened using the quadruplex phylo-group assignment method demonstrating that 12.8% of *E. coli* isolates belonged to the newly described phylo-groups C, E, F, and clade I and that strains assigned to phylo-groups A and D by the triplex method are worth to be retested by the quadruplex method, as it is likely that they are going to be reclassified [26]. Logue et al. [27] performed a comparative analysis of phylogenetic assignment of human and avian extraintestinal pathogenic (ExPEC) and fecal commensal *E. coli* (FEC) strains and showed that a total 13.05% of studied human *E. coli* strains and 40.49% of avian *E. coli* strains had to be

**Figure 6.**

*Dichotomous decision tree to determine the phylogenetic group by the Clermont triplex PCR method [23].*

**9**

strains.

**Figure 7.**

**3.4 The commensal** *E. coli*

also designated as mutualistic *E. coli*.

**3.5 The pathogenic** *E. coli*

*Introductory Chapter: The Versatile Escherichia coli DOI: http://dx.doi.org/10.5772/intechopen.88882*

reclassified. Another study using human *E. coli* strains isolated from skin and soft-tissue infections and fecal *E. coli* strains from healthy humans and also avian and brown bear fecal strains revealed that 27.60% of human, 23.33% of avian, and 70.93% of brown bear strains had to be reclassified. Moreover, a high number (12.22%) of reclassifications from the previous phylo-groups to the non-typeable (NT) group were observed among the avian fecal strains of this study. Further, a survey performed on other published data by Starčič Erjavec et al. [28] showed that also a number of other studies report occurrence of NT strains by the quadruplex method, for example, a study including 140 uropathogenic *E. coli* strains from Iran reported 27.14% of NT strains [29]. These data emphasizes that there is a need to search for more *E. coli* strains from novel environments (new hosts in not yet explored geographic regions) and to revise the PCR phylotyping method again in order to type these NT

*Dichotomous decision tree to determine the phylogenetic group by the Clermont quadruplex PCR method [26].*

As *E. coli* is a facultative anaerobe, and among the first gut colonizers, these bacteria help to establish the anaerobic environment of the gut that enables the further colonization of the gut by anaerobic bacteria [30]. After the *E. coli* colonization, usually the host and *E. coli* coexist in mutual benefit for decades [7]. *E. coli* gets "food and shelter," and the host benefits due to the *E. coli* vitamin K production and the so-called colonization resistance. Colonization resistance is the phenomenon of protection against colonization by pathogenic bacteria, including pathogenic *E. coli* [31]. The niche of the commensal *E. coli* is the mucous layer of the colon [7]. On average five different commensal *E. coli* strains colonize a human host at any given time [32]. As host and the *E. coli* profits from their association, these *E. coli* could be

*E. coli* is also a medically important species, as it is involved in many different types of infections. Two major groups of pathogenic *E. coli* exists: the intestinal

*Introductory Chapter: The Versatile Escherichia coli DOI: http://dx.doi.org/10.5772/intechopen.88882*

*The Universe of Escherichia coli*

or C, and D or E phylo-group [26] (**Figure 7**).

The molecular studies of *E. coli* diversity began with the measurement of variations in electrophoretic mobility of enzymes derived from different *E. coli* strains [21]. In 1980s the multi-locus enzyme electrophoresis (MLEE) became the common technique for the study of bacterial diversity. It was found that *E. coli* populations evolve in a clonal manner, with recombination playing a limited role, and it also became clear that genetically distant strains can have the same serotype and that closely related strains may have different serotypes [19]. Based on the MLEE studies of 38 enzyme loci, four major phylogenetic groups among *E. coli* were found: A, B1, B2, and D [22]. Clermont et al. [23] established a method of rapid and simple determination of the *E. coli* phylogenetic groups by a triplex PCR. This genotyping method is based on the amplification of a 279 bp fragment of the *chuA* gene; a 211 bp fragment of the *yjaA* gene; and a 152 bp fragment of TSPE4.C2, a noncoding region of the genome. The presence or absence of combinations of these three amplicons is used to assign the *E. coli* to the phylogenetic groups: A, B1, B2, or D (**Figure 6**). However, subsequently, on the basis of multi-locus sequence typing and complete genome data, additional *E. coli* phylogenetic groups were recognized [24, 25]. The number of defined phylogenetic groups thus rose to eight (A, B1, B2, C, D, E, F that belongs to *E. coli* sensu stricto, and the eighth—the *Escherichia* cryptic clade I). Clermont et al. [26] thus revised their method to encompass the newly described phylogenetic groups. To enable identification of the F phylogenetic group, the new extended PCR phylotyping method employs an additional gene target, *arpA*, which serves also as an internal control for DNA quality. Thus, the revised PCR method is based on a quadruplex PCR, and if required, additional single PCR reactions are employed to distinguish between E and clade I, A

Two collections of human fecal isolates were screened using the quadruplex phylo-group assignment method demonstrating that 12.8% of *E. coli* isolates belonged to the newly described phylo-groups C, E, F, and clade I and that strains assigned to phylo-groups A and D by the triplex method are worth to be retested by the quadruplex method, as it is likely that they are going to be reclassified [26]. Logue et al. [27] performed a comparative analysis of phylogenetic assignment of human and avian extraintestinal pathogenic (ExPEC) and fecal commensal *E. coli* (FEC) strains and showed that a total 13.05% of studied human *E. coli* strains and 40.49% of avian *E. coli* strains had to be

*Dichotomous decision tree to determine the phylogenetic group by the Clermont triplex PCR method [23].*

**8**

**Figure 6.**

**Figure 7.** *Dichotomous decision tree to determine the phylogenetic group by the Clermont quadruplex PCR method [26].*

reclassified. Another study using human *E. coli* strains isolated from skin and soft-tissue infections and fecal *E. coli* strains from healthy humans and also avian and brown bear fecal strains revealed that 27.60% of human, 23.33% of avian, and 70.93% of brown bear strains had to be reclassified. Moreover, a high number (12.22%) of reclassifications from the previous phylo-groups to the non-typeable (NT) group were observed among the avian fecal strains of this study. Further, a survey performed on other published data by Starčič Erjavec et al. [28] showed that also a number of other studies report occurrence of NT strains by the quadruplex method, for example, a study including 140 uropathogenic *E. coli* strains from Iran reported 27.14% of NT strains [29]. These data emphasizes that there is a need to search for more *E. coli* strains from novel environments (new hosts in not yet explored geographic regions) and to revise the PCR phylotyping method again in order to type these NT strains.

#### **3.4 The commensal** *E. coli*

As *E. coli* is a facultative anaerobe, and among the first gut colonizers, these bacteria help to establish the anaerobic environment of the gut that enables the further colonization of the gut by anaerobic bacteria [30]. After the *E. coli* colonization, usually the host and *E. coli* coexist in mutual benefit for decades [7]. *E. coli* gets "food and shelter," and the host benefits due to the *E. coli* vitamin K production and the so-called colonization resistance. Colonization resistance is the phenomenon of protection against colonization by pathogenic bacteria, including pathogenic *E. coli* [31]. The niche of the commensal *E. coli* is the mucous layer of the colon [7]. On average five different commensal *E. coli* strains colonize a human host at any given time [32]. As host and the *E. coli* profits from their association, these *E. coli* could be also designated as mutualistic *E. coli*.

#### **3.5 The pathogenic** *E. coli*

*E. coli* is also a medically important species, as it is involved in many different types of infections. Two major groups of pathogenic *E. coli* exists: the intestinal

#### *The Universe of Escherichia coli*

pathogenic *E. coli* (IPEC), associated with infections of the gastrointestinal tract, and the extraintestinal pathogenic *E. coli* (ExPEC), associated with infections of extraintestinal anatomic sites [7]. The medical diversity of this species is nicely exhibited by its classification of pathogenic *E. coli* (**Figure 8**), the so-called *E. coli* pathotypes.

The versatility of pathogenic *E. coli* strains depends on their genetic makeup, on the presence of so-called virulence genes, and possession of such genes distinguishes pathogenic from nonpathogenic bacteria [34]. Virulence factors help bacteria to (1) invade the host, (2) cause disease, and (3) evade host defenses [35].
