**1. Introduction**

The determination of when and how genes are "turned on and off" is a challenge in posgenomic era. Differences between two species are closer to gene expression and regulation than to gene structures (Howard & Benson, 2002). The first and key step in gene expression is promoter recognition by RNA polymerase enzyme (RNAP). The promoter sequences can be defined as cis-acting elements located upstream of the transcription start site (TSS) of open reading frames (ORF). To make an analogy, genes represent the "computer memory" and promoters represent the "computer program" which acts on that memory. The study about promoters can assist in providing new models about the constitution of the computer program and how it operates (Howard & Benson, 2002).

The proper regulation of transcription is crucial for a single-cell prokaryote since its environment can change dramatically and instantly (Huffmann & Brennan, 2002). In face of this, the detailing of the principals and the organization of transcriptional process is helpful for understanding the complexity of biological systems involved, for instance, cellular responses to environmental changes or in the molecular bases of many diseases caused by microbes (Janga & Collado-Vides, 2007).

While several sequenced genomes have their protein-coding gene repertoire well described, the accurate identification and delineation of cis-regulatory elements remain elusive (Fauteux et al., 2008). At this moment, the challenges are to analyze the available sequences and to locate TSS, promoters and other regulatory sequences (Askary et al., 2009). The purpose of this review is to provide a brief survey of promoter sequences characteristics and the advances of computer algorithms for their analysis and prediction. This chapter is organized in two main sections. The established knowledge about biological features of the

© 2012 Silva and Echeverrigaray, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Silva and Echeverrigaray, licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

promoters will be described in the first section, focusing in their genetic role and sequence content constitution. This is an important topic for understanding the intrinsic difficulties in the *in silico* promoter prediction approaches. The second section is devoted to give a reasonably concise background of the most used methodologies for *E. coli* promoter prediction and recognition, presenting their applications, as well as their limitations.

Bacterial Promoter Features Description

Region Reference

Helmann & Chamberlin,

1987

1985

1999

al.,2006

Margalit, 1993

and Their Application on *E. coli in silico* Prediction and Recognition Approaches 243

sequences of the promoters can affect the level of expression of the gene(s) they control, without altering the gene products themselves (Lewin, 2008). The canonical consensus and the number interspacing nucleotides recognized by the most important σ are presented on

Region Separation -10 consensus

Response CCCTTGAA 13-15pb CCCGATNT Cowing et al.,

Metabolism CTGGNA 6pb TTGCA Barrios et al.,

Response GGAACTT 15pb GTCTAA Rhodius et

A comprehensive study of the promoter content information was carried out by Schultzaberger et al. (2006). The authors have used the Claude Shannon's information theory and have built a promoter model by aligning and refining of 559 sequences upstream of TSS. The results for the promoter motifs showed, among others, two interesting results: *(i)*  the difference of TSS prokaryotic information (0.39+-0.06 bits) in opposite to eukaryotic TSS (~3bits) and, (*ii)* the notorious high degree of conservation of the last nucleotide (T) in the -10 region. Another important discussion described in the paper is about the -10 extended region. According to Hook-Barnard et al. (2006), some promoters are functional without the -35 region and this missing information is compensated by four nucleotides upstream of the -10 element. Its consensus sequence is TRTG (according to IUPAC code, the letter R represents A or G). About this issue, the authors suggest that in prokaryotes the extended - 10 may be an evolutionary predecessor to the modern bipartite promoter or vice versa.

Response TTGACA 16-18pb CTATACT Typas et al., 2007

Table 1. Just for σ54, the consensual region is located in the -12 and -24 nucleotides.

70 rpoD Housekeeping TTGACA 16-18pb TATAAT Lisser &

**Table 1.** *E. coli* σ factors and their promoter sequences binding sites (LEWIN, 2008).

However, the second possibility does not explain the origin of bipartite promoter.

and incorrect assignments (de Avila e Silva et al., 2011).

As it has been related so far, the promoter motifs are not strictly conserved within a set of promoters recognized by a given σ factor and also differ according to the σ factor which recognizes them. The structure of bacterial promoters limits the efficacy of prediction by a global analysis approach. A limited analysis of a putative promoter sequence by comparison with the σ70 promoter consensus motif can lead to an unacceptable rate of false negatives


28 fliA Flagellar genes CTAAA 15 pb GCCGATAA

σ

Factor Gene Cellular

32 rpoH Heat shock

38 rpoS Starvation

54 rpoN Nitrogen

24 rpoE Heat shock

Uses
