**3.2. tagRNA-seq**

A surprising result was the detection of a large number of transcription start sites (TSS). This has never been achieved before using any technology aside from derivative RNA-seq tech‐ nology, like the differential RNA-seq (dRNA-seq), which differentiated primary transcripts that exhibit triphosphate ends from processed transcripts that present monophosphate ends, such as rRNAs and tRNAs. In this case, to enrich mRNA, the strategy was to treat all the RNA samples with exonuclease enzymes that degrade nucleotide monophosphate. This strategy identified 5'UTR ends, operons and antisense transcription, thus providing a new perception of the organization of the bacterial transcriptome and a new model for the analysis of indi‐

The results obtained allow the inference of a role of 5'UTR regions. A correlation between size and cell function was proposed by the researchers, who found that larger size is related to pathogenicity [13]. These results show how little knowledge there is regarding microorgan‐ isms, believed to be the simplest form of life, yet which nevertheless prove to be more complex

An RNA-seq application that has been widely used in bacterial genomes is found in studies focused on identifying small RNAs (sRNA). These elements are regulators of various biological processes and were initially studied primarily in *Escherichia coli* [21]. However, with the advances in technology, it has been possible to identify and characterize small RNAs in a variety of bacterial species [13, 22, 23]. Yan et al. (2013) identified an expression profile of sRNA in the *Yersinia pestis*, both *in vitro* and *in vivo*. This has allowed the identification of new sRNAs and the recognition of gene expression modulation during the infection process, thus improv‐ ing the understanding of the transcription regulation mechanisms of this organism [24]. The importance of studies involving sRNA also includes assistance in research related to antibiotics therapies, a study in initial development despite a lot of knowledge to be better exploited [25].

RNA-seq has been used in different areas and situations. Advanced studies using this technology can detect details in cell expression [26]. Even with the difficulties in separating eukaryotic and prokaryotic materials, it was possible to distinguish the simultaneous expres‐ sion profiles between the host–pathogen responses through dual transcriptome studies. This work allowed to disclosure the host response against the bacterial infection and virulence factors, enabling the infectious process determination [27]. These studies contribute to the research in the field of biological infection by examining diverse pathogens with different life cycles and methods of infection and providing crucial knowledge for studies of diagnostics

After a relatively short time on the market, RNA-seq can accurately reveal structural and functional elements of bacteria. The mapping of transcripts in the genome can refine the annotation or even identify new regions, improve the quality of the studied genome compared to regions previously annotated by predictors or assembled using an *ab initio* approach [28,

Data coming from a quality genome tends to provide more promising results, responding to the biological question being investigated by researchers. In search of a quality genome, *ab initio* transcripts assembly or even a hybrid approach, which uses both the reference genome

than previously anticipated. This leaves a lot to be discovered.

208 Next Generation Sequencing - Advances, Applications and Challenges

and vaccines, such as metatranscriptomics study.

29], and can even check the abundance of transcript expression.

vidual genes [13].

Bacterial RNA can be divided in two groups: primary and processed transcripts. Primary transcripts are represented by the presence of 5'-triphosphate (5'PPP), which includes messenger RNA (mRNA) and small RNAs (sRNA). Processed transcripts are those carrying 5'-monophosphate (5'P), such as mature ribosomal RNA (rRNA) and transfer RNA (tRNA).

Transcriptome represents approximately 95% of the total bacterial transcriptome [15]. A recently developed approach called dRNA-seq [13] revolutionized the study of the primary transcripts by considering the 5' difference between the primary and the processed groups, as mentioned previously (see Section 3.1).

RNAs are very stable and during preparation, considering the "wet-lab" experiments, some transcripts are partially or totally degraded. 5'PPP and 5'P are two of the mechanisms of protection against exonucleases and the first degraded portion of the transcripts. During that process, information is lost and some primary transcripts end up with 5'P and are treated as processed transcripts. Consequently, they are eliminated by the dRNA-seq technique. A new methodology was created to overcome this problem by tagging and clustering the two groups together in an RNA-seq-derived approach named tagRNA-seq [31]. This technique also considers the difference between processed and primary transcripts, but instead of degrading the processed ones, two different ligation reactions are implemented with two different markers: PSS-tag (processed start site) and TSS-tag (transcription start site). They differ in their nucleotide sequence. Figure 1 exhibits briefly the methodology, considering the three main steps: (1) the first reaction tags (PSS-tag) on the processed transcripts; (2) treatment with tobacco alkaline phosphatase (TAP), where the 5'PPP loses two phosphates, which allows the third step; (3) the second ligation reaction (TSS-tag) on the primary transcripts. After those steps are completed, the transcripts are sequenced and, due to the different markers, they can be distinguished and compared [31].

This methodology was first described for *Enterococcus faecalis* [31] and was based on another technique, 5'tagRACE [32], a 5'RACE derived method. The results provided by tagRNA-seq improved the annotation of the *E. faecalis* genome by having identified or corrected several genome portions, including both non-coding and coding regions. This study also compared different libraries to prove the effectiveness of this innovative approach. With this, it provided

**Figure 1.** The three main steps of the tagRNA-seq approach. (1) The first ligation reaction, during which the attach‐ ment of the PSS-tag (blue) to the processed transcripts (5'P) occurs. (2) Treatment with tobacco alkaline phosphatase (TAP), turning triphosphate to monophosphate groups. (3) The second ligation, corresponding to the TSS-tag (yellow) marker on the previously 5'PPP group (primary transcripts). The different markers allow the differentiation of the tri‐ phosphate and monophosphate groups after sequencing.

a new method capable of differentiating primary and processed RNAs and was suited to better comprehending of the genetic information of bacteria as other groups [31].

dRNA-seq and tagRNA-seq are approaches that enable a new view of the transcriptome by selecting the primary transcripts for sequencing or by differentiating the primary from the processed transcripts, for a broader insight into the transcriptome. These state-of-the-art techniques promise a better understanding of RNA structures like TSS, 5'UTR, promoters, among others, besides the knowledge of non-annotated genes and small RNAs.

#### **3.3. FRT-seq (flowcell reverse transcription sequencing)**

Flowcell reverse transcription sequencing (FRT-seq) is a new and improved methodology, derived from the RNA-seq technology that was created for Illumina sequencers. Unlike RNAseq, FRT-seq does not require amplification by PCR, a step that usually introduces bias into the results by displaying an erroneous view of the quantity of some RNA species [33]. Other important features of the Illumina sequencing methodology are the ability to generate strandspecific information, the use of pair-end libraries and the need for a considerable initial amount of RNA template. PCR-free amplification is a major step towards a more comprehensive library, akin to the original one, but without the formation of intermolecular priming artefacts among other errors. It will probably become a fairly useful technique in the near future [33, 34]. Third-generation sequencing platforms, like Nanopore and PacBio, also use amplificationfree approaches. However, neither is currently being broadly used since they still exhibit sequencing errors.

FRT-seq comprises the fragmentation of the template (e.g., mRNA) followed by ligation of adapters in both the 3' and the 5' ends, which are responsible for the hybridization of the template with oligonucleotides on the flowcell surface. The next steps performed are quanti‐ fication, reverse transcription and then sequence reaction [33, 34].

This approach can be applied to both eukaryotes and prokaryotes, although the number of published papers involving eukaryotes is more substantial. From the bacterial world, we can quote papers involving *Salmonella enterica* [23] and *Shigella fleneri* [35] in which FRT-seq was applied as a complementary approach to describe the transcriptional landscape of the species. In both cases, FRT-seq showed greater sensitivity and excellent concordance when compared to other approaches and replicates.

The *S. enterica* paper [23] shows that FRT-seq is as efficient as the RNA-seq and dRNA-seq techniques (Figure 2) (Table 1). Figure 2 compares nine different RNA libraries: TEX (1, 2, 3), RNA-seq (1, 2, 3, \*) and FRT-seq (depleted and not depleted). TEX (libraries treated with terminator exonuclease) is a dRNA-seq methodology (see Sections 3.1 and 3.2) that, together with the first three RNA-seq biological replicates, was sequenced using a 454 (1 and 2) or an Illumina GAII (3 and FRT-seq) sequencer and the RNA-seq\* (library enriched for small RNA species) was sequenced using Illumina HiSeq. The charts relate the percentages of different RNA species and show that the FRT-seq libraries provide similar or better results than the other approaches. The data presented in Table 2 also support this claim, especially considering both the total number of reads and the uniquely mapped reads achieved using the FRT-seq libraries.

a new method capable of differentiating primary and processed RNAs and was suited to better

**Figure 1.** The three main steps of the tagRNA-seq approach. (1) The first ligation reaction, during which the attach‐ ment of the PSS-tag (blue) to the processed transcripts (5'P) occurs. (2) Treatment with tobacco alkaline phosphatase (TAP), turning triphosphate to monophosphate groups. (3) The second ligation, corresponding to the TSS-tag (yellow) marker on the previously 5'PPP group (primary transcripts). The different markers allow the differentiation of the tri‐

dRNA-seq and tagRNA-seq are approaches that enable a new view of the transcriptome by selecting the primary transcripts for sequencing or by differentiating the primary from the processed transcripts, for a broader insight into the transcriptome. These state-of-the-art techniques promise a better understanding of RNA structures like TSS, 5'UTR, promoters,

Flowcell reverse transcription sequencing (FRT-seq) is a new and improved methodology, derived from the RNA-seq technology that was created for Illumina sequencers. Unlike RNAseq, FRT-seq does not require amplification by PCR, a step that usually introduces bias into the results by displaying an erroneous view of the quantity of some RNA species [33]. Other important features of the Illumina sequencing methodology are the ability to generate strandspecific information, the use of pair-end libraries and the need for a considerable initial amount of RNA template. PCR-free amplification is a major step towards a more comprehensive library, akin to the original one, but without the formation of intermolecular priming artefacts among other errors. It will probably become a fairly useful technique in the near future [33, 34]. Third-generation sequencing platforms, like Nanopore and PacBio, also use amplification-

comprehending of the genetic information of bacteria as other groups [31].

among others, besides the knowledge of non-annotated genes and small RNAs.

**3.3. FRT-seq (flowcell reverse transcription sequencing)**

phosphate and monophosphate groups after sequencing.

210 Next Generation Sequencing - Advances, Applications and Challenges

**Figure 2.** Sequencing methodology comparison. Adapted from [23]. IGR – Intergenic region; TEX – libraries treated with terminator exonuclease; RNA-seq\* – library enriched for small RNA species (sRNA).


**Table 1.** Sequencing statistics. Adapted from [23]

The *S. fleneri* paper [35] also reports a favourable result concerning FRT-seq. In fact, this approach revealed a larger gene repertoire than the RNA-seq (Table 2).


**Table 2.** Sequencing statistics. Adapted from [31].

**Library Sequencing**

TEX\_1 454

RNA-seq\_1 454

TEX\_2 454

RNA-seq\_2 454

TEX\_3 Illumina GAII

RNA-seq\_3 Illumina GAII

RNA-seq\* Illumina HiSeq

**FRT-seq Illumina GAII**

**FRT-seq dep Illumina GAII**

**Table 1.** Sequencing statistics. Adapted from [23]

**technology Description**

212 Next Generation Sequencing - Advances, Applications and Challenges

dRNA-seq library biological replicate 1

RNA-seq library biological replicate 1

RNA-seq library biological replicate 2

RNA-seq library biological replicate 3

RNA-seq library biological replicate 4

**FRT-seq library biological replicate 5**

**FRT-seq library biological replicate 5 rRNA depleted**

approach revealed a larger gene repertoire than the RNA-seq (Table 2).

dRNA-seq library biological replicate 3

dRNA-seq library biological replicate 2

**Total number** **Number of reads (not mapped)**

**Number of reads (uniquely mapped)**

161,031 72,623 88,408 54.90 1.11

248,993 83,030 165,963 66.65 2.03

111,462 10,785 100,677 90.32 2.16

93,337 38,577 54,760 58.67 0.61

1,738,867 122,058 1,211,426 69.67 20.99

2,148,563 136,871 1,360,113 63.30 21.16

3,750,797 164,658 2,596,010 69.21 25.11

**18,563,218 4,203,715 2,456,792 13.23 16.42**

**24,585,564 9,652,397 4,093,744 16.65 27.77**

The *S. fleneri* paper [35] also reports a favourable result concerning FRT-seq. In fact, this

**Percent uniquely mapped reads [%]**

**Minimum fold coverage**#

> The data presented in this topic demonstrate the quality of this recently published methodol‐ ogy and, according to the authors [33, 34], new updates are still being developed. This will probably provide an even better approach for users. The fact that this technique is only applicable for Illumina sequencers is a drawback; but, since this sequencing platform is available worldwide, this disadvantage can easily be fixed. Perhaps, in the near future, it can be extended to work in other sequencing platforms. Another particularity of this technique is its efficiency with AT-rich genomes, which does not constrain its application with AT-poor genomes. This is due to the PCR-free amplification, which raises a question for other sequenc‐ ers like Nanopore and PacBio. Despite these issues, this technology has a bright future and is a great advance over the conventional RNA-seq.

#### **3.4. Chromatin immunoprecipitation followed by sequencing (ChIP-seq)**

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) is a technique for the genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes [36]. ChIP-Seq has become an essential tool for studying gene regulation and epigenetic mecha‐ nisms. It offers higher resolution, less noise and greater coverage than its array-based prede‐ cessor, the ChIP-chip [37, 38]. This approach has six main steps: (1) it is initiated with cell cultures that are grown under defined conditions; and, when the cultures reach the desired stage of development, they are treated with formaldehyde for the cross-linking of proteins and DNA; (2) the chromatin is sheared by sonication into small fragments (200–600 bp); (3) an antibody specific to the protein is used to immunoprecipitate the DNA–protein complex; (4) the cross-links are reversed by heating; (5) the released DNA is subjected to high-throughput sequencing and (6) in silico analysis is carried out in which the resulting sequencing reads are studied for quality and then cropped, based on the quality of the reads [38–40]. The cropped reads are then aligned to a reference genome. Afterwards, areas of enrichment in the ChIP-seq data are identified and those areas, usually called peaks, represent where the transcription factors (TF) bind throughout the genome. CisGenome, MOSAiCs and MACS are some known algorithms that have been utilized in bacterial ChIP-seq analysis [38, 41]. After peaks are associated with genes downstream, a number of bioinformatics analyses can be carried out, including identification and analysis of motifs, differential analysis and association with expression data for deep understanding of bacterial regulon. This is shown in Figure 3 [36].

**Figure 3.** ChIP-seq sample preparation and analysis. Adapted from [36].

As whole-genome transcription profiling cannot reveal whether the influence of the transcrip‐ tion factors (TF) on RNA levels is direct or indirect, this requires identification of transcription factors binding within the appropriate promoter region. ChIP-seq provides information about where the TF are bound. Thus, by integrating ChIP methods and transcription profiling, it is possible to identify all direct regulatory targets of a TF for a given condition. For example, work carried out by Stringer et al. (2014) on the *araC* gene of *Escherichia coli* and *Salmonella enterica* has identified direct regulatory targets of AraC, including five novel target genes: *ytfQ*, *ydeN*, *ydeM*, *ygeA* and *polB* [42]. Although ChIP-seq has been used only in moderation to study bacterial systems in a few bacterial species, such as *Vibrio harveyi*, *V. cholerae*, *Rhodobacter sphaeroides*, *Mycobacterium tuberculosis*, *S. enterica* and *Caulobacter crescentus* [36, 37, 43–45], it is used to identify novel regulatory interactions, even for well-studied proteins [46, 47].

ChIP-seq, in combination with RNA-seq, could be an efficient tool to get detailed information about bacterial transcription regulation and how bacteria respond to different external conditions.
