**Trinity**

transcript reconstruction. The sequence read length is shown to be one of the key parameters in determining *de novo* assembly strategy. While the overlap-layout consensus (OLC) approach has been used for the assembly of long reads generated from the third-generation sequencing instruments such as PacBio Sequel or Oxford Nanopore, *de Bruijn* graph approach has been used in both *de novo* genome and transcriptome assembly because this computationally effective algorithm can process billions of short reads to reconstruct the transcriptome as complete as possible. In the *de Bruijn* methods, the graphs are constructed from short reads and then paths in this graph are used to generate contigs. In graph construction, a given read is broken into k-mer seeds (nodes) and edges are added between consecutive k-mers (in manner; the suffix of length k−1 of one node is the prefix of length k−1 of the other) and then, these k-mers are arranged into a *de Bruijn* graph structure (**Figure 2**). Contigs are obtained by inversely transforming the optimal path in the *de Bruijn* graph into sequences [21]. However, *de Bruijn* graph-based strategy between *de novo* genome and transcriptome assembly is slightly modified because of the following reasons: (i) while the DNA sequencing depth is expected to be uniform across the genome (except in repetitive regions), the sequencing depth of transcripts can vary considerably, (ii) Genome assembly graph is considered as linear (theoretically one graph for each chromosome), but due to alternative splicing, transcriptome assembly is more complex than genome and requires a graph to represent the multiple alternative transcripts per locus [1, 21]. By considering these challenges, several *de novo* assembly tools such as Trinity [1], SOAPdenovo-Trans [22], Trans-AbySS [23], Oases [24], IDBA-Tran [25], BinPacker [26] and Bridger [27] have been developed so far (Box 1). Most of these tools, which are initially developed for *de novo* genome assembly (except for Trinity) use *de Bruijn* graph-based

58 Applications of RNA-Seq and Omics Strategies - From Microorganisms to Human Health

assembly strategy and have their own pros and cons in transcript reconstruction.

**Figure 2.** The *de Bruijn* graph approach is instrumental for reference-free transcriptome assembly and *de Bruijn* graphs are built from the short reads. These short reads are split into short k-mers (here, k-mer length, 5) and then k-mers are connected by overlapping prefix and suffix (k−1)-mers. When the *de Bruijn* graph is built from reads, the optimal paths are obtained in the graphs and reconstructed transcripts (or contigs) are recovered by inversely transforming the optimal

path in the *de Bruijn* graph.

Trinity's main difference from other transcriptome assembly programs is that it is directly manufactured for *de novo* RNA assembly. It uses the parallel calculation method to create alternate spliced isoforms and transcripts with *de Bruijn* method [1]. Trinity has three functional modules; *Inchworm*, *Chrysalis* and *Butterfly* of which work in succession and perform different tasks [29]. *Inchworm* uses greedy extension model based on k-mer overlap and reports fulllength transcripts for a dominant isoform. Then, *Chrysalis* clusters overlapping contigs and constructs *de Bruijn* graphs. Finally, *Butterfly* process these graphs in parallel and reconstructs full-length transcripts for each isoform. In addition to reconstruct accurate transcripts from RNA-Seq data, Trinity exhibit superior performance in recovering isoforms. Trinity requires extensive computational resources and running time, but it performs best in terms of assembly quality such as N50 value, fewer chimeras and transcript coverage.
