Recent Advancement on In-Silico Tools for Whole Transcriptome Analysis

*Vidya Niranjan, Lavanya Chandramouli, Pooja SureshKumar and Jitendra Kumar*

## **Abstract**

Delving into the intricate world of transcriptome analysis, this chapter unfolds the story of gene expression in organisms. The classic DNA microarray and RNA-seq methods have long been the pillars, with RNA-seq taking the spotlight for its superior resolution in understanding dynamic aspects. Yet, tools like Hisat2 and DESeq2, while effective, come with the drawback of being time-consuming and reliant on powerful GPUs. The need for quicker, less resource-intensive techniques has sparked a shift toward simpler R and Python-based tools that not only sidestep GPU dependence but also offer enhanced graphical representations. As we navigate through the content, the chapter draws a vivid comparison between the established tools and the emerging ones, highlighting the pressing need for innovative approaches in transcriptome analysis. The narrative guides readers through the fundamentals, from the Central Dogma's backstory to the pivotal role of RNA in gene expression and disease. It uncovers the nuances between RNA-Seq and microarray technologies, providing a comprehensive overview of tools for data collection and interpreting changes in gene expression. Our journey extends to the latest breakthroughs, such as the TACITuS platform and the TALON pipeline, tailored for in-depth analysis of transcriptomes using longread data. The chapter concludes by emphasizing the ever-growing significance of transcriptomics in unraveling complex biological phenomena, with a spotlight on the promising applications of next-generation sequencing. A comprehensive summary ties it all together, detailing the step-by-step protocol of transcriptome analysis, along with insights into current tools, their advantages, and limitations, providing readers with a holistic understanding of their practical application and outcomes.

**Keywords:** transcriptome analysis, in-silico tools, current trends, run time, memory, protocol

### **1. Introduction**

In 1950, Watson and Crick introduced the Central Dogma, outlining the directional flow of genetic information within cells [1]. This fundamental principle of molecular biology involves two key processes: the transcription of DNA into RNA and the

translation of RNA into proteins [2]. Subsequent research revealed the existence of various RNA types, including messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA (rRNA), playing pivotal roles in the synthesis of proteins [3]. During the 1980s, investigations into U-rich RNA, particularly in *Tetrahymena thermophila* and bacterial RNase P complexes, revealed the catalytic function of RNA akin to ribosomes. Subsequent findings highlighted the existence of micro-RNA (miRNA) and their regulatory properties. Both messenger RNA (mRNA) and non-coding RNA (ncRNA) play pivotal roles in governing gene expression and participating in cellular development [4]. RNA stands out as a crucial macromolecule in biological cells, transcribing essential messages from DNA to facilitate protein synthesis, thereby sustaining life. Minor alterations in transcription can disrupt the entire mechanism, sometimes leading to severe diseases. The transcriptome, encompassing all messenger RNA molecules (mRNAs) expressed by an organism, characterizes the intricate web of genetic information. Additionally, the term "transcriptome" extends to describe the mRNA transcripts within a specific cell or tissue type. The field of transcriptomics closely examines the regulation, variation, and mechanisms governing RNA molecules in cellular processes [5]. Comprehensive examination of the entire transcriptome has become increasingly vital for comprehending the modified expression of genetic variants implicated in complex diseases such as cancer, diabetes, and cardiovascular conditions. Primarily, transcriptome analysis unveils fresh insights into biomarker exploration, establishing gene-centric benchmarks for personalized medicine and therapies [6]. Furthermore, transcriptome analysis plays a significant role in advancing research on long-term effects of COVID-19 [7]. By scrutinizing genome-wide differential RNA expression, researchers can gain a deeper understanding of the biological processes and molecular mechanisms governing cell fate, development, and the progression of diseases.

The analysis of the transcriptome commonly involves a comparative evaluation between two groups, notably healthy and diseased conditions [8]. This approach proves valuable in elucidating the functionalities of genes and regulatory pathways. Advanced techniques in transcriptome analysis encompass microarrays, which


#### **Table 1.**

*Comparison of microarray and RNASeq technologies.*

*Recent Advancement on In-Silico Tools for Whole Transcriptome Analysis DOI: http://dx.doi.org/10.5772/intechopen.114077*

provide a comprehensive quantitative and qualitative gene expression profile of the sample by scrutinizing the entire transcriptome. Additionally, RNA-Seq employs high-throughput sequencing to capture all sequences, offering an enhanced perspective on the complexity levels within the eukaryotic transcriptome [9]. **Table 1** illustrates distinctions between RNA-Seq and Microarray technologies.

Despite the drawbacks of RNA-Seq, such as high costs and the need for powerful computing systems, this technology offers the advantage of not depending on prior sequence information. It facilitates the exploration of both known and unknown transcripts. The RNA-Seq process entails preparing libraries and sequencing RNA samples through diverse platforms like Illumina, Nanopore, and PacBio. The sequencing techniques yield fastq files, and the subsequent bioinformatics pipeline is implemented using suitable tools. The following sections delve into discussions about the existing tools, emerging technologies, and the protocol involved in RNA-Seq.
