**1. Introduction**

Genetic and epigenetic features encompassed in the genome are the basic determinants of fate and functions of cells. At the human interface, qualitative and/or quantitative differences in transcripts are the first level readout of these features in any specific context of their identification

© 2017 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

[1]. These contexts may refer to a diseased state or the influence of stimulation such as intrinsic ligands or response to immunogens. With the total transcripts often referred to as transcriptome, the stage-specific or cell type-specific transcriptome of cells are valuable to evaluate the genetic and epigenetic features characteristic to them. From high- to low-input RNA, the RNA sequencing methods have considerably improved to appreciate the inter- and intra-level population heterogeneity of cells. Not restricted to messenger RNA (mRNA), these technologies are also being increasingly exploited to analyze other transcription-based products such as microRNAs and lncRNAs, reaching out to the identification of over 10–30 pg of a human cell or tissue [2]. RNA or transcripts are of two categories, protein coding mRNAs which synthesize protein and non-coding RNAs involved in regulating gene expression and in cell structure maintenance. mRNA makes up only 6% of the total RNA content of a cell or tissue; a number of methods and kits are available for RNA extraction from the cell [2, 3].

**2.2. Applicability of transcriptome data**

**2.3. Requirements**

polyadenylation sites with AAA.

promoters (**Figure 1**), and isoform-specific expression profiles [7].

Functions of each gene are not completely defined, information about the involvement of genes in functional pathways is identified and available from biological databases which provide clues on how each gene behaves in different metabolic pathways. Estimating the genes expressed in a particular biological condition allows comparing with the existing annotations. Only a small percentage of the genome is expressed in each cell, and a portion of the RNA synthesized in the cell is specific for that cell type [4], identifying the genes which are differentially expressed in similar tissue, but different context has therapeutic significance. Moreover, transcriptome sequencing allows identifying transcript level variations such as cassette exon, mutually exclusive exons, intron retentions, indels, alternative splice junctions, alternative

Transcriptome Sequencing for Precise and Accurate Measurement of Transcripts and...

http://dx.doi.org/10.5772/intechopen.70026

147

The number of biological/technical replicates, adequate sequencing depth, and essentially, the sequencing qualities are the major factors that should be accounted in a sequencing-based

**Figure 1.** Alternative splicing. Here exons are boxes and lines are introns. Promoters represented by arrows and

The human genome has more than 99.5% sequence identity to each other at the genomic level when analyzed in toto. However, they are also paradoxically personalized and are amenable to somatic variations. Hence, the cells could also be heterogeneous at genome level within an individual, and the genomic sequence variations are necessary to be accounted whenever they are analyzed at the transcriptome level. Toward this, the sequence obtained by RNA sequencing also reflects their coding sequence in the genome, kept aside, the RNA editing. Further, there are a plethora of other sequence determinants that could also be analyzed by sequence-based identification of transcripts. These determinants include the isoforms, gene fusions and identification of transcripts from putative pseudogenes. Unarguably, human cancer cells or tissues of diverse origins and stages in different populations are the most explored differential genome and transcriptome to date accounting the amount of data derived by RNA sequencing [4]. The Cancer Genome Atlas (TCGA) is probably the most extensive resource of providing access to cancer data especially from next-generation sequencing (NGS) platform. TCGA provides a number of options to perform analysis on cancer-related experimental data and stands as a major data repository for cancer data.
