**3.3. Mapping**

**Figure 3.** Snapshots from RNASeq results dashboard from SanGeniX for an experiment consisting of four groups (or samples). (A) Boxplot: It displays distribution of normalized expression values among different groups. Similar distri‐ bution of normalized expression values among the different groups of interest indicates that any technical biases due to difference in sequencing depth have been taken care of. (B) Heatmap is a convenient way to visualize cluster of genes based upon their expression. Here, log2 fold change of genes in three groups with respect to a reference group, Group1 has been plotted. The color-code helps to infer gene expression level. Scatter plot (C), MA plot (D), and Volca‐ no plot (E) present visual investigation of differentially expressed genes between two conditions, for example, here Group 4 and Group 1. Scatter plot helps to quickly compare the expression of a gene between the two conditions, while MA plot depicts trends of difference in expression over the average expression, and Volcano plot helps to spot

As described above, NGS-based transcriptomic data generation and analysis is a complex and

The process of library preparation is generating cDNA from the large RNA fragments, adding the adapters, and amplifying the cDNA for sequencing. Due to a series of experimental reactions, several biases can be introduced in the library preparation step. In majority of the

multistep process. Every step has some key challenges that hinder the data analysis.

genes by considering both fold change and test statistic.

130 Next Generation Sequencing - Advances, Applications and Challenges

**3.1. Library preparation**

**3. Challenges in RNASeq data generation and analysis**

Accurate mapping of RNASeq reads is a challenging issue because of large data volume, slow mapping speed, false-positive splicing events and incorrect estimation of exon–intron boun‐ daries, large genome size, repeat sequences in the genome, and annotation quality of the genome. Usually, aligners search for introns smaller than a fixed length to reduce the compu‐ tational power, which often leads to missing the splice reads spanning longer introns [66]. Multiple mapping of reads is another major problem that can be due to presence of repeat regions, similar sequences, and number of mismatches allowed in the mapping step. If such reads mapping to multiple regions are discarded, it will lead to gap in the regions that cannot be mapped uniquely, and if it is included, it can lead to false-positive transcription status. Reference-based assembly cannot efficiently detect trans-spliced genes that are formed from splicing and joining of two different precursor mRNAs and found in some disease conditions such as cancer [125, 126]. Additionally, aligners have to cope with sequencing errors, SNP, InDels, other genomics variations and parameters-based, suboptimal mapping outcome. In summary, mapping-based RNASeq analysis can be more effective and complete when reads are long, genome is well-annotated, and it can be combined with *de novo* genome assembly to identify novel transcripts.
