**1. Introduction**

Next-generation sequencing (NGS) technologies provide a new platform for the production of high-throughput sequencing data in less time at reduced cost. The tremendous

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

improvements in past years have allowed the sequencing of millions of DNA fragments in parallel. It has shifted the genomics to a newer edge by capturing the small details of DNA fragments. Earlier, Maxam and Gilbert's [1] and Sanger sequencing [2] techniques were leading approaches after the discovery of the DNA structure [3]. However, these techniques were time-consuming and limited to small-scale, dealing with few genes to the genome of simple organisms. But the necessity of sequencing the complex genome in short time and reduced cost have technologically advanced the sequencing approaches and evolved as NGS technologies. The NGS systems provide rapid, reproducible, and highly accurate sequencing techniques, and are based on the short-read sequencing approaches and a more advance single-molecule long-read sequencing [4]. The short-read sequencing approaches are dependent on sequencing by synthesis (SBS) and sequencing by ligation (SBL) methods. Further, these methods require pre-processing of DNA before directly proceeding to the sequencing steps, according to the requirement of different NGS platforms [4]. In SBS approach, the nucleotides are added by the polymerase into the elongating DNA strand and the signal is received in the form of fluorescence or ionic concentration change for every single nucleotide incorporated [5, 6]. Besides this, in SBL approach, probes having one- or two-base matching, bound to fluorophore, are ligated to the adjacent oligonucleotide on DNA fragments. The emitted fluorescent spectrum identifies the complementary bases of the probe at a specific position and reset primers are used to encrypt the complete DNA sequence [5]. Most of the short-read sequencing approaches require the clonal amplification of DNA on the solid surface such as bead-based, solid-state, and generation of DNA nanoball [5]. In all the methods, initially the DNA is fragmented and then ligated to a common set of adaptor for amplification and consequently ensue for DNA sequencing [5]. The short-read sequencing approaches include 454, illumina, SOLiD, and Ion Torrent platforms. Moreover, the *in-silico* approaches are used for the assembly of data generated by after these techniques [6]. The limitations in short-read sequencing approaches like *de novo* sequencing and the resolution of genomic variation leads to the development of more advance long-read sequencing approaches [6]. The long-read sequencing approaches are used for complex genomes with several long repetitive elements, structural variation, and alteration in copy number, which are significant for the occurrence of disease, and for evolution and adaptation [7–9]. It produces long reads of several kilobases and allows the higher resolution of the genome. In contrast to short reads, a single long-read can completely span the repetitive or complex region of genome, thus reducing the probability of vagueness in the size and positions of the genomic element [6]. Pacific Biosciences and Oxford Nanopore are commercially available sequencing technologies which provide the platform for sequencing the long reads with thousands of bases per read. These technologies are based on single-molecule sequencing, but have different methods of nucleotide detection. Oxford Nanopore is based on the detection through nanopores while Pacific Biosciences uses optical detection inside zero-mode waveguide [10]. Besides this, in synthetic approach, the data of short-read sequencing is combined with informatics and biochemical approaches for the construction of synthetic long reads. Long reads allow researchers for a deep transcriptomic study such as allelespecific transcription, alternative splicing, and in the identification of exact connectivity of exons and discern gene isoforms [6, 11, 12].

**2. High-throughput RNA sequencing**

**3. Long non-coding RNA (lncRNA)**

**3.1. Discovery and identification of lncRNA**

Transcriptome consists of a whole set of transcripts present in a cell, and their expression level in particular developmental stage and cellular conditions. The detailed study of an organism at transcriptome level is necessary for revealing the molecular constituents involved in that particular stage or condition of the tissue. The high-throughput RNA-sequencing (RNA-seq) has emerged as an important technique in the field of transcriptomics for studying all the aspects of gene expression at large scale. It is one of the most commonly used techniques for quantification and mapping of transcriptomes. It involves the conversion of RNA into cDNA, followed by random sequencing of cDNAs fragments by using NGS platforms [13]. The generated millions of short reads were assembled by various bioinformatics approaches. Consequent mapping of these short reads reveals the position of gene transcribing the RNA on the reference genome or sets of a gene [13]. The high-throughput technologies also include direct RNA sequencing (DRS), in which the native RNA is directly sequenced without proceeding to the step of cDNA preparation. The technique is successful in sequencing native polyA+, where reverse transcription is undesirable. It is applicable in determining the precise sequence, identification of alternative polyadenylation sites, and deals with the small amount of nucleic acid [14]. In cap-assisted gene expression (CAGE) technique, RNAs with a 5′ cap are targeted. The short sequence tags are generated from 5′ ends of targeted RNAs with one tag per RNA molecule and allow the precise mapping of 5′ ends [15]. Series analysis of gene expression (SAGE) is another method for the sequencing of RNA molecules which target polyadenylated messages, and tags are generated near 3′ ends, typically one internal tag per RNA molecule [16]. Similarly, paired-end tags (PET) also targets polyadenylated RNA molecules, but the combined information on 5′ and 3′ ends of same RNA molecule generates the sequence tag [17]. Furthermore, rapid amplification of cDNA ends (RACE) is a PCR-based method used to identify the unknown sequences in conjunction with a known region. Together with the NGS technologies, it can be utilized for deep transcriptome sequencing of the particular locus [18]. Targeted RNA sequencing is also meant for a specific locus and by using tiling microarrays RNAs are selected and sequenced [19]. RNA profiling method by GRO-seq measures the steady-state levels of RNA and combined NGS analysis with the nuclear run-on experiments to generate information on RNA polymerase complexes competent with transcription [20]. This high-throughput RNA-seq is helpful in finding out the transcript (messenger RNAs, non-coding RNAs, and small RNAs) of species in short time and in determining the 5′ and 3′ splice sites, splicing patterns, and post-transcriptional modifications. The quantification of transcripts reveals the change in expression of genes in different conditions.

Role of Next-Generation RNA-Seq Data in Discovery and Characterization of Long Non-Coding…

http://dx.doi.org/10.5772/intechopen.72773

113

In the era of NGS, the high-throughput RNA-seq data has lime lighted the necessity of non-coding part of the genome in the gene functioning. Non-coding RNAs (ncRNAs) are
