**8.2. Transcriptomics and RNA sequencing**

RNA-seq is the NGS method that sequences the transcriptome, that is, all the RNA transcript sets expressed by the genome in cells, tissues, and organs at different stages of an organism's life cycle [12, 18, 19, 20, 30]. High-throughput RNA sequencing using cDNA fragments was first employed in mammalian cells [131] and yeast [132], and now it is used for a wide range of organisms [133]. Without transcriptome data, the genome sequence alone is of limited use for understanding the intricacies of genome function in biology. RNA-seq provides technical reliability and sensitivity and unambiguous maps of the transcribed regions of the genome with high accuracy in quantitative expression levels, identification of tissue-specific transcript variants and isoforms (SNPs and mutations), transcription boundaries and splicing events, transcription factors, and small and large noncoding RNAs (ncRNA) involved in the regulation of gene expression [131–137].

At least 90% of the mammalian genome is actively transcribed to produce different classes of ncRNAs [135, 136], including ribosomal RNA (rRNA), transfer RNA (tRNA), microRNA (miRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), small interfering RNA (siRNA), PIWI-interacting RNA (piRNA), and large intergenic noncoding RNA (lincR‐ NA) [138–141] and retrotransposons [142–146]. The known classes of functional ncRNAs consists largely of those supporting protein translation (ribosomal, transfer, and small nucleolar RNAs), transcript splicing (snRNAs) [137, 138], and miRNA that target conserved binding sites of mRNAs to decrease their stability [139]. The new class of small piRNA was discovered to interact with PIWI regulatory proteins and RNA to silence transposons in the germ line and regulate gene expression in the soma [140]. The lincRNAs are expressed by a different class of actively transcribed RNA genes and they have diverse roles in processes such as cell cycle regulation, immune responses, brain processes, and gametogenesis [147–150]. A substantial fraction of lincRNAs binds to chromatin-modifying proteins and may modulate gene expression by bringing together protein complexes for specific functions [150].

Defective splicing of transcripts and expression levels are believed to contribute to at least 50% of inherited human diseases [151]. Altered expression levels of specific isoforms or alleles have been identified in ischemic stroke, type 2 diabetes, colorectal cancer, chronic lymphocytic leukemia, and many other diseases [30]. Dysregulation of gene expression, splicing, and other editing events in specific cell types have been associated also with the pathogenesis of cardiovascular diseases, neurological disorders, and different cancers [137, 151–153]. Similarly, different classes of small and large ncRNAs have been found to be associated with different diseases and cancers [147–149]. The expressed information of the transcriptome varies enormously between different cells of a multicellular organism and depends on the cell type and its functional and temporal state. At least two important databases, the Encyclopedia of DNA Elements (ENCODE) and Genotype-Tissue Expression (GTEx) (Table 3), have focused on mapping functional elements at high resolution and the regulation of gene expression and the transcriptome in different tissues of humans. The GTEx project is one of the most recent projects that have generated a large amount of RNA sequence data by RNA-seq technology to investigate the patterns of transcriptome variation across 43 tissues and 1641 samples from 175 postmortem individuals [153]. The analysis included 20,110 protein-coding genes and 11,790 lncRNAs with 88% and 71%, respectively, detected in at least one sample. A relatively small number of genes (a few hundred) were expressed for most tissues with a definite, differential modular profile showing tissue-preferential expression. In addition, 3,046 protein-coding genes were expressed together with an adjoining repeat element such as Alu, L1, ERV, Tigger, and Charlie [153]. These findings provide a better systematic understanding of the heteroge‐ neity among a diverse set of human tissues and the enormous complexity and variation involved in the regulation of genome expression.
