**3.3. The splicing patterns for "Identical", "Alternative", and "Unmapped" reads**

As concluded above, a reference transcriptome mainly affects the mapping of junction reads. One interesting question is what kind of junction reads tend to be mapped identically, alternatively, or unmapped. In order to characterize the splicing patterns, we focus on only two-exon junction reads that are uniquely mapped when the RefGene annotation is used. For every junction read, we calculate the number of overlapping nucleotide bases with its left exon (OL) and right exons (OR), respectively. Then the minimum of OL and OR is chosen for histogram analysis (Figure 5). Only the results for lung, liver, kidney, and heart samples are shown in Figure 5, and for the rest of 12 samples, the patterns were very similar to those in Figure 5 (data not shown). Since the full read length is 75 bp long, the MOE (Minimum Overlap with an Exon, MOE = min(OL,OR)) ranges from 1 to 37 for any junction read.

For "Identical" junction reads, the typical MOE ranges from 15 to 37, and the frequency drops to nearly 0 when MOE is less than 10 (left panels in Figure 5). For "Alternative" junction reads, the most dominant MOE is 1 (middle panels in Figure 5), representing an average of one-third of cases. In general, those "Alternative" reads have very small MOE. For those junction reads with MOE of 1, 2, and 3, it is virtually impossible to map them 'correctly' without the prior knowledge on transcripts. The MOE for "Unmapped" reads has a much broader range with peaks from 4 to 12 (right panels in Figure 5). In order to map a junction read without a reference transcriptome, the read should have sufficient overlaps with exons at both ends. The majority of "Identical" reads meet this requirement (left panels in Figure 5). However, if the overlap with one end is too short, let's say 1 or 2 nucleotide bases, this read will be more likely mapped to only a single exon with the remaining couple of bases mapping to the intron region adjacent to that exon (middle panels in Figure 5). Otherwise, such junction reads become either

**Figure 4.** The impact of a gene model on RNA-Seq read mapping (read length = 75 bp). (A) Composition of mapped reads; (B) effect on mapping of non-junctions reads; (C) effect on mapping of junctions reads. (Note: The 16 tissue sam‐ ple names are denoted as follows: a: adipose; b: adrenal, c: brain; d: breast; e: colon; f: heart; g: kidney; h: leukocyte; i:

In Figure 4A, we divided uniquely mapped reads into two classes, i.e., non-junction reads and junction reads, and investigated the impact of a gene model on their mapping. Accordingly to Figure 4A, approximately 23% of mapped reads were junction reads, and the remaining 77% were non-junction reads. For non-junction reads (see Figure 4B), 95% remained mapped to exactly the same genomic location regardless of the use of a gene model. Without a gene model,

liver; j: lung; k: lymph node; l: ovary; m: prostate; n: skeletal muscle; o: testis; and p: thyroid.)

434 Next Generation Sequencing - Advances, Applications and Challenges

unmapped or mapped to different genomic regions as non-junction reads if the overlap is something between (right panels in Figure 5).

**Figure 5.** The splicing patterns and distribution of MOE (Minimum Overlap with an Exon) for junction reads. The typi‐ cal MOE for "Identical" junction reads ranges from 15 to 37. For "Alternative" junction reads, the most dominant MOE is 1, representing an average of one-third of cases. In contrast, the MOE for "Unmapped" reads has a much broader range with peaks from 4 to 12. Note the scale for y-axis is not uniform.
