**Practical Data Processing Approach for RNA Sequencing of Microorganisms Sequencing of Microorganisms**

**Practical Data Processing Approach for RNA** 

DOI: 10.5772/intechopen.69157

Toshitaka Kumagai and Masayuki Machida Toshitaka Kumagai and Masayuki Machida Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.69157

#### **Abstract**

[196] Dubois PCA, Trynka G, Franke L, Hunt KA, Romanos J, Curtotti A. et al. Multiple com‐ mon variants for celiac disease influencing immune gene expression. Nature Genetics. Apr

36 Applications of RNA-Seq and Omics Strategies - From Microorganisms to Human Health

[197] Musunuru K, Strong A, Frank‐Kamenetsky M, Lee NE, Ahfeldt T, Sachs K V. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature. 5Aug

[198] Castel SE, Levy‐Moonshine A, Mohammadi P, Banks E, Lappalainen T. Tools and best practices for data processing in allelic expression analysis. Genome Biology. 17 Sep

[199] Stevenson KR, Coolon JD, Wittkopp PJ. Sources of bias in measures of allele‐specific expression derived from RNA‐seq data aligned to a single reference genome. BMC

[200] McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A. et al. The genome analysis toolkit: A map reduce framework for analyzing next‐generation DNA

[201] GTEx Consortium. The Genotype‐Tissue Expression (GTEx) project. Nature Genetics.

sequencing data. Genome Research. Sep 2010;**20**(9):1297‐1303

2010;**42**(4):295‐302

2015;**16**:195

2010;**466**(7307):714‐749

Genomics. 2013;**14**(1):536

Jun 2013;**45**(6):580‐585

The rapid evolvement of sequencing technology has generated huge amounts of DNA/ RNA sequences, even with the continuous performance acceleration. Due to the wide variety of basic studies and applications derived from the huge number of species and the microorganism diversity, the targets to be sequenced are also expanding. The huge amounts of data generated by recently developed high-throughput sequencers have required highly efficient data analysis algorithms using recently developed high-performance computers. We have developed a highly accurate and cost-effective mapping strategy that includes the exclusion of unreliable base calls and correction of the reference sequence through provisional mapping of RNA sequencing reads. The use of mapping software tools, such as HISAT and STAR, precisely aligned RNA-Seq reads to the genome of a filamentous fungus considering exon-intron boundaries. The accuracy of the expression analysis through the refinement of gene models was achieved by the results of mapped RNA-Seq reads in combination with ab initio gene finding tools using generalized hidden Markov models (GHMMs). Visualization of the mapping results greatly helps evaluate and improve the entire analysis in terms of both wet experiment and data processing. We believe that at least a portion of our approach is useful and applicable to the analysis of any microorganism.

**Keywords:** RNA sequencing, computational analysis, microorganisms, gene modeling, alternative splicing
