**3. Tools for ncRNA identification**

#### **3.1. Tools for miRNA identification**

*2.2.4. Alignment*

After trimming and filtering, reads are ready for alignment or *de novo* construction. Alignment consists of mapping reads to a reference genome. Various alignment tools have been developed [42, 43] (https://omictools.com/read-alignment-category) including frequently used tools like TopHat [44], STAR [45], Bowtie [46], StringTie [47], etc. (**Table 2**). These softwares have their own specifications highlighting the importance of understanding the utility of each tool and the options they offer. The alignment tool used can have great impact on the end results. It has been observed that the choice of aligner and specific options can affect results of differential gene expression analysis [48]. Aligners can be grouped in two types, gapped (also known as split, e.g. STAR, BWA, etc.) and ungapped (e.g. Bowtie, etc.). Bowtie (ungapped group) can easily map reads to a genome, but is less effective at finding spliced junctions. Aligners in the gapped group are able to align reads and detect spliced variants. In the absence of a reference genome, *de novo* assembly aligners (e.g. Trinity [49]) can be used. In the context of lncRNA read alignment, gapped softwares are preferred since the transcripts are not all annotated and portions of the reads of the same transcript may align to one position of the genome and the remaining to another position. Alignment is one of the longest steps in RNA-Seq sequence analysis therefore selection of the right tool might have significant impact on the outcome of the analysis. It is also important to perform mapping quality control following alignment. Quality check includes the percentages of mapped and unmapped reads,

RNA-Seq transcript construction and the alignment steps can demand considerable computing time. Transcript construction tools are many (https://omictools.com/transcript-quantification-category) including commonly used tools like Cufflinks [63], iReckon [64], StringTie [47], etc. This step requires paired-end data and high sequence coverage to reconstruct lowly expressed transcripts. With the assumption that transcripts are species specific, raw data or alignment files from all samples from the same population can be merged to increase coverage [65]. This modification will help clarify transcript boundaries in case of *de novo* transcript assembly. Particular considerations for lncRNA transcript construction include sample pooling according to species and tissue type. LncRNA expression is known to demonstrate tissue

Overall, the procedures for miRNA identification and discovery are less time consuming and do not include as many steps as for mRNA and lncRNA identification. The global process includes quality and adaptors trimming with quality checkpoints before and after each step. A size selection to keep sequences between 17 and 30 nt (sometimes up to 35 nt) is often performed right after the quality and adaptors trimming step. This is followed by read mapping and filtering of other RNA sequences (rRNA, tRNA, snRNA, mRNA, lncRNA, etc.). The reads thought to represent miRNA are analyzed with miRNA prediction tools like miRDeep2 [69],

the location of the reads (intronic and exonic) and the 5′–3′ coverage.

114 Applications of RNA-Seq and Omics Strategies - From Microorganisms to Human Health

*2.2.5. Transcript construction and quantification*

specificity [66–68].

*2.2.6. miRNA processing steps*

The identification of miRNAs can be either annotation of known miRNAs or discovery of novel miRNAs. A variety of algorithms and bioinformatics tools are applied to annotate known miRNAs as well as to discover new miRNAs from sequence data. These tools can use several features such as sequence conservation among species, structural features like hairpin and minimal folding free energy [72]. Many tools are available for miRNA annotation (https://tools4mirs.org/software/known\_mirna\_identification/) [73] including frequently used tools like miRdeep [74], miRanalyzer [75], mirTools 2.0[71], UEA sRNA Workbench [76], sRNAtoolbox [77], and SeqBuster [78] (**Table 3**). Many more tools have been developed for novel miRNA discovery and miRNA precursor prediction (https://tools4mirs.org/software/ precursor\_prediction/)[73] including frequently used tools like MiPred [79], miRanalyzer [75], miR-Abela [80], MiReNA [81], UEA sRNA Workbench [76] and mirDeep [74] (**Table 3**). Major features of miRNA discovery tools have been reviewed [82–84]. Regarding livestock species, the choice of methods for miRNA discovery and novel miRNA annotation vary among studies and species. For example, De Vliegher et al. [85] used miRbase [86] and UNAFold [87] for miRNA annotation and discovery in bovine mammary gland tissues while Peng et al [88] used miRbase [86] and RNAfold [89] for these purposes in porcine mammary glands. In our own studies, miRbase [86] and mirDeep2 [74] were used to identify miRNAs in various tissues including bovine mammary gland tissues [90], milk fat [90–92], milk whey and cells [90].

#### **3.2. Tools for lncRNA identification**

To date, a large number of lncRNA genes have been identified in the genomes of human (141,353), cow (23,896) and chicken (13,085) (http://www.bioinfo.org/noncode/analysis.php, accessed on 24-03-2017). Several methodologies have been described to identify/distinguish lncRNAs from mRNAs and successfully applied to livestock species such as coding potential calculator (CPC) [122], PhyLoCSF [123], coding-non-coding index (CNCI) [124], coding potential assessment tool (CPAT) [125], Predictor of Long non-coding RNAs and mRNAs based on an improved k-mer scheme (PLEK) [126] and Flexible Extraction of LncRNAs (FEELnc) [127], etc. The FEELnc program developed by the functional annotation of animal genome project consortium (FAANG) [128] is recommended as a standardized protocol for lncRNA analyses in animal species. In order to distinguish lncRNAs from mRNAs, FEELnc program uses a machine-learning method for estimation of a protein-coding score according to the RNA size, open reading frame coverage and multi k-mer usage [127]. The FEELnc program can derive an automatically computed cut-off so it maximizes the lncRNA prediction sensitivity and specificity. An overview of tools for lncRNA identification/characterization is listed in **Table 4**.

**Tools Type Major Function/web link References**

ncrdeathdb

resource to ncRNA-associated cell death interactions. www.rna-society.org/

Transcriptome Analysis of Non‐Coding RNAs in Livestock Species: Elucidating the Ambiguity

long noncoding genes. bioinfo.ibp.ac.cn/

RNA, and lncRNA information with immunologically relevant target genes. http://

annotating newly identified human

Present a computational method and program to predict lncRNA DNA-binding motifs and binding sites. lncrna.smu.edu.cn

Facilitates search for the functions of a specific lncRNA or the lncRNAs associated with a given functional term, or annotate functionally a set of human lncRNAs of interest. http://mlg.hit.edu.cn/

Presents a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of single or multiple lncRNAs. www.bio-bigdata.com/

Provides regulatory information about lncRNAs, such as targets, regulatory mechanisms, and experimental evidence for regulation and key molecules participating in regulation. bioinformatics.ustc.edu.cn/lncreg/

Provides comprehensive functional annotations for human lincRNA. http:// www.bioinfo.tsinghua.edu.cn/~liuke/

Integrates ncRNA information related to expression, pathways and diseases in a large number of human tissues and primary cells.

a resource for efficient browsing and visualization of virus-host ncRNAassociated interactions and interaction networks in viral infection. http://www.

www.cbrc.kaust.edu.sa/farna/

rna-society.org/virbase

[138]

http://dx.doi.org/10.5772/intechopen.69872

117

[139]

[140]

[141]

[142]

[143]

[144]

[145]

[146]

[147]

[148]

ncRDeathDB Database Present a comprehensive bioinformatics

LncVar Database Presents genetic variation associated with

IRNdb Database Combines microRNA, PIWI-interacting

AnnoLnc Annotation Presents online portal for systematically

LongTarget Target

LncRNA2Function Functional

Co-LncRNA Function

LncReg Function

Linc2GO Function

FARNA Function

prediction

inferences

inference

inference

inference

annotation

LncVar

irndb.org

lncRNAs.

lncrna2function

Co-LncRNA/

Linc2GO

ViRBase Database Provides the scientific community with



RNA size, open reading frame coverage and multi k-mer usage [127]. The FEELnc program can derive an automatically computed cut-off so it maximizes the lncRNA prediction sensitivity and specificity. An overview of tools for lncRNA identification/characterization is listed in

> binding sites. Predicts transcriptional regulatory relationships between transcription factors and genes. http://rna.

> structure, human lncRNA transcripts and genes. http://www.lncipedia.org/.

eukaryotic lncRNAs. Offers an improved user interface enabling greater accessibility to sequence information, expression data and the literature. http://www.lncrnadb.org/.

annotation resources. Allows achieving refined annotation of lncRNAs within the interested region. http://biocc.hrbmu.edu.cn/

nucleotide polymorphisms (SNPs) in human/ mouse lncRNAs. bioinfo.life.hust.edu.cn/

curation and collection of information on human lncRNAs. http://lncrna.big.ac.cn/

annotation of non-coding RNAs (excluding tRNAs and rRNAs) for 18 species including human, mouse, cow, rat, chicken, pig, fruitfly, zebrafish, *Caenorhabditis elegans* and yeast.

analysis of lncRNAs in domestic animals. Offers information on genome-wide expression profiles and animal quantitative trait loci (QTLs) of domestic animals. http://

Contains annotation of lncRNA loci publicly available with the predominant transcript form consisting of two exons. https://www.

res.xaut.edu.cn/aldb/index.jsp

[129]

[130]

[131]

[132]

[133]

[134]

[135]

[136]

[137]

**Tools Type Major Function/web link References**

sysu.edu.cn/chipbase/.

ChIPBase Database Identifies binding motif matrices and their

116 Applications of RNA-Seq and Omics Strategies - From Microorganisms to Human Health

LNCipedia Database Provides basic transcript information and

lncRNAdb Database Provides comprehensive annotation of

LNCat Database Stores the information of 24 lncRNA

LNCat/

lncRNASNP/

index.php/Main\_Page

www.noncode.org/

gencodegenes.org

LncRNASNP Database Provide comprehensive resources of single

lncRNAWiki Database Provide open-content and publicly editable

NONCODE Database Presents the most complete collection and

ALDB Database Enables the exploration and comparative

GENCODE Database Presents all gene features in the human genome.

**Table 4**.


highly versatile and user-friendly interaction with data in order to easily classify small RNA sequences with putative functional importance [155]. For other small RNAs, ncPRO-seq [156] allows the discovery of unknown ncRNA or siRNA-coding regions from small RNA sequence data. DARIO [94] is a web-tool that allows annotation and detection of ncRNAs from various species but not livestock species. CoRAL [157] is a machine learning method that classifies ncRNAs by relying on biologically interpretable features. Several tools also have been developed for predicting circRNAs such as PredicircRNATool [158] and PredcircRNA [159] which apply a machine learning approach to distinguish circRNAs from other ncRNAs (**Table 5**).

Transcriptome Analysis of Non‐Coding RNAs in Livestock Species: Elucidating the Ambiguity

based on quantifiable deviations from a hypothetical uniform distribution regarding the decisive piRNA cluster characteristics. https://sourceforge.net/

from small RNA-seq data using a density based clustering approach. http://epigenomics.snu.ac.kr/piclustweb

on their genomic location in gene, intron, intergenic, CDS, UTR, repeat elements, pseudogenes and syntenic regions. bicresources.jcbose.ac.in/

and classification of the non-miRNA small RNA transcriptome. http:// seqcluster.readthedocs.io/#

ncRNA- or siRNA-coding regions from sRNA sequence data. http://ncpro.

ncRNAs from various species but not livestock species. http://dario.bioinf.

piRNAs, snRNAs, snoRNAs, scRNAs (small cytoplasmic RNAs), tRNAs, and rRNAs information. lisanwanglab.org/

RNAs (snoRNAs) and small cajal bodyspecific RNAs (scaRNAs). gene.fudan.

uni-leipzig.de/index.py

edu.cn/snoRNAbase.nsf

classifies ncRNA by relying on biologically interpretable features. http://wanglab.pcbi.upenn.edu/coral [152]

http://dx.doi.org/10.5772/intechopen.69872

119

[153]

[154]

[155]

[156]

[94]

[157]

[160]

[161]

**Tools Types Main Features/web link References**

projects/protrac/

zhumur/pirnaquest

curie.fr/.

DASHR

ProTRAC piRNA prediction Detects and analyses piRNA clusters

piClust piRNA prediction Finds piRNA clusters and transcripts

piRNAQuest piRNA database Provides annotation of piRNAs based

SeqCluster ncRNA classification A framework python for the annotation

ncPRO-seq ncRNA discovery Allows the discovery of unknown

DARIO ncRNA discovery Allows annotation and detection of

CoRAL ncRNA classification A machine learning method that

DASHR Database Stores human small ncRNAs: miRNAs,

Sno/scaRNAbase Database A curated database for small nucleolar

**Table 4.** Overview of tools for the analysis of lncRNA sequence data.

#### **3.3. Tools for identification of other non-coding RNA**

Currently, few tools have been developed for the identification of groups of ncRNAs other than miRNAs and lncRNAs. The popular tools for piRNA identification include ProTRAC [152], piClust [153], piRNAQuest [154], etc. (**Table 5**). proTRAC detects piRNA clusters based on a probabilistic analysis with assumption of a uniform distribution while piClust uses a density based clustering approach for the detection of piRNAs. piRNAQuest allows a search of the piRNome for silencers [154]. Another notable framework is SeqCluster [155], a python pipeline for the annotation and classification of non-miRNA small ncRNAs. The pipeline permits a highly versatile and user-friendly interaction with data in order to easily classify small RNA sequences with putative functional importance [155]. For other small RNAs, ncPRO-seq [156] allows the discovery of unknown ncRNA or siRNA-coding regions from small RNA sequence data. DARIO [94] is a web-tool that allows annotation and detection of ncRNAs from various species but not livestock species. CoRAL [157] is a machine learning method that classifies ncRNAs by relying on biologically interpretable features. Several tools also have been developed for predicting circRNAs such as PredicircRNATool [158] and PredcircRNA [159] which apply a machine learning approach to distinguish circRNAs from other ncRNAs (**Table 5**).


**3.3. Tools for identification of other non-coding RNA**

**Table 4.** Overview of tools for the analysis of lncRNA sequence data.

Currently, few tools have been developed for the identification of groups of ncRNAs other than miRNAs and lncRNAs. The popular tools for piRNA identification include ProTRAC [152], piClust [153], piRNAQuest [154], etc. (**Table 5**). proTRAC detects piRNA clusters based on a probabilistic analysis with assumption of a uniform distribution while piClust uses a density based clustering approach for the detection of piRNAs. piRNAQuest allows a search of the piRNome for silencers [154]. Another notable framework is SeqCluster [155], a python pipeline for the annotation and classification of non-miRNA small ncRNAs. The pipeline permits a

**Tools Type Major Function/web link References**

lncrna2target.org/

cbi.pku.edu.cn/

sourceforge.net/

com/tderrien/FEELnc

ibiomedical.net/plek/

a web interface for searching targets of a particular lncRNA or for the lncRNAs that target a particular gene. https://www.

Identifies lncRNAs-associated modules from protein interaction networks and predicts the function of lncRNAs based on the protein functions in the modules. lncin.ym.edu.tw

Integrates experimentally verified functional interactions between noncoding RNAs (excluding tRNAs and rRNAs) and other biomolecules (proteins, RNA and genomic DNA). www.bioinfo.org.cn/NPInter

Distinguishes between coding and noncoding RNA. Uses a Support Vector Machine-based classifier to assess the protein-coding potential of a transcript. cpc.

com/www-bioinfo-org/CNCI

Distinguishes between protein-coding and non-coding sequences independent of known annotations. Applies to a variety of species without whole-genome sequence or with poorly annotated information. https://github.

Distinguishes between coding and noncoding RNA. Uses a logistic regression model to assess the protein coding potential. rna-cpat.

Derives an automatically computed cut-off so it maximizes the lncRNA prediction sensitivity and specificity. https://github.

Uses k-mer scheme and a support vector machine (SVM) algorithm to distinguish lncRNAs from mRNAs. http://www.

[149]

[150]

[151]

[122]

[124]

[125]

[127]

[126]

LncRNA2Target Database Stores lncRNA-to-target genes. Provides

118 Applications of RNA-Seq and Omics Strategies - From Microorganisms to Human Health

Lncin Function

NPInter Function

CPC Coding

CNCI Coding

CPAT Coding

FEELnc LncRNA

PLEK lncRNA

annotation

annotation

potential assessment

potential assessment

potential assessment

prediction

prediction


**4. Tools for differential expression analysis of non-coding RNA**

controlling FDR while NOISeq is efficient in avoiding false positives [29].

*5.1.1. Bioinformatics tools for target prediction and functional inference of miRNAs*

for inferring the functions of miRNAs is shown in **Figure 4**.

**5.1. Functional inference of miRNAs**

**non‐coding RNA**

**5. Bioinformatics tools for target prediction and functional inference of** 

Following discovery and detection of important ncRNAs from RNA sequence data, the important next steps are to understand their regulatory roles. Since ncRNAs commonly act by interacting with target genes (mostly inhibit expression), various tools have been developed to predict their target genes and to infer their functions (**Tables 3** and **4**). A simple work flow

Inferring individual targets for a given miRNA can be done either by computational or experimental methods. Computational target prediction is coordinated in a sequence-specific manner and the target genes are normally predicted based on information derived from the

Various tools allow for the detection of genes (mRNA or ncRNA) differentially expressed (DE) between two or more conditions or states from sequence data. The major differences among tools are their implemented statistical methods, input and output file formats as well as filtering steps for DE analyses. Many tools such as DESeq [169], edgeR [170], NBPSeq [171], TSPM [172], baySeq [173], EBSeq [174], NOISeq [175], SAMseq [176] and ShrinkSeq [177] use count data as input file, while others like limma [178] and Cufflinks use transformed data or BAM files (the binary version of sequence alignment data) as input, respectively. Tools that use count data can be divided in to two groups; parametric (DESeq [169], edgeR [170], NBPSeq [171], TSPM [172], baySeq [173], EBSeq [174]) and non-parametric methods (NOISeq [175], SAMseq [176]). For parametric methods, most softwares (baySeq [173], DESeq [169], NBPSeq [171], edgeR [170], EBSeq [174] and NBPSeq) use a negative binomial model to account for over dispersion except ShrinkSeq which has two options for distribution, either negative binomial or a zero-inflated negative binomial distribution. These methods also implement different statistical test approaches; DESeq, edgeR and NBPSeq perform a classical hypothesis testing approach while baySeq, EBSeq and ShrinkSeq apply Bayesian methods. The comparison of methods and performances have been done and reviewed by many authors [29, 179– 183]. In general, no single method performs well for all datasets. In a survey of performance of DE analyses methods, Conesa et al. [29] observed that limma package [178] performed well under many conditions. Many studies observed similar performances by DESeq and edgeR in ranking genes [29, 179–183]. However, DESeq is more conservative while edgeR is more liberal in controlling false discovery rate (FDR) [29]. Other tools such as SAMseq is better in

Transcriptome Analysis of Non‐Coding RNAs in Livestock Species: Elucidating the Ambiguity

http://dx.doi.org/10.5772/intechopen.69872

121

**Table 5.** Overview of tools and databases for sequence analysis of other small ncRNAs.
