**3. Long non-coding RNA (lncRNA)**

## **3.1. Discovery and identification of lncRNA**

In the era of NGS, the high-throughput RNA-seq data has lime lighted the necessity of non-coding part of the genome in the gene functioning. Non-coding RNAs (ncRNAs) are transcribed from non-coding DNA, earlier called junk DNA. An extensive study on transcriptomes from multiple species indicated that about 90% of the genome can be transcribed, whereas only a small portion of such transcribed regions potentially codes for proteins [21]. The ncRNAs are categorized into housekeeping and regulatory ncRNAs on the basis their expression and role in different cells types. The expression of housekeeping ncRNAs (e.g., t-RNA, r-RNA, and snRNA) is prominent and has a structural role in all the cells [22]. While, the regulatory ncRNA shows temporal expression in specific cell types and includes microR NAs (miRNAs), small interfering RNAs (siRNAs), enhancer RNAs (eRNAs), promoter-associated RNAs (PARs), Piwi-interacting RNAs (piRNAs), and long non-coding (lncRNA). The criteria of >200 nt length are set for the identification of lncRNAs among all the organisms [23]. lncRNA comprises of a major group of ncRNAs and regulate various biological processes through different molecular mechanisms.

In plants, the lncRNA was first reported in *Glycine max* [24], involved in changing the sub-cellular localization of a protein. In *Medicago truncatula* and *Oryza sativa*, *MtENOD40* and *OsENOD40* lncRNAs were discovered in nodule formation, respectively, and signify the involvement of lncRNA in biological roles [25, 26]. Likewise, in other plant species, for example, *COLD-INDUCED LONG-ANTISENSE INTRAGENIC RNA* (*COOLAIR*) and *COLD-ASSISTED INTRONIC NONCODING RNA* (*COLDAIR*), lncRNA in *Arabidopsis thaliana* [27, 28], involved in regulation of flowering, were identified and studied for their diverse function in the plant system. Furthermore, the exponential rise in high-throughput RNA-seq data have contributed to the discovery of lncRNA at genome-wide level, but the studies are limited in plants to some species. The amalgamation of experimental RNomics with the computational approaches has contributed to the identification of lncRNA and their function in wide-ranging biological processes [6]. The accurate identification and functional annotation is an ongoing challenge in

Role of Next-Generation RNA-Seq Data in Discovery and Characterization of Long Non-Coding…

**Tissues/organ/stress Reference**

http://dx.doi.org/10.5772/intechopen.72773

115

[32]

[30]

[33]

[35]

[44]

**Sr. no. Plant name Number of** 

**Table 1.** Occurrence of lncRNA in various plant species.

**lncRNAs**

1 *Amborella trichopoda* 2569 Tissue [32] 2 *Arabidopsis thaliana* ~6480 Organ-specific and stress responsive [22]

6 *Fragaria vesca* 1556 Floral, fruit tissue, and two vegetative tissues [34]

 *Oryza sativa* 2224 Development and reproductive organs [36] *Physcomitrella patens* 2711 Developmental stages [32] *Populus trichocarpa* 2542 Control and drought condition [37] *Setaria italica* 584 Drought stress [38] *Selaginella moellendorffii* 4422 Root, stem, and leaf [32] *Solanum lycopersicum* 10,774 In wild and ripening mutant [39]

14 *Triticum aestivum* 44,698 Organ-specific and stress responsive [31]

15 *Vitis vinifera* 4506 Organ-specific [32] 16 *Zea mays* 1704 Different tissues [42]

cells

tissues

root tissues

1565 Tomato yellow-leaf curl virus stress [40]

283 Fungal-responsive lincRNAs [41]

664 Drought-stressed leaves [43]

7245 Leaves (under conditions of nitrogen deficiency and sufficiency)

development

3 *Chlamydomonas reinhardtii* 2214 Cultured cells and synchronized vegetative

5 *Cucumis sativus* 3274 Fruit development and sex differentiation

7 *Medicago truncatula* 23,324 Control, osmatic, and salt stress in leaf and

4 *Cicer arietinum* 2248 Three vegetative tissues and flower

**Figure 1.** Pipeline for identification of long non-coding RNA.

In plants, the lncRNA was first reported in *Glycine max* [24], involved in changing the sub-cellular localization of a protein. In *Medicago truncatula* and *Oryza sativa*, *MtENOD40* and *OsENOD40* lncRNAs were discovered in nodule formation, respectively, and signify the involvement of lncRNA in biological roles [25, 26]. Likewise, in other plant species, for example, *COLD-INDUCED LONG-ANTISENSE INTRAGENIC RNA* (*COOLAIR*) and *COLD-ASSISTED INTRONIC NONCODING RNA* (*COLDAIR*), lncRNA in *Arabidopsis thaliana* [27, 28], involved in regulation of flowering, were identified and studied for their diverse function in the plant system. Furthermore, the exponential rise in high-throughput RNA-seq data have contributed to the discovery of lncRNA at genome-wide level, but the studies are limited in plants to some species. The amalgamation of experimental RNomics with the computational approaches has contributed to the identification of lncRNA and their function in wide-ranging biological processes [6]. The accurate identification and functional annotation is an ongoing challenge in


**Table 1.** Occurrence of lncRNA in various plant species.

transcribed from non-coding DNA, earlier called junk DNA. An extensive study on transcriptomes from multiple species indicated that about 90% of the genome can be transcribed, whereas only a small portion of such transcribed regions potentially codes for proteins [21]. The ncRNAs are categorized into housekeeping and regulatory ncRNAs on the basis their expression and role in different cells types. The expression of housekeeping ncRNAs (e.g., t-RNA, r-RNA, and snRNA) is prominent and has a structural role in all the cells [22]. While, the regulatory ncRNA shows temporal expression in specific cell types and includes microR NAs (miRNAs), small interfering RNAs (siRNAs), enhancer RNAs (eRNAs), promoter-associated RNAs (PARs), Piwi-interacting RNAs (piRNAs), and long non-coding (lncRNA). The criteria of >200 nt length are set for the identification of lncRNAs among all the organisms [23]. lncRNA comprises of a major group of ncRNAs and regulate various biological processes

through different molecular mechanisms.

114 Next Generation Plant Breeding

**Figure 1.** Pipeline for identification of long non-coding RNA.

the field of bioinformatics for high-throughput RNA-seq data. The data of identified lncRNAs in plants is timely submitted to the different databases [29]. A pipeline with multiple filters has been designed for the assembly and identification of high confidence lncRNAs in **Figure 1** [30, 31]. The present status of most of the identified lncRNAs in different plant species are mentioned in **Table 1**.

**4.3. Precursor lncRNA**

findings [58].

**4.4. RNA-dependent DNA methylation**

**4.5. Chromosome looping**

**4.6. Protein re-localization**

lncRNAs constitute an important class of riboregulators by acting as a precursor in the synthesis of shorter ncRNAs, such as miRNAs and siRNAs. In this mechanism, some lncRNAs are processed to shorter ncRNAs or may directly act as a precursor [55]. The genes of primary miRNA transcripts (pri-miRNA) encoding miRNAs are transcribed by RNA polymerase II [56]. In plants, miRNA constitutes the modest portion in small regulatory ncRNA pool due to the presence of other complex small regulatory ncRNAs. In addition, they have plant-specific RNA polymerase IV/V involved in the transcription of siRNAs and endogenous siRNAs [57]. For example, in *Triticum aestivum*, 19 lncRNAs were predicted as a precursor of 28 miRNAs [31]. In *Arabidopsis*, the 24-nt sequence of several siRNAs were matched with five lncRNAs (npc34, npc351, npc375, npc520, and npc523), which was considered as potential precursor lncRNAs. The mapping of siRNAs on both the strands of lncRNAs also strengthened the

Role of Next-Generation RNA-Seq Data in Discovery and Characterization of Long Non-Coding…

http://dx.doi.org/10.5772/intechopen.72773

117

The modification of chromatin is facilitated by recruitment of chromatin modifiers through lncRNA and small RNA (sRNA) into the specific locations in DNA. This RNA-dependent DNA methylation (RdDM) is a conserved process that recruits DNA methyltransferase and histone modifiers for DNA methylation and suppressive histone modification, respectively [59].

This mechanism is different from RdDM and histone modification, as it only involves the structural changes of chromatin. Thereby, it affects the binding potential of RNA polymerase and other transcription factors [60]. A persuasive example of chromosome looping mechanism in plants by *APOLO* lncRNA has been described in auxin transport by regulating the PID expression, an auxin transporter. When locus of *APOLO* lncRNA is transcribed by RNA Pol V and modified by RdDM, the expression of the locus is suppressed and loops to PID. This causes the inhibition of PID transcription. In contrast, when RNA Pol II carry out the transcription of APOLO lncRNA the looping of PID is restrained resulting in the expression of PID [60].

The mechanism of lncRNA in protein relocalization was first described in *G. max* and *Medicago sativa* [61, 62]. The symbiotic interactions among soil bacteria and leguminous plants are regulated by *Enod40* gene (early nodulin gene) which is induced by nitrogen-fixing bacteria in the pericycle and dividing cortical cells of roots [24, 63]. The diverse occurrence of *Enod40* lncRNA was suggested by its presence in non-leguminous plants, such as rice [26, 64]. The secondary structure of *Enod40* lncRNA is highly stable and has five highly conserved domains. The ORF of *Enod40* is very short and synthesis two short peptides. These short peptides regulate the biological activities of Enod40 and consequently help in nodulation [65, 66]. In *M. truncatula*, *Enod40* has been reported in the re-localization of MtRBP1 (*Medicago truncatula* RNA binding protein 1). *Enod40* showed direct interaction with MtRBP1 and re-localized the protein during

nodulation process from nuclear speckles into cytoplasmic granules [25].
