**4. Long non-coding RNAs**

Long non-coding RNAs (lncRNAs) are a wide group of molecules identified in yeast, plants, and mammals. In general, lncRNAs can be defined as polyadenylated or nonpolyadenylated, more than 200 nt long transcripts with low protein coding potential (coding for less than 100 amino acids). Acceleration in transcriptome research, achieved thanks to the development of high-throughput technologies such as microarrays or next-generation sequencing methods, allow us now to fully appreciate complicated interactions that lead to precise gene expression regulation. It turned out that apart from genes encoding proteins, transposons, genes for housekeeping RNAs (like ribosomal RNAs), intergenic regions, which lie between proteincoding sequences, are also being expressed. Intriguingly, similarly to mRNA, those transcripts, namely lncRNAs, are capped at the 5' end and many of them are targets of the splicing process [119]. However, in contrast to mRNA, their expression level is very low and they do not have long open reading frames, which are evolutionary conserved [120]. LncRNA also differ from protein-coding transcripts in ribosome occupancy [121]. Large-scale genomics projects, such as Encyclopedia of DNA Elements (ENCODE), proved that lncRNAs are not only transcrip‐ tome noise, but indeed transcripts with biological functions. As a result of the ENCODE project, it has been shown that 75% of human genome is transcribed and about 80% of those RNA molecules have some biochemical function [122]. Most of the intergenic regions of Arabidopsis, rice or corn are shown to be transcribed as well as human and constitute a source of lncRNAs that are polyadenylated. Such polyadenylated stable lncRNAs are transcribed by RNA polymerase II and can be divided into four groups based on their genomic origin and relationship with adjacent protein-coding genes:


Recently in Arabidopsis and rice [123, 124], another category of lncRNAs has been described — these are nonpolyadenylated transcripts, 50-300 nt in length with low coding potential, but without any sequence similarity to known ncRNAs. This novel group is referred to as inter‐ mediate-sized ncRNAs (im-ncRNAs).

Despite the effort of 20 years of investigation [125], the elusive role of lncRNAs is still not fully described nor understood. Thus far functions of only few such molecules are characterized. We know that lncRNAs are engaged mainly in transcriptional gene expression regulation by acting as scaffolds for transcriptional factors and genetic modifiers, molecular signals, decoys or guides. Moreover lncRNAs can also encode for miRNA and target specific mRNAs for decay or function as miRNA sponges. Most studies has been performed on animal systems, but although research on plants are limited, the emerging picture is that the regulatory functions of plant lncRNAs are similar to animal ones [126]. Till today, almost 40,000 putative lncRNAs have been identified in *A. thaliana* [127] and thousands of them in *Oryza sativa* [128], *Zea mays* [129], *Medicago truncatula* [130], *Populus trichocarpa* [131], and other plant species [132, 133]. With rapid development of bioinformatics tools and transcriptome analysis methodolo‐ gies, genome-wide identifications of plant lncRNAs have been conducted. In maize imple‐ mentation of SVM tools (support vector machines), together with Python pipeline on cDNA dataset resulted in the identification of 2,492 potential ncRNAs, which represent 13.3% of initial sequences. In total, 237 ncRNAs were classified as shRNA precursors, and 1,225 as siRNA precursors, which constituted 59.4% of predicted ncRNAs particles. The remaining 1,011 was considered to be potential long non-coding transcripts [134]. Recently, new gold standard to study the complexity of eukaryotic transcriptomes emerged — the RNA-sequencing technol‐ ogy (RNA-seq). It allows an accurate quantification of expression levels of transcripts and also reveals transcripts that are missing or incomplete from the reference genome. Computational prediction based on RNA-seq data from rice anthers, pistils, seeds, and shoots, together with 40 available rice RNA-seq libraries led to the identification of 2,224 reliably expressed lncRNAs, including 1,624 lincRNAs and 600 long non-coding natural antisense transcripts (NATs). Further verification of rice insertional mutants allowed to set a pool of lncRNAs that are preferentially expressed at the reproductive stage. Several lncRNAs were identified as competing endogenous RNAs (ceRNAs), which sequester miR160 or miR164 in a type of target mimicry [135].

**4. Long non-coding RNAs**

166 Abiotic and Biotic Stress in Plants - Recent Advances and Future Perspectives

relationship with adjacent protein-coding genes:

script

strand.

associated genes

mediate-sized ncRNAs (im-ncRNAs).

Long non-coding RNAs (lncRNAs) are a wide group of molecules identified in yeast, plants, and mammals. In general, lncRNAs can be defined as polyadenylated or nonpolyadenylated, more than 200 nt long transcripts with low protein coding potential (coding for less than 100 amino acids). Acceleration in transcriptome research, achieved thanks to the development of high-throughput technologies such as microarrays or next-generation sequencing methods, allow us now to fully appreciate complicated interactions that lead to precise gene expression regulation. It turned out that apart from genes encoding proteins, transposons, genes for housekeeping RNAs (like ribosomal RNAs), intergenic regions, which lie between proteincoding sequences, are also being expressed. Intriguingly, similarly to mRNA, those transcripts, namely lncRNAs, are capped at the 5' end and many of them are targets of the splicing process [119]. However, in contrast to mRNA, their expression level is very low and they do not have long open reading frames, which are evolutionary conserved [120]. LncRNA also differ from protein-coding transcripts in ribosome occupancy [121]. Large-scale genomics projects, such as Encyclopedia of DNA Elements (ENCODE), proved that lncRNAs are not only transcrip‐ tome noise, but indeed transcripts with biological functions. As a result of the ENCODE project, it has been shown that 75% of human genome is transcribed and about 80% of those RNA molecules have some biochemical function [122]. Most of the intergenic regions of Arabidopsis, rice or corn are shown to be transcribed as well as human and constitute a source of lncRNAs that are polyadenylated. Such polyadenylated stable lncRNAs are transcribed by RNA polymerase II and can be divided into four groups based on their genomic origin and

**1.** intergenic lncRNAs (lincRNAs) that are transcribed from sequences between two genes

**2.** intronic ncRNAs (incRNAs) that overlap with intronic sequences within another tran‐

**3.** natural antisense transcripts (NATs) derived from complementary DNA strand of their

**4.** sense lncRNAs overlapping with one or more exon sequences of the transcript on the same

Recently in Arabidopsis and rice [123, 124], another category of lncRNAs has been described — these are nonpolyadenylated transcripts, 50-300 nt in length with low coding potential, but without any sequence similarity to known ncRNAs. This novel group is referred to as inter‐

Despite the effort of 20 years of investigation [125], the elusive role of lncRNAs is still not fully described nor understood. Thus far functions of only few such molecules are characterized. We know that lncRNAs are engaged mainly in transcriptional gene expression regulation by acting as scaffolds for transcriptional factors and genetic modifiers, molecular signals, decoys or guides. Moreover lncRNAs can also encode for miRNA and target specific mRNAs for decay or function as miRNA sponges. Most studies has been performed on animal systems, but

Another feature that complicates the retrieval of true lncRNAs is their weak sequence conser‐ vation. It is estimated that only from 2% to 5.5% of lncRNAS are conserved in their primary sequence and only some of them may be associated with short conserved elements. Most likely, it is a result of rapid evolution—lncRNAs are frequent targets of positive selection [136]. Some lncRNAs and their target genes can be distinguished by their conserved synteny across species — those lncRNAs play roles in cis-functions [136]. Other lncRNAs may be recognized by conserved secondary structures, which allow them to interact with RNA-binding proteins [124]. Genome-wide analyses carried out so far determined that expression of different groups of lncRNAs is highly tissue-specific and many of them are responsive to biotic and abiotic stress conditions.

In conjunction with the climatic changes, drought is the condition that has been recently extensively studied, and thus many drought-responsive lncRNAs were identified. In *Populus trihocarpa,* a model tree species, RNA-seq experiments conducted on control- and droughttreated plants revealed 504 drought-responsive lincRNAs and allowed for basic annotation set of 2,542 of them. Mutual interaction of miRNA and lncRNAs was also reported; a total of 30 miRNAs were predicted to target the sense strand of lincRNAs, 21 were found to target the antisense strand, and 20 target mimicry events was predicted of known *Populus* miRNA [130]. A potential new model organism of the family Poaceae, *Foxtail millet*, was also subjected to water deficient conditions. Deep transcriptome sequencing revealed 585 lncRNAs responding to PEG-induced drought stress. Those stress conditions induced the expression of 17 lincRNAs and 2 NATs at different expression levels. Qi et al. [138] identified one lncRNA, whose sequence was shared with its counterpart in sorghum. In maize, one of most important crop species, genome-wide identification of differentially expressed lncRNAs during drought conditions led to the identification of 567 upregulated and 97 downregulated lncRNAs, among them 538 particles were considered to be novel. Moreover, 8 lncRNAs molecules were homologous to the miRNA precursors, 62 were classified as both shRNA and siRNA precur‐ sors, and 279 were classified as siRNA precursors [139].

In the best known model plant, *Arabidopsis thaliana*, genome-wide characterization of lncRNAs was performed as well. A correlated expression of lncRNAs with its epigenetic and structural features in response to four stresses (heat, cold, drought, and salt) has been described [140]. The authors identified 245 polyadenylated and 58 nonpolyadenylated lncRNAs that are differentially expressed under stress stimuli, and most of the selected candidates were further validated by qRT-PCR. From experiments on Arabidopsis came best studied cases of plant lncRNAs functions such as: *COLDAIR, COOLAIR, At4/IPS1, npc48,* and *npc536* [141–145].

One of best described mechanism of lncRNAs action is lncRNA transcript IPS1 (Induced by Phosphate Starvation 1). IPS1 can interact with miRNA as a competitor and function as miRNA target mimics, which resembles the miRNA sponges from animal systems. Maintaining the phosphate balance is a complicated mechanism in plants, regulated, among others, by miR399 as described in Paragraph 2.1. Low activity of PHO2, ruled by mRNA cleavage mediated by miR399, causes the elevation of phosphate uptake by increasing the expression of two root phosphate transporters. Phosphate starvation also increases the level of IPS1 transcript that has a 23-nt conserved domain, partially complementary to miR399 with 3-nt mismatch overlapping with the miR399-mediated cleavage site. As a non-cleavable product, IPS1 competes with PHO2 and can therefore weaken the miR399-mediated repression of PHO2 [142]. The miRNA sponge strategy is used in the therapy of human diseases and similar processes in plants (target mimic) and can be a very useful tool in plant research as well as in agricultural applications. As mentioned before, in Arabidopsis to date about 20 putative target mimicry events were predicted, which suggest the potential role of this mechanism in other pathways than the maintenance of phosphate homeostasis [146].

Another model organism, *Saccharomyces cerevisiae*, allows researchers to define and clarify a large number of new and unexpected roles of lncRNAs, such as promoting the timing of gene expression [147], cell cycle regulation during stress conditions [148], or local reduction of histone density and chromatin remodeling in response to glucose starvation [149]. Upon osmostress in yeast, hundreds of stress-responsive genes are induced by the stress-activated protein kinase (SAPK) p38/Hog1. Whole-genome tiling arrays were used to identify a set of Hog1-induced lncRNAs. One of the genes expressing a Hog1-dependent lncRNA in antisense orientation is CDC28, the cyclin-dependent kinase 1 (CDK1) that controls the cell cycle in yeast. Cdc28 lncRNA mediates the establishment of gene looping and the relocalization of Hog1 and RSC from the 3' UTR to the +1 nucleosome to induce CDC28 expression. The increase in expression level of Cdc28 makes cells able to re-enter the cell cycle more efficiently after stress conditions occur. This may represent a more general mechanism to prime the expression of genes needed after stresses [148].
