The Function of lncRNAs as Epigenetic Regulators

*Ana Luisa Pedroso Ayub, Debora D'Angelo Papaiz, Roseli da Silva Soares and Miriam Galvonas Jasiulionis*

#### **Abstract**

Recently, the non-coding RNAs (ncRNAs) have been classified in different categories, and its importance in regulating different cellular processes has been unravelled. The long non-coding RNAs (lncRNAs) can interact with DNA, other RNAs and proteins, including epigenetic modifiers. Some lncRNAs are related to genomic imprinting and are associated with chromatin-modifying complexes that can regulate gene transcription. It is well established that cancer cells have different epigenetic alterations and some of these modifications are associated with lncRNAs. Studies of cancer-associated lncRNAs have defined its function in the process of tumorigenesis, its impact on cell proliferation, cellular signalling, angiogenesis and metastasis. Therefore, having a better knowledge of their role might contribute to a better understanding of the diseases. In this chapter, we will discuss about lncRNA classification and functions, epigenetic marks and how they can guide transcription. Nevertheless, we will discuss how these mechanisms can interact and guide gene expression, as well as recently findings of dysregulation of lncRNAs in cancer.

**Keywords:** epigenetics, lncRNAs, DNA methylation, histone modifications, cancer

#### **1. An overview**

The patterns of gene expression of a cell are altered throughout its lifetime, and these changes occur as a response to different stimuli. For example, during the differentiation stage of an embryonic cell, a group of active genes dictates the cell fate, while after differentiation those genes are silenced since they are no longer needed for that task. In this manner, shifts in gene expression may occur within different mechanisms. However, the most important alterations occur in the epigenome level. The epigenome is dynamic, being constantly altered by different chemical modifications such as DNA methylation, histone modifications, nucleosome positioning and chromatin remodelling. Those changes make the DNA sequences more or less accessible to the transcriptional machinery, altering gene expression in a cell. The regulation of these mechanisms is complex and involves enzymes, proteins and RNA molecules. In the last years, it has been shown that long non-coding RNAs are also responsible for regulating transcription and they can do it in three different levels: pre-transcriptional, transcriptional and post-transcriptional. Besides these regulatory functions, they can also alter gene expression by altering the epigenome. Epigenomic alterations may alter gene expression and are related to the onset of many diseases and have been reported

to be crucial to cancer development. In cancer cells, tumour suppressor genes are silenced, and oncogenes are overexpressed, and these alterations can be driven by epigenetic modifications regulated by lncRNAs. In this chapter, we will discuss about lncRNAs and epigenetic marks. Nevertheless, we will approach how they can interact with each other to regulate gene expression and their role in cancer.

## **2. Long non-coding RNA**

The discovery of ribonucleic acid (RNA) molecules that do not code for proteins has drastically altered our understanding of molecular biology. Until recent years, the central dogma of biology described the DNA as the source of information from which an encoded gene was transcribed into a RNA strand and after it would be translated into a protein. However, in the human genome, approximately 93% of the DNA can be transcribed into RNA, but only around 2% of that would be protein-coding messenger RNA (mRNA). The remaining transcripts were therefore classified as transcriptional noise. With the rapid advance in molecular biology techniques, including large-scale sequencing, it is now known that many thousands of non-coding transcripts are encoded by the genome. These transcripts represent more than 70% of the genome, and they are transcribed into non-coding RNA (ncRNA) molecules. This knowledge opens up a completely new universe, and currently more than 40 types of non-coding RNAs have already been described. Among the most well-known ncRNAs are the transfer RNAs (tRNAs), ribosomal RNAs (rRNAs), microRNAs (miRNAs) and, more recently, the long non-coding RNAs (lncRNAs), which are the focus of this chapter.

#### **2.1 Characteristics of lncRNAs**

The lncRNAs, as the name suggests, are long RNA transcripts, with more than 200 nucleotides which are not translated into protein. The first long non-coding RNA was described in 1971, in a viroid plant pathogen; however, the first time a long non-coding RNA had its regulatory role described was only in the early 1990s, when the scientific community discovered transcripts involved in epigenetic mechanisms. One of the first identified lncRNAs was *H19* (imprinted maternally expressed transcript), firstly described in mouse [1]. Shortly after, X-inactivespecific transcript (*XIST*) was suggested to be a functional lncRNA, with a structural role in the cell nucleus. lncRNAs present relatively low levels of evolutionary conservation and originated from genes that are usually shorter than protein coding genes, with fewer exons [2]. However, they present similar features with proteincoding transcripts, as they are typically transcribed by RNA polymerase II and can be capped, polyadenylated and spliced [3].

lncRNAs can be transcribed from both mitochondrial and nuclear genomes, in sense and antisense directions. Also, strong evidence suggest that the post-transcriptional cleavage of the lncRNAs might be the substrate to smaller RNAs, as they can act as precursors to smaller molecules such as miRNAs, piR-NAs, siRNAs and others.

One of the main characteristics of lncRNAs is their ability to fold themselves into secondary or higher thermodynamically stable structures, which are highly conserved [4]. The longer the lncRNA, the higher is the probability of it to form those structures. Because lncRNAs have the capacity to bind through bonds, they are able to fold themselves into structures such as double-helix, hairpins, loops, pseudonodes and more. Due to these complex structures, they are able to bind to

**119**

**Figure 1.**

*directions of transcription are indicated by arrows.*

*The Function of lncRNAs as Epigenetic Regulators DOI: http://dx.doi.org/10.5772/intechopen.88071*

well understood [7, 8].

**2.2 The lncRNA classification**

the small-lncRNA group (58%).

more than one molecule at a time, regulating gene expression at different levels

lncRNAs can be expressed in different cell compartments, and their function is directly related to their location. A substantial proportion of lncRNAs are exclusively expressed in the nucleus. Nuclear lncRNAs often play a role in modulating gene expression by recruiting transcription factors, by remodelling or by modifying the chromatin or by RNA-DNA triplex formation [5]. Other lncRNAs must be transported to the cytoplasm, where they may interfere in post-translational modification, participating in protein localization processes, mRNA translation and stability [6]. Not only that, lncRNAs may also be transported to distant regions through extracellular vesicles, such as exosomes and microvesicles; however, the mechanisms which regulate the expression of these circulating lncRNAs are still not

Because lncRNAs are a very diverse class of molecules, there is still a debate on which would be the best way to classify them into categories, as the classification can infer information regarding their localization, regulatory function, biological function and so on. The simplest method of lncRNA classification is related to their size [9, 10]: small lncRNA (200–950 nt), medium lncRNA (950–4800 nt) and large lncRNA (>4800 nt). According to this classification, most human lncRNAs fall into

Another classification by the catalogue of human lncRNAs, made in 2012,

*lncRNA biotype classification. The image shows the lncRNA biotypes:* antisense*,* lincRNA*,* sense overlapping *and* sense intronic*. Blue squares represent gene coding exons, and green squares represent lncRNA exons;* 

defines five biotypes of lncRNAs according to the GENCODE (**Figure 1**):

through RNA-protein, RNA-DNA and RNA-RNA complexes.

*The Function of lncRNAs as Epigenetic Regulators DOI: http://dx.doi.org/10.5772/intechopen.88071*

*Non-Coding RNAs*

**2. Long non-coding RNA**

the focus of this chapter.

**2.1 Characteristics of lncRNAs**

be capped, polyadenylated and spliced [3].

NAs, siRNAs and others.

to be crucial to cancer development. In cancer cells, tumour suppressor genes are silenced, and oncogenes are overexpressed, and these alterations can be driven by epigenetic modifications regulated by lncRNAs. In this chapter, we will discuss about lncRNAs and epigenetic marks. Nevertheless, we will approach how they can

The discovery of ribonucleic acid (RNA) molecules that do not code for proteins has drastically altered our understanding of molecular biology. Until recent years, the central dogma of biology described the DNA as the source of information from which an encoded gene was transcribed into a RNA strand and after it would be translated into a protein. However, in the human genome, approximately 93% of the DNA can be transcribed into RNA, but only around 2% of that would be protein-coding messenger RNA (mRNA). The remaining transcripts were therefore classified as transcriptional noise. With the rapid advance in molecular biology techniques, including large-scale sequencing, it is now known that many thousands of non-coding transcripts are encoded by the genome. These transcripts represent more than 70% of the genome, and they are transcribed into non-coding RNA (ncRNA) molecules. This knowledge opens up a completely new universe, and currently more than 40 types of non-coding RNAs have already been described. Among the most well-known ncRNAs are the transfer RNAs (tRNAs), ribosomal RNAs (rRNAs), microRNAs (miRNAs) and, more recently, the long non-coding RNAs (lncRNAs), which are

The lncRNAs, as the name suggests, are long RNA transcripts, with more than 200 nucleotides which are not translated into protein. The first long non-coding RNA was described in 1971, in a viroid plant pathogen; however, the first time a long non-coding RNA had its regulatory role described was only in the early 1990s, when the scientific community discovered transcripts involved in epigenetic mechanisms. One of the first identified lncRNAs was *H19* (imprinted maternally expressed transcript), firstly described in mouse [1]. Shortly after, X-inactivespecific transcript (*XIST*) was suggested to be a functional lncRNA, with a structural role in the cell nucleus. lncRNAs present relatively low levels of evolutionary conservation and originated from genes that are usually shorter than protein coding genes, with fewer exons [2]. However, they present similar features with proteincoding transcripts, as they are typically transcribed by RNA polymerase II and can

lncRNAs can be transcribed from both mitochondrial and nuclear genomes,

One of the main characteristics of lncRNAs is their ability to fold themselves into secondary or higher thermodynamically stable structures, which are highly conserved [4]. The longer the lncRNA, the higher is the probability of it to form those structures. Because lncRNAs have the capacity to bind through bonds, they are able to fold themselves into structures such as double-helix, hairpins, loops, pseudonodes and more. Due to these complex structures, they are able to bind to

in sense and antisense directions. Also, strong evidence suggest that the post-transcriptional cleavage of the lncRNAs might be the substrate to smaller RNAs, as they can act as precursors to smaller molecules such as miRNAs, piR-

interact with each other to regulate gene expression and their role in cancer.

**118**

more than one molecule at a time, regulating gene expression at different levels through RNA-protein, RNA-DNA and RNA-RNA complexes.

lncRNAs can be expressed in different cell compartments, and their function is directly related to their location. A substantial proportion of lncRNAs are exclusively expressed in the nucleus. Nuclear lncRNAs often play a role in modulating gene expression by recruiting transcription factors, by remodelling or by modifying the chromatin or by RNA-DNA triplex formation [5]. Other lncRNAs must be transported to the cytoplasm, where they may interfere in post-translational modification, participating in protein localization processes, mRNA translation and stability [6]. Not only that, lncRNAs may also be transported to distant regions through extracellular vesicles, such as exosomes and microvesicles; however, the mechanisms which regulate the expression of these circulating lncRNAs are still not well understood [7, 8].

#### **2.2 The lncRNA classification**

Because lncRNAs are a very diverse class of molecules, there is still a debate on which would be the best way to classify them into categories, as the classification can infer information regarding their localization, regulatory function, biological function and so on. The simplest method of lncRNA classification is related to their size [9, 10]: small lncRNA (200–950 nt), medium lncRNA (950–4800 nt) and large lncRNA (>4800 nt). According to this classification, most human lncRNAs fall into the small-lncRNA group (58%).

Another classification by the catalogue of human lncRNAs, made in 2012, defines five biotypes of lncRNAs according to the GENCODE (**Figure 1**):

**Figure 1.**

*lncRNA biotype classification. The image shows the lncRNA biotypes:* antisense*,* lincRNA*,* sense overlapping *and* sense intronic*. Blue squares represent gene coding exons, and green squares represent lncRNA exons; directions of transcription are indicated by arrows.*


It is important to note that even though this classification is widely used, additional biotypes of lncRNAs are also described in GENCODE, such as *macro lncRNAs*, *pseudogenes*, 3 *prime overlapping ncRNA* and *bidirectional promoter lncRNA*, among others. Alternatively, lncRNAs can be categorized according to the molecular mechanisms that may be involved in their functions into five archetypes:


Nevertheless, they can also be classified based on the region of the DNA sequence impacted by the lncRNA. lncRNAs can influence a neighbouring gene on the same allele from which it is transcribed (cis) or in further genomic region and other chromosomes (trans):

1.*Cis-lncRNAs*: lncRNAs regulating the expression of genes in close genomic proximity. They may be transcribed from promoter regions and may interfere in the transcription activity of neighbouring genes. They may act by recruiting transcription factors, inducing chromatin remodelling or forming DNA-RNA triplex structure.

One of the most well-known examples of cis-acting lncRNA is XIST. In mammals, the females have two copies of the X chromosome (XX), while the males have only one (XY). This unbalance could result in a variety of problems associated with the expression of genes from chromosome X. However, the lncRNA X-inactivespecific transcript (XIST) is expressed from the X-inactivation centre (XIC) locus

**121**

**Figure 2.**

*chromosome 2 blocking gene expression.*

*The Function of lncRNAs as Epigenetic Regulators DOI: http://dx.doi.org/10.5772/intechopen.88071*

in this chromosome silencing (**Figure 2A**).

tion factors or to RNA polymerases.

*2.3.1 Pre-transcriptional regulation*

**2.3 Gene expression regulation mediated by lncRNAs**

and acts in *cis* along the whole chromosome from which it is transcribed, resulting

2.*Trans-lncRNAs*: lncRNAs may also function in trans-mode by influencing distant gene loci. In such case, they may also act as chromatin modification complexes, as well as affect transcription by binding to transcription elonga-

Another well-studied lncRNA, HOX transcript antisense RNA (HOTAIR), also recruits the polycomb repressive complex 2 (PRC2) to inactivate gene expression. However, in this case, HOTAIR is transcribed from the HoxC locus on chromosome 12 and represses the HoxD locus on chromosome 2, therefore acting in trans (**Figure 2B**).

Long non-coding RNAs are functionally very diverse and are involved in numerous biological roles, such as imprinting, epigenetic regulation, apoptosis and cell cycle control, transcriptional and translational regulation, splicing, cell development and differentiation and ageing. They have been described in almost every stage of gene regulation: pre-transcriptionally, guiding proteins to specific areas of the genome; as decoys, keeping proteins away from chromatin; by epigenetic alterations, by histone modifications or DNA methylation [3]; transcriptionally, modulating the transcriptional process; and post-transcriptionally, by RNA-RNA interactions.

It is well understood that, in eukaryotic cells, the DNA is packaged in the chromatin and the availability of those structures to the transcriptional machinery has a strong influence in the gene expression, as the transcriptional factors must have access to the chromatin in order to transcribe the encoded gene. The lncRNAs can regulate this expression in the nucleus by associating and recruiting chromatin-remodelling factors. The examples of *XIST* and *HOTAIR* mentioned above illustrate this pre-transcription regulation, as in both examples, *XIST* and *HOTAIR* recruit the PRC2 to interact and repress the expression of genes through the K27 trimethylation in H3 histones [11].

*Cis and trans regulation of gene expression by lncRNAs. (A) Inactivation of the X chromosome by the lncRNA*  XIST*,* cis-*acting in the same chromosome; (B) lncRNA* HOTAIR *is transcribed in chromosome 12 and acts in* 

*Non-Coding RNAs*

any exon

structure

activity

targets

other chromosomes (trans):

triplex structure.

antisense gene regulation

intersection with exons

1.*Antisense*: located on the opposite strand from protein-coding genes, containing an intersection with some exons or introns or published evidence of

2.Long intergenic non-coding RNA (*lincRNA*): transcripts originated from intergenic loci; that is, located between two protein-coding genes

3.*Sense overlapping*: transcripts containing 'protein-coding gene sequences in their introns', located in the same strand as them and that do not overlap with

4.*Sense intronic*: located within introns of a protein-coding gene and with no

5.*Processed transcripts*: locus where all transcripts have no open reading frame (ORF) and do not fit in any of the above biotypes, due to their complex

It is important to note that even though this classification is widely used, additional biotypes of lncRNAs are also described in GENCODE, such as *macro lncRNAs*, *pseudogenes*, 3 *prime overlapping ncRNA* and *bidirectional promoter lncRNA*, among others. Alternatively, lncRNAs can be categorized according to the molecular mechanisms that may be involved in their functions into five archetypes:

1.*Signal archetype*: acts as a molecular signal or indicator of transcriptional

regulatory RNAs, inhibiting its function

2.*Decoy archetype*: binds and captures other molecules, such as proteins and other

3.*Guide archetype*: binds and recruits ribonucleoprotein complexes to specific

4.*Scaffold archetype*: plays a structural role as a platform upon which other

Nevertheless, they can also be classified based on the region of the DNA sequence impacted by the lncRNA. lncRNAs can influence a neighbouring gene on the same allele from which it is transcribed (cis) or in further genomic region and

1.*Cis-lncRNAs*: lncRNAs regulating the expression of genes in close genomic proximity. They may be transcribed from promoter regions and may interfere in the transcription activity of neighbouring genes. They may act by recruiting transcription factors, inducing chromatin remodelling or forming DNA-RNA

One of the most well-known examples of cis-acting lncRNA is XIST. In mammals, the females have two copies of the X chromosome (XX), while the males have only one (XY). This unbalance could result in a variety of problems associated with the expression of genes from chromosome X. However, the lncRNA X-inactivespecific transcript (XIST) is expressed from the X-inactivation centre (XIC) locus

molecules can bind simultaneously, assembling a complex

5.*Enhancer archetype*: controls higher-order chromosomal looping

**120**

and acts in *cis* along the whole chromosome from which it is transcribed, resulting in this chromosome silencing (**Figure 2A**).

2.*Trans-lncRNAs*: lncRNAs may also function in trans-mode by influencing distant gene loci. In such case, they may also act as chromatin modification complexes, as well as affect transcription by binding to transcription elongation factors or to RNA polymerases.

Another well-studied lncRNA, HOX transcript antisense RNA (HOTAIR), also recruits the polycomb repressive complex 2 (PRC2) to inactivate gene expression. However, in this case, HOTAIR is transcribed from the HoxC locus on chromosome 12 and represses the HoxD locus on chromosome 2, therefore acting in trans (**Figure 2B**).
