Abstract
Free full text
Junk DNA and the long non-coding RNA twist in cancer genetics
Abstract
The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions, and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function, and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer.
INTRODUCTION
The Encyclopedia of DNA Elements (ENCODE) project has revealed that at least 75% of the human genome is transcribed into RNAs, while protein-coding genes comprise only 3% of the human genome1. Because of a long-held protein-centered bias, many of the genomic regions that are transcribed into non-coding RNAs had been viewed as ‘junk’ in the genome, and the associated transcription had been regarded as transcriptional ‘noise’ lacking biological meaning2. The last decade has witnessed an explosive expansion in the understanding of biological function and clinical significance of non-coding RNA (ncRNA) transcripts, exemplified by the large number of published reports linking microRNAs (miRNAs) and various human diseases including cancer3. With the advancement of sequencing technology and bioinformatics, other types of short or long ncRNAs, such as endogenous small interfering RNAs (endo-siRNAs), PIWI-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), natural antisense transcripts (NATs), circular RNAs (circRNAs), long intergenic non-coding RNAs (lincRNAs), enhancer non-coding RNAs, and transcribed ultraconserved regions (T-UCRs) have been characterized and classified4, 5. Among these ncRNAs, long non-coding RNAs (lncRNAs), defined as being at least 200 nucleotides in length, have received much attention due to their abundant presence in the human genome, as well as their tissue-specific expression patterns and functional relevance in complex physiological and pathological processes6. Distinct from the short miRNAs, the length of lncRNAs allows them to fold into more complex three-dimensional structures, likely to determine specific interactions of lncRNA with biomolecule partners such as transcription factors, histones or other chromatin-modifying proteins. Consequently, alterations in lncRNA expression levels could affect a broad spectrum of genes via their protein partners, and as such cause profound phenotypic changes7. LncRNAs could also have sequence-specific interactions with DNA or RNA in the forms of duplex or triplex structures8, 9, and create complex regulatory networks composing of DNA, RNA, and proteins.
The mapping of several lncRNAs to regulatory genomic regions such as promoters and enhancers1 indicates a possible involvement of these noncoding transcripts in gene regulation. In addition, genome wide association studies (GWAS) revealed that less than 10% of the disease-related single nucleotide polymorphisms (SNPs) are in exons of protein-coding genes, whereas nearly half of disease-associated SNPs are outside protein-coding genes10. Although lncRNA function remains largely unknown, recent studies have clearly demonstrated the functional importance of lncRNAs in embryonic development11, cell differentiation12, and various human diseases including cancer5, 13, 14. Mechanistically, lncRNAs that are transcribed from regulatory elements or cancer-associated genomic regions may cooperate with their genomic DNA elements to fine-tune the complex biological activities necessary for precise regulation. This might be of particular relevance in the regulation of complex biological activities that do not obey to “binary switch” (on and off) regulation, but are rather regulated in a subtler, dosage-dependent fashion.
The topic of lncRNA has been covered in several excellent in-depth review papers13, 15–20. Here, we focus on the interplay between DNA and lncRNA in the human genome, and the relevance of these interactions in human cancer. We introduce various types of lncRNAs from regulatory genomic elements, summarize recently identified molecular mechanisms of DNA-RNA interaction in the context of cancer, and discuss the clinical relevance of the findings.
The missing culprit genes
In many non-hypothesis-driven studies, large-scale genotyping from population-based samples are used to evaluate disease gene associations. Among these, GWAS studies provided valuable information as to the genetic variants in cancer risk, disease diagnosis, prognosis, and treatment response21. However, the molecular mechanisms underlying such links remain largely undefined, owing to the fact that many of these genetic variants (43%) are located in gene “desert” regions that lack protein-coding genes13 (summarized in Table 1).
Table 1
LncRNA | Cancer Type | LncRNA Function | Gene Locus | SNP | Risk Allele | GWAS |
---|---|---|---|---|---|---|
PCGEM1 | Prostate | Oncogenic118, 119 | 2q32 | rs6434568 | C > A | Prostate120 |
PCGEM1 | Prostate | Oncogenic118, 119 | 2q32 | rs16834898 | A > C | Prostate120 |
HULC | Hepatocellular | Oncogenic121 | 6p24 | rs7763881 | A > C | Hepatocellular79 |
CCAT2 | Colorectal | Oncogenic41 | 8q24 | rs6983267 | G > T | Prostate, coloretal95, 122, 123 |
PCAT1 | Prostate | Oncogenic88 | 8q24 | rs1026411 | C > T? | Breast90 |
PCAT1 | Prostate | Oncogenic88 | 8q24 | rs12543663 | A > C? | Prostate91 |
PRNCR1 | Prostate | Oncogenic36 | 8q24 | rs1456315 | ? | Prostate92 |
PRNCR1 | Prostate | Oncogenic36 | 8q24 | rs7463708 | ? | Prostate92 |
ANRIL | Neurofibromas, Esophageal | Oncogenic73, 124 | 9p21 | rs2151280 | T > C | Neurofibromas74 |
H19 | Hepatocellular | Oncogenic or suppressive75, 76 | 11p15 | rs2839698 | C > T | Bladder77 |
H19 | Hepatocellular | Oncogenic or suppressive75, 76 | 11p15 | rs2107425 | C > T | Breast cancer78 |
MALAT1 | Lung | Oncogenic125 | 11q13 | rs619586 | A > G | Hepatocellular79 |
Similarly, non-GWAS studies also point to the same observations that in many cases protein-coding genes are not the culprits responsible for disease phenotypes. This notion can be exemplified and supported by the role played by miRNA-15a/16-1 in chronic lymphocytic leukemia (CLL)22. A recurring pattern of 13q14.3 deletions was observed in CLL indicative of the presence of a tumor suppressor in this region. However, the protein-coding genes identified from this genomic region did not fulfill this tumor suppressing function22. Instead, two miRNAs were identified and subsequently proven by multiple studies to underlie the etiology of CLL22. Following this initial finding, several studies identified numerous miRNAs involved in a broad spectrum of human malignancies. Nevertheless, miRNAs and protein-coding genes are not the only determining factors of disease phenotype. Other DNA regulatory elements may play an important role in causing morbid phenotypes by altering gene transcription modalities. Moreover, other types of ncRNAs are transcribed from cancer-associated genomic regions and participate in cancer pathogenesis.
‘Junk DNA’ encodes for lncRNAs
The protein-centered dogma had viewed genomic regions not coding for proteins as ‘junk’ DNA. We now understand that many lncRNAs are transcribed from ‘junk’ regions, and even those encompassing transposons, pseudogenes, and simple repeats represent important functional regulators with biological relevance23, 24. For the convenience of this review, we subdivided lncRNAs into several categories based on their genomic locus relative to protein-coding genes, or their unique structural features. However, these classifications are not exclusive, and this grouping does not have any bearing on their biological activity or functional mechanisms.
Promoter-associated lncRNAs
Gene promoters interact with transcription factors and RNA polymerases to activate transcription25. The recent identification of ncRNA transcripts located within the promoter region of several genes26, 27 has clearly indicated that more complex regulatory mechanisms should be envisaged. A tiling microarray aimed at the study of ncRNAs mapping in the proximity of the transcription start site (TSS) of 56 cell cycle-related genes, revealed extensive transcription activity in the gene promoter region without protein-coding feature. Among these lncRNA, the non-spliced 1.5 kb ncRNA PANDA transcribed from 5 kb upstream of the CDKN1A TSS, was proven to function in the DNA damage response28. Interestingly, while CDKN1A mediates cell cycle arrest, PANDA promotes cell survival in response to DNA damage by preventing the transcription factor NF-YA from binding specific promoters of apoptosis-inhibiting genes28. This indicates that following DNA damage response, both cell cycle arrest and anti-apoptotic genes (and possibly genes with other functions) can be induced from the same locus, and a complex network will determine the biological phenotypes. In another study, a promoter-associated lncRNA complementary to the rRNA gene promoter binds to rRNA gene to form an lncRNA-DNA triplex. This RNA-DNA triplex prevents the binding of Transcription Termination Factor 1 (TTF1) to the rRNA gene, and recruits DNMT3b to silence it8.
Enhancer ncRNAs
Enhancers are defined as DNA elements which, independently of their proximity or orientation with respect to the gene transcription site, are able to enhance gene expression levels29. Notably, many active enhancer regions are transcribed into lncRNAs30. In mouse neurons, out of the 12,000 neuronal activity-regulated enhancers defined by p300/CBP occupation and histone H3-Lysine 4 mono-methylation (H3K4Me1), 2,000 were found to bi-directionally express long ncRNAs, termed ‘enhancer RNAs’ or eRNAs, that are predominantly non-polyadenylated31. Positive association of eRNA expression at neuronal enhancers with the levels of nearby protein-coding genes suggests that eRNA may regulate mRNA synthesis31. Next to eRNAs, polyadenylated ‘enhancer-like ncRNAs’ were identified from genomic enhancer regions and shown by RNA interference to activate neighboring protein-coding genes in cis32. In addition, several T lymphocyte specific enhancers are bound by RNA polymerase II and general transcription factors, and express both polyadenylated and nonpolyadenylated lncRNAs33.
Evf2 lncRNA represents yet another example of enhancer RNA that regulates gene expression of the Dlx cluster through interaction with the transcription factor Dlx234. HOTTIP, an lncRNA expressed from the distal tip of the HoxA locus, drives expression of several HoxA genes35. Using an engineered reporter plasmid, it was elegantly shown that HOTTIP activates HoxA genes by cis regulation35. Recently, two lncRNAs highly expressed in aggressive prostate cancers, PRNCR1 and PCGEM1, were found to enhance transcription of approximately 2000 androgen receptor-responsive genes by binding to the androgen receptor36. This study expanded the functional mechanisms of enhancer RNAs by demonstrating a sophisticated underlying mechanism of trans regulation.
T-UCRs
Untraconserved regions (UCRs) refer to a subset of conserved genome sequences longer than 200 bp that are conserved with 100% identity between orthologous regions of the human, rat, and mouse genomes. Although a high degree of genomic conservation usually indicates functional relevance, more than half of the 481 ultraconserved regions described by Bejerano et al. have no protein-coding potential37. Microarray analysis showed that 93% of the UCRs have transcriptional activity in at least one tissue, and consequently are referred to as T-UCR. T-UCR profiling in a panel of 133 human leukemia and carcinoma samples and 40 corresponding normal tissues identified specific signatures associated with each cancer type. For instance, uc.349A and uc.352, both mapping to the familial CLL-associated fragile chromosomal region 13q21.33-q22.2, are differentially expressed in normal versus malignant B-CLL CD5-postitive cells38. Following the initial report, several studies have reported the importance of the role played by T-UCRs in cancer. For instance, uc.338, a T-UCR whose expression is dramatically increased in human hepatocellular carcinoma compared with noncancerous adjacent tissues, promotes anchorage-dependent and anchorage-independent cell proliferation39. Studies form our group showed that uc.475, a hypoxia-induced noncoding ultraconserved transcript, enhances cell proliferation specifically under hypoxic conditions40. In addition, we identified a novel lncRNA, named CCAT2, transcribed from a highly conserved ‘gene-desert’ region, and encompassing the cancer-associated SNP rs6983267. We showed that CCAT2 is an oncogenic lncRNA promoting chromosomal instability and colorectal cancer metastasis41. More recently, the uc.283+A T-UCR was shown to interfere with miRNA processing by binding to primary miRNA-195 (pri-miR-195) via sequence complementarity42. Despite these findings, the biological activities and functional mechanisms of the majority of T-UCRs still remain largely unexplored. It should be noted that for functional attribution of T-UCRs in human diseases, precise gene annotation is the key, and this requires rigorous analysis determining sense or antisense orientation of ncRNA, for instance, by northern blotting, strand-specific PCR, and deep sequencing.
NATs
natural antisense transcripts (NATs) are endogenous RNA molecules that are partially or fully complementary to protein-coding transcripts. According to their genomic origin, NATs can be separated into cis-NATs, which are transcribed from the same genomic loci as their sense transcripts but from the opposite DNA strand, and trans-NATs, which are transcribed from genomic regions that are distinct from those encoding their sense counterpart43, 44. Although generally expressed at relatively low level compared with the sense transcripts, NATs have been shown to effectively regulate expression level of their protein-coding targets45. Systematic global transcriptome analysis suggested that approximately 70% of transcripts have antisense partners, and that perturbation of antisense RNA can alter the expression of the sense gene46. NATs activate or inactivate sense gene transcription by mechanisms including epigenetic modifications45. ANRIL is a NAT transcribed from the INK4A-INK4B gene-cluster locus encoding for the tumor suppressor genes CDKN2A and CDKN2B47. Through interaction with CBX7, a component of polycomb repressive complex 1 (PRC1) able to recognize H3K27me3 repressive marks, ANRIL recruits the protein complex to its locus for sustained repression of the INK4A-INK4B gene-cluster48. NATs also affect gene expression through posttranscriptional regulation such as splicing. During epithelial-mesenchymal transition, a NAT at the ZEB2 locus is transcriptionally activated. This ZEB2 NAT inhibits splicing of an internal ribosome entry site-containing intron, and positively regulates ZEB2 protein expression49. The regulation of sense transcript by NATs provides a natural way of improving or reducing protein expression.
LincRNAs
Initially identified using histone marker signatures associated with RNA polymerase II, long intergenic noncoding RNAs (lincRNAs) have received much attention because of their lack of overlap with protein-coding genes. Therefore, their effect can be characterized without ambiguity in the attribution of biological functions19. HOTAIR is among the first lincRNA that was functionally and mechanistically elucidated50. Transcribed from a HOXC gene cluster, HOTAIR controls gene expression via a trans-effect, i.e., affecting transcription on chromosomes other than the one producing the gene50. This was achieved by interaction of HOTAIR with polycomb repressive complex 2 (PRC2) and LSD1, which promotes repressive histone marks (such as H3K27me3) to silence the HOXD locus51. LincRNA-p21, a polyadenylated RNA transcribed from the upstream opposite strand to p21, is induced by DNA damage and acts as a downstream regulator of the p53 transcriptional response52. LincRNA-p21 physically associates with hnRNP K through its 5’ end and represses p53-responsive apoptotic genes52.
The DNA-RNA twist in cancer genetics
The elucidation of the mechanisms underlying lncRNA function falls far behind the discovery pace of new lncRNAs. Although lncRNAs could be easily classified into different types according to their genomic locus or other features, this classification does not shed light on the mechanisms. Instead, lncRNAs from different classes might possibly share similar molecular mechanisms. Generally, the mode of action of lncRNAs can be classified into cis and trans regulation, depending on whether the lncRNA regulates neighboring genes on the same chromosomal regions where they are located or distant genes on other chromosomes, respectively (See Figure 1). In both cases, lncRNAs need to interact directly or indirectly with genomic DNA elements, in most cases with assistance of proteins, to perform specific biological functions. Additionally, SNP variants insides a lncRNA sequence may not only affect the function of the DNA element, but also affect the primary sequence, and possibly the higher-order structure, and consequently the activities of the lncRNA.
Cis regulation within the genomic context
LncRNAs have several unique properties as cis-acting molecules53. First, lncRNAs are in close proximity, when compared with proteins, to their genomic locus during transcription and are thus able to direct locus- and allele-specific regulation. Second, the length of lncRNAs gives an advantage to bind with multiple epigenetic complexes, and work as initiators or mediators in genomic looping feats necessary for active chromatin of gene transcription. Third, the length of lncRNAs makes it possible to function during transcription, and immediately after transcriptional termination the degradation signals might prevent diffused action at other genomic sites. Many lncRNAs mediate local functions in cis, interacting with chromatin-modifying proteins to regulate their neighboring genes. These include several previously mentioned enhancer RNAs and NATs. For instance, HOTTIP recruits WD repeat domain 5 (WDR5)/mixed lineage leukemia (MLL) complex to drive the H3K4M3 signature and gene transcription of HoxA distal genes35. Chromosomal looping facilitates HOTTIP to act on its target genes35. This mechanism was elegantly demonstrated with a luciferase reporter artificially tethered with HOTTIP35. The lncRNA Mistral employs a similar mechanism of MLL interaction to recruit to, and activate the Hoxa6 and Hoxa7 genes54. The lncRNA ecCEBPA uses a different mechanism, by binding to DNMT1 to prevent methylation of the CEBPA gene55.
The cis regulation could also elicit broader epigenetic changes, as in the cases of Xist, an lncRNA silencing an entire female X chromosome, and of several other lncRNAs regulating gene imprinting. Xist is transcribed exclusively from the inactive X chromosome in females, and tethered to the X inactivation center by the transcription factor Yin Yang 1 (YY1)56. Xist RNA coats the X chromosome and serves as a scaffold for recruitment of silencing factors such as PRC257. Interestingly, a repeated motif named ‘Repeat A’ within the Xist RNA encompassing a stem-loop structure was shown to be responsible for the recruitment of the PRC2 complex to the inactive X chromosome58. As an example of regulating gene imprinting, the lncRNA Air, transcribed from the paternal allele, recruits G9a to methylate H3K9 residues over an adjacent 300-kb genomic region, thus silencing the expression of distantly located genes including Igf2r, Slc22a2 and Slc22a3 on the paternal chromosome59.
LncRNAs not only regulate protein-coding genes, but can also activate neighboring lncRNAs. An example of this is the regulation of Xist by Tsix, a lncRNA transcribed in the antisense orientation in relation to Xist from the activate X chromosome60. Tsix recruits PRC2 and methyltransferase DNMT3A to the Xist promoter thus maintaining a repressive chromatin domain for long-term silencing of the Xist gene61. In addition, Tsix and Xist can form RNA duplex structures, which are subsequently subjected to RNA interference into small regulatory RNAs62.
Although it has been shown that the in cis mechanism employs genomic looping to exert a regulatory effect, whether lncRNAs are necessary to maintain the loop still remains to be determined. Lai et al. demonstrated that knockdown of either lncRNAs or Mediator (coactivator complex bridging regulatory information from enhancers to the promoter) abolished the chromatin interactions, supporting a participation of both the Mediator and the lncRNA in looping enhancer-promoter interactions. Further, the lncRNA-Mediator interaction regulates the kinase activity of the Mediator protein, and subsequently promotes phosphorylation of serine 10 on histone H3, a chromatin mark for transcriptional activation63. However, the role of lncRNA in maintaining chromatin looping was not observed in other studies. For instance, depletion of HOTTIP did not disrupt looping chromatin architecture, as determined by high-throughput chromosome conformation capture35. A recent study similarly suggests that chromatin looping linking p53-binding sites and their targets does not depend on the lncRNAs transcribed from the p53-binding sites64.
Trans regulation at distant genomic loci
The property of interaction with proteins such as transcription factors or chromatin modifiers suggests the possibility of trans regulation by lncRNAs able to act outside the genomic locus they map to. About 20% of all lincRNAs have PRC2 as interaction partner to regulate gene expression, thus suggesting widespread trans-regulated chromatin remodeling, as previously characterized for HOTAIR65. Similarly, a cross-linking immunoprecipitation followed by sequencing (CLIP-seq) study of RNAs associated with the SFRS1 splicing factor identified more than 6000 spliced ncRNAs66. Although not yet experimentally proven, it can be envisioned that a single ncRNA could affect a wide range of genes regulated by SFRS1. A more recent study showing regulation of androgen receptor-responsive genes by PRNCR1 and PCGEM1 also represents a trans mechanism through which more than 2000 genes are regulated by lncRNAs36.
While it is clear that lncRNAs target proteins to exert their in trans effects, the factors determining the RNA-protein interaction are not well-defined. Interestingly, several studies suggest that the secondary structure, instead of the primary lncRNA sequence dictates a specific interaction. For instance, the tumor suppressor function of the MEG3 lncRNA was maintained by the conservation of the secondary structure, though not in its primary sequence67, 68. In addition, repetitive sequences were found to contribute to the interaction with protein partners. In the case of Xist, although the cis regulatory mechanism is well established, it still provides an example to explain the importance of higher order structures in RNA-protein interaction. A cluster of nine repetitive elements within Xist was found to form stem-loop structures essential for the interaction with PRC1 and for H3K27 trimethylation, while another region encompassing repetitive elements was shown to bind to YY1 through a stem-loop structure tethering Xist onto the X chromosome56, 69. Studies on short interspersed elements that are derived from transposons have also showed that repetitive sequences are the recognition domains for RNA polymerase II binding, and that such interactions leads to repression of mammalian heat shock genes70, 71.
Another puzzling question relative to the mechanisms underlying in trans regulation is how the lncRNAs recognize specific genomic loci. One possibility is that the primary or secondary structure of lncRNAs defines their preferred interaction with certain genomic regions. Using a technique named Chromatin Isolation by RNA Purification (ChIRP), in combination with deep sequencing of genomic binding sites, an enriched binding motif was identified for HOTAIR9. The exact structure responsible for such RNA-DNA interaction remains to be determined. Notably, a promoter-associated lncRNA forms a triplex with the transcription termination factor 1 (TTF1)-binding site, and subsequently recruits DNMT3b to silence rRNA gene8. The specific recognition of genomic loci could also be achieved by the relay of protein partners, as illustrated by the activation of androgen receptors-responsive genes by PRNCR1 and PCGEM1 via interaction with androgen receptor36.
Linking SNPs, lncRNAs and cancer
The fact that approximately 90% of disease-associated SNPs are in genomic regions not coding for proteins10 suggests that these ‘gene-poor’ regions may represent a ‘gold mine’ towards the identification and characterization of novel lncRNAs. To facilitate such an effort, a lincSNP database has been established to link lncRNAs with disease-related SNPs72. Although the association does not necessarily mean a causal relationship between specific lncRNAs and disease phenotypes, the possibility of finding long-sought lncRNA culprits is a very attractive one. In addition, a disease predisposition SNP may flag the existence of regulatory element of a gene whose function is only weakly affected by the SNP variant(s). These “disease predisposing” SNPs could be located upstream, within, or downstream of the lncRNAs. Here, we only review the cancer-related lncRNAs which also encompass cancer-risk SNPs (see Table 1). ANRIL was found to be a hotspot for risk locus for gliomas and basal cell carcinomas in GWAS studies73. The rs2151280 SNP variants located within the ANRIL gene were significantly associated with susceptibility to neurofibromas74. Moreover, the T allele of rs2151280 was correlated with lower ANRIL levels, suggesting that this SNP variant could affect ANRIL expression74. The rs2839698 and rs2107425 SNPs located within H19, a lncRNA with both oncogenic75 and tumor suppressive activity76, were reported to be associated with bladder cancer risk77. Rs2107425 is also found to confer increased breast cancer risk in a different study78. HULC, a lncRNA involved in hepatocellular carcinoma, encompasses the rs7763881 SNP that determines susceptibility to hepatocellular carcinoma in HBV patients79. Similarly, this group also identified that the rs619586 variants, located within the MALAT1 gene, were associated with hepatocellular carcinoma risk though with marginal significance79.
A twisted 8q24 genomic region
The 8q24 genomic region is frequently altered by amplification, deletion, viral integration or translocation in many types of human cancers80. A large-scale study identified the 8q24 region as the most frequently (14%) amplified region among inhuman cancers81. In addition, GWAS studies point to 8q24 as a hotspot for cancer-associated SNPs owing to the density, strength, as well as the high allele frequency of these SNPs82. However, the 2 Mb SNP-rich 8q24 region has nevertheless been considered a ‘gene desert’ largely because of the absence of functionally annotated genes with the only notable exception of the MYC proto-oncogene83. Several 8q24 loci have demonstrated enhancer activity and it has been proposed that these enhancer activities might regulate MYC expression through looping with its promoter84. Recently, several reports revealed that lncRNAs including CCAT185, CCAT241, CARLo-586, PVT187, PCAT188, and PRNCR136 are transcribed from this regions (Figure 2). Among these, CCAT2, PCAT1 and PRNCR1 encompass the cancer predisposition SNPs (Table 1)41, 89–92. Several of these lncRNAs (e.g. CCAT1 and CCAT2) regulate MYC expression41, 85, while the rs6983267 SNP that resides within the CCAT2 gene, shows alleles-specific effect on the lncRNA CARLo-5 expression levels86. Recently, MYC copy number gains were found to depend on PVT1 in mice with chromosome engineering87.
The CCAT2 gene is located in a very special region: first, this genomic region has shown enhancer activity affected by the SNP variants93, 94. Second, the rs6983267 SNP it encompasses is one of the most consistently identified predisposition SNPs in multiple types of cancer including colorectal cancer, prostate cancer, ovarian cancer, head and neck cancer, and inflammatory breast cancer95, 96. Third, its genomic sequence is highly conserved among mammals, supporting a functional role for this element41. Deletion of the 8q24 region encompassing the rs6983267 was found to reduce intestinal tumor multiplicity in ApcMin/+ mice97. However, the genetic deletion removes not only the DNA enhancer elements, but also the CCAT2 gene, thus allowing for different explanations for the observed phenotypic changes. Our study showing MYC regulation via knockdown approaches suggest that CCAT2 could independently regulate MYC transcription. Analysis of colorectal cancer samples showed a correlation of MYC and CCAT2 at the transcriptional level, further providing experimental support for the causal relationship. Most interestingly, overexpression of CCAT2 transforms a chromosomal stable cell line with near-diploid status into a chromosomally unstable one, with a dramatic increase in polyploidy. This is well in agreement with the high CCAT2 expression levels found in microsatellite stable (MSS) tumors, often characterized by aneuploidy, when compared with the near-diploid MSI-High colon tumors41. Although we proved the oncogenic nature of CCAT2 in promoting chromosomal instability and colorectal cancer, whether the rs6983267 SNP variants affect CCAT2 function still remains to be further elucidated. From this perspective, we reported a significant positive correlation between CCAT2 and MYC expression in GG samples but not in TT samples of CRCs41.
Since MYC and its regulatory networks have been proposed as one of the most important drivers in colon cancer development (as implicated by the large-scale TCGA project)98, we hypothesize that a complex regulatory network containing DNA elements (enhancers) and RNA transcripts (lncRNAs) for the MYC gene is active in the 8q24 region and acts to fine tune the expression and function of this critical gene. The concept of super enhancers, defined as large clusters of transcriptional enhancers driving gene expression, has also recently surfaced, and points to MYC regulation in the 8q24 region as a typical example99.
It is also possible that lncRNAs may have fundamental biological effects, independent of MYC transcription, and that these factors together initiate or promote cancer pathogenesis. A genome-wide association approach identified that 75% of the disease-associated SNPs affect expression of lncRNA, but not that of neighboring protein-coding genes100. Additionally, such effects are tissue-dependent, reflecting regulation of a complex trait100. As we learned from PCAT1 and CCAT2, lncRNAs transcribed from the 8q24 locus may affect double-stranded DNA break repair101 and chromosome instability41, which consequently exert a broader biological effect in promoting cancer pathogenesis.
The clinical relevance of the DNA-RNA twist in cancer
Many lincRNAs such as ANRIL48, HOTAIR102, PCAT-188, PRNCR136, PCGEM136, CCAT241, and MALAT1103 have been shown to associate with human cancer. Recently, XIST, an lncRNA for X-chromosome inactivation, was also shown to suppress hematologic cancer104. The abnormal expression profile and functional importance of lncRNAs in cancer suggest translation potential of this knowledge into clinical applications for the cancer patients.
LncRNAs are generally more tissue-specific than protein-coding genes and thus may be more specifically associated with certain cancer subtypes6. This tissue-specific expression pattern can possibly enhance the utility of lncRNAs as biomarkers for the early diagnosis of localized cancers from different body fluids, for the detection of cancer metastasis, the prediction of clinical outcome, and/or to reveal the origin of metastatic cancers. For instance, increased MALAT1 expression levels predict metastasis and poor survival in early stage NSCLC105. Likewise, elevated HOTAIR levels are associated with poor prognosis in several cancer types including breast102, liver106, colorectal107, gastrointestinal108, and pancreatic109 cancers. A mouse study demonstrated that HOTAIR initiate breast cancer metastases102, 103. Also, CCAT2 levels in primary tumors showed an inverse correlation with metastasis-free survival of breast cancer patients41, 110. Furthermore, a bioinformatics study identified 120 individual lncRNAs that are significantly associated with progression-free survival in prostate cancer111.
An ideal lncRNA biomarker requires robust detection in plasma and other biofluids such as urine. Although lncRNA stability in such environments remains largely unknown, several studies have suggested the potential of lncRNAs as biomarkers. MALAT1 fragment levels in patient plasma were found to significantly differentiate human subjects with or without prostate cancer112. The specific association of PCA3 with prostate cancer has been developed into a FDA-approved commercial Progensa PCA3 assay aiding for the recommendation of repeated prostate biopsies113. The finding of lncRNA germline and somatic mutations in leukemia and colorectal cancer114 suggest that a combined strategy of genotyping the DNA sequence and measurement of lncRNA expression levels may strengthen the disease connection.
The DNA-RNA coordination in determining a specific activity indicates that disruption of either one component could have functional consequences. LncRNAs may represent ideal therapeutic targets. Another attractive feature of lncRNA therapeutics is the capacity to increase protein output in a more natural way, for instance, by targeting NATs. The effect of cis-acting NATs may be more focused on a local gene, and potentially such therapy has less off-target effects. Here the clear understanding of the mechanism of lncRNA within its genomic context is the key for such therapeutic development.
Conclusion
“One man’s junk is another man’s treasure”. The recent advances in lncRNA research have revealed transcriptional treasures from the once derided ‘junk’ DNA regions. Although currently only a small fraction of the lncRNAs have been functionally characterized, we believe that the reservoir of functional lncRNAs will quickly expand as the result of many emerging technologies for high-throughput screening and functional validation. For instance, studies on protein interaction coupled with the transcriptome data can be greatly facilitated by photoactivatable ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP)115; genomic occupation sites of lncRNAs can be profiled by ChIRP and subsequent DNA sequencing9; functional motifs within RNA can be detected by RNA–mechanically induced trapping of molecular interactions (RNA-MITOMI)116; RNA movement can be traced by live imaging using engineered fluorescent RNAs117. However, because of the extremely large number of lncRNAs in the human genome, it may be more practical to first focus on the disease-associated lncRNAs suggested by other studies such as expression analysis and GWAS findings. These disease-related SNPs can be useful marks to flag functioning lncRNAs. Additionally, lncRNAs identified in such regions, either functionally affected or altered in their expression levels by specific SNP variants, may be the culprits underlying the mechanisms of disease predisposition. Elucidation of such mechanisms needs a detailed understanding of lncRNA structure, structure-function relationship, and a suitable experimental system to distinguish the subtle differences.
Due to tissue-specific expression patterns and site-specific action of lncRNAs, drugs targeting lncRNAs could achieve more selective therapeutic effect than conventional drugs. In addition, the allele-specific regulatory mechanisms of lncRNAs may be exploited for precise control of gene expression, presumably with fewer side effects. Synthetic oligonucleotides with high affinity and specificity, such as those with locked nucleic acid modifications, allow for targeted regulation of lncRNA expression. Small molecule chemical compounds showing specificity towards a lncRNA could also be tested as candidates to interrupt lncRNA-protein interaction, or interfere with the lncRNA loading onto its target genomic regions.
The regulatory scheme in human cells is complicated, and it is rare that a single molecule can explain an entire disease phenotype. It can be envisioned that in a specific genomic locus there are intertwined transcripts of many kinds, including protein-coding genes, overlapping intronic and noncoding RNAs in the sense or antisense orientation relative to the protein-coding genes, further complicated by the various isoforms caused by alternative splicing. Thus a loss or gain of a genomic region, as frequently seen in cancer, will not only affect DNA regulatory elements, but also affect the transcription landscape. This concept can be further expanded to include regulatory circuitry at several genomic loci containing both coding and non-coding genes with reciprocal interactions and feedback loops to determine a disease phenotype. Hence, it is of critical importance to consider the genetic context, including gene locus, neighboring genes, chromatin status, and target genomic regions, for a comprehensive functional annotation or therapeutic manipulations in the battle against cancer.
Acknowledgments
HL is an Odyssey Fellow, and his work is supported in part by the Odyssey Program at The University of Texas MD Anderson Cancer Center. GAC is The Alan M. Gewirtz Leukemia & Lymphoma Society Scholar. Work in Dr. Calin’s laboratory is supported in part by the NIH/NCI grants 1UH2TR00943-01 and 1 R01 CA182905-01, Developmental Research Awards in Prostate Cancer, Multiple Myeloma, Leukemia (P50 CA100632) and Head and Neck (P50 CA097007) SPOREs, a SINF MDACC_DKFZ grant in CLL, a SINF grant in colon cancer, a Kidney Cancer Pilot Project, The Blanton-Davis Ovarian Cancer – 2013 Sprint for Life Research Award, The Center for Cancer Epigenetics Pilot project, a 2014 Knowledge GAP MDACC grant, the Laura and John Arnold Foundation, the RGK Foundation and the Estate of C. G. Johnson, Jr and by the CLL Global Research Foundation. This work was supported also by a grant from The University of Texas MD Anderson Cancer Center Duncan Family Institute for Cancer Prevention and Risk Assessment. MP is supported by an Erwin-Schroedinger Scholarship of the Austrian Science Funds (project no. J3389-B23). FS was supported by NIH grants CA131301 and CA157749. We apologize to all colleagues whose work was not cited because of space restrictions.
References
Full text links
Read article at publisher's site: https://doi.org/10.1038/onc.2014.456
Read article for free, from open access legal sources, via Unpaywall: https://www.nature.com/articles/onc2014456.pdf
Citations & impact
Impact metrics
Article citations
The role of ncRNAs and exosomes in the development and progression of endometrial cancer.
Front Oncol, 14:1418005, 12 Aug 2024
Cited by: 0 articles | PMID: 39188680 | PMCID: PMC11345653
Review Free full text in Europe PMC
Non-coding RNA: A key regulator in the Glutathione-GPX4 pathway of ferroptosis.
Noncoding RNA Res, 9(4):1222-1234, 20 May 2024
Cited by: 1 article | PMID: 39036600 | PMCID: PMC11259992
Review Free full text in Europe PMC
LncRNA PART1 promotes malignant biological behaviours associated with head and neck cancer cells via synergistic action with FUT6.
Cancer Cell Int, 24(1):185, 28 May 2024
Cited by: 0 articles | PMID: 38807207 | PMCID: PMC11134962
The role of ncRNAs in depression.
Heliyon, 10(6):e27307, 06 Mar 2024
Cited by: 0 articles | PMID: 38496863 | PMCID: PMC10944209
Review Free full text in Europe PMC
Insights into the Roles of Epigenetic Modifications in Ferroptosis.
Biology (Basel), 13(2):122, 15 Feb 2024
Cited by: 1 article | PMID: 38392340 | PMCID: PMC10886775
Review Free full text in Europe PMC
Go to all (201) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
SNPs (Showing 12 of 12)
- (2 citations) dbSNP - rs6983267
- (2 citations) dbSNP - rs2151280
- (1 citation) dbSNP - rs1026411
- (1 citation) dbSNP - rs2107425
- (1 citation) dbSNP - rs16834898
- (1 citation) dbSNP - rs7463708
- (1 citation) dbSNP - rs2839698
- (1 citation) dbSNP - rs6434568
- (1 citation) dbSNP - rs7763881
- (1 citation) dbSNP - rs12543663
- (1 citation) dbSNP - rs1456315
- (1 citation) dbSNP - rs619586
Show less
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Long noncoding RNAs: Novel insights into hepatocelluar carcinoma.
Cancer Lett, 344(1):20-27, 30 Oct 2013
Cited by: 260 articles | PMID: 24183851
Review
Targeting long non-coding RNAs in cancers: progress and prospects.
Int J Biochem Cell Biol, 45(8):1895-1910, 04 Jun 2013
Cited by: 310 articles | PMID: 23748105
Review
Population genomic analysis of gibberellin-responsive long non-coding RNAs in Populus.
J Exp Bot, 67(8):2467-2482, 22 Feb 2016
Cited by: 45 articles | PMID: 26912799
Genomic Insight into the Role of lncRNA in Cancer Susceptibility.
Int J Mol Sci, 18(6):E1239, 09 Jun 2017
Cited by: 42 articles | PMID: 28598379 | PMCID: PMC5486062
Review Free full text in Europe PMC
Funding
Funders who supported this work.
Austrian Science Fund FWF (1)
Non-coding RNAs in colorectal cancer metastases
Prof. Dr. Martin PICHLER, Medical University of Graz
Grant ID: J 3389
NCATS NIH HHS (2)
Grant ID: UH2 TR000943
Grant ID: 1UH2TR00943-01
NCI NIH HHS (11)
Grant ID: 2P50CA127001
Grant ID: R01 CA131301
Grant ID: P50 CA097007
Grant ID: R01 CA157749
Grant ID: P50 CA093459
Grant ID: P50 CA100632
Grant ID: R01 CA182905
Grant ID: 1 R01 CA182905-01
Grant ID: CA157749
Grant ID: CA131301
Grant ID: P50 CA127001