Skip to main content

Sequence and Function Relationships in Long Intervening Noncoding RNAs

Periodic Reporting for period 4 - lincSAFARI (Sequence and Function Relationships in Long Intervening Noncoding RNAs)

Reporting period: 2019-12-01 to 2020-05-31

The central dogma in molecular biology is that information stored in DNA is transcribed into RNA mainly to serve as a template for production of proteins, that carry out most cellular functions. It is now clear that many regions in the human genome also give rise to a range of processed and regulated transcripts that do not appear to code for functional proteins. A subset of these are long (>200 nucleotides), processed RNAs transcribed collectively called long noncoding RNAs or lncRNAs. The recent estimates are that the human genome encodes >10,000 such distinct lncRNAs, many of which show tissue-specific activity and are frequently dysregulated in human disease, including neurodegeneration.

Given the growing number of lncRNAs implicated in human disease or required for proper development, fundamental questions that need to be addressed are: Which lincRNAs are functional? How is functional information encoded in the lncRNA sequence? Is this information interpreted in the context of the mature or the nascent RNA? What are the identities and functional roles of specific sequence domains within lncRNA genes?

Our main hypothesis is that many lncRNA loci play key roles in gene regulation during cell differentiation, both via functionally important transcription events and post-transcriptionally, through the combined action of multiple short sequence domains. We tested this hypothesis using three complementary approaches – comparative genomics, detailed perturbations in mammalian cells followed by quantitative molecular phenotyping, and high-throughput screens for sequences able to carry out specific functions.

We used an interdisciplinary approach combining computational, molecular and stem cell biology. Our methodology is scalable, allowing us to tackle completely uncharacterized long RNAs and eventually zoom in and study their individual bases. The understanding of which functions are carried out by lncRNAs in key processes, and even more importantly, how those functions are carried out is crucial for the eventual use of these molecules as potential therapeutic targets, or as drugs.
We have studied the functions and modes of action of long noncoding RNAs (lncRNAs) using both computational and experimental approaches and over the course of the lincSAFARI project we have achieved substantial progress on all three aims specified in our proposal. Specifically:

Aim 1: Throughtout the project we have improved and expanded our PLAR pipeline for identification of lncRNAs from RNA-seq data in several directions and supported its use by the wider community studying lncRNAs in different organisms and systems (Ushakov et al., Scientific Reports 2017). Other components of the computational infrastructure we developed were used for more broad applications, such as analysis of promoter usage (Tamarkin et al., eLife 2017). We focused on the evolutionary origins of lncRNAs, and found that ~5% of the lncRNAs shared between mammals can be traced back to protein-coding genes that lost their coding potential before the rise of mammals. These lncRNAs have important functional aspects, such as broader and higher expression levels, that set them apart from other lncRNAs, and we found that specific lncRNAs derived from protein-coding genes have inherited specific functional elements that contribute to their stability (Hezroni et al, Genome Biology 2017).

Aim 2: We established a system of differentiation of mouse embryonic stem cells (mESCs) towards neuronal progenitor cells (NPCs) and mature neurons. We attempted to knockdown eight different lncRNAs and for two lncRNAs we observed robust effects on differentiation. We then characterized in detail the function and found clues to the mechanism of action of one of these, that we named Reno. Cells where Reno is lost are not able to commit to a neuronal fate during very early stages of mESC differentiation, and instead undergo massive cell death. The loss of a protein-coding gene found near Reno, Bahcc1 leads to a similar effect, although, surprisingly, Reno does not regulate Bahcc1 expression (Hezroni et al., EMBO reports, in press). There is evidence that both Reno and Bahcc1 are required for maintaining during mESC differentiation the open chromatin at promoters marked by H3K4me3 chromatin mark in mESCs, which was previously shown to be important in differentiation. In another line of experimental study of lncRNA functions, we studied the NORAD – a trans-acting RNA that modulates the ability of Pumilio proteins, post-transcriptional regulators of mRNA expression, to repress their targets. We published two manuscripts on this topic (Tichon et al., Nature Communications 2016 and Tichon, Perry et al., Genes & Dev 2018). We also studied functions of lncRNAs during neuroregeneration (Perry et al. Molecular Cell 2018), and found a lncRNA, Silc1 that is required for timely regeneration of neurons in the PNS, through assisting in cis to promote expression of Sox11 found close to it in the genome. We explored the theme of lncRNAs that assist the expression of genes in their proximity more broadly, and found that Silc1-like lncRNAs are associated with stronger enhancer activity in the proximity of their sites of transcription (Gil & Ulitsky, Cell Systems 2018).

Aim 3: Identify lincRNA sequences capable of specific activities and determine their sequence-function landscape at single-base resolution.
We established a high-throughput system for testing which fragments of lncRNA sequence are capable of carrying out specific functions. Many lncRNAs function in the nucleus, and so establishment of nuclear localization is a key aspect of their function, which was poorly understood. In our screening system, we placed thousands of short 'tiles' of lncRNA sequence in a reporter sequence that is usually localization to the cytoplasm rather than the nucleus. Using this approach, we identified a short element that is sufficient for increasing nuclear localization by ~2-fold and that is found in a substantial number of mammalian lncRNAs (Lubelsky and Ulitsky, Nature 2018). Ongoing work deals with further characterization of this element, and with the broader use of this approach for linking together sequence variation in lncRNA genes and their functionality.
Aim 1: We developed methods for detailed comparison of sequences of orthologous lncRNAs accross species using novel frameworks for both discovery of lncRNAs from RNA-seq data and alignment-free comparison of lncRNAs from different species. Given the sequence of a lncRNA from multiple species, our LncLOOM algorithm is capable of homing in on specific elements that are conserved and likely functional.
Aim 2: We obtained deep understanding about the functional roles and importance of several lncRNAs, including NORAD, Reno, and Silc1, and could place their functions in the broader context of cis- and trans-acting RNAs, providing important paradigms for lncRNA activities and how they are encoded in their loci and sequences.
Aim 3: We developed massively parallel reporter assays for identifying tiles of RNA sequence capable of carrying out specific aspects of lncRNA mechanisms, such stability, and activation/repression of promoters.