We have studied the functions and modes of action of long noncoding RNAs (lncRNAs) using both computational and experimental approaches and over the course of the lincSAFARI project we have achieved substantial progress on all three aims specified in our proposal. Specifically:
Aim 1: Throughtout the project we have improved and expanded our PLAR pipeline for identification of lncRNAs from RNA-seq data in several directions and supported its use by the wider community studying lncRNAs in different organisms and systems (Ushakov et al., Scientific Reports 2017). Other components of the computational infrastructure we developed were used for more broad applications, such as analysis of promoter usage (Tamarkin et al., eLife 2017). We focused on the evolutionary origins of lncRNAs, and found that ~5% of the lncRNAs shared between mammals can be traced back to protein-coding genes that lost their coding potential before the rise of mammals. These lncRNAs have important functional aspects, such as broader and higher expression levels, that set them apart from other lncRNAs, and we found that specific lncRNAs derived from protein-coding genes have inherited specific functional elements that contribute to their stability (Hezroni et al, Genome Biology 2017).
Aim 2: We established a system of differentiation of mouse embryonic stem cells (mESCs) towards neuronal progenitor cells (NPCs) and mature neurons. We attempted to knockdown eight different lncRNAs and for two lncRNAs we observed robust effects on differentiation. We then characterized in detail the function and found clues to the mechanism of action of one of these, that we named Reno. Cells where Reno is lost are not able to commit to a neuronal fate during very early stages of mESC differentiation, and instead undergo massive cell death. The loss of a protein-coding gene found near Reno, Bahcc1 leads to a similar effect, although, surprisingly, Reno does not regulate Bahcc1 expression (Hezroni et al., EMBO reports, in press). There is evidence that both Reno and Bahcc1 are required for maintaining during mESC differentiation the open chromatin at promoters marked by H3K4me3 chromatin mark in mESCs, which was previously shown to be important in differentiation. In another line of experimental study of lncRNA functions, we studied the NORAD – a trans-acting RNA that modulates the ability of Pumilio proteins, post-transcriptional regulators of mRNA expression, to repress their targets. We published two manuscripts on this topic (Tichon et al., Nature Communications 2016 and Tichon, Perry et al., Genes & Dev 2018). We also studied functions of lncRNAs during neuroregeneration (Perry et al. Molecular Cell 2018), and found a lncRNA, Silc1 that is required for timely regeneration of neurons in the PNS, through assisting in cis to promote expression of Sox11 found close to it in the genome. We explored the theme of lncRNAs that assist the expression of genes in their proximity more broadly, and found that Silc1-like lncRNAs are associated with stronger enhancer activity in the proximity of their sites of transcription (Gil & Ulitsky, Cell Systems 2018).
Aim 3: Identify lincRNA sequences capable of specific activities and determine their sequence-function landscape at single-base resolution.
We established a high-throughput system for testing which fragments of lncRNA sequence are capable of carrying out specific functions. Many lncRNAs function in the nucleus, and so establishment of nuclear localization is a key aspect of their function, which was poorly understood. In our screening system, we placed thousands of short 'tiles' of lncRNA sequence in a reporter sequence that is usually localization to the cytoplasm rather than the nucleus. Using this approach, we identified a short element that is sufficient for increasing nuclear localization by ~2-fold and that is found in a substantial number of mammalian lncRNAs (Lubelsky and Ulitsky, Nature 2018). Ongoing work deals with further characterization of this element, and with the broader use of this approach for linking together sequence variation in lncRNA genes and their functionality.