Skip to main content

Pervasive Upstream Non-Coding Transcription Underpinning Adaptation

Periodic Reporting for period 2 - PUNCTUATION (Pervasive Upstream Non-Coding Transcription Underpinning Adaptation)

Reporting period: 2019-08-01 to 2021-01-31

According to the convention in current textbooks, RNA acts as a passive carrier of information between DNA and protein. This view is highly over-simplified: it only applies to protein-coding messenger RNA (mRNA) that represents about 2% of the DNA in our cells. An astounding 80-90% of our genome produces RNA with unclear functional roles, since these do not encode proteins (non-coding RNA). Sequence changes of non-coding DNA reduces cellular health and causes human diseases. A key question of contemporary biology is therefore: how does non-coding DNA affect cellular processes on the molecular level? My proposal addresses this question by focusing on the hypothesis that non-coding DNA can affect biological processes through conversion into non-coding RNA by the processes of transcription. The results of my proposal promise to highlight the purpose of mysterious non-coding DNA in genomes. In particular, we aim to characterize this phenomenon on the molecular level to be in the position to predict where in genomes non-coding DNA may function through this mechanism. We perform this work in plants in the context of how plants sense and respond to changing temperature. Our results may thus also contribute to boost plant resilience to a changing climate.
My team succeeded with the development of novel methods to study Gene expression in plant genomes. We were able to develop a method to map precisely positions of transcription start sites (TSSs) by Next-Generation sequencing (TSS-seq). These data have helped us to identify that transcription frequently starts at unexpected positions, for example in the middle of genes. We next developed a novel method, Transcript Isoform Sequencing (TIF-seq) to simultaneously map the connections of TSSs to the positions where the transcript ends, poly-adenylation sites (PASs). TIF-seq showed that the DNA information of a plant gene is on average used to derive four RNA isoforms. We found a similar number of RNA isoforms per across a range of experimental conditions, including changing temperature. TIF-seq identified a novel species of non-coding RNA in plants: short-promoter proximal RNAs (sppRNAs). We identified sppRNA fortuitously in the data, and this discovery advances our understanding of plant gene expression enormously. sppRNAs represent a truncated non-coding RNA isoform that starts at the same position as the corresponding mRNA, but stops shortly after it started. While many short non-coding RNAs appear to repress genes, sppRNA formation appears to be positively correlated with gene expression. In addition, TIF-seq clarified that many TSSs we identified in the middle of transcription units go all the way to the predicted 3´-ends, so initiating transcription internally frequently generates RNA isoforms that would result in proteins lacking N-terminal protein domains. We performed TSS-seq and TIF-seq in mutants defective in nuclear RNA degradation, and this experimental trick helped us to identify many previously unknown transcription units that likely correspond to novel long non-coding RNA.
We developed a method to capture the last base added during RNA polymerase II (RNAPII) transcription, so the nascent 3´-end. We refer to this method as plant Native Elongating Transcript sequencing (plaNET-seq). plaNET-seq allows us to study plant gene expression with unprecedented single nucleotide resolution. We learned that plant gene expression is associated with promoter-proximal stalling at the first nucleosome RNAPII encounters. Interestingly, we found that the position promoter-proximal stalling matches the position of sppRNA termination, so plant gene expression is associated with promoter-proximal stalling and transcriptional termination and formation of a short non-coding RNA isoform. While further research is needed to evaluate the full significance of sppRNA formation, our discovery illustrates how the data generated as part of this project tremendously increase our knowledge of plant gene expression.
Our study of the plant response to cold by plaNET-seq revealed a fascinating example for molecular adaptation to low temperature. We observed that plant genes (i.e. transcription units) initially “shrink” when they experience cold, but already a few hours later transcription in the genome has adapted to the new environmental condition. We did not anticipate that we would be able to capture this remarkable period of environmental adaptation at the level of nascent transcription. Our findings will help to appreciate how temperature actually affects cellular processes at the molecular level, and to which extend organisms can compensate for these changes by novel genome-wide molecular adaptation mechanisms.
Importantly, plaNET-seq provides a parallel strategy to identify RNAPII transcription in the non-coding genome. We are working on an extensive annotation of the current transcript models based on our experimental data. Our experimental detection of massive non-coding transcription and a large variety of non-coding RNAs validates the central hypothesis of my proposal. We are using our novel genome annotation to work with selective loci to dissect the effect of non-coding transcription on overlapping protein coding genes.
In this regard, we could show that a novel long non-coding RNA, SVALKA, represses the CBF1 gene, needed for plant cold-acclimation. While CBF1 expression promotes freezing tolerance, constitutive activation of CBF1 results in fitness penalties. We found that SVALKA regulates CBF1 expression, so that CBF1 activity is limited to a short period after initial perception of cold. Mechanistically, SVALKA expression inhibits CBF1 expression shortly after CBF1 induction during cold perception. Our findings could inform biotechnological solution to enhance plant resilience to temperature changes. For example, by incorporating auto-inhibition through non-coding RNA expression to prevent fitness penalties that may arise from constitutive activation.
The genetic dissection of my proposal informed on chromatin-based pathways that shape the molecular decisions to activate or repress TSSs overlaid by another transcription unit. Our genomics method illustrate that this the “normal” situation during plant gene expression, as I had predicted in my proposal. Chromatin-based pathways, particularly linked to RNAPII elongation, are central to repressing TSSs within genes (i.e. intragenic TSSs). So far, we could identify a key contribution of the evolutionary conserved FACT complex. A computational project based on ChIP-seq data informed on elongation-linked chromatin signaling. Interestingly, H3K4me1 that we identified as particularly informative RNAPII elongation hallmark in plants participates in the repression of over 10.000 TSSs by the FACT complex. These novel insights into chromatin signaling during RNAPII transcription help us to identify and functionally characterize loci with evidence for pervasive upstream non-coding transcription underpinning adaption, an effort that is still underway.
The results of my project pushed the boundaries of knowledge of molecular regulation principles forward. We made discoveries that advanced the state-of-the art in plant science and biological sciences generally.

We developed Transcript Isoform Sequencing (TIF-seq) to simultaneously map the connections of TSSs to the positions where the transcript ends, poly-adenylation sites (PASs). We used plants as model for complex eukaryotes to develop this technology further. TIF-seq showed that the DNA information of a plant gene is on average used to derive four RNA isoforms. We found a similar number of RNA isoforms per across a range of experimental conditions, including changing temperature. This has profound implications to understand the effects of sequence polymorphisms underlying phenotypic diversity on the molecular level.
TIF-seq data revealed a novel species of non-coding RNA in plants: short-promoter proximal RNAs (sppRNAs). We identified sppRNA fortuitously, and this discovery advances our understanding of plant gene expression. sppRNAs represent a truncated non-coding RNA isoform that starts at the same position as the corresponding mRNA, but stops shortly after it started. While many short non-coding RNAs appear to repress genes, sppRNA formation appears to be positively correlated with gene expression. sppRNA formation coincides with center of the +1 nucleosome, where we have located the position of promoter-proximal RNA polymerase II (RNAPII) stalling in plants. These findings showed first what we saw presented at the EMBL conference meeting on chromatin&transcription for mammalian system when key mammalian RNAPII-pausing factors that plants lack were missing. So, sppRNA and their co-localization with RNAPII stalling sites appear to highlight a novel and widely shared concept of gene expression, where RNAPII stalls at the +1 nucleosome and is then targeted for transcriptional termination. Our finding that plant gene expression is positively correlated with sppRNA formation suggest that biotechnology could benefit from sppRNA-based approaches to boost gene expression.

We developed a method to capture the last base added during RNA polymerase II (RNAPII) transcription, so the nascent 3´-end. We refer to this method as plant Native Elongating Transcript sequencing (plaNET-seq). plaNET-seq allows us to study plant gene expression with unprecedented single nucleotide resolution. We learned that plant gene expression is associated with promoter-proximal stalling at the first nucleosome RNAPII encounters. Moreover, we identified a stalling position in plant introns that likely contributes to the accuracy of plant gene expression. Importantly, plaNET-seq provides a parallel strategy to identify RNAPII transcription in the non-coding genome. We are working on an extensive annotation of the current transcript models based on our experimental data. Our experimental detection of massive non-coding transcription and a large variety of non-coding RNAs validates the central hypothesis of my proposal.

Our study of the plant response to cold by plaNET-seq revealed a fascinating example for molecular adaptation to low temperature. We observed that plant genes (i.e. transcription units) initially “shrink” when they experience cold, but already a few hours later transcription in the genome has adapted to the new environmental condition. Our findings will help to appreciate how temperature actually affects cellular process at the molecular level, and to which extend organisms can compensate for these changes by novel genome-wide molecular adaptation mechanisms.

We completed one study characterizing the long non-coding RNA, SVALKA. SVALKA represses the CBF1 gene, needed for plant cold-acclimation. While CBF1 expression promotes freezing tolerance, constitutive activation of CBF1 results in fitness penalties. We found that SVALKA regulates CBF1 expression, so that CBF1 activity is limited to a short period after initial perception of cold. Mechanistically, SVALKA expression inhibits CBF1 expression shortly after CBF1 induction during cold perception. Our findings could inform biotechnological solution to enhance plant resilience to temperature changes. For example, by incorporating auto-inhibition through non-coding RNA expression to prevent fitness penalties that may arise from constitutive activation. Feedback inhibition by non-coding transcription of nearby mRNA appears to be an emerging concept in this area that our results nicely illustrate.
My team succeeded with the development of novel methods to study Gene expression in plant genomes. We were able to develop a method to map precisely positions of transcription start sites (TSSs) by Next-Generation sequencing (TSS-seq). These data have helped us to identify that transcription frequently starts at unexpected positions, for example in the middle of genes. We next developed a novel method, In addition, TIF-seq clarified that many TSSs we identified in the middle of transcription units go all the way to the predicted 3´-ends, so initiating transcription internally frequently generates RNA isoforms that would result in proteins lacking N-terminal protein domains. We verified this idea suggested by TSS-seq data with TIF-seq that is ideally suited for this question. TSS-seq and TIF-seq in mutants defective in nuclear RNA degradation helped us to identify many previously unknown transcription units that likely correspond to novel long non-coding RNA.
The genetic dissection of my proposal informed on chromatin-based pathways that shape the molecular decisions to activate or repress TSSs overlaid by another transcription unit. Our genomics method illustrate that this the “normal” situation during plant gene expression, as I had predicted in my proposal. Chromatin-based pathways, particularly linked to RNAPII elongation, are central to repressing TSSs within genes (i.e. intragenic TSSs). We performed an extensive re-analyses of state-of-the art Arabidopsis ChIP-seq data, that made us suggest the “Global positioning system during RNAPII transcription (RNAPII-GPS)”. We mad these genomics data available for teachers to promote teaching in chromatin biology. In plants, we identified H3K4me1 as key signature for RNAPII elongation.
So far, we could identify a key genetic contribution of the evolutionary conserved FACT complex. Our genetic dissection is ongoing, and the dissemination of several factors linked to chromatin signaling is planned for the future.
A computational project based on ChIP-seq data informed on elongation-linked chromatin signaling. Interestingly, H3K4me1 that we identified as particularly informative RNAPII elongation hallmark in plants participates in the repression of over 10.000 TSSs by the FACT complex. These novel insights into chromatin signaling during RNAPII transcription, in combination with our genomics data will help us to identify and functionally characterize loci with evidence for pervasive upstream non-coding transcription underpinning adaption.

We have generated key reagents based on CRISPR/dCas9 technology and TALs to purify selected loci for proteomics. I anticipate that we will contribute beyond the state-of-the-art in this area in the remaining project period.