Skip to main content

Determinants of mammalian transcription start site selection and core promoter usage

Final Report Summary - DTSSCP (Determinants of mammalian transcription start site selection and core promoter usage)

The overall goal of this project was to understand the signals that contribute to transcription start site(TSS) selection and core promoter usage in mammalian genomes, ranging from local sequence signals, enhancer regions and epigenetics using computational methods on novel data sets.
Our main achievements are summarized here:
1)A method to predict transcription initiation and RNA polymerase stalling from epigenetics data.
We have showed that chromatin accuracy can predict the locations of core promoters, the amount of recruited RNAPII at the promoter, the amount of elongating RNAPII in the gene body, the mRNA production originating from the promoter and finally also the stalling characteristics of RNAPII by considering both quantitative and spatial features of histone modifications around the transcription start site. As the model framework can also pinpoint the signals that are the most influential for prediction, it can be used to infer underlying regulatory biology. For example, we show that the H3K4 di- and tri- methylation signals are strongly predictive for promoter location while the acetylation marks H3K9 and H3K27 are highly important in estimating the promoter usage. All of these four marks are found to be necessary for recruitment of RNAPII but not sufficient for the elongation. We also show that the spatial distributions of histone marks are almost as predictive as the signal strength and that a set of histone marks immediately downstream of the TSS is highly predictive of RNAPII stalling (Published in BMC Genomics and Trends in Genetics)
2 The relation between promoters and small RNAs upstream, within and downstream of TSSs
Efforts to catalog eukaryotic transcripts have uncovered many small RNAs (sRNAs) derived from gene termini and splice sites. Their biogenesis pathways are largely unknown, but a mechanism based on backtracking of RNA polymerase II (RNAPII) has been suggested. By sequencing transcripts 12–100 nucleotides in length from cells depleted of major RNA degradation enzymes and RNAs associated with Argonaute (AGO1/2) effector proteins, we provide mechanistic models for sRNA production. We showed that transcription start site–associated RNAs do not arise from RNAPII backtracking. Instead, SSa RNAs are largely degradation products of splicing intermediates, whereas TSSa RNAs probably derive from nascent RNAs protected by stalled RNAPII against nucleolysis. IN a follow up-article, we characterized uptrseam and unstable transcription start site–associated RNAs. We found that motif analyses around th RNA 3’ ends revealed polyadenylation (pA)-like signals. Mutagenesis studies demonstrated that PROMPT pA signals are functional but linked to RNA degradation. Moreover, pA signals are under-represented in promoter-downstream versus promoter-upstream regions, thus allowing for more efficient RNAPII progress in the sense direction from gene promoters. We conclude that asymmetric sequence distribution around human gene promoters serves to provide a directional RNA output from an otherwise bidirectional transcription process (Published in Nature Structural and Molecular Bioogy 2011 and 2013). Thereby, we can link promoter sequence signals to abortive transcription or productive elongation.

3 Inference of enhancer locations and their interaction with promoters using capped RNA signatures.
Enhancers are long-range regulatory elements that enhance the transcription of TSSs. They control the correct temporal and cell-type-specific activation of gene expression in higher eukaryotes. Knowing their properties, regulatory activity and targets is crucial to understand the regulation of differentiation and homeostasis. We show that enhancers share properties with CpG-poor messenger RNA promoters but produce bidirectional, exosome-sensitive, relatively short unspliced RNAs, the generation of which is strongly related to enhancer activity. Combining this with FANTOM 5 data, covering the large variety of human cell types, we could produce an atlas of active, in vivo-transcribed enhancers. The atlas is used to compare regulatory programs between different cells at unprecedented depth, identifying target TSSs for enhancers to identify disease-associated regulatory single nucleotide polymorphisms, and to classify cell-type-specific and ubiquitous enhancers. We further explore the utility of enhancer redundancy, which explains gene expression strength rather than expression patterns. The enhancer atlas represents a unique resource for studies on cell-type-specific enhancers and gene regulation. This was recently accepted by Nature.