Skip to main content

Cis-regulatory variation: Using natural genetic variation to dissect cis-regulatory control of embryonic development

Final Report Summary - CISREGVAR (Cis-regulatory variation: Using natural genetic variation to dissect cis-regulatory control of embryonic development.)

Embryonic development is governed by tightly regulated spatial and temporal gene expression. Despite a considerable degree of genetic variation, embryos develop in a stereotypic manner. The CisRegVar project sets out to address how genetic variation (in cis-regulatory elements) affects transcription factor binding and gene expression during embryonic development. To address this, we made use of a large panel of isogenic strains collected from wild isolates of Drosophila (the DGRP collection). Given that the polymorphism level in Drosophila is 10-fold greater in humans and linkage disequilibrium blocks are unusually short, the statistical power of this study was very high.

We quantified RNA levels in embryos collected from 82 genotypes (DGRP lines) at three developmental stages during embryonic development and assessed the impact of genetic variation on gene expression. We identified thousands of stage-specific and shared Quantitative Trait Loci (QTLs), that affected transcriptional and post-transcriptional regulation. The near base-pair resolution obtained by 3’Tag seq allowed us to pinpoint causal mutations in enhancers, RNA motifs and transcription factor binding sites. Interestingly, many QTLs showed genetic epistasis within developmental enhancers. Our results uncover context-specific effects of genetic variation at specific embryonic stages and identifies mechanisms that buffer these effects to ensure robustness within developmental programs.

By performing 5’ CAGE on the same 82 genotypes and embryonic stages, we quantified variation in transcription start site (TSS) usage during development. We identified 5000 QTLs giving rise to two classes of phenotypes: (1) changes in the total level of TSS usage and (2) changes in the spatial distribution of TSS intensities (e.g. promoter shape). Our results indicate that promoter shape QTLs increase transcriptional noise, often without affecting transcript abundance. Promoter shape is therefore an independent evolvable trait that has an impact on the evolutionary constraints of promoter variants.

Taking advantage of the huge RNA-Seq datasets generated during the course of this project, we also identified long non-coding RNAs (lncRNAs) and enhancer RNAs (eRNAs) that are transcribed during these stages of embryogenesis. Hundreds of new stage- and tissue-specific lncRNAs were identified, several of which displayed strain-specific expression. Deletion of selected lncRNAs had no effect on development or viability. Their expression is generally highly correlated with neighbouring protein coding genes. We also found that the expression of many lncRNAs is strain-specific, highlighting intra-species variation in non-coding RNA transcription. Overall, our data indicates that genetic variants can easily generate non-coding transcription (often with very complex expression patterns) – these will be classified as lncRNAs, but generally reflect by-stander non-functional expression.

To understand the properties of eRNA expression for enhancer and promoter function, we developed a dual transgenic assay to simultaneously measure enhancer and promoter activities. Transgenic analysis revealed a relationship between the direction of eRNA transcription and its enhancer or promoter activity. Bidirectionally transcribed enhancers can act as weak promoters in vivo, whereas bidirectionally transcribed promoters often act as strong enhancers. Strong promoters generally have no enhancer activity. These results suggest that the level of either promoter or enhancer activity from a regulatory element is reflected in the directionality and levels of eRNA transcription.

Taken together, these studies show that genetic variation (1) has a substantial impact on gene expression variation during embryonic development, (2) impacts not only the levels, but also the specific transcript isoforms that are produced, creating a huge diversity at both the 5’ and 3’ ends of transcripts and (3) is partially buffered by extensive genetic epistasis within both enhancers and promoter elements.