Skip to main content

Model-based Data Analysis of Transcription and Splicing


Gene expression is the fundamental process that in all cells produces functional protein from a genomic DNA template using a messenger RNA (mRNA) intermediate. Eukaryotic gene expression involves transcription--the polymerization of mRNA--and splicing--the removal of non-coding regions from the mRNA. Recent evidence shows that nascent mRNAs are spliced while still being transcribed, not after completion of transcription, and that splicing machinery regulates transcription. This cross-talk complicates understanding of gene expression, as its mechanism and consequences are not understood. This project proposes using model-based data analysis, applied to multiple types of data, to study the kinetics of coupled transcription and splicing.

Model-based data analysis is a statistical framework in which models are formulated as probability distributions encoding the stochastic interactions between components, including observed data. Knowledge of the underlying mechanism--here, biological--is used to quantify both the phenomenon, and the uncertainty resulting from partial knowledge and noisy observations. The need for such analysis is acute in modern biology: decades of molecular biology have yielded detailed information on specific molecules and pathways, and now next-generation sequencing (NGS) allows scientists to collect gigabytes of data on thousands of distinct molecules simultaneously. Yet, integrating these approaches is challenging: biologists struggle to analyze NGS data in ways that give insight into known--and previously unknown--biological mechanisms.

Here, the model-based data analysis paradigm will be used to interrogate the interplay of transcription and splicing, using state-of the art data including time-resolved NGS measurements of RNA processing. Working with experimentalists, we will quantify the kinetics of splicing in constitutive genes by labeling nascent transcripts, and estimate the effect of splicing on polymerase elongation genome-wide.

Field of science

  • /natural sciences/biological sciences/molecular biology
  • /natural sciences/computer and information sciences/data science/data analysis

Call for proposal

See other projects for this call

Funding Scheme

MSCA-IF-EF-ST - Standard EF


Old College, South Bridge
EH8 9YL Edinburgh
United Kingdom
Activity type
Higher or Secondary Education Establishments
EU contribution
€ 195 454,80