Skip to main content

Model-based Data Analysis of Transcription and Splicing

Periodic Reporting for period 1 - MoDATS (Model-based Data Analysis of Transcription and Splicing)

Reporting period: 2016-01-05 to 2018-01-04

Gene expression is the fundamental process that in all cells produces functional protein from a genomic DNA template using a messenger RNA intermediate. We used computational tools applied to large datasets of gene expression to understand how several steps of gene expression are co-ordinated.

This is important because understanding the basic workings of cells is curcial for treating disease and also for bio-engineering. It’s also important because we are developing computational approaches that help other biologists better understand their data.
We focused on two crucial steps in eukaryotic gene expression: transcription (the polymerization of mRNA from a DNA template) and splicing (the removal of non-coding “intron” regions from the mRNA). We applied these to the best available data from “next-generation sequencing” measurements in budding yeast. Budding yeast (Saccharomyces cerevisiae) is a popular “model organism”, because it is easy to manipulate and key features of its biology are shared with other funig, plants, and humans, including the gene expression machinery. We used data science tools, including model-based data analysis, to show that for most yeast genes splicing happens very quickly after transcription, but for some it does not. We found that splicing is particularly fast for most mRNAs coding for parts of the ribosome, the molecular machine that makes proteins from a mRNA template. This work was published in:

Wallace EWJ, Beggs JD. 2017. Extremely fast and incredibly close: cotranscriptional splicing in budding yeast. RNA 23: 601–610.
https://doi.org/10.1261/rna.060830.117

The accompanying image, taken from the paper, compares the different splicing measurements (SMIT, nascent RNA-seq, 4tU-seq) and other relevant features of yeast mRNAs (intron length, mRNA abundance). It highlights the different features and splicing patterns of ribosomal vs non-ribosomal RNAs.


We also developed computational models and software to understand a later stage of gene expression, the production of protein from mRNA by ribosomes.

Carja O, Xing T, Wallace EWJ, Plotkin JB, Shah P. 2017. riboviz: analysis and visualization of ribosome profiling datasets. BMC Bioinformatics 18: 461.
https://doi.org/10.1186/s12859-017-1873-8
Our work included a novel meta analysis of several different kinds of sequencing data. Dr. Edward Wallace is continuing this line of work as a Sir Henry Dale Fellow at the University of Edinburgh.

In addition to the research outputs, we presented the work to colleagues in university seminars and conferences in Europe and the USA. Dr. Wallace shared his knowledge with colleagues at Edinburgh, and trained other researchers in fundamentals of data processing in Data Carpentry workshops and at the Natural History Museum in London.
Extremely fast and incredibly close