Skip to main content
European Commission logo print header

Modelling latent causes in molecular networks

Final Report Summary - LATENTCAUSES (Modelling latent causes in molecular networks)

In systems biology, we develop mathematical models to solve biological questions. Any model only approximates reality, leaving out details or other types of regulation, but in many cases such models can still explain biological observations sufficiently well. In this ERC starting grant we focused on cases where given models failed to predict a set of observations with acceptable accuracy. To that end, we aimed to identify latent factors that cause the deviation from the original model. We applied these methods to molecular data from stem cell biology, with generated models for hematopoiesis as well as embyronic stem cell development.
We developed a statistical method to estimate the time course and impact of a latent component (e.g. transcription factors or microRNAs in embryonic stem cell differentiation) on a dynamical system without having to specify the biological function of the latent component in advance (Kondofersky et al., IET Syst. Biol. 2015). Specifically, we extended ordinary differential equation (ODE) models by latent variables, which can be estimated using tools from penalized spline approximation, maximum likelihood approximation and model selection. If such a latent cause is found by the statistical method, it may be validated in collaboration with experimental partners. In a follow-up project (Kondofersky et al., IET Syst. Biol. 2016) we demonstrated how the method can be applied to identify catalyzing components in otherwise known networks in a computationally efficient way. In parallel, we aimed at identifying independent projections of interest in multivariate data obtained from structured models in an unsupervised manner to visualize the impact of latent variables on the observations. We thus developed a tool to analytically separate sources with respect to a given biological network (Illner et al., 2012, 2014a, 2014b, 2015). The Bayesian setting allowed us to include missing observations and parameter priors. We demonstrated that the new model could identify relevant biological interactions.
A dominant cause for misfit of cellular models are single-cell heterogeneities, which cannot be seen in population averages but are present at the transcriptional level. Initially, we asked how such heterogeneities could be detected and quantified. In order to keep technical noise to a minimum and to save costs, we initially analyzed measurements of small cell populations rather than single-cell measurements. The challenge was to extract the latent single-cell information from such data. We achieved this by identifying single-cell expression states and cell-type frequencies through maximum-likelihood inference, and extended the method to programs of heterogeneously co-expressed transcripts for systems-level applications (Bajikar et al., PNAS, 2014). This adds an important tool for stochastic profiling studies that seek to understand the heterogeneous regulation of molecular states and cell decisions.
For a more fine-grained picture of regulated proteins, we started to focus on high-dimensional gene-expression patterns measured by single-cell qPCR or single cell RNA-seq techniques, which had started to become available during the course of the grant. We introduced a novel approach for visualizing and analyzing single-cell gene expression data by latent variable models and were able to identify distinct gene expression patterns in the context of cell fate decisions during development (Buettner and Theis, Bioinformatics, 2012 and Buettner et al., Bioinformatics, 2014). We applied our method to analyze data of a few thousand single cells of the bone marrow in a collaboration with the Goettgens lab in Cambridge, UK. We identified characteristic gene expression patterns correlated with specific cell fates and uncovered a previously unknown link between certain transcription factors (Moignard et al, Nat Biotechnol., 2015). This is of particular interest as two of these factors have recently been reported to be linked with leukemia, but with opposite effects on survival times.
Furthermore we devised models and tools for cellular differentiation based on time-lapse microscopy in collaboration with the experimentally oriented Schroeder lab, ETH Zürich. Specifically, we developed methods to normalize fluorescence images from time-lapse movies using machine-learning algorithms. The combination with image processing methods allowed us to measure cellular fluorescence of single cells. We incorporated these methods in a software tool that provides an efficient workflow for the quantification and analysis of cellular fluorescence in time-lapse movies (Hilsenbeck et al, Nature Biotechnol. 2016, in press). Quantifications were validated based on established flow cytometry data and applied to mouse embryonic stem cells and mouse blood stem cells (Hoppe et al., Nature, 2016, in press). Finally we used this method to set up a model for heterogeneity of a key pluripotency factor, Nanog, in mouse embryonic stem cells (Filipcyzk et al., Nature Cell Biol., 2015).

Altogether this show-cases the successful application of computational methods and modeling for the interpretation of large-scale data in single-cell biology.