Skip to main content
European Commission logo
italiano italiano
CORDIS - Risultati della ricerca dell’UE
CORDIS

Enhancing gene network inference from single-cell transcriptomics data through biophysical constraints

Periodic Reporting for period 1 - scNet (Enhancing gene network inference from single-cell transcriptomics data through biophysical constraints)

Periodo di rendicontazione: 2021-04-01 al 2023-03-31

Regulatory processes within living cells have long been the topic of research interest and the key to understanding various diseases. The decades of studies resulted in a large body of knowledge on molecular interactions and regulatory pathways in the cells of model organisms ranging from microorganisms to mammals. Nevertheless, accurately inferring gene network topology at the scale of a whole cell has remained an intractable task until recently, mostly due to the large amount of single-cell data needed for such inference. In the last few years, single-cell RNA sequencing (scRNA-seq) technology enabled measuring transcriptome of high numbers of individual cells, which allowed observing a much greater share of the multidimensional parameter space of large gene networks and gave rise to multiple inference methods. However, none of the existing methods incorporate all relevant knowledge on biophysical constraints. This project aims to incorporate prior knowledge on the system; decomposition of measurement, extrinsic and intrinsic noise; and accurate representation of stochastic gene expression and its regulation into a Bayesian inference framework for identifying topology of a gene network and rate constants of its molecular interactions. The performance of the inference algorithm will be tested by evaluating its ability to predict the effects of transcription factor deletion perturbations. Enhancing gene network inference by accounting for the wealth of known biophysical constraints could provide insights into the gene regulatory processes that would enable advancement in developmental and evolutionary biology, biomedicine and bioengineering.
First, we worked on defining a knowledge graph with attributes added to edges. For example, instead of “gene A activates gene B”, we aim to encode “gene A activates gene B with a certain rate that depends on the state of the local nodes or on the external parameters”. These attributes are suitable for encoding mechanistic constraints of gene expression and regulation, such as “The rate of activation of B by A increases linearly until the concentration of B reaches a certain value. Above this value, the activation rate stays constant.”

Further, in order to gain insight into the kinetics and mechanisms of existing gene interactions (i.e. the network structure), we worked on combining this static information with the temporal information from scRNA-seq data. Using a hierarchical Bayesian model with delays, we represent a gene network as a system of stochastic chemical reactions, where biological constraints and molecular interaction kinetics, such as the cell size and the regulatory mechanisms of gene expression, are incorporated implicitly in the mRNA and protein production reaction rates and delays. To evaluate the accuracy of our modelling framework, we simulate in silico scRNA-seq data of a sparse and modular gene network with a delayed Gillespie algorithm. We then reconstruct pseudotime from this data using the existing software (Cyclum and MONOCLE). The next step is, given the ground truth on which genes interact with each other, to infer the kinetic parameters and the mechanisms of these interactions.

While conducting work described above, we encountered the difficulties in obtaining accurate pseudo-time reconstruction from large networks. To mitigate that, we developed a strategy where pseudo-time is reconstructed from gene clusters, which often correspond to functional gene groups in cellular gene networks, and then the trajectories of each of the gene clusters are connected via the genes that belong to multiple clusters simultaneously. We are now preparing for publication the manuscript that presents this pseudo-time reconstruction strategy.

Finally, we developed a new modelling framework that can capture the fingerprint of temporal heterogeneity of gene expression environment from the snapshot data such as scRNA-seq. In particular, this heterogeneity can affect even the mean expression levels, as a consequence of bimolecular interactions with low-abundant environment species. We also described several examples in which incorporating this heterogeneity allows to explain the mean gene expression levels observed from the data. This work is now a manuscript in preparation.
The main results that go beyond the state of the art are introducing modular approach to pseudo-time reconstruction from scRNA-seq data and developing a model that describes the effects of temporal heterogeneity in gene expression environment on the mean levels of gene expression. Although this project is already terminated (due to the beneficiary switching to a different, longer funding source), we plan to keep on working towards gene network inference from single-cell omics data augmented with the data on mechanistic constraints and known gene interactions. During the year that we worked on this project, we discovered that it is not possible to reliably reconstruct pseudo-time from scRNA-seq data without approaching the task modularly. This is something that we did not predict while planning for this project, and resolving this difficulty will help both with carrying out the development of the network inference framework and also will be of use to the wider scientific community.
Modular pseudo-time reconstruction: Cell cycle pseudo-time is reconstructed wrongly from the whole s