Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Decoding the Multi-facets of Cellular Identity from Single-cell Data

Periodic Reporting for period 1 - DecodeSC (Decoding the Multi-facets of Cellular Identity from Single-cell Data)

Reporting period: 2022-10-01 to 2025-03-31

Advances in technologies that measure gene expression at single-cell resolution have revolutionized our understanding of the heterogeneity, structure and dynamics of tissues and whole organisms in health and disease. Yet, in most single-cell experiments tissue structure, temporal trajectories, and their underlying mechanisms are lost or not directly accessible. Despite experimental advances, major gaps remain in understanding how tissues orchestrate multicellular functions. In recent years, we and others focused on computationally recovering single facets of single-cell data, such as tissue structure or differentiation trajectories. However, each cell encodes multiple layers of information about its type, location, and various biological processes. Disentangling these signals from large-scale, high-dimensional single-cell data is a major challenge. The goal of this project is to take a unique approach to this challenge by developing computational methodologies combining machine learning and dynamical systems approaches to: tease apart multiple cellular facets encoded in single-cell data; infer interactions between these facets and mechanisms shaping spatiotemporal expression across them; and derive generative models to sample and predict unobserved cell states and design optimal perturbations, providing an interpretable platform to study conditions leading to a physiological disruption and therapies aimed at reversing it. The research program spanned by this project tackles the core challenge in the single-cell era - transforming this exponentially growing, complex data into insights and principles for the underlying biology of multicellular systems. It aims to advance our understanding and control of collective tissue behavior, and uncover the multiple facets of cellular identity in health and disease, and thus expected to be valuable for both basic and translational research.
We have made progress, and published corresponding open-access manuscripts and software packages, on each one of the Aims of the project.

Signal Filtering. We developed two methods, probabilistic and spectral-based. Specifically, we developed a kernel-based method for probabilistically filtering known (measured or computationally reconstructed) signals from single-cell data to expose over-shadowed signals, which are originally harder to detect (Piran and Nitzan, Nature Communications 2024). We demonstrated that our method, SiFT, could enhance the signal of the circadian rhythm within single-cell data of liver lobules by filtering the signal of spatial zonation. In a different setting, SiFT was demonstrated to filter the healthy background signal from COVID-19 single-cell patient data to expose disease-related cells and dynamic signatures.
Additionally, we developed a spectral computational method that uses topological priors to decouple, enhance and filter different classes of biological processes (e.g. periodic or linear) in single-cell data using spectral template matching (Karin, Bornfeld and Nitzan, Nature Biotechnology, 2023). We demonstrated the use of our method, scPrisma, for the analysis of diverse processes, such as the circadian rhythm and spatial zonation, diurnal cycle in Chlamydomonas, and circadian rhythm in the suprachiasmatic nucleus in the brain.

Signal disentanglement. We developed a generative deep learning approach, biolord, to simultaneously disentangle multiple biological signals from single-cell data (Piran et al., Nature Biotechnology, 2024). The computational framework is tailored for multiple types of biological data, including single-cell RNA-sequencing. Beyond disentanglement, biolord allows for counterfactual predictions – effectively virtually shifting cells across space, time and conditions to predict cellular responses to perturbations and sample unseen cellular states. The framework was shown to reveal gene expression programs associated with liver infection, and in a different context, to predict cellular responses to unseen drugs and genetic perturbations.

For both filtering and disentanglement of signals in single-cell data, having integrated, high-quality annotations of measured cells is key. To advance this goal, we developed an optimal transport-based method for annotation transfer across biological datasets (Mages, Moriel, Avraham-Davidi, et al., Nature Biotechnology, 2023), and proposed an algorithm, based on iterative gene weight updates, to improve the robustness of computational reconstruction of single-cell data (Sheng, Barak and Nitzan, Bioinformatics, 2023).

Towards the goal of deriving biological universality classes for processes occurring in various cellular populations under different conditions, we have developed physics-informed machine learning frameworks, for the identification and classification of different dynamical regimes and bifurcations within snapshot measurements, as well as for the recovery of underlying equations or mechanisms driving such dynamics (Ricci et al., ICLR, 2023; Moriel, Ricci and Nitzan, ICLR, 2024). These works allowed us, for example, to recover distinct dynamical regimes corresponding to proliferation vs. differentiation in pancreas development from single-cell RNA-sequencing data, without the use of prior (gene-based) knowledge.

Towards identifying design principles guiding the low-dimensional structure and distributions of cells in expression space, we demonstrated how division of labor theory predicts distinct, mechanism-dependent spatial patterns. By optimizing collective cellular task performance under trade-offs, we found that distinguishable expression patterns can emerge from cell-cell interactions versus instructive signals, and demonstrated these results across multiple tissues and biological contexts (Adler et al., Cell Reports, 2023).
1) We have advanced the conceptual understanding and developed effective computational methodologies for biological signal filtering (Piran and Nitzan, Nature Communications, 2024), manipulation (Karin, Bornfeld and Nitzan, Nature Biotechnology, 2023), and disentanglement (Piran et al., Nature Biotechnology, 2024) in single-cell data.

2) We have developed physics-informed machine learning frameworks for the embedding, classification and derivation of underlying equations for dynamical systems, with applications to the characterization of different dynamical regimes, and therefore different classes of biological processes, in single-cell data (Ricci et al., ICLR, 2023; Moriel, Ricci and Nitzan, ICLR, 2024).

3) We have suggested a framework and associated computational tools for characterizing mechanisms contributing to division of labor of interacting cellular populations derived from single-cell and spatial transcriptomics data (Adler et al., Cell Reports, 2023).

Additionally, during the work on the project, we have realized that, as a basis for filtering and disentanglement of signals in single-cell data, we needed more robust reconstructions, annotation transfer and characterization of single-cell data, which gave rise to two corresponding projects that address these challenges (Sheng, Barak and Nitzan, Bioinformatics, 2023; Mages, Moriel, Avraham-Davidi, et al., Nature Biotechnology, 2023).
My booklet 0 0