Periodic Reporting for period 1 - DeepCell (Learning and modeling the molecular response of single cells to drug perturbations)
Berichtszeitraum: 2023-01-01 bis 2025-06-30
Building on a pilot study that successfully predicted gene expression changes in response to stimuli, DeepCell extends this approach by developing a multi-condition, multi-modal deep-learning framework for both normal and spatially resolved genomic data. Unlike classical small-scale systems biology models, the DeepCell model introduces greater flexibility, enabling the interrogation of complex drug interactions and the characterization of gene regulatory landscapes through deep network interpretation.
DeepCell provides a unique opportunity to leverage cell-based drug screens for fundamental questions in gene regulation and treatment outcome prediction. As a proof of concept, the project focuses on identifying key regulators of enteroendocrine lineage selection in the intestine. To achieve this, we have designed a 500-compound single-cell organoid RNA-seq screen, incorporating compounds from a spatial imaging screen of 200,000 intestinal organoids. These data will be modeled using DeepCell to predict optimal treatment strategies for obese mice, laying the groundwork for future in silico drug screening. This approach has the potential to accelerate drug discovery and transform clinical decision-making by enabling rapid computational predictions of drug effects.
To formalize the challenge of predicting transcriptomic responses to small molecules, DeepCell introduces a comprehensive benchmark that provides a standardized framework—including data, models, and evaluation metrics—to systematically assess machine learning methods for drug response prediction. This benchmark is based on a high-quality single-cell dataset profiling 146 chemical compounds in peripheral blood mononuclear cells (PBMCs) from three donors, capturing transcriptomic signatures before and after drug exposure. In an open competition with over 1,300 participants, cutting-edge machine learning models—including neural networks and transformer-based approaches—demonstrated the ability to accurately predict drug-induced gene expression changes in unseen conditions.
To complement these efforts, we developed Moscot, a scalable framework for mapping cellular states using optimal transport (OT). Given the destructive nature of single-cell sequencing and the limitations of capturing multiple modalities simultaneously, aligning distributions of cells efficiently is crucial. Moscot enables us to track hundreds of thousands of single cells across multiple time points in developing mouse models, leading to the discovery of the first epsilon-cell specific transcription factor in pancreatic development. Additionally, we applied moscot to align slices of spatial transcriptomics, an essential step for constructing a comprehensive tissue-level view by integrating gene expression, surface proteins, and chromatin accessibility. Furthermore, we introduced a novel method for spatiotemporal trajectory inference, allowing for the mapping of spatial transcriptomic data over time. These advancements open up new possibilities for modeling cellular state transitions, enhancing our ability to predict cell fate decisions and optimize therapeutic interventions.
By addressing the critical need for predictive modeling in single-cell genomics, DeepCell is set to transform drug discovery and precision medicine. The project’s innovative machine learning approaches, combined with high-throughput experimental validation, will accelerate the identification of effective drug treatments and provide a computational foundation for simulating complex biological processes. Through its integration of machine learning, multi-omics data, and large-scale perturbation screens, DeepCell establishes a scalable and interpretable framework that advances both fundamental biology and clinical applications.
2. We introduced CellRank 2, a scalable framework that analyzes multiview single-cell data to predict cellular fates and trajectories. It effectively identifies terminal states and fate probabilities, integrates data across time points, and estimates transcription and degradation rates, enhancing understanding of cellular dynamics in development. This work was published in Nature Methods (https://doi.org/10.1038/s41592-024-02303-9(öffnet in neuem Fenster)).
3. We developed an open-source Python framework for analyzing heterogeneous electronic health records (EHRs). It streamlines data extraction, quality control, and statistical analysis, supporting advanced applications like patient stratification and survival analysis, This work is published in Nature Medicine (https://doi.org/10.1038/s41591-024-03214-0(öffnet in neuem Fenster)).
4. We introduced scPoli, an open-world learner that integrates single-cell atlases by learning representations to handle heterogeneous data. It supports data integration, label transfer, and reference mapping, effectively managing sample variations and enhancing atlas utility for biological insights. This work is published in Nature Methods (https://doi.org/10.1038/s41592-023-02035-2(öffnet in neuem Fenster)).
5. We developed an experimental pipeline that uses combinatorial indexing and automation technology for massively parallelized scRNA-seq profiling of post-perturbation organoids.