Periodic Reporting for period 4 - scAssembly (Algorithms and experimental tools for integrating very large-scale single cell genomics data)
Reporting period: 2022-06-01 to 2023-05-31
This gap made effort for translating advances in biology toward medical impact difficult – most common diseases are not emerging due to one faulty gene or some global malfunction across all the cells in an individual. Instead, diseases are emerging when some of our cells develop abnormal behavior, either internally, or through mis-communication with other cells. The emerging technology of single cell genomics is revolutionizing our ability to address precisely these types of problems: we can profile genomes or gene activity profiles across thousands of cells from normal or diseased tissue, understand which cells are malfunctioning by comparing them to newly established atlases of cellular behaviors, and examine the impact of different therapies at high resolution and without assuming all cells respond similarly or collectively. The objectives of the scAssembly project is to develop new computational methodology for making sense of single cell genomics experiments in health in disease.
Our research in scAssembly provided significant advances with these challenges. scAssembly thereby contributed to the dramatic improvement in single cell genomics and its impact, as observed in almost all fields of biology. Highlights of our key contributions include:
1. We developed algorithms for partitioning scRNA-seq datasets into metacells: disjoint and homogenous groups of profiles that could have been resampled from the same cell. Unlike clustering analysis, our algorithm specializes at obtaining granular as opposed to maximal groups (Baran et al 2019). We worked on highly scalable implementation of Metacell, capable of handling many millions of profiles (Ben-kiki etal, in 2022).
2. Using metacells to represent cell atlases quantitatively, we developed algorithms that allow interpretation of new data (“query”) as projections over an existing atlas (Ben-Kiki et al 2023).
3. We showed how to model dynamics over quantitative transcriptional maps. When applying this to models of cancer immunotherapy, we described how T-cells killing potential is declining as part of a complex dynamics of proliferation, differentiation and turnover (Li et al 2019, Barboy et al 2023 in revision).
4. Since single cells are always functioning in the context of tissues and within ensembles of thousands of other cells, we develop strategies for understanding dynamics of groups of cells. This was first used to link variation of transcriptional states in single cell populations and the emergent properties of cancer cell populations, in particular given epigenetic deterioration (Meir et al 2020). Even more ambitiously, we developed models describing embryonic development as the collective dynamics of cells over a metacell model (Mittnenzweig et al. 2021). Our new “differentiation flow” models became the basis for all our work on embryonic development.
5. We showcased how to use the new tools in order to understand the function of epigenetic regulation during embryonic development. This was approached using in-vivo analysis of embryos lacking a functional de-methylation machinery (Cheng et al 2022), through analysis of chromosomal conformation in embryos (Rappaport et al 2023), and by developing meso-endo embryoid models and characterizing the impact of single, double and triple de-novo methyltransferase knockouts (Mukamel et al 2023).
These works provide us with tools that go significantly beyond the original aims set for scAssembly originally. It provides us with opportunities for translating the single cell genomics technology revolution into quantitative, robust and highly interpretable models and discovery platforms. And it shows how to apply new tools to address some of the most fascinating questions in modern biology – from the harmonious emergence of tissues and organs in mammalian embryos, to the deterioration of homeostasis and emergence of disease in ageing tissues.