Skip to main content

Algorithms and experimental tools for integrating very large-scale single cell genomics data

Periodic Reporting for period 2 - scAssembly (Algorithms and experimental tools for integrating very large-scale single cell genomics data)

Reporting period: 2019-06-01 to 2020-11-30

Much of the biology we know, and in particular our own human biology is achieved by the collective and coordinated behavior of individual cells. This is true when studying developing embryos, healthy tissues, and become even more apparent when looking at different diseases. Major advances in the biological sciences during the last 70 years provided us with in-depth understanding of some of the most important mechanisms allowing cells to achieve diverse function, including the remarkable flexibility allowing cells to control which genes will be active at which tissue and in response to multiple environmental queues or inter-cellular communication. The genomics revolution of the last 30 years complemented these discoveries with the possibility to fully read the genes present in each individual organism or every human patient. But until recently it was difficult to link the detailed and fine-tuned behavior of cells with the rich genomic instruction set and devise practical model that can decipher biological function from the joint activities of millions of single cells. This gap made effort for translating advances in biology toward medical impact difficult – most common diseases are not emerging due to one faulty gene or some global malfunction across all the cells in an individual. Instead, diseases are emerging when some of our cells develop abnormal behavior, either internally, or through mis-communication with other cells. The emerging technology of single cell genomics is revolutionizing our ability to address precisely these types of problems: we can profile genomes or gene activity profiles across thousands of cells from normal or diseased tissue, understand which cells are malfunctioning by comparing them to newly established atlases of cellular behaviors, and examine the impact of different therapies at high resolution and without assuming all cells respond similarly or collectively. The objectives of the scAssembly project is to develop new computational methodology for making sense of single cell genomics experiments in health in disease and test their utility by studying classical questions on the way by which cells acquire and lose identity during embryonic development, or in disease.
To meet scAssembly goals we developed methodologies for modelling cellular states and dynamics from single cell RNA-seq data. Our overall approach is aiming at quantitative models for the way by which cells regulate genes within tissues (static maps), such that we can use additional experimental layers to infer the dynamics of cells over such maps, their interaction with each other, and the gene regulatory mechanisms that drive such dynamics and interaction. Our publications from the first funding period already provide significant advances with these challenges, and as described further in the next section, these are combined with further experimental developments to go beyond the state of the art in the coming funding period, in practice exceeding our initial aims due to the dramatic improvement in technology that the field (with our participation) derived over the last few years.

1. We developed algorithms for partitioning scRNA-seq datasets into metacells: disjoint and homogenous groups of profiles that could have been resampled from the same cell. Unlike clustering analysis, our algorithm specializes at obtaining granular as opposed to maximal groups (Baran et al 2019). We have recently expanded the capability of our framework to handle many millions of cells (Ben-kiki etal, in preparation)

2. We model dynamics over transcriptional maps - in one published project used them to understand complex regulation of T-cells in tumors, describing how T-cells killing potential is declining as part of a complex dynamics of proliferation, differentiation and turnover (Li et al 2019, and in preparation)

3. We study dynamics of cellular memory in an experimental framework allowing tracking of hunderds of single cell clones in several cancer cell lines. This provide linkage between variation of transcriptional states in single cell populations and the emergent properties of cancer cell populations, in particular given epigenetic deterioration (Meir et al 2020).

4. We co-developed new experimental approaches for capturing cell-cell physical interactions, modelling interactions between immune cells using generalization of the same statistical framework developed for capturing maps of single cells (Giladi etal 2020)

5. We study embryonic development using a combination of single cell genomics tools - in particular we developed new ways for analyzing single cell chromosome conformation maps and infer the first embryonic chromosomal differentiation in mouse (Rappaport et al, submitted).

6. We compare transcriptional states and regulation between genomes: we model whole-organism transcirptional maps from non-model organism, to evaluate the potentail of genomes with significantly less complexity than human or mouse, to create and maintain complex cell identities (Sebe-Pedros et al, 2018)
In the coming 30 months, we expect many exciting discoveries toward scAssembly 3 main aims.

Toward aim 1, we are finishing the development of a powerful all-inclusive model for capturing transcriptional state in millions of single cells from multiple, complex and heterogenoeus tissues. Our approach handle key aspects of the data through a native manifold modelling approach: sparsity of the single cell profiles is approached by the metacell algorithm. Non-linearity and the very common pleiotropy of many genes over the global manifold is approached by covering the transcriptional manifold by overlapping neighborhood and inferring simultaneously local models and stitching functions for linking such models. This make sure the model is locally correct and optimal, but still scales globally. Our models are designed to allow “assembly” of single cell maps from multiple sources into one coherent and interpretable manifold model

Toward aim 2, we went far beyond the original plans and are currently finishing the characterization of embryonic differentiation at unprecedented temporal resolution. We combine powerful in-vitro systems that allow easy manipulation and perturbation, with in-vivo studies at single cell and single embryo resolution. This give rise to surprising results on the dynamics of cellular differentiation and commitment in the mouse embryo – showing for example that many key developmental decisions are achieved when one progenitor state give rise to multiple fates simultaneously, rather than through the classical “Tree” model that suggest decisions are being made sequentially by progressive refinement.

Toward aim 3, we also expanded our plans significantly and are working on single cell analysis of extensive cancer cohorts and the epigenetic basis for their heterogeneity (in breast cancer, multiple myeloma, acute myeloid leukemia). In parallel we go in-depth with mechanistic and phenomenological analysis of colon carcinogenesis, and the interaction of CD8+ and CD4+ T-cells with different cells in the melanoma and lung tumor microenvironment.

Together, we believe that the field and our group in particular, is in a unique position to derive real breaktrhoughs that go beyond the initial excitement of a new technology and into some of the most fundamental questions of biology. We are confident that scAssembly will deliver sustainable and long-lasting impact for advancing basic research as well as for applying the dramatic progress in single cell genomics toward high impact on important medical challenges.