Skip to main content
Go to the home page of the European Commission (opens in new window)
English en
CORDIS - EU research results
CORDIS

Decoding animal genomes into cell types

Periodic Reporting for period 1 - Genome2Cells (Decoding animal genomes into cell types)

Reporting period: 2023-06-01 to 2025-11-30

The genome of an animal encodes a large set of regulatory programs that give rise to the thousands of cell types that make up its tissues and organs. Despite recent progress in single-cell omics, our knowledge about the regulatory programs that control the establishment and maintenance of cell type identity remains limited, and methods are lacking to infer regulatory programs directly from the genome sequence. In this project, which lies at the interface between the genome and single-cell atlases, we ask how the genome sequence “translates” into cell types. We start with Drosophila as model organism. Its compactness allows sampling of all its cell types and developmental trajectories from egg to adult, using whole-organism single-cell multi-omics, thus capturing the spectrum of “activation states” that emerge from the regulatory genome. Deep learning models will be trained on regulatory sequences to predict and explain gene regulatory networks (GRN) and GRN transitions between cell states, encoded by enhancers, promoters, transcription factors (TF), effector genes, and feedback loops. Based on a better mechanistic understanding, we will translate this framework to other animals, including octopus, birds, and mammals, and ask how regulatory programs evolve, with a focus on neuronal diversity in the brain. Using new algorithms for cross-species deep learning and combinatorial optimization, we will study how combinations of expressed TFs co-evolve with genomic enhancer logic. We are unique in our approach because we will develop and use new technological assays, deep learning, and massively parallel reporter assays, and combine these with perturbation experiments and synthetic biology to test our hypotheses. After iteratively improving our regulatory models, we ultimately aim to predict which regulatory programs, and thus which cell types, are encoded in an animal’s genome, and how changes in these programs underlie changes in cell types during evolution.
* We generated a 700K scATAC-seq atlas of the entire adult fly, which we combined with our previously published scRNA-seq adult Fly Cell Atlas (Can Eksi et al, in preparation).
* To optimize the cost and efficiency of sampling enough cells over 10 days of development, we combined commercial platforms (10x Genomics) with a newly developed HyDrop-v2 scATAC protocol. A first bioRxiv preprint was released to describe HyDrop-v2, alongside a 600K cells atlas of the last four hours of embryo development (Dickmancken et al., BioRxiv 2025).
* We developed a new software framework to train deep learning enhancer models, called CREsted, and released the codebase on GitHub (Kempynck & De Winter et al., BioRxiv 2025).
* We trained and validated a series of CREsted models on diverse systems and shared them through a model repository. This includes a zebrafish development model, that we also used to design synthetic enhancers that we tested in vivo in zebrafish (Kempynck & De Winter et al., BioRxiv 2025).
* We trained CREsted models on the entire adult fly scATAC-seq atlas. We also trained the first ‘foundation model’ of the fruit fly, from scratch, using the Borzoi framework, on 11,000 genomic tracks that we curated from public data bases, combined with our full scRNA-seq and scATAC-seq adult atlases (Can Eksi et al, in preparation).
* We discovered and published similarities of enhancer codes between all cell types in the vertebrate pallium, particularly in the human and mouse cortex, and the chicken telencephalon. For the latter, we created the first sc-multiome and spatial atlas. We trained CREsted models for all species and devised three new metrics to compare enhancer logic. We validated several chicken enhancers in the mouse brain using AAV enhancer-reporter assays. Hecker & Kempynck, Science 2025.
* We participated in and won a computational challenge to predict cell type specific enhancers in the mouse brain, led by the Allen Institute, and co-published the results (Johansen & Kempynck, Cell Genomics 2025).
* We analyzed a human embryonic brain sc-multiome atlas and compared it to human neural tube organoids, generating a compendium of enhancer codes. We compared these to zebrafish neural tube, and validated enhancers in chicken embryos and zebrafish embryos (De Winter, in preparation).
* To compare gene loci between species, we generated a mouse sc-multiome atlas during development and curated a human counterpart from public data sets, compared trajectories using gene warping, and trained CREsted models on human and mouse to study changes in enhancer-promoter interactions between species (Abaffyova et al., in preparation).
* We developed computational models and tools to design synthetic enhancers, including generative adversarial networks, and tested in Drosophila brains and human melanoma cells (Taskiran, Nature 2025)
* We published a review paper on enhancer design (De Winter, Nat Rev Bioengineering 2025).
* We developed HyDrop-v2, a new version of our custom droplet microfluidics technique for scATAC-seq that achieves higher sensitivity and scale. We generated a large Drosophila embryo atlas and a mouse brain atlas with HyDrop-v2, and benchmarked it against commercial methods (10x Genomics) using sequence-to-function models. This technique is used to improve our atlasing efforts of the entire fruit fly. (Dickmancken et al., BioRxiv 2025).
* We developed Nova-ST, a new spatial transcriptomics technique using Illumina chips. We use this to localize and validate cell types in other species’ brains (Poovathingal et al., Cell Reports Methods 2024)
Our discovery that cell type specific enhancers can be generated with very high success rates, for different brain cell types, using sequence-to-function models represents the first report that synthetic enhancer design is feasible and was published in Nature (Taskiran et al., 2024). Here we solved a decade-old problem that generated an entire new field, as seen by dozens of publications afterwards that study improvements of enhancer design (e.g. alternative search space algorithms), based on our pioneering work. This work finally also provided unprecedented insight into how enhancers are built, because our approach allows the precise tracking of all the building blocks of a genomic enhancer.
These results are currently being followed-up on to design synthetic enhancers in the mouse brain, and to translate these techniques to industry, in the context of gene therapy.
Summary figure of goals and selected results
My booklet 0 0