Periodic Reporting for period 1 - EvoDCode (Deep representational learning of the evolutionary DNA code in the vertebrate pallium)
Okres sprawozdawczy: 2025-04-01 do 2027-03-31
The rapidly advancing field of artificial intelligence holds great promise for comparative genomics and Convolutional Neural Networks (CNNs) and DNA language models have already been successfully used to model gene regulatory logic in an interpretable way by revealing the transcription factor binding sites within cis-regulatory elements and their co-regulatory relationships. These models require large amounts of training data, and the introduction of the single-cell Assay for Transposase Accessible Chromatin sequencing (scATAC-seq) has enabled us to collect the large amounts of data stratified by cell type, which are required to train these artificial intelligence models.
The overarching goal of this proposal is to better understand how the genome sequence underlies cell identity in the pallium across vertebrate species. I hypothesize that gene regulatory logic can be directly learned from the genomic sequence and used to model and predict cell types. I specifically focus on the pallium, a part of the brain that is strongly tied to species-specific behaviour and underwent strong divergent evolution. In humans the dorsal pallium is expanded into the cerebral cortex, providing most of our expanded cortical abilities, while the avian dorsal pallium consists of only a single cortical layer (Wulst). Our understanding of how these large differences came to be is still limited. Using modern single-cell epigenomic methods we can study how evolutionary changes impact gene regulation by sampling across a wide set of vertebrate species and using this data to model cell type evolution.
WP2 is currently ongoing and some of the tasks have been completed. Notably task 2.1 and 2.2 are partially completed in that 27 species-specific sequence-to-function models have been trained (task 2.1) and co-embeddings have been generated of the different species’ scRNA-seq libraries using SATURN (task 2.2). However, the proposed cross-species model architecture (task 2.1/2.2) is not finished yet. The work in tasks 2.3/2.4 has not done yet. Similarly, WP3 was planned later in the timeline and has not been conducted yet. Given the timeline as proposed in the original grant proposal we are on track in regard to deliverables.