Periodic Reporting for period 1 - EPIC (Unravelling the eukaryotic post-transcriptional regulatory code)
Période du rapport: 2024-07-01 au 2025-12-31
EPIC aims to derive the first comprehensive sequence-based model of eukaryotic gene regulation by exploiting the advantages of the model eukaryote Saccharomyces cerevisiae and other species, covering a broad evolutionary range. EPIC will accomplish this by integrating the complementary expertise of 3 teams: it will combine (i) innovative high-throughput omics assays to probe post-transcriptional regulation across a large evolutionary scale and multiple conditions with (ii) synthetic biology to massively test and quantify the effects of regulatory sequences through iterative designs of reporter assays and (iii) deep learning on these rich datasets. This will allow EPIC to build novel computational models that will help us to predict and understand complex regulatory instructions.
Ultimately, EPIC will enable us to decipher the actual language of gene regulation and facilitate (re)writing genomes. Doing so, EPIC will enable understanding and predicting regulation, and ultimately phenotype, from DNA, closing a major gap in basic biology, while also opening exciting avenues for applications in biotechnology and medicine, from pinpointing disease-causing mutations to rational design of genes, RNAs and cells.
Second, we determined mRNA boundaries in the eukaryotic model organism Saccharomyces cerevisiae in 10 conditions. We developed a high-throughput experimental protocol to determine transcription start site (determining 5’UTR) and polyadenlylation site (determining 3’ UTR) of mRNA. The latter is important since it allows to capture the boundaries of each mRNA molecule.
Third, this data will be used to design and set up dedicated MPRA (multiple parallel reporter assays) experiments. MPRAs are synthetic reporter assays that allow testing the effect of specific sequences in a fixed context, and this for millions of sequences in parallel. Specifically, we have already designed MPRAs to investigate the effect of natural and randomized 5’ and 3’ UTRs of S. cerevisiae.
Fourth, using publicly available data, we have developed and optimized several models and frameworks to better model the regulatory code. Concretely, we have (1) implemented nucleotide dependency maps (how nucleotide substitutions at one genomic position affect the probabilities of nucleotides at other positions) in genomic language models. This, for example, allows the identification of regulatory elements, such as RNA-binding protein binding sites. (2) developed and improved models that predict gene expression and chromatin accessibility from sequence (so-called Flashzoi and Scooby frameworks).
Examples of impact include:
1. new insights into post-transcriptional regulation, including key insights on the role of UTRs for gene expression regulation.
2. a unique resource for the scientific community. Our unique, extremely large datasets will provide an unmatched resource to study gene regulation evolution, including how fast gene expression evolves and at which level (transcription, degradation, translation).
3. new high-throughput experimental methods to investigate different aspects of post-transcriptional regulation.
Ultimately, the results of this project will allow us to move from reading (sequencing) and (re)writing (synthetic biology) genomes to deciphering the actual language of gene regulation, a long-standing goal in biology and a door towards optimized cellular engineering strategies.