Skip to main content
Przejdź do strony domowej Komisji Europejskiej (odnośnik otworzy się w nowym oknie)
polski pl
CORDIS - Wyniki badań wspieranych przez UE
CORDIS

Unravelling the eukaryotic post-transcriptional regulatory code

Periodic Reporting for period 1 - EPIC (Unravelling the eukaryotic post-transcriptional regulatory code)

Okres sprawozdawczy: 2024-07-01 do 2025-12-31

Genomes encode instructions for cells to regulate gene activity in response to their environment. Despite its importance for biology, medicine, and biotechnology, however, the underpinning regulatory code remains undeciphered. Gene regulation consists of two major steps. First, genes are transcribed into mRNA. Second, post-transcriptional mechanisms regulate mRNA stability and the rate at which it is translated into proteins. This second step of gene regulation is still poorly understood because relevant parameters such as mRNA half-life, mRNA protein binding, and subcellular localization are difficult to assay. The lack of understanding of post-transcriptional regulation implies that we still do not have a complete picture of the regulatory code and, therefore, can not accurately predict phenotype from genotype.

EPIC aims to derive the first comprehensive sequence-based model of eukaryotic gene regulation by exploiting the advantages of the model eukaryote Saccharomyces cerevisiae and other species, covering a broad evolutionary range. EPIC will accomplish this by integrating the complementary expertise of 3 teams: it will combine (i) innovative high-throughput omics assays to probe post-transcriptional regulation across a large evolutionary scale and multiple conditions with (ii) synthetic biology to massively test and quantify the effects of regulatory sequences through iterative designs of reporter assays and (iii) deep learning on these rich datasets. This will allow EPIC to build novel computational models that will help us to predict and understand complex regulatory instructions.

Ultimately, EPIC will enable us to decipher the actual language of gene regulation and facilitate (re)writing genomes. Doing so, EPIC will enable understanding and predicting regulation, and ultimately phenotype, from DNA, closing a major gap in basic biology, while also opening exciting avenues for applications in biotechnology and medicine, from pinpointing disease-causing mutations to rational design of genes, RNAs and cells.
First, we collected a diverse set of 100 Ascomycota isolates; including 25 filamentous fungi. Our collection includes a range of biotechnologically and medically relevant yeasts. The 100 selected strains were sequenced using Illumina short-reads at minimum 10X and we successfully obtained long-read sequencing for 80 strains (65 yeasts and 15 filamentous fungi) using PacBio Revio instruments. These 80 strains are being assembled using HiFiASM and LJA assemblers. gDNA extraction for the remaining 20 strains is being repeated. We have collected samples for all isolates in 2 different conditions for proteomics, transcriptomics and for determining mRNA boundaries and mRNA stability. Proteomics data is being collected, and we are currently optimizing protocols for determining mRNA stability in these various yeast species.

Second, we determined mRNA boundaries in the eukaryotic model organism Saccharomyces cerevisiae in 10 conditions. We developed a high-throughput experimental protocol to determine transcription start site (determining 5’UTR) and polyadenlylation site (determining 3’ UTR) of mRNA. The latter is important since it allows to capture the boundaries of each mRNA molecule.

Third, this data will be used to design and set up dedicated MPRA (multiple parallel reporter assays) experiments. MPRAs are synthetic reporter assays that allow testing the effect of specific sequences in a fixed context, and this for millions of sequences in parallel. Specifically, we have already designed MPRAs to investigate the effect of natural and randomized 5’ and 3’ UTRs of S. cerevisiae.

Fourth, using publicly available data, we have developed and optimized several models and frameworks to better model the regulatory code. Concretely, we have (1) implemented nucleotide dependency maps (how nucleotide substitutions at one genomic position affect the probabilities of nucleotides at other positions) in genomic language models. This, for example, allows the identification of regulatory elements, such as RNA-binding protein binding sites. (2) developed and improved models that predict gene expression and chromatin accessibility from sequence (so-called Flashzoi and Scooby frameworks).
Our project will yield the first comprehensive model of the eukaryotic regulatory code, from DNA sequence to protein abundance. This represents a major step in our basic understanding of how organisms work. We expect that the computational analyses of the very large datasets generated in this project, combined with AI to integrate all data, will drive the discovery of many novel insights.

Examples of impact include:
1. new insights into post-transcriptional regulation, including key insights on the role of UTRs for gene expression regulation.
2. a unique resource for the scientific community. Our unique, extremely large datasets will provide an unmatched resource to study gene regulation evolution, including how fast gene expression evolves and at which level (transcription, degradation, translation).
3. new high-throughput experimental methods to investigate different aspects of post-transcriptional regulation.

Ultimately, the results of this project will allow us to move from reading (sequencing) and (re)writing (synthetic biology) genomes to deciphering the actual language of gene regulation, a long-standing goal in biology and a door towards optimized cellular engineering strategies.
Overview of how EPIC aims to unravel post-transcriptional regulation
Moja broszura 0 0