Genomes encode instructions for cells to regulate gene activity in response to their environment. Despite its importance for biology, medicine, and biotechnology, however, the underpinning regulatory code remains undeciphered. Gene regulation consists of two major steps. First, genes are transcribed into mRNA. Second, post-transcriptional mechanisms regulate mRNA stability and the rate at which it is translated into proteins. This second step of gene regulation is still poorly understood because relevant parameters such as mRNA half-life, mRNA protein binding, and subcellular localization are difficult to assay. The lack of understanding of post-transcriptional regulation implies that we still do not have a complete picture of the regulatory code and, therefore, can not accurately predict phenotype from genotype.
EPIC aims to derive the first comprehensive sequence-based model of eukaryotic gene regulation by exploiting the advantages of the model eukaryote Saccharomyces cerevisiae and other species, covering a broad evolutionary range. EPIC will accomplish this by integrating the complementary expertise of 3 teams: it will combine (i) innovative high-throughput omics assays to probe post-transcriptional regulation across a large evolutionary scale and multiple conditions with (ii) synthetic biology to massively test and quantify the effects of regulatory sequences through iterative designs of reporter assays and (iii) deep learning on these rich datasets. This will allow EPIC to build novel computational models that will help us to predict and understand complex regulatory instructions.
Ultimately, EPIC will enable us to decipher the actual language of gene regulation and facilitate (re)writing genomes. Doing so, EPIC will enable understanding and predicting regulation, and ultimately phenotype, from DNA, closing a major gap in basic biology, while also opening exciting avenues for applications in biotechnology and medicine, from pinpointing disease-causing mutations to rational design of genes, RNAs and cells.