Community Research and Development Information Service - CORDIS

Final Report Summary - CISREGLOGIC (Identifying and understanding the cis-regulatory modules that control the spatio-temporal transcription of genes in chordates.)

Executive Summary:

The understanding of transcription control and its importance in disease has changed dramatically over the past decade, fueled by major advances in comparative genomics, epigenetics and genome wide association studies (GWAS). Yet we still do not have a functional understanding of why certain regions of the genome act as CRMs, and others do not. This work aimed at leveraging genomic and epigenomic data to help uncover fundamental mechanisms of eukaryote transcription, which bear considerable importance for human health. Indeed, just finding CRMs helps to elucidate the mechanism by which they operate, such as identifying in which cells and which developmental stages they are active. This work contributes to the understanding of transcriptional regulation. Deciphering this code will lead to entirely new pathways for drug targets with potentially large impacts on human health.

Cis-regulatory modules (CRMs) are regions of non-coding DNA, which act as docking stations for transcription factors and regulate gene expression levels. CRMs are known to be less well conserved than coding regions, which is possibly due to the flexibility of the CRM architecture that allows for major sequence changes without significantly changing the binding landscape of transcription factors. This flexibility is a major issue for the computational identification of CRMs in genome sequences.

In this project, we developed computational tools to identify orthologous CRMs between two ascidian species with a highly conserved embryonic developmental programme, but very divergent genome sequences. We reasoned that orthologous CRMs in these species should respond to the same regulatory logic (that is harbor binding sites for the same transcription factors), although their sequences are too divergent to be aligned. The algorithms we developed in this project include three consecutive steps.

First, we selected two distantly related ascidian genera, Ciona and Phallusia and in each genus, we selected 2 species at a suitable evolutionary distance to identify CRMs on the basis of non-coding sequence alignment (phylogenetic footprinting).

Second, we developed a software to identify, for a given gene locus, candidate CRMs controlling the expression of the gene of interest within each genus. This algorithm integrated the presence of clusters of candidate transcription factor binding sites, of non-coding sequence conservation within the genus, and the computational prediction of nucleosomes occupancy. To identify candidate transcription factor binding sites binding sites, we used a novel affinity scoring system that leverages the depth of information found from a SELEX-seq experiments (high throughput Systematic Evolution of Ligands by EXponential enrichment). We also developed a novel method to solve a persistent problem with predicting affinities for SELEX-seq data: the determination of scoring thresholds below which TF binding may not be significant. The tools developed in this first step of the procedure were validated using known CRMs in Ciona. In addition, we showed experimentally that this method could be used to predict and refine the location of CRM for early regulatory genes.

Third, we developed tools to compare in pairwise manner, for each gene locus, the TF binding site composition between predicted CRMs in Ciona and Phallusia. We then integrated these data to find pairs of Ciona/Phallusia CRMs, which maximize the binding site composition overlaps, irrespective of the arrangement of the binding sites. These are considered orthologous CRMs, and the shared TF binding sites are considered functional. We showed that indeed clusters of similar binding sites were shared between Phallusia and Ciona in the vicinity of early regulatory genes, and that clusters of similar binding site composition were associated to genes sharing similar functional annotation. Although we could only start applying this pipeline at the genome scale, and combining it with functional genomics data such as ChIP-seq, the results obtained during the project provide a strong foundation on which the host lab will base further work.

Please see attached document for more details on the results obtained during the project.

Related information

Reported by

Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top