A major challenge in deciphering the genome is to understand the genomic sequences which control the spatio-temporal expression of genes. Despite much progress, the rules by which these regions, the cis-regulatory modules (CRMs), control the transcription of a gene remain unclear. The objectives of this proposal are to develop an algorithm to detect novel CRMs, to analyze the identified CRMs, and to elucidate the logic by which CRMs control when a gene is transcribed and in what cells. This will be done by first developing an algorithm that integrates several recent breakthroughs in understanding CRM mechanisms and using a mid-throughput chordate model organism (Ciona intestinalis) for validation. Recent work has shown that many factors contribute to CRM activation in addition to the well-known clustering of transcription factor binding sites: the openness of the chromatin, 3D structure of the DNA, nucleosome positioning, and co-factor binding. Combining this understanding with previous prediction methods (evolutionary sequence conservation, binding-site clustering) will improve predictive power, which I will experimentally confirm. The CRMs will be refined to their essential sequences by comparing them to their orthologs in distant species that drive similar expression patterns. The identified CRMs will be used in a second step to analyze and classify the CRMs. Finally, combining the identified CRMs with known patterns of expression will provide a means to elucidate the rules by which CRMs operate. This work will impact fundamental biology and also contribute to identifying and understanding mutations with disease implications, as SNPs within a CRM have been shown to have effects in genetic diseases, such as cancer.
Field of science
- /natural sciences/biological sciences/genetics and heredity/genome
Call for proposal
See other projects for this call