Skip to main content
European Commission logo print header

The influence of DNA sequence on the epigenome

Final Report Summary - ATRUN (The influence of DNA sequence on the epigenome)

The influence of DNA sequence on the epigenome

Every cell in a multicellular organism carries the blueprint for the multitude of proteins that are needed to form any of its constituent cell types in the form of DNA. The identity of a cell is defined by the genes that it expresses. In order to generate any specialized tissue, such as liver, blood, or brain, distinct sets of genes need to be activated or repressed during development. Failure of gene expression programming can give rise to a variety of developmental disorders and cancer. Regulation is orchestrated by the interplay of two systems: transcription factors and epigenetic mechanisms.
Transcription factors are sequence-specific DNA-binding proteins that can activate or repress transcription of genes. The particular set of transcription factors present in a specific cell type is a major determinant of gene expression patterns. Epigenetic mechanisms include chemical modifications of DNA itself or of histones, the DNA packaging proteins, to modulate the accessibility of DNA and optimize either silencing or expression of genes. DNA methylation is a long-known epigenetic mark resulting from chemical modification of the DNA base cytosine by addition of a methyl group and is broadly associated with repression of gene transcription. In addition to this chemical mark on DNA itself, histones also can carry modifications including methylation, acetylation and phosphorylation that can have repressive or activating effects on transcription.

Epigenetic marks have been extensively studied in the last decade and the emerging data support the view that they have a key role in refining gene expression programmes. In line with this idea there are increasing reports of developmental diseases and cancers that involve a de-regulation of epigenetic marks and/or mutations of readers, writers or erasers of epigenetic information. The evidence suggests that epigenetic marks optimize and stabilize gene expression programmes. This would fit with the finding that cellular re-programming of differentiated cells back to pluripotent stem cells is a very inefficient process, presumably due to buffering of the differentiated state by epigenetic mechanisms.
Given the central importance of the epigenome, it is essential to understand the mechanisms responsible for laying down epigenetic patterns. At present, our understanding of this issue is primitive. Distinct epigenomic patterns of DNA methylation and histone modifications arise during cell differentiation and are thought to be influenced by developmental history, disease or environmental influences. Our hypothesis is that DNA sequence is also a potential determinant of the epigenome.

DNA is built from a sugar – phosphate backbone and the four bases Adenine (A), Cytosine (C), Guanine (G) and Thymine (T) and the sequence of the four bases encodes genetic information. On a much larger scale, the genome is segmented into domains of relatively uniform base composition forming domains that have different content of the bases G/C and A/T. These domains are detected by several technologies and they correlate with epigenetic marks, but if and how base composition influences the setup of domains with distinct function and specific epigenetic marks is unclear to date. In this project I have investigated the hypothesis that domains of relatively uniform DNA base composition may modulate the epigenome through cell type-specific proteins that recognize short, frequent sequence motifs in G/C-rich or A/T-rich DNA (Fig.1). This model represents a new concept of gene expression control where genes are regulated in multigene blocks by differential recruitment of epigenomic modifiers as an alternative to tuning the activity of each gene separately, thus simplifying gene expression programming.

As part of this project I have co-authored a recent study (Wachter et al. 2014) that provides direct evidence that base composition influences the setup of epigenetic marks. Integrating DNA sequences of different base composition into mouse embryonic stem cells (mESCs) we have shown that DNA rich in the bases A/T attracts DNA methylation while G/C-rich DNA stays free of DNA methylation and carries distinct histone modifications. These data provide a direct link between the observed segmentation of the genome and the setup of epigenetic domains and hint at a genome wide regulatory mechanism.

To further investigate the mechanistic basis of this new concept of genome organization I set out to identify proteins that bind to A/T-rich DNA. In cooperation with the laboratory of Prof. Michiel Vermeulen (formerly Utrecht, now Nijmegen) I performed state-of-the-art DNA “pulldown” experiments coupled to mass spectrometry analysis. Using this method in mESCs I found by mass spectrometry a set of proteins that bind to A/T-rich DNA fragments. The concept of A/T-rich DNA binding proteins regulating the genome wide setup of epigenetic marks is new and none of the identified candidates have been implicated in this process. Of particular note, the list of identified candidates comprises several proteins that have been shown to bind to A/T-rich DNA demonstrating the validity of the assay. From the candidate A/T binders I have subsequently identified specific proteins that have been implicated as regulators of DNA methylation. I have identified an A/T-rich binding motif for the top candidate protein and global analysis confirms genome wide binding of this candidate to A/T-rich regions that are enriched in the binding motif. Moreover, my preliminary evidence indicates that DNA methylation and gene expression are preferentially disturbed in regions with an A/T-rich base composition if this candidate protein is removed from mESCs. To uncover the underlying molecular mechanisms I am currently investigating which properties of this and other candidate proteins are needed to regulate DNA methylation and gene expression in A/T-rich DNA. Using new state-of-the-art CRISPR/Cas9 technology that enables a rapid generation of mutant mESC lines I am disrupting DNA binding and protein-protein interaction domains of candidates to identify essential domains.

Although our candidate proteins are mostly known and well-studied in cellular and animal models and in cancer, the molecular details of their function are poorly characterized. By demonstrating genome wide binding to A/T-rich DNA, I have linked their role to the global setup of epigenetic marks and regulation of gene expression in these domains. The work provides a new mechanistic concept how A/T-binders may help to define gene expression patterns in specific cell types and maintain cell identity. It may also explain how to prevent global de-regulation of expression patterns that can give rise to developmental diseases and cancer. We have recently published our model of genome organization and transcriptional control by proteins that bind to regions of different base compostion (Quante and Bird 2016) and a manuscript describing the role of candidate proteins is in preparation.

In summary, I have provided evidence for the new concept that base composition instructs the genome wide patterning of epigenetic marks and influences gene expression through proteins that bind A/T-rich DNA. I have identified candidate A/T-binders that are tightly linked to pluripotency and de-regulated in cancer indicating that this regulatory mechanism plays an important role in specifying cell identity. Ongoing experiments will help to define the underlying molecular mechanisms at work and hold the promise to uncover drug targets for developmental diseases and cancer.