Periodic Reporting for period 1 - CACOHET (“Causes and consequences of pluripotency gene regulatory network member heterogeneity”)
Reporting period: 2016-08-01 to 2018-07-31
Pluripotent stem cells harbor great potential when it comes to disease modelling, regenerative medicine and future cell-based therapies, as they can self-renew and differentiate in virtually all cell types of the human body. To fully exploit the potential of these cells, a thorough understanding of the mechanisms regulating this pluripotent cell state is required. How stem cell populations balance the opposing forces of self-renewal and differentiation to maintain a functional population is a question that strikes at the heart of what it means to be a stem cell. Undifferentiated embryonic stem cell (ESC) identity is maintained by transcription factors (TFs) of the pluripotency gene regulatory network (PGRN) centred on the TFs Oct4, Sox2 and Nanog. ESCs with high levels of Nanog self-renew efficiently while ESCs with low Nanog levels are prone to differentiate. Therefore, the observed heterogeneous expression of some PGRN components, in particular Nanog, is likely to be important for simultaneous maintenance of self-renewal and facilitation of differentiation, thereby sustaining functional pluripotency. Despite the fact that in the last decade considerable progress in our understanding of the regulatory mechanisms underlying pluripotency has been achieved, it is still rather unclear how the PGRN signals down to the DNA sequences and how this affects the gene expression of genes crucial for maintaining the pluripotent state.
Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far
In CACOHET, we have further addressed the mechanisms underlying regulation of pluripotency. We have first characterized genome-wide binding of the TFs Nanog and Oct4 and the histone modifications H3K4me1 and H3K27ac in various ESC lines and related types of pluripotent stem cells such as Epiblast stem cells using chromatin immunoprecipitation followed by sequencing (ChIP-seq). This has provided us with in depth genome-wide binding profiles of these factors. Using cell lines in which expression levels of Nanog and Oct4 can be modulated, we assessed the co-dependency of these TFs on each other, and found examples where binding of the one factor does no longer occur in the absence of the other. Whereas genome-wide occupancy maps for TFs and histone modifications are useful to identify regions that are likely to harbor regulatory activity, the presence of these features does not guarantee that regions harboring these marks are actively involved in the regulation of pluripotency. Also, using these marks, one cannot predict the activity of these candidate regulatory regions. One particular type of regulatory elements within the non-coding genome are enhancers, that are non-coding elements that can control gene expression in a correct spatio-temporal manner. Enhancers are believed to be bound by key TFs and marked by several histone modifications (such as H3K4me1 and H3K27ac), yet these features cannot predict completely the activity of enhancer sequences. To this end, in CACOHET, we have developed a new experimental approach that enables the genome-wide identification of active enhancer sequences based on their functionality. To this end, we have combined Chromatin immunoprecipitation (ChIP) with a form of a massively-parallel-reporter system. In this approach, ChIP is performed using antibodies against TFs or histone modifications, thereby pooling down genomic regions enriched for potential enhancers. The co-immunoprecipitated DNA is then used to clone into large reporter plasmid libraries, consisting of many millions of individual plasmids which together tile virtually all regions that were bound by the TFs and histone modifications in ESCs. In the reporter plasmid, a minimal promoter driven green fluorescent protein (GFP) is located upstream of a cloning site for ChIPed DNA elements, which is followed by a polyA sequence. Upon transfection of the reporter plasmid libraries in target cells, only those cloned sequences that have enhancer activity will be able to activate the minimal promoter, with resulting GFP expression. As the inserted enhancer is located upstream of the polyA sequence, the enhancer itself will also be transcribed. Therefore, upon transfection of reporter plasmid libraries, sorting of GFP-expressing cells and sequencing of mRNAs (RNA-seq), the identity and activity of the enhancer sequence can be determined. We have used this ChIP-STARR-seq approach to generate highly accurate, comprehensive genome-wide enhancer activity maps for various stem cell populations. This has provided us with the largest collection of functionally validated active enhancers in this cell state presently available, that we have made publically available (http://hesc-enhancers.computational-epigenetics.org/). Using these data, we gained novel insights in gene regulation in ESCs. We found that only a small subset of genomic sequences occupied by Nanog, Oct4, H3K4me1 and H3K27ac has measurable enhancer activity. Highly active sites show a distinct protein binding profile compared to lowly active or inactive sequences, and are enriched for certain TF binding motifs and sequences derived from transposable elements. Upon transition of cell states, the enhancer landscape changes dramatically, and only smaller constituents of so-called super-enhancers are responsible for enhancer activity. Using CRISPR-Cas9 mediated genome-engineering, we could further validate our findings at the endogenous genome-loci.
Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)
Together, CACOHET has generated a valuable set of data that will allow further deciphering of the grammar of the non-coding genome regulation, which are useful to a broad variety of investigation fields, thereby further increasing the impact of this project. The experimental approach developed in CACOHET will also enable us to identify those regulatory elements that might be involved in the pathogenesis of genetic disorders, and we are currently actively following up these possibilities.