Skip to main content

Linking Transcription Factor Binding Dynamics to Promoter Output

Periodic Reporting for period 4 - DynaMech (Linking Transcription Factor Binding Dynamics to Promoter Output)

Reporting period: 2020-08-01 to 2021-01-31

Humans have approximately 20 000 protein-coding genes in our DNA. The proteins encoded by genes perform a plethora of different cellular tasks and need to be produced ("expressed") at appropriate levels. These levels differ for different proteins, are dependent on cell types and the appropriate levels also depend on developmental stage or other circumstance that a cell may find itself in (e.g. damaged, under viral attack, under certain nutritional or hormonal conditions etc.). Appropriate control of gene expression is therefore essential for all living organisms to function properly. Disruption of gene expression is associated with many different types of disease, including cancer, immune disorders, metabolic disorders etc. In order to understand how life works in health and in disease, it is essential to understand how control of gene expression is achieved. This has already been the subject of intense research since the discovery that DNA is the carrier of genetic information. It is well-established that control is exerted by a range of different types of transcription factors, specific proteins that can bind to DNA, thereby exerting (local) control on which genes are actively transcribed or not. An important aspect of the mechanism of gene expression control is how transcription factors bind to DNA. The aim of this project is to develop technologies that will allow us to analyse DNA binding dynamics of DNA binding factors across entire genomes in living organisms. Such methods would allow us to answer how different DNA binding dynamics of different transcription factors on different genes affects control of gene expression. This will be fundamental to understanding how life works in health and in disease. An additional aspect that this project has examined is gene expression at the single cell level. This too is a dynamic process and heterogeneity in gene expression across different cells can now be studied, also in relation to disease. The overall objective of this project is a better understanding of gene expression and how aspects of gene expression relate to disease. This has been achieved in this project by developing methods for studying transcription factor binding dynamics at an unprecedented level, that is across an entire genome and in parallel by developing and applying methods for analysis of gene expression at the level of single cells. As a disease model for the latter it was chosen to work on (paediatric) cancer. The overall objectives have been achieved. A method had been set up to determine genome-wide off rates of transcription factors and a series of analyses have been carried out determining gene expression at the single cell level.
The work has focused on two facets. The first being development of technologies aimed at measuring transcription factor binding dynamics across entire genomes in living organisms. Measurement of transcription factor off-rates determination across genomes has worked well. In addition, during the course of the project and in part based on review, more focus was put on to single cell gene expression analyses also because this was a logical step that the field has been taking and has resulted in a variety of advances as detailed below.

For the off-rate technique work has included optimisation of a plethora of steps. This has resulted in optimised protocols for each of the substeps. The sensitivity of our measurements has improved vastly due to these efforts, with in some cases more than 100-fold improvement in our signal to noise. As a spin-off we have published a highly improved protocol for chromatin immunoprecipitation. We have also learnt that not all transcription factors will perform equally well with these techniques. The off-rate procedure is now publicly available along with extended protocols. The analyses show that protein–DNA interactions are indeed dynamic, and these dynamics are an important aspect of chromatin-associated processes such as transcription or replication. The results indicate a large range of different residence times for individual transcription factors for example varying between 4.2 and 33 min depending on which binding site in the genome. Sites with different off-rates are associated with different functional characteristics. This includes their transcriptional dependency, nucleosome positioning and the size of the nucleosome-free region, as well as the ability to roadblock RNA polymerase II for termination. The results show how off-rates contribute to transcription factor function and that DIVORSEQ (Determining In Vivo Off-Rates by SEQuencing) is a meaningful way of investigating protein–DNA binding dynamics genome-wide. With regard to the on-rate, since this is a genomic adaptation of a protocol published by another research group, its adaptation was started later. We ran into several technical hurdles that were unforseen. On the one hand, this was surprising given that the starting point was a published protocol, in a reputable scientific journal. On the other hand, the project was high risk/high gain and although we have worked extensively on this, a shift in focus towards single cell approaches was an excellent alternative given the direction that the field was taking, also taking into account previous review remarks. The various analyses of single cell gene expression resulted in technical as well as conceptual advances. The technical advances included methods for disrupting tumor cell biopsies without excessive loss of cell viability (and cell type selection) and also selection methods for purifying viable cells after tissue disruption, as well as data analysis methods for identifying known and unknown cell types using single cell gene expression data for example, as well as a pipeline for processing the data. Applying such expertise to acute lymphatic leukaemia in infants (iALL) has resulted in the conceptual advance that it is possible to predict disease outcome based on the ratio of different tumor cell types found in different patients. This finding has been written up, deposited in medRxiv and is currently under review at a scientific journal. A concise overview of results is presented in the next section.
All the afore-mentioned methods (previous section) are further than state of the art.

Overview of results and dissemination
Genome-wide off-rate determination - published in de Jonge et al., Molecular Systems Biology 2020
Vastly improved ChIP - published in de Jonge et al., STAR protocols 2020 and on bioRxiv 835926
Cell type identification method - published in de Kanter et al., NAR 2019
Pipeline for single cell gene expression data processing - publicly available through Candelli et al., bioRxiv 250811
Protocols for single cell analyses - disruption of tumors, selection of viable cells - available to all collaborators and institute members, published in the Methods sections of the relevant scientific papers (see below)
Analysis of tumor cell heterogeneity in infant ALL leading to prediction of disease/treatment outcome - publicly available through medRxiv 2020.04.14.20056580 and is currently under final review for publication in a scientific journal.
Analyses of cell heterogeneity in other paediatric tumors and organoid models thereof - published in Calandrini et al., Nature Communications 2020, Kildisiute et al., Science Advances 2021, Hanemaaijer et al., PNAS 2021, Ineveld et al., Developmental Dynamics 2021.
DIVORSEQ: Determining In Vivo Off-Rates by SEQuencing, genome-wide protein-DNA binding dynamics