Periodic Reporting for period 4 - Gen-Epix (Genetic Determinants of the Epigenome)
Reporting period: 2020-12-01 to 2022-06-30
Relevance: SALL4 is of broad biomedical interest for several reasons:
• It is mutated in the human skeletal disorder Okihiro syndrome.
• It is a primary target for the drug thalidomide which leads to its unscheduled degradation.
• It is an essential inhibitor of differentiation that safe-guards pluripotency of stem cells
• It is over-expressed in many cancers and is a potential target for anti-cancer therapeutics
High level outcome: The existence of long domains in the mammalian genome with distinct, evolutionarily-conserved base compositions correlated with gene expression levels has been known for decades, but our SALL4 study is to our knowledge the first to provide evidence for a specific biological function. This research programme helped us to build a comprehensive mechanistic picture of the ways in which DNA sequence-mediated transcriptional modulation establishes, defines, and stabilises cell states, and how defects in this system lead to disease.
We determined the DNA sequence binding preferences of ZFC4 used a SELEX protocol. This technique involves identification of enriched DNA motifs after repeated cycles of protein-DNA binding and PCR amplification. This led us to introduce improvements to the current protocol that were recently published as a methods paper.
We confirmed the importance of an N-terminal domain of SALL4 which recruits the NuRD corepressor complex. Mutation of this short region so that it no longer interacts with the NuRD complex results in a severe embryonic lethal phenotype in mice, resembling complete absence of SALL4. We speculated that NuRD might be substituted by a different corepressor complex that also depends upon histone deacetylases for its activity. However, replacement of the NuRD binding domain by a SIN3A recruitment domain did not rescue this defect. We conclude that only the NuRD corepressor will do, for reasons that are unclear at the molecular level.
We defined precisely a domain that leads to multimerization of SALL4 with itself and with other SALL protein family members. We now wish to determine whether multimerization is essential or dispensable for SALL4 function by further molecular genetic experiments. We are in any case following up the functional relevance of SALL1, which co-exits with SALL4 in embryonic stem cells.
A major output of this project was the publication by Pantier et al (Molecular Cell 2021). The enhanced SELEX protocol was recently published in STAR Protocols (Pantier et al, 2022). A manuscript describing the structure of the ZFC4 domain complexed with AT-rich DNA is under revision at the journal Life Science Alliance. The analysis of multimerization by SALL4 is in progress and publication is expected within the next 1-2 years.
To identify novel AT-binding proteins, we utilized a DNA pulldown-mass spectrometry screen in mouse embryonic stem cells (ESCs) which are pluripotent and can be differentiated in culture. Our top hit was SALL4 which is a multi-zinc-finger protein that restrains differentiation of ESCs and participates in several physiological processes, including neuronal development, limb formation and gametogenesis. In humans, failure of SALL4 function is the cause of two severe developmental disorders: the recessive genetic disorder Okihiro syndrome and embryopathies due to treatment during pregnancy with the drug thalidomide. Despite its biomedical importance, the molecular functions of SALL4 were poorly understood. This project demonstrated that many of the defects seen in Sall4-null ESCs, including precocious differentiation, are mimicked by inactivation of its AT-binding domain. A major function of SALL4 is to sense DNA base composition and therefore restrain transcription of genes that promote differentiation.
Our results uncover a novel regulatory mechanism that uses base composition to regulate gene expression programmes. Vertebrate genomes are on average relatively AT-rich (60% A/T) and therefore the short A/T motifs to which it binds occur throughout the genome with frequencies that vary probabilistically according to local base composition. As base composition is a constant feature of the genome, regulation is achieved by varying the availability of the base composition reader itself. Accordingly, as cells enter differentiation, expression of SALL4 drops, suggesting that differentiation is triggered by loss of SALL4-mediated inhibition of key developmental genes. Global regulation of this kind confers the ability to modulate expression of multi-gene blocks using relatively few base composition readers and is potentially more economical than controlling each gene by a separate mechanism. Our findings demonstrate that base compositional domains are not merely a biologically irrelevant by-product of genome evolution but constitute a signal that is impacts gene expression.