Periodic Reporting for period 1 - COMBICODE (Decoding the Combinatorial Epigenetic Information of the Mammalian Genome with Engineered DNA Duplex Readers)
Période du rapport: 2022-11-01 au 2024-04-30
Cytosine modifications occur in the DNA sequence CG. This “CpG dyad” is palindromic, that is, the complementary bases of the opposite DNA strand also have the sequence CG, and the C in both strands can be modified. This gives rise to a total of 15 theoretical combinations of cytosine bases in the two strands of a CpG dyad (from here: “CpG duplex modifications”). Most regulatory, DNA-binding proteins recognize both strands of DNA, and each CpG duplex modification thus represents a unique signal with potential to uniquely influence protein interactions and gene expression.
The ability to selectively detect and map user-defined CpG duplex modifications in genomes is essential for studying their roles in stem cell differentiation and cancer. It is thus key for cancer biomarker discovery and the development of diagnostic assays (e.g. for liquid biopsy). However, current detection/mapping methods for mC and hmC cannot selectively reveal CpG duplex modifications: Approaches based on bisulfite or deaminase treatment (e.g. BS-/ox-BS-Seq or EM-Seq) rely on a conversion step that leads to the reading of a modification either as C or T in DNA sequencing. This provides only two information units for five C nucleobases, so that CpG duplex modifications cannot be analyzed selectively. Alternative approaches based on affinity enrichment rely on capture of mC or hmC-containing DNA fragments via antibodies or related probes/tags. They offer simple protocols, and enable cost-effective sequencing only of relevant, modified genomic regions. However, there are no probes capable of enriching user-defined combinations of modifications in CpG dyads. This lack of mapping methods for novel CpG duplex modifications is a key roadblock for understanding their functions in cell differentiation and cancer disease, and effectively prevents innovation in the epigenetics research and cancer diagnostics (e.g. liquid biopsy) markets.
We have engineered the first affinity enrichment probes for selectively enriching novel CpG duplex modifications. We have set up an effective directed evolution platform to reengineer MBD proteins (the natural readers of symmetric mC/mC dyads) for the analysis of novel CpG duplex modifications. Most importantly, we engineered the probe MBD[hmC/mC] to bind the CpG duplex modification hmC/mC that is expected to be particularly abundant in genomes and has a high potential to serve as cancer biomarker. This probe binds hmC/mC-containing DNA with low nanomolar affinity, discriminates against all other CpG duplex modifications, and rivals the selectivity of its wt MBD progenitor.
Within the project, this probe is planned to be integrated into user-friendly and cost-effective kits to enable the routine mapping of hmC/mC marks. The kit was planned to be validated with mESC DNA samples, the bioinformatics pipeline established and the kit benchmarked against three commercially available affinity enrichment kits. The kit performance was planned to be optimised and the application to be applied to first clinical samples. Market and IP landscape analyses were planned and a licensing was envisaged.
1.) Characterized off-target affinities and potential sequence-dependencies of the MBD[hmC/mC] mutant relevant for the enrichment kit, showing desirable properties of the mutant.
2.) Conducted broad benchmark studies using mESC gDNA and involving commercial MBD-Cap-Seq, MeDIP and hMeDIP kits in comparison with our MBD[hmC/mC]-CapSeq kit.
3.) Ran directed evolution campaigns to improve the selectivity of our MBD[hmC/mC] mutant
4.) Established all steps of a workaround strategy with a high likelihood to obtain our hmC/mC maps independently of optimizations of the mutant or assay conditions (Established that MBD[hmC/mC] is fully blocked by hmC glucosylation,
5.) Conducted the enrichment and sequencing with glucosylated gDNA and MBD[hmC/mC]-CapSeq as a fifth data set, which we will use to obtain hmC/mC maps from comparing +/- glucosylation samples. All other reference maps we conducted in our studies will serve as valuable datasets that can be used to benchmark our study with the most widely used enrichment assays for mC and hmC.
As can be seen from Part B, two aspects of the project plan had to be adjusted to secure a successful outcome. The data obtained with the new strategy are currently undergoing analysis and if promising (as expected), next steps will be taken.
Since the performance of our kit for hmC/mC mapping in mESC genomes will be well-characterized with our analyses, we will have our kit at a stage that should attract potential partners for commercialization (including Diagenode) for the epigenetics research market even without end user testing studies. We will pursue the transfer to human samples and liquid biopsy in any case, but the opportunities for such studies should anyway increase significantly with a commercialization of the kit for the research market.
Because of the delay in the technical developments and the very dynamic environment in respect to the IP landscape and competing developments on the commercial and academic sides, we decided to temporally coordinate the market and IP landscape analyses with our technical work, i.e. to postpone IP/market analyses until we have an overview of the +/-ghmC mapping data.