CORDIS - EU research results

A TRIP through chromatin and DNA double strand break repair

Final Report Summary - TRIP (A TRIP through chromatin and DNA double strand break repair)

Mammalian genomes are more than a random juxtaposition of genes. Instead, they have a domain architecture where gene deserts alternate with gene-dense regions. One of the most puzzling observations is that the frequency of G+C nucleotides varies in regional waves of megabase scale called isochores. Biases of the DNA repair system can create a global increase or decrease of the G+C content, but this does not explain why the level would vary throughout the genome. The two main hypotheses to explain this phenomenon are that it is the consequence of natural selection, or that it is due to G+C-biased gene conversion: a replacement of A+T by G+C alleles during meiotic recombination.
It is now clear that chromatin is a heterogenous environment. The influence of chromatin on DNA damage and repair is no longer to be proved, but accurate descriptions are difficult to obtain for lack of proper technologies to study DNA repair in those contexts. How a given chromatin context twists the evolution of the local sequence is presently elusive. It is possible that the DNA repair system has different properties in different microenvironments and, or in other words that the biases would themselves vary. Small but systematic biases would accumulate over evolutionary time scales, thereby creating the isochores and possibly other structures in the genome.
We have tested this hypothesis experimentally. For this, we have developed a technology called TRIP for Thousands of Reporters Integrated in Parallel. Briefly, the principle of TRIP is to integrate reporter construct for biased DNA repair. The reporter is inserted in the genome using Sleeping Beauty, a high efficiency transposon. It contains a restriction site for the meganuclease I-SceI flanked by two nearly identical sequences. The reporter can be cut in vivo by expressing I-SceI and the outcome of the repair process can be followed by high throughput sequencing. The break can either be repaired through non homologous end joining, whereby the ends are ligated together, or through single strand annealing, whereby the two nearly identical sequences anneal to each other and the nicks are sealed. Since the sequences are not exactly identical, a mismatch will be formed when the DNA is repaired via the second mechanism, so this method effectively allows us to create the same mismatch at different locations of the genome. Finally, each transposon molecule is tagged by a unique barcode that allows to identify the location associated to a particular repair outcome.
We have chosen mouse embryonic stem cells as a model system because, unlike cancer cells, they have an intact DNA repair system. We have generated four types of reporters, producing C:A, C:T, G:A and G:T mismatches. The reporters were cloned in Sleeping Beauty, an efficient transposon with little insertion bias. We have inserted and mapped around 55,000 DNA repair reporters in the genome of mouse embryonic stem cells and have followed the process of DNA repair after transfecting an I-SceI expression plasmid.
Overall, the results show that repair of the construct in a chromatin context is A+T-biased. This conflicts with earlier experiments carried out on plasmids, which showed that the repair of a single mismatch is G+C-biased. We are currently investigating whether the discrepancy is due to the fact that our reporter system is integrated in the genome. Alternative explanations include the cell type, the sequence of the reporter, or the fact that the mismatch is generated by single-strand annealing.
We observed that the mismatches have different intrinsic biases. Purine-pyrimidine mismatches (G:T and C:A) were biased towards A+T, whereas purine-purine and pyrimidine-pyrimidine mismatches (G:A and C:T) were unbiased. Interestingly, none of the mismatches was biased towards G+C, indicating that on average, DNA repair tends to decrease the G+C-content of the genome. The numbers are in agreement with the observation that the mouse genome is overall A+T-rich.
More importantly, we found that the repair biases were not constant throughout the genome. The G:A and C:T mismatches were more G+C-biased in G+C-rich regions. Somewhat surprisingly, the G:T and C:A mismatches did not show this trend, indicating that the DNA repair system is both context and sequence-dependent. These results suggest that variations of the G+C content in the mouse genome can be explained to some extent by the activity of the DNA repair system. In some regions, the G:A and C:T mismatches are more prone to be repaired as G:C or C:G, which tends to increase the local G+C content over evolutionary time scales.
Our technology opens several research avenues. It can be used to follow other kinds of damage, induced by double strand break or other elicitors such as radiations. It is also possible to dissect the biases at the molecular level by performing the same experiments in cell lines deficient for a given component of the DNA repair machinery. Such approaches will eventually allow us to identify the molecular actors at work.
Our results have important implications for evolutionary biology. They lay the foundations for future theoretical work aiming to better appreciate the role of the DNA repair system in the appearance of genomic patterns such as the variations of G+C content. More generally, the connection between DNA repair and the local chromatin context means that our genomes bear the print of past events. Given that chromatin is influenced by the surrounding environment, it may be possible to collect some information about the habitat, diet or culture of our ancestors. Such a Lamarckian interpretation of the evolution of the G+C content also has profound implications for epistemology and for the history of scientific concepts.
Mutations are responsible for many human illnesses, from cancer to hereditary diseases. They cannot be prevented, but a better understanding of their occurrence would allow us to design better screening methods. For instance the knowledge that a genomic region is more prone to mutations could give rise to prenatal tests for rare diseases. Also, a hallmark of cancer is a malfunction of the DNA repair system. By defining more precisely what a normal mutation pattern is, our research opens perspectives of detection based on mutation patterns deviating from the expected.