In the past years different biological systems have been identified that allow to programmably target almost any desired genetic locus within an organism. These tools, most prominently the CRISPR-Cas technology, have greatly facilitated and revolutionized the field of genome engineering to allow facile deletions, modifications or introduction of genes in living of organisms. This has boosted in the recent years a large range of new emerging applications in science, green biotechnology as well as medicine. A drawback of these highly promising new technologies, in particular for medicine, is that the involved proteins exhibit a significant degree of promiscuity in recognizing their genetic targets. This can lead to editing of wrong genes - so-called off-targets. To reduce the potentially detrimental off-targeting, this project aims at deciphering the mechanisms by which CRISPR-Cas and other protein tools recognize their DNA targets. To this end, the project uses cutting-edge single-molecule observation for a detailed and quantitative characterization of the target recognition. The obtained data is then used to develop quantitative mechanism-based models/predictors for the potential off-target recognition. This in turn shall be used to select target sequences with a minimum off-targeting propensity in order to avoid detrimental side-effects. Furthermore, novel high-throughput methodology is developed to parametrize and test the predictions on many different targets in parallel.
During the runtime of the project, investigations of a large number of DNA interacting proteins with single-molecule tools provided a wealth of different mechanisms at which nucleic acids are processed by these molecular machines. Focusing an CRISPR-Cas complexes, the dynamics at which these enzymes recognize their target genes could be mapped and resolved in unprecedented detail. Based on this, a first fully validated model for the (off-)target recognition dynamics and propensity could be developed that correctly predicted general rules of target site selection by these complexes. Using DNA-based nanotools we could furthermore derive the full free energy landscape of the target recognition process - a key ingredient for successful modeling. Combining the new mechanism-based models with high throughput data will in future provide improved off-target predictors and thus more reliable genome engineering results.