Periodic Reporting for period 1 - GuideArtifEvol (Tracking and guiding artificial enzyme evolution via landscape inference)
Berichtszeitraum: 2019-09-01 bis 2021-08-31
Directed evolution has been proved to be an effective method for enzyme engineering: it has successfully increased the efficiency of some enzymes, adapted others to work in different conditions, and even changed their substrates. The process mimics natural evolution: it generates a set of diverse variants of the gene codifying for the enzyme and subjects this set to functional screening or selection in order to extract the best performing variants. These two steps are applied iteratively, leading to optimized variants of the enzyme. While it is an effective strategy, the underlying of the process remains unknown and the overall protocol is time consuming. Previous work has focused on increasing diversity and smarter strategies for selection and screening. In this project I focus on how the sequence space is explored during directed evolution experiments. I use an experimental platform that tests millions of variants of an enzyme simultaneously, and I incorporate next generation DNA sequencers to the overall protocol. This permits to have an insight on the effect of DNA mutations in the activity of the enzyme and it will aid to focus experimental efforts in those which have a stronger impact.
I chose nanopore sequencing because it permits to read long strands of DNA (in this project reads between 1000 and 2000 bases are studied) it has a high throughput (millions of reads for an accessible amount of time and money). On the other hand, the results are noisy compared to other technologies, with an error rate from 2 to 10%, thus comparable with the mutation rate on the libraries that I am aiming to. To overcome this I took a consensus sequence strategy, where each variant is read several times and then the average sequence is the true sequence. This strategy can consume a large part of the sequencing throughput, as one read needs to be read 20-100 times. I developed SINGLe, a machine-learning based method to reduce the number of noisy reads required to obtain the true sequence. Using SINGLe, as few as 5 reads are enough to obtain the true sequence by consensus. This development is available on git-hub (https://github.com/rocioespci/single(öffnet in neuem Fenster)) and it is currently being evaluated to be incorporated on Bioconductor.org. There is also a scientific publication submitted a peer-review journal and it's pre-print is available in https://www.biorxiv.org/content/10.1101/2020.03.25.007146v2(öffnet in neuem Fenster).
I have successfully carried one round of selection on the protein Bst.NBI a DNA nicking enzyme and sequenced the input and output DNA libraries. I observed enrichment of some variants and I am currently studying which are the properties at DNA level that characterize. I expect that this will permit to design a more efficient exploration of the fitness landscape.