Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Tracking and guiding artificial enzyme evolution via landscape inference

Periodic Reporting for period 1 - GuideArtifEvol (Tracking and guiding artificial enzyme evolution via landscape inference)

Reporting period: 2019-09-01 to 2021-08-31

Enzymes are molecules that exist in Nature and that catalyze chemical reactions. They are highly specific to bind to their substrates and very efficient in their activities. They are perfect candidates for applications in which chemical reactions are needed and many of them are already used in different industries, from pharmaceuticals and diagnostics to food and cloth industries. Enzymes most of the time cannot be directly used in a different context as the natural one, or the desired outcome of the chemical reaction is not exactly the same as the natural one. Therefore, there is a large interest in engineering enzymes to adapt them to work in different conditions, to increase their efficiency, change their substrates or remove secondary activities.

Directed evolution has been proved to be an effective method for enzyme engineering: it has successfully increased the efficiency of some enzymes, adapted others to work in different conditions, and even changed their substrates. The process mimics natural evolution: it generates a set of diverse variants of the gene codifying for the enzyme and subjects this set to functional screening or selection in order to extract the best performing variants. These two steps are applied iteratively, leading to optimized variants of the enzyme. While it is an effective strategy, the underlying of the process remains unknown and the overall protocol is time consuming. Previous work has focused on increasing diversity and smarter strategies for selection and screening. In this project I focus on how the sequence space is explored during directed evolution experiments. I use an experimental platform that tests millions of variants of an enzyme simultaneously, and I incorporate next generation DNA sequencers to the overall protocol. This permits to have an insight on the effect of DNA mutations in the activity of the enzyme and it will aid to focus experimental efforts in those which have a stronger impact.
I worked with an experimental platform of directed evolution denominated PEN-CSR, which can test millions of an enzyme's variant simultaneously. A pre-print of the scientific article describing the experimental platform available in https://www.biorxiv.org/content/10.1101/2021.04.22.440993v1.full(opens in new window) and it is also under review in a peer-review journal. The input and the output of this system is a gene library, i.e. millions of fragments of DNA coding for the actual enzymes. Before selecting for the desired activity, the library includes variants with random mutations, but after selection it is enriched with the genes coding for the most efficient enzymes. In this project I included the use of nanopore sequencing to read out the full gene libraries before and after selection. This permits to study at the DNA level which are the mutations and combinations of mutations which improve the enzymes. In other words, it allow us to reconstruct the fitness landscape.

I chose nanopore sequencing because it permits to read long strands of DNA (in this project reads between 1000 and 2000 bases are studied) it has a high throughput (millions of reads for an accessible amount of time and money). On the other hand, the results are noisy compared to other technologies, with an error rate from 2 to 10%, thus comparable with the mutation rate on the libraries that I am aiming to. To overcome this I took a consensus sequence strategy, where each variant is read several times and then the average sequence is the true sequence. This strategy can consume a large part of the sequencing throughput, as one read needs to be read 20-100 times. I developed SINGLe, a machine-learning based method to reduce the number of noisy reads required to obtain the true sequence. Using SINGLe, as few as 5 reads are enough to obtain the true sequence by consensus. This development is available on git-hub (https://github.com/rocioespci/single(opens in new window)) and it is currently being evaluated to be incorporated on Bioconductor.org. There is also a scientific publication submitted a peer-review journal and it's pre-print is available in https://www.biorxiv.org/content/10.1101/2020.03.25.007146v2(opens in new window).

I have successfully carried one round of selection on the protein Bst.NBI a DNA nicking enzyme and sequenced the input and output DNA libraries. I observed enrichment of some variants and I am currently studying which are the properties at DNA level that characterize. I expect that this will permit to design a more efficient exploration of the fitness landscape.
In this project, next generation DNA sequencers were successfully incorporated into high throughput directed evolution experiments. In order to do this, we developed computational tools that improved the exploitation of the sequencing power while keeping an accurate read of the mutations in the enzyme’s gene. We now have preliminary results on the DNA sequence level of the selection process. This will provide some insight on the fitness landscape of the protein through this experiment. We expect that the final results will both contribute to a deeper understanding of the fitness landscape of enzymes and to the design of more effective directed evolution experiments.
My booklet 0 0