Periodic Reporting for period 1 - deCYPher (Decipher cytochrome P450 enzymes (CYPs) by digital tools to produce flavonoids and terpenoids)
Reporting period: 2023-09-01 to 2025-02-28
At the same time, artificial intelligence (AI) and machine learning (ML) are advancing rapidly, yet their potential in engineering biology remains underused. The deCYPher project bridges this gap by developing a standardized AI/ML platform to unlock the microbial production of OPMs, making these high-value molecules more accessible for industrial and societal applications.
deCYPher builds an AI/ML pipeline that integrates across all stages of the Design-Build-Test-Learn (DBTL) cycle, and across the biotech value chain. This modular platform is combined with smart databases and synthetic biology tools to support innovation throughout the sector. Project activities begin at TRL 2–3, with a target of reaching TRL 5.
For OPM production, deCYPher focuses on three main objectives:
(1) discovering and selecting the right CYP enzymes;
(2) ensuring their proper expression and localization in microbial hosts; and
(3) optimizing the microbial chassis and overall bioprocess.
Alongside its technical work, deCYPher actively involves stakeholders—including industry, regulators, NGOs, and citizens—to reflect on the societal and ethical implications of integrating AI and SynBio.
The project’s pathway to impact covers four key areas:
(1) Scientific and technical impact – A generic, modular AI/ML pipeline is developed for every step of the DBTL cycle. This foundation supports the efficient and sustainable development of bioprocesses for flavonoids and terpenoids, and is adaptable across the life sciences.
(2) Economic impact – By using an integrated approach across the bioprocess development chain, the project enables scalable and competitive production of OPMs with broad applications, supporting economic valorization.
(3) Sustainability impact – The use of metabolically versatile microbial hosts supports improved resource use and the shift to non-fossil, local feedstocks, contributing to cost-effective and sustainable bioproduction.
(4) Societal impact – Through open and citizen science approaches, and by applying science anticipatory methods, deCYPher supports a deeper understanding of the societal implications of AI/ML in SynBio, ensuring alignment with stakeholder needs and the safe development of emerging biotechnologies.
Progress has been made in designing and building a standardized AI/ML platform to support the Design-Build-Test-Learn (DBTL) cycle. A generic, modular active learning pipeline has been developed and is currently being validated in the context of the first use case, focusing on protein engineering. This platform will be further expanded to accommodate additional use cases, including microbial host optimization and bioprocess development. In parallel, several computational tools have been developed and implemented to support the selection and expression optimization of cytochrome P450 (CYP) enzymes, further enhancing the platform’s capabilities.
(2) CYP expression and activity:
Progress has been made in the areas of CYP bioprospecting, selection, and functional expression in microbial systems. Semi high-throughput screening platforms have been established to evaluate CYP specificity toward target flavonoids and terpenoids. These platforms provide an efficient means to identify CYPs with desired substrate specificity, and initial screening of a first CYP set has already demonstrated successful microbial production of several plant-derived metabolites.
To ensure functional expression in microbial hosts, two alternative strategies have been developed to achieve correct localization and activity of CYPs in bacterial systems. These approaches are now being evaluated and optimized for selected CYP/host combinations, supported by computational tools. Initial experiments indicate successful enzyme targeting while preserving activity.
In parallel, efforts have also focused on improving host robustness during CYP expression. Two dynamic genetic circuitries have been designed to mitigate the cellular stress typically induced by CYP expression. These circuits will be further evaluated and refined to support more stable and efficient microbial production platforms.
(3) Microbial production of plant metabolites:
Advances have been made to build microbial cell factories for oxygenated plant metabolites production. In total, three host/target pairs are selected for which the baseline is being created, and which will be evaluated and optimized using the developed tools to improve CYP expression.
(4) Data management:
A harmonized metadata framework for data management has been developed to ensure consistency and interoperability across the consortium by the creation and adoption of a shared glossary and a “Codebook” that define common metadata standards. These standards will support seamless data exchange between data-producing and data-consuming partners, while also facilitating the publication of data and digital research outputs in open-access repositories and registries. The framework lays the foundation for FAIR (Findable, Accessible, Interoperable, Reusable) data practices across the project.
One notable example is the modular design of the AI/ML pipeline, which enables its application across a wide range of data types, models, and problem domains throughout the entire biotechnology value chain. This flexible, cross-cutting approach represents a significant advancement beyond the current state of the art in the use of AI for engineering biology.