Skip to main content

Bioinformatics Services for Data-Driven Design of Cell Factories and Communities

Periodic Reporting for period 3 - DD-DeCaF (Bioinformatics Services for Data-Driven Design of Cell Factories and Communities)

Reporting period: 2019-03-01 to 2020-02-29

Biotechnology aims to utilize biological systems to solve societal challenges ranging from sustainable production of fuels, chemicals and agricultural products to improving human health through diagnostic tools and therapeutics. Despite its great potential, biotechnological product development is challenging and time consuming due to our limited understanding of the biological systems that are employed for example for renewable chemical production. Past 20 years have seen a rapid development in so called omics technologies that allow comprehensive characterization biological systems at different levels from genes to transcripts to proteins and metabolites. These omics technologies allow building complete catalogues of parts of a biological system, the interactions between the parts, and the states of the system in any given condition. These catalogues can in turn be used to build predictive models of how biological systems function, and these models can be used to design improvements in biological systems that enhance the performance of the system. As an example, a model of cellular metabolism can be used to design how we should manipulate the gene complement of a microbial strain to optimize the flow of material from a feedstock such as a sugar to a specific chemical such as a vitamin.

Omics data is currently rapidly accumulating in public domain and proprietary databases and has the potential to accelerate development of biotechnology-based products significantly. However, omics data is not leveraged effectively in the biotechnology industry due to lack of tools to rapidly access public and private data and to design cellular manipulations or interventions based on the data. With this project we aim to make a broad spectrum of omics data useful to the biotechnology industry covering application areas ranging from industrial biotechnology to human health. Our approach is based on using omics data to develop, parameterize and constrain large-scale models of biological systems. We will develop novel approaches for omics data analysis using models to enable 1) Identification of novel enzymes and pathways by mining genomic data collected from environments such as oceans and soils, 2) Data-driven design of cell factories for the production of chemicals and proteins, and 3) Analysis and design of microbial communities relevant to human health, industrial biotechnology and agriculture.

All research efforts will be integrated in an interactive web-based DD-DeCaF platform that will be available for academic research and teaching as well as industrial product development communities. The DD-DeCaF platform will incorporate tools to analyze and visualize diverse omics datasets, and to use these datasets for designing cell factories and communities. This platform can be leveraged by biotechnology SMEs to increase their competitiveness through economizing resources and reducing time-to-market within their respective focus areas. The platform will be composed of standardized and interoperable components that service-oriented bioinformatics SMEs involved in the project can reuse in their own products. Two end-user companies will be involved in practical testing of the platform built within the project using proprietary omics data generated at the companies. These companies are involved in production of nutrition-related products (vitamins and other dietary supplements) in a renewable fashion and the project is expected to accelerate the development of new products within the companies.
The project is focused on developing methods and software infrastructure for data-driven design of cell factories and microbial communities. The work carried out during the reporting period has included significant methodological developments in all areas ranging from metagenomic data analysis to algorithms for cell factory design (WP1-WP6). Methodological developments in this period include both development of new or improved models of biological systems and development of new algorithms for integrating omics data with the models in order to make phenotypic predictions and design improved cell factories and communities. On the model development side two major results have been achieved so far: 1) the development of the first proteome constrained eukaryotic genome-scale metabolic model that can be used to integrate proteomics data into modeling workflows (WP5) and 2) the development of large-scale kinetic models for both E. coli and yeast that can be used to integrate metabolomics data into modeling workflows (WP4). On the algorithm development side the project has so far delivered an improved pipeline for large-scale metagenomic data processing from raw reads to full functional annotation (WP2), a pipeline for automated reconstruction of metabolic models from genomic or metagenomic data (WP3), pipelines for building large-scale kinetic models (WP4) and proteome-constrained models (WP5), and algorithms for identifying optimal cell factory designs based on kinetic models (WP6).
On the technological side (WP7, WP8) the project focuses on building software packages, reusable software services, and visual user interfaces to exploit the models an algorithms developed in WP2-WP6. During this reporting period the project has delivered a comprehensive software library for cell factory design that can be used as a foundation to develop user friendly software tools within the project (Cameo), software service APIs that allow manipulating and simulating models, and a prototype implementation of a web-based platform for cell factory design and pathway based data and simulation result visualization ( This platform includes facilities for uploading and storing data relevant for cell factory design, tools for integrating data with models and tools for performing advanced computations using the models. The prototype platform is under constant testing by the end user partners (WP9) who are providing valuable input for further development.
The project is fully on track with respect to the original DoA and we expect to reach the milestones specified in the DoA by the end of the project. The project will deliver a series of advanced models of commonly used cell factory host organisms, advanced algorithms for using these models to analyze omics data and use data and models for cell factory designs, and software tools that make it easy to use the models with both public domain and newly generated omics data. These software tools will include both well documented packages and services that can be used by bioinformatics SMEs, and a comprehensive web-based platform that can be used by end-users in industry and academia. The provision of packages and services will allow bioinformatics SMEs both within and outside of the consortium to build additional software tools that exploit the methodological developments achieved during the project. The wider impact of the project will primarily come through the use of the software tools in biotechnology industry. Within the consortium the end-user partners (Biosyntia and DSM) will see impact in accelerating their internal R&D programs to build improved cell factories for chemical production.