From Data to Models: New Bioinformatics Methods and Tools for Data-Driven Predictive Dynamic Modelling in Biotechnological Applications

Final Report Summary - BIOPREDYN (From Data to Models: New Bioinformatics Methods and Tools for Data-Driven Predictive Dynamic Modelling in Biotechnological Applications)

Executive Summary:
Currently, biologists are collecting enormous amounts of ‘omics’ data in a vast number of different databases. Predictive, data-driven computational models are needed to understand the complex, multi-scale biological networks underlying these high-throughput datasets. Such models are non-linear and contain many parameters, which are difficult (or impossible) to measure directly. Instead, parameters need to be inferred from data. This approach is called reverse-engineering. It has tremendous potential for several areas, such as biotechnology and systems biology, since it allows us to develop models with unprecedented accuracy and predictive power. This is achieved through an iterative refinement of our models compared to quantitative ‘omics’ data, a process called the “systems-biology modelling cycle”. Many methods have been developed that deal with specific steps in this cycle, but we still lack an over-arching, easy-to-use software framework that supports the modelling cycle in its entirety, allowing its widespread application in academia and industry.
The BioPreDyn project aims at improving accessibility of the data, and developing novel algorithms and tools implemented in such a general framework, which can enable the efficient transfer of cutting-edge modelling and optimisation methods from an academic research setting to private biotechnology partners.
To develop user-friendly software to support the model-building cycle, the project counted on a highly collaborative network of academic partners (who developed the algorithms), end-users (Insilico Biotechnology, INSIL and Evolva SA, EV, who applied the software in a commercial biotechnological setting) and a Complex Systems Modelling company (CSM, who wrote the integrated code framework). Users with different levels of expertise (experimental biologists, engineers, and bioinformaticians) tested the BioPreDyn emerging software framework and provided feedback on user-friendliness and functionality to the developers. The end-users INSIL and EV were both successful in incorporating this into their production strategies, and we are confident that these can be basic items in their toolbox for optimizing production processes.
A main goal was to go beyond the use of steady-state genome-scale models. These models do not yet include any regulatory or dynamic information, so that their simulation capabilities can become rapidly limiting. BioPreDyn efforts led to dynamic models that yield superior and more accurate simulation results, which should lead to the increasingly widespread use of such models in the design of biotechnological production processes.
We expect that the mutually beneficial partnerships between the eight academic labs and the three SMEs to continue beyond the project end. The academic groups have benefited from establishing proof-of-concepts for the technical and economical exploitation of the know-how and infrastructure generated by the project. On the other hand, the SMEs benefited by broadening their product and technology portfolio, such as software and computational tools, and they profited from model-based optimization for their modeling tools and production bio-processes. Overall, synergies between academic partners and SMEs catalyzed by BioPreDyn can be expected to facilitate the development and application of microorganisms in industrial and medical biotechnology, and contribute to shortening time-to-market of ideas.

Project Context and Objectives:
BioPreDyn scientists mine high-throughput datasets by employing data-driven computational models to elucidate biological networks. For this, we use reverse engineering, e.g. inferring parameters from the data, to predict non-linear models with numerous parameters. This is achieved through the systems-biology modeling cycle, which is an iterative refinement of our models as compared to quantitative ‘omics’ data. While many methods have been developed that deal with specific steps in this cycle (data analysis, model building/discrimination, parameter estimation/identifiability analysis, uncertainty quantification, and optimal experimental design), we still lack an over-arching, easy-to-use software framework that supports the modeling cycle in its entirety, allowing its widespread application.
The aim of this project is to improve accessibility of the data and to develop novel algorithms and tools, for the efficient transfer of cutting-edge modeling and optimization methods from an academic research setting to private biotechnology partners. We will use representative biological and biotechnological applications as benchmark problems to develop robust and generally applicable methodology. The availability of such tools to the biotechnology sector (and other industries) will greatly enhance our ability to design and optimize complex production processes, especially those of nutraceuticals, biopharmaceuticals, or fine chemicals based on engineered organisms such as bacteria, yeast or plants.
We plan to develop methods and tools to handle time- and space-dependent biological data, incorporating visualization methods for data analysis and model development. To support such dynamic modeling and reverse-engineering of biological systems, we will develop integrated software tools and workflows for i) multi-scale model identification and building; ii) measure design; iii) parameter estimation by global non-linear optimization; iv) parameter identifiability analysis; v) model comparison; and vi) optimal experimental design. Finally, this will be applied to biotechnological and biological problems, such as: (a) large-scale dynamic modeling of metabolism and gene regulation in microorganisms (Escherichia coli, Saccharomyces cerevisiae) and eukaryotic cell lines, (b) cellular signaling networks used for biotechnological production processes, c) inference of developmental gene regulatory networks in fruit flies (Drosophila) and cnidarians (Nematostella), and (d) mechanistic and comprehensive modeling of biotechnological production processes based on transgenic microorganisms.

Project Results:
The BioPreDyn project achieved its main goals of developing new bioinformatics methods and tools for data-driven and predictive dynamic modeling, which can be used by researchers to better understand specific biological questions and datasets and develop to new biotechnological applications. Main achievements are listed here.
Software tools have been developed and are freely accessible, which include Path2Models, a large-scale generation of computational models from biochemical pathway maps; SBML qualitative models, a model representation format and infrastructure to foster interactions between qualitative modelling formalisms and tools; BioServices, a common Python package to access biological web services programmatically; Cyrface, an interface from Cytoscape to R that provides a user interface to R packages; CySBGN, a cytoscape plug-in to integrate SBGN maps; CellNOptR, a flexible toolkit to train protein signaling networks to data using multiple logic formalisms; SBGN-ML and LibSBGN, which provide support for SBGN maps; and MIDER, a network inference method with mutual information distance and entropy reduction.
BioPreDyn-bench is a collection of large-scale benchmark problems for parameter estimation, and represents a highly collaborative effort within the BioPreDyn consortium (with CSIC, CRG, EMBL-EBI, UNIMAN, and INSIL). This addressed the current lack of realistic, large-scale, dynamic, ready-to-run benchmarks for parameter estimation. This shortage hampers progress in computational biology, since it makes it difficult to perform a fair and systematic evaluation and comparison of model calibration methods. We thus created a suite of challenging problems that are representative of the current state of the art in dynamic modeling in systems biology. This collection goes beyond parameter estimation, as the models provided here can also be used for benchmarking methods for optimal experimental design, identifiability analysis, sensitivity analysis, model reduction, and (in the case of metabolic models) also for metabolic engineering purposes. The software is publicly available at http://www.iim.csic.es/~gingproc/biopredynbench See also the publication DOI 10.1186/s12918-015-0144-4.
Source code and documentation of the BioPreDyn software suite is available online on the GitHub platform (https://github.com/bmoreau/biopredyn) under the BSD-3 clause license, along with documentation on how to install it and use it.
Numerous other methods and tools were developed, including: 1) a multi-objective optimization approach to the problem of regulation in metabolic networks, which can be applied to optimize parameters of the allosteric regulation of enzymes in a model of a metabolic substrate-cycle; 2) a method for assessing the confidence in the predictions made by dynamic models; studying the identifiability of a model of protein production from mRNA on the gap-gene circuit of Drosophila melanogaster; 3) structural identifiability analysis and global optimization methods; dynamic modeling of biological networks using a logic-based formalism; 4) OMEM (Optimal Metabolic Engineering of Microorganisms), which uses constraint-based models and combinatorial optimization methods to find optimal gene deletions that maximize the production of a target compound of interest; 5) a Matlab tool that enables a user to search for known direct and functional interactions in a query list of biological entities, which implements a set of Web Service API that programmatically accesses public databases in order to search pathways, interactions, and multi-layer networks. The tool returns ranked lists of entities, interactions, neighbors, and pathways containing the searched keywords, which are saved in formats that can be easily imported into popular network analysis tools; 6) a computational strategy, called Differential Multi-Information (DMI), to infer post-translational modulators of a transcription factor from a compendium of gene expression profiles (GEPs). DMI detects the occurrence of changes in target gene co-regulation for each candidate modulator, using a measure called Multi-Information; 7) an innovative computational workflow to model the alteration of metabolism caused by loss- or gain-of-function (LoF or GoF) of an enzyme (or transporter) and to predict the metabolites and reactions that are most affected. The computational workflow has been applied to a genome-scale metabolic network model of human hepatocytes; 8) standardised metabolic reconstructions of E. coli (http://ecoli.sf.net/) and CHO cell (http://cho.sf.net/) metabolism, and we lead the consensus yeast metabolic network (e.g. http://yeast.sf.net/). These reconstructions receive regular feedback from the community and are thus in a constant state of development and improvement.

Potential Impact:
The application of data-driven mathematical models in industry for the improvement of biotechnology production processes has only just begun, and the regular use of modeling and optimization software in the private biotechnology sector needs to be promoted. The use of such methods is often hampered by the absence of user-friendly, flexible and reliable software. Existing code often needs significant expert knowledge (both computational and scientific). In other words, end-users without extensive expertise in how to handle and compile code, and in modeling and optimization, are excluded from using advanced algorithms and models, or simply need too much time to use and understand their functionality. We have therefore met this strong demand for user-friendly software solutions that implement the iterative modelling cycle described above, which can be used by non-experts and guide the design of efficient production processes.
To develop user-friendly software to support the model-building cycle, we worked in a highly collaborative network of academic partners (who developed the algorithms), end-users (Insilico Biotechnology, INSIL and Evolva SA, EV, who applied the software in a commercial biotechnological setting) and a Complex Systems Modelling company (CSM, who wrote our integrated code framework). Users with different levels of expertise (experimental biologists, engineers, and bioinformaticians) tested our emerging software framework and provided feedback on user-friendliness and functionality to the developers. The end-users INSIL and EV were both successful in incorporating this into their production strategies, and we are confident that these can be basic items in the toolbox for optimizing production processes.
A main goal was to go beyond the use of steady-state genome-scale models. These models do not yet include any regulatory or dynamic information, so that their simulation capabilities can become rapidly limiting. Our efforts led to dynamic models that yield superior and more accurate simulation results, which should lead to the increasingly widespread use of such models in the design of biotechnological production processes.
We expect that the mutually beneficial partnerships between the eight academic labs and the three SMEs to continue beyond the project end. The academic groups have benefited from establishing proof-of-concepts for the technical and economical exploitation of the know-how and infrastructure generated by the project. On the other hand, the SMEs benefited by broadening their product and technology portfolio, such as software and computational tools, and they profited from model-based optimization for their modeling tools and production bio-processes. Overall, synergies between academic partners and SMEs catalyzed by BioPreDyn can be expected to facilitate the development and application of microorganisms in industrial and medical biotechnology, and contribute to shortening time-to-market of ideas.
Since the three SME partners work in close collaboration with other industries, the beneficial impact of BioPreDyn on biotechnology is expected to be further amplified. Evolva, for example, develops production processes for food additives and nutraceutical ingredients, and delivers its protocols to larger companies that implement production on a large scale. Evolva sells its products to the dietary supplement industry, an industry in which there is an ever increasing demand for high quality products at low prices. InSilico Biotechnology predicts and optimizes microbial processes for the food, agricultural, and healthcare industries, collaborating with major players in the field, such as Bayer Technology Services, Boehringer Ingelheim, and DSM Food Specialty.
Finally, the project invested a great deal of scientific resources in organizing international workshops and conferences for scientists external to BioPreDyn. We believe that these events have helped to train the next generation of scientists in these areas, and spread the knowledge about the new tools and methods developed within the project.

List of Websites:

www.biopredyn.eu

final1-biopredyn.jpg

Final Report Summary - BIOPREDYN (From Data to Models: New Bioinformatics Methods and Tools for Data-Driven Predictive Dynamic Modelling in Biotechnological Applications)

Share this page Share this page on social networks

Download Download the content of the page