Periodic Reporting for period 1 - DEBBIE (A database of experimental biomaterials and their biological effect)
Periodo di rendicontazione: 2018-09-01 al 2020-08-31
This problem has wide implications for society- the absence of structured data impedes the use of computational tools and the development of data-driven implants. It also slows down the process of learning from past failures and complications.
Text mining systems offer an attractive method to speed up and facilitate extraction, organization and synthesis of information, making the process more robust and systematic.
Project DEBBIE, or The Database of Experimental Biomaterials and their Biological Effect aims to address this fundamental problem. The project’s main objective is the design and automated population of the first, open-access biomaterials database, aiming to facilitate a more efficient access to the large literature in the field, generate a comprehensive map of research activity and findings, and enable evidence-based selection of materials for medical applications. Beyond the development of the database, the project is dedicated to laying the foundation for text mining in the biomaterials domain, through the creation, adaptation and optimization of open source assets and tools.
Project DEBBIE has resulted in 4 significant tools described in more details below:
1.1 The DEB Ontology
The Devices, Experimental scaffolds and Biomaterials Ontology (DEB) is an open resource for organizing information about biomaterials, their design, manufacture and biological testing. It was developed using text analysis and systematically curated to represent the domain’s lexicon.
DEB may be used for searching terms, performing annotations for machine learning applications, standardized meta-data indexing and other cross-disciplinary data exploitation. Biomaterials scientists have participated in its validation, and can continue to contribute by flagging new terms, definitions and errors. Like all of DEBBIE’s generated assets, DEB is open source, and can be downloaded from: bioportal.bioontology.org/ontologies/DEB
1.2 The Biomaterials Annotator
The Biomaterials Annotator is the first biomaterials-specific annotation system, designed to recognize named entities from fifteen different categories, pre-defined in the DEB ontology. The Biomaterials Annotator was developed in a re-iterative manner, and after sufficient quality was achieved, a team of experts validated the output of the system.
Developed in JAVA, it uses the General Architecture of Text Engineering (GATE) software, and can be parametrized with dictionaries and specific handmade JAPE (Java Annotation Patterns Engine) rules. To cover entities from the different domains, multiple nomenclature, vocabularies, and ontologies were identified and combined. The resulting dictionaries as well as the system itself are openly available: github.com/ProjectDebbie/Biomaterials_annotator.
1.3 The DEBBIE pipeline
DEBBIE is a novel automated text-mining pipeline that retrieves published records from MEDLINE, classifies them by their relevance to the biomaterials domain, performs annotations of domain-specific concepts, and stores the results within a MongoDB database.
The pipeline utilizes a software container technology (Docker) and is orchestrated by Nextflow, a workflow manager.
The DEBBIE pipeline is openly available here: github.com/ProjectDebbie/DEBBIE_pipeline
1.4 The DEBBIE database
The physical DEBBIE database is a MongoDB implementation hosted by Barcelona Supercomputing Centre’s Starlife infrastructure. The contents of DEBBIE are accessible through a web app (https://debbie.bsc.es/search/) , but also programmatically through the RESTful API located at http://debbie.bsc.es/search/rest.
At its current versions, developed with user's feedback, the web app provides a search bar where users can submit a query, and receive a quick summary, as well as the opportunity to select categories of interest and identify top concepts associated with their query.
In addition to these four tools, the DEBBIE project has developed multiple corpora for biomaterials text classification.
2. Exploitation and dissemination
Peer reviewed articles published to date:
1. An article explaining the urgency and obstacles to text mining in the biomaterial domain:
O. Hakimi; M. Krallinger; M.P. Ginebra. 2020. Time to kick-start text mining for biomaterials NATURE REVIEWS MATERIALS 5-8, pp.553-556.
www.nature.com/articles/s41578-020-0215-z
2. An article detailing the development of the DEB ontology:
O. Hakimi; J.L. Gelpi; M. Krallinger; F. Curi; D. Repchevsky; M.P. Ginebra. 2020. The Devices, Experimental Scaffolds, and Biomaterials Ontology (DEB): A Tool for Mapping, Annotation, and Analysis of Biomaterials' Data ADVANCED FUNCTIONAL MATERIALS. 30-16.
onlinelibrary.wiley.com/doi/abs/10.1002/adfm.201909910
We also published a story in the blog towards data science, explaining the potential impact of mining biomaterials data:
https://towardsdatascience.com/could-data-analysis-prevent-unnecessary-suffering-lessons-to-learn-from-the-vaginal-mesh-scandal-793086d6bbcc
All assets and tools developed during the DEBBIE project are openly available in the project’s GitHub: github.com/ProjectDebbie
Information about the project with relevant links can be found at: projectdebbie.github.io/
Documentation of the DEBBIE system can be found here: projectdebbie.github.io/documentation.html
Presentations in international conferences included a talk in ESB2019, Annual Conference of the European Society for Biomaterials, Dresden, Germany, September 2019, and Automated Knowledge Base Construction (AKBC), Amherst, US, May 2019.
The impact of Project DEBBIE is two folds. It has started the process of mapping the territory, identifying obstacles and missing resources, developing key enabling assets (corpora, ontology) and finding a path to exploit unstructured data. Importantly, it has started investigating methods to handle the specific challenges of data extraction in the field, and leading the first effort to address it.
In the long term, gathering structured biomaterials data is expected to significantly advance implant and medical device design, testing and evaluation, enabling data-driven approaches, intelligent materials selection and inference.