Periodic Reporting for period 1 - DEBBIE (A database of experimental biomaterials and their biological effect) Reporting period: 2018-09-01 to 2020-08-31 Summary of the context and overall objectives of the project Efficient access to organized biomaterials biocompatibility, performance and safety data is a pressing need shared by researchers, clinicians and policy makers. Like in many other biomedical fields, scientific advance is shared in the form of published peer-reviewed articles, with additional technical data in filed patents. Recent years have seen a dramatic hike in the numbers of scientific publications in the biomaterials domain. In February 2021, the search query ‘biomaterials OR cell scaffolds’ returned over 240k records from PubMed, 350k from Web of Science, 850k from Scopus and 1,000k in Google Scholar. The rapid expansion of the knowledge pool and the heterogeneous nature of the data makes the systematic synthesis of biomaterials knowledge a daunting manual task. This problem has wide implications for society- the absence of structured data impedes the use of computational tools and the development of data-driven implants. It also slows down the process of learning from past failures and complications. Text mining systems offer an attractive method to speed up and facilitate extraction, organization and synthesis of information, making the process more robust and systematic.Project DEBBIE, or The Database of Experimental Biomaterials and their Biological Effect aims to address this fundamental problem. The project’s main objective is the design and automated population of the first, open-access biomaterials database, aiming to facilitate a more efficient access to the large literature in the field, generate a comprehensive map of research activity and findings, and enable evidence-based selection of materials for medical applications. Beyond the development of the database, the project is dedicated to laying the foundation for text mining in the biomaterials domain, through the creation, adaptation and optimization of open source assets and tools. Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far 1. Results overviewProject DEBBIE has resulted in 4 significant tools described in more details below: 1.1 The DEB OntologyThe Devices, Experimental scaffolds and Biomaterials Ontology (DEB) is an open resource for organizing information about biomaterials, their design, manufacture and biological testing. It was developed using text analysis and systematically curated to represent the domain’s lexicon. DEB may be used for searching terms, performing annotations for machine learning applications, standardized meta-data indexing and other cross-disciplinary data exploitation. Biomaterials scientists have participated in its validation, and can continue to contribute by flagging new terms, definitions and errors. Like all of DEBBIE’s generated assets, DEB is open source, and can be downloaded from: bioportal.bioontology.org/ontologies/DEB1.2 The Biomaterials AnnotatorThe Biomaterials Annotator is the first biomaterials-specific annotation system, designed to recognize named entities from fifteen different categories, pre-defined in the DEB ontology. The Biomaterials Annotator was developed in a re-iterative manner, and after sufficient quality was achieved, a team of experts validated the output of the system. Developed in JAVA, it uses the General Architecture of Text Engineering (GATE) software, and can be parametrized with dictionaries and specific handmade JAPE (Java Annotation Patterns Engine) rules. To cover entities from the different domains, multiple nomenclature, vocabularies, and ontologies were identified and combined. The resulting dictionaries as well as the system itself are openly available: github.com/ProjectDebbie/Biomaterials_annotator. 1.3 The DEBBIE pipelineDEBBIE is a novel automated text-mining pipeline that retrieves published records from MEDLINE, classifies them by their relevance to the biomaterials domain, performs annotations of domain-specific concepts, and stores the results within a MongoDB database. The pipeline utilizes a software container technology (Docker) and is orchestrated by Nextflow, a workflow manager. The DEBBIE pipeline is openly available here: github.com/ProjectDebbie/DEBBIE_pipeline1.4 The DEBBIE databaseThe physical DEBBIE database is a MongoDB implementation hosted by Barcelona Supercomputing Centre’s Starlife infrastructure. The contents of DEBBIE are accessible through a web app (https://debbie.bsc.es/search/) , but also programmatically through the RESTful API located at http://debbie.bsc.es/search/rest.At its current versions, developed with user's feedback, the web app provides a search bar where users can submit a query, and receive a quick summary, as well as the opportunity to select categories of interest and identify top concepts associated with their query.In addition to these four tools, the DEBBIE project has developed multiple corpora for biomaterials text classification. 2. Exploitation and disseminationPeer reviewed articles published to date:1. An article explaining the urgency and obstacles to text mining in the biomaterial domain:O. Hakimi; M. Krallinger; M.P. Ginebra. 2020. Time to kick-start text mining for biomaterials NATURE REVIEWS MATERIALS 5-8, pp.553-556. www.nature.com/articles/s41578-020-0215-z2. An article detailing the development of the DEB ontology:O. Hakimi; J.L. Gelpi; M. Krallinger; F. Curi; D. Repchevsky; M.P. Ginebra. 2020. The Devices, Experimental Scaffolds, and Biomaterials Ontology (DEB): A Tool for Mapping, Annotation, and Analysis of Biomaterials' Data ADVANCED FUNCTIONAL MATERIALS. 30-16.onlinelibrary.wiley.com/doi/abs/10.1002/adfm.201909910We also published a story in the blog towards data science, explaining the potential impact of mining biomaterials data:https://towardsdatascience.com/could-data-analysis-prevent-unnecessary-suffering-lessons-to-learn-from-the-vaginal-mesh-scandal-793086d6bbccAll assets and tools developed during the DEBBIE project are openly available in the project’s GitHub: github.com/ProjectDebbieInformation about the project with relevant links can be found at: projectdebbie.github.io/Documentation of the DEBBIE system can be found here: projectdebbie.github.io/documentation.htmlPresentations in international conferences included a talk in ESB2019, Annual Conference of the European Society for Biomaterials, Dresden, Germany, September 2019, and Automated Knowledge Base Construction (AKBC), Amherst, US, May 2019. Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far) In terms of progress beyond state of the art, the Biomaterials Annotator and the DEBBIE extraction pipeline represent the first systems performing biomaterials named entity recognition, biomaterials text classification and database deposition of information about scaffolds and implants. DEBBIE successfully pioneered the application of NLP techniques in the field of biomaterials. The impact of Project DEBBIE is two folds. It has started the process of mapping the territory, identifying obstacles and missing resources, developing key enabling assets (corpora, ontology) and finding a path to exploit unstructured data. Importantly, it has started investigating methods to handle the specific challenges of data extraction in the field, and leading the first effort to address it. In the long term, gathering structured biomaterials data is expected to significantly advance implant and medical device design, testing and evaluation, enabling data-driven approaches, intelligent materials selection and inference. The DEBBIE pipeline design. Future research will address the retrieval and annotation of full text.