The volume of data in biomedicine is constantly increasing. Despite a large adoption of English in science, a significant quantity of these data uses the French language. Biomedical data integration and semantic interoperability is necessary to enable new scientific discoveries that could be made by merging different available data. A key aspect to address those issues is the use of terminologies and ontologies as a common denominator to structure biomedical data and make them interoperable. Researchers have called for the need of automated annotation methods and for leveraging natural language processing tools in the curation process. Still, even if the issue is being currently addressed for English, French is not in the same situation: there is little readily available technology (i.e.“off-the-shelf” technology) that allows the use of ontologies uniformly in various annotation and curation pipelines with minimal effort.
The Semantic Indexing of French Biomedical Data Resources (SIFR/SIFRm — www.lirmm.fr/sifr) project investigates the scientific and technical challenges in building ontology-based services to leverage biomedical ontologies and terminologies in indexing, mining and retrieval of biomedical data. Our main goal is to enable straightforward use of ontologies freeing health researchers to deal with knowledge engineering issues and to concentrate on the biological and medical challenges.
Within SIFR, we build an ontology-based indexing workflow (i.e. SIFR Annotator) similar to what exists for English resources but specialized for other EU languages, starting with French. This service is available within a portal of ~30 French biomedical ontologies/terminologies which reuses the NCBO BioPortal technology, developed at Stanford University. The SIFR BioPortal has been released in June 2015 (
http://bioportal.lirmm.fr(opens in new window)) and actively used and improved since then. Recently, the SIFR Annotator has been enriched to process clinical data and contextualize the annotations (negation, temporality, experiencer). We offer now, both for English and French a unique open ontology-based annotation service that both recognize ontology concepts and contextualize them allowing non-natural-language-processing experts to both annotate and contextualize medical conditions in clinical notes.
In addition, we are also abstracting and generalizing our results to agronomy by offering an ontology repository for agronomical ontologies called AgroPortal. The AgroPortal project, is a community effort started by the Montpellier scientific community (LIRMM, IRD, CIRAD, INRA, Bioversity International) to build an ontology repository for agronomy and related domains (food, plant sciences and biodiversity). Our goal is to encourage the adoption of metadata and semantics to facilitate open science. By enabling straightforward use of ontologies, we expect data managers and researchers to focus on their tasks, without requiring them to deal with the complex engineering work needed for ontology management.
SIFR/SIFRm (2013-2019) is a collaborative action between LIRMM & BMIR previously funded by the French ANR Young Researcher program and currently by the EU H2020 Marie Sklodowska-Curie Program (2016-2019). Dr. Clement Jonquet, SIFR’s principal investigator, is assistant professor at University of Montpellier & LIRMM, and previously visiting scholar at Stanford BMIR, within Pr. Mark Musen’s team.