European Commission logo
English English
CORDIS - EU research results

Semantic integrAtion and reasoning Framework for pharmacovigilancE signals Research

Final Report Summary - SAFER (Semantic integrAtion and reasoning Framework for pharmacovigilancE signals Research)

One of the most important aspects of marketed-drug safety monitoring is the identification of new or incompletely documented potentially causal associations between drugs and adverse effects (called “signals”), that are judged to be of sufficient likelihood to justify verificatory actions. Given the necessity to identify as early as possible such safety risks, in the post-market setting various data sources are employed for signal detection, spanning from spontaneous reporting systems, electronic health records, administrative claim databases, the scientific literature, and even social media. Each one of the above sources is attributed with advantages and limitations that affect the signal detection capacity, such as quality, reliability, coverage and bias, to name a few. Computational analysis methods constitute an important tool for signal detection. Nevertheless, computationally-extracted signals do not establish a causal relationship between drugs and adverse effects and, given the typically large number of generated indications, their filtering and prioritization is necessary.

Despite continuous advances in computational signal detection, a number of recent comparative studies of detection methods illustrated: (a) shortcomings in detection accuracy/efficiency, i.e. high rate of false-positive indications, difficulties in detecting rare adverse drug reactions (ADRs), while some events were not detectable despite the variety of the employed methods; (b) performance variation, i.e. event-based differential performance of methods, as well as differential performance with respect to the data used for analysis, and (c) complementarity in methods’ outcomes. These findings reinforce the arguments for exploiting all possible information sources for drug safety, but also for using multiple signal detection methods in parallel. Although few studies have elaborated on combinatorial signal detection with a rather limited number of methods and data sources, well-promising outcomes have been illustrated. Nevertheless, the realization of this approach at large scale requires systematic frameworks to address the challenges of the concurrent analysis setting.

SAFER focused on such challenges by exploiting and complementing evidence obtained from diverse signal sources and relevant computational signal detection methods. In particular, SAFER elaborated on “integrated signal detection”, i.e. the systematic, joint exploitation of multiple heterogeneous signal detection methods, data and other drug-related resources under a common framework. In this context, method invocation and outcomes’ aggregation, filtering, ranking and potential interpretation is central. Since the focus in signal detection concerns new ADRs, noisy indications (e.g. known ADRs or associations linking the drug with its indicated use) have to be filtered by accessing reference knowledge sources. In addition, given the fact that computational signal detection methods typically generate many indications for potential ADRs, the filtering outcomes shall be ranked (e.g. based on measures of their significance), in order to prioritize their subsequent assessment by drug safety experts. Supportive information on these findings would increase their comprehension and potential interpretation, beyond just providing statistical measures on each drug-event pair. Such information may constitute, for example, recent published studies, as well as information about potentially relevant clinical trials.

To this end, SAFER employed semantic technologies to perform the integrated signal detection workflow. A central part of SAFER included the PharmacoVigilance Signal Detectors Ontology (PV-SDO), which defined various concepts related with: (a) the domain of signal detection (e.g. drug, health outcome, pharmacovigilance signal, signal source, signal detection method, etc.), and (b) the construction of an integrated platform for signal detection (e.g. analysis experiment, analysis parameter, ranking criterion, etc.). These concepts were linked via object properties and further specified via datatype properties, in order to support the application logic that SAFER should provide. PV-SDO has been populated with a significant number of individuals using data from existing, open-source signal detection methods and analysis experiments. Its evaluation was both “data-driven”, i.e. assessing whether its model is sufficient to describe signal detection methods which were not part of the source knowledge employed for its design, and “experts-based”, according to which experts in knowledge engineering and signal detection were asked to assess it through an online survey. Refinements (e.g. to address ambiguous domain and range definitions, misleading names of class labels and property names) were applied to the model, according to this phase. For method outcomes filtering/evaluation, SAFER employed reference drug information sources available as RDF (Resource Description Framework) datasets.

The prototype implementation of the integrated signal detection workflow was based on an agent-based approach, comprising of a mediation scheme and a collaborative agent interaction protocol. Agents provided a way of structuring a system around autonomous communicative elements, through which the mechanisms for automating and improving signal detection have been developed, under the integrated setting. The development relied on publicly available resources. In particular, SAFER elaborated on programmatic access to the following raw data sources for signal detection: (a) the FAERS (FDA Adverse Events Reporting System) spontaneous reporting system, (b) PubMed, the reference bibliographic database in the life sciences, and (c) Twitter, a popular micro-blogging platform. The reference sources for filtering the outcomes of signal detection methods included SIDER (the Side Effect Resource), containing information on marketed medicines and their recorded ADRs extracted from public documents and package inserts, and DrugBank, a rich information resource on drug-drug interactions. The drug information sources for supporting evidence on novel findings included ChEMBL, which contains 2D structures, calculated properties and abstracted bioactivities of drugs, a registry and results database of publicly and privately supported clinical studies of human participants conducted around the world, but also DrugBank, as it combines drug data with comprehensive drug target and biointeraction information. Access to these resources was made by SPARQL-based querying the respective RDF datasets available through bio2RDF. Finally, implementations of well-known signal detection methods contained in open-source packages and an in-house detection method appropriate for unstructured text were integrated in the framework.

In order to assess the integrated signal detection approach, a representative set of test cases was elaborated, e.g. new oral anticoagulants and the risk of cerebral hemorrhage. The test cases were defined in collaboration with experts from the Centre Régional de Pharmacovigilance (CRPV) de Paris in Hôpital Européen Georges Pompidou (HEGP). The selection criteria were importance in the field of drug safety, coverage by the employed data sources, and diversity. For each one of the test cases, a comprehensive visualization of the acquired data across the considered sources was constructed under a common timeline, aiming to illustrate potential associations among sources along time. Signal detection was performed using the detection methods that were integrated in the framework, exploring both exact and partial match (based on synonyms and semantically-relevant terms) as regards the drugs and the health outcomes of interest. The analysis of the elaborated test cases revealed “echoing” findings across the data sources explored, which may indicate the means of strengthening the outcomes provided by signal detection methods applied separately. The joint visualization of data across a common timeline facilitated a rigorous visual inspection, indicating the evolution of data across time and highlighting potential associations in the production of data from diverse sources. Another evaluation activity concerned the capacity of the considered reference information sources for filtering. Using two reference datasets containing both positive and negative signals, the analysis revealed that each resource has its own strengths and shortcomings in accurately identifying the signals across datasets, suggesting thus their complementary use.

Overall, SAFER had the following major outcomes: (a) identified gaps in current computational signal detection approaches, originated from both data sources and detection methods, and proposed a semantic framework to pursuit integrated signal detection; (b) designed and developed the PV-SDO ontology, aiming to support the integrated signal detection approach; (c) investigated publicly available data sources for signal detection, and developed/adapted means to programmatically access, extract, verify and transform data from these sources for analysis, and (d) using a set of signal detection methods and drug safety data/information resources, illustrated a proof-of-concept implementation for integrated signal detection, which was assessed via a representative set of test cases with particular interest in drug safety.

SAFER conducted wide dissemination of its research activities, aiming to create awareness in both the scientific community (with scientific publications and presentations) and the general public (with media interviews and presentations). Drug safety is a public health issue and an important priority worldwide, with serious economic implications as well. The potential market for the foreseen research outcomes involves primarily the pharmaceutical industry, as well as drug regulatory authorities and safety organizations. Under the integrative perspective proposed by SAFER, the assessment of existing signal detection methods and data sources can be facilitated, which could lead in the introduction of new and more efficient approaches for signal detection. Optimization of methods for signal detection, could impact the use of drugs over their lifecycle, while data obtained from diverse sources can facilitate the construction of a completer safety profile of drugs. Overall, SAFER brings a new perspective on combinatorial, knowledge-intensive signal detection, and aspires to increase accuracy, timely decisions, efficiency, automation and support for drug safety stakeholders.

The activities of the project are presented in the project Web site: