AI powered Data Curation & Publishing Virtual Assistant

Project Information

AIDAVA

Grant agreement ID: 101057062

DOI

10.3030/101057062

EC signature date 19 May 2022

Start date 1 September 2022

End date 31 August 2026

Funded under

Health

Total cost

€ 7 720 615,23

EU contribution

€ 7 720 615,00

7 720 615,00

0,23

Coordinated by

UNIVERSITEIT MAASTRICHT
Netherlands

Periodic Reporting for period 2 - AIDAVA (AI powered Data Curation & Publishing Virtual Assistant)

Reporting period: 2024-03-01 to 2025-08-31

The core objective of AIDAVA is to develop and test an AI-powered Virtual Assistant (VA) that can automate data curation and publishing of unstructured and structured, heterogeneous health data. During this second reporting period, the project validated the first prototype of the across 4 hospitals with BC and CVD patients demonstrating (1) the potential of automation in curation with the orchestration of tools addressing different data interoperability issues, (2) the value of data quality checks on individual records and (3) the importance of adaptive and context-rich human-in-the-loop (HTIL) for explainability.
Following the evaluation of this first generation of the prototype, the project focused on improvements on the AI curations tools and HTIL to deliver the second generation of the prototype.

From Mar 2024 to Aug 2025, AIDAVA focused on deploying, evaluating and improving the first-generation (G1) of the prototype toward delivery of the second generation (G2). The prototype was tested in four hospitals in three countries and one Health Data Intermediary (HDI), following approval of the study protocol by local ethics committees. Despite delays due to recruitment and the withdrawal of one HDI, testing concluded in Dec 2024 with valuable technical and user feedback.
In WP1, updated personas and revised business requirements informed the design of G2.
WP2 finalised the semantic and architectural framework, including development of the Simplified Upper Level Ontology (SULO) as the semantic backbone of AIDAVA. The solution design (D2.4) integrates lessons learned from G1 and Sustainability Advisory Board (SAB) recommendations on market potential.
Under WP3, G1 was finalised, deployed with continuous support during evaluation. Improvements for included a structured approach for integration and testing of the backend micro-services (i.e imaging quality enhancement and tools developed in WP5) as well as enhancement of the human-AI dialogue.
WP4 ensured compliance with GDPR. A regulatory conformance analysis produced recommendations for trustworthy AI, complementing ongoing Data-Protection Impact Assessments and oversight from the Ethical Advisory Board. The Data Quality (DQ) Framework was validated, with DQ checks aligned with the ontology while defining metrics for completeness and provenance.
WP5 delivered several novel AI tools. The multilingual AIDAVA-mBERT model for Named-Entity Recognition and Temporal Extraction was developed on AIDAVA’s annotated datasets in Estonian, Dutch and German. The FAIRification tools (automated mapping, Entity Linking, Entity Deduplication, MutateAndTransform and publishing tools to generate IPS, CVD score and BC registry) that were deployed in G1 were substantially improved for G2. Explainability modules were extended.
WP6 expanded communication and exploitation activities. The project website and social media channels continued to grow; AIDAVA partners presented results at scientific and policy events. The SAB identified important areas related to the EHDS where AIDAVA could bring value. During Exploitation Workshops the consortium discussed key exploitable results toward ownership and exploitation.
WP7 coordinated technical and financial reporting, managed risks and integrated the new beneficiary (DFP) into the consortium. All deviations were addressed through the approved amendment, keeping the project on track for its objectives.

AIDAVA’s second period demonstrated a new level of maturity and a different paradigm in health-data management and data interoperability, based on INDIVIDUAL health records, by validating its innovative combination of AI based curation and data quality enhancement tools. The project’s architecture integrates AI-based curation workflows with ontology-driven FAIRification tools and a knowledge-graph approach, demonstrating that heterogeneous health data can be automatically curated and reused while preserving context and meaning.

AIDAVA prototype demonstrated that Personal Health Knowledge Graphs (PHKG) can successfully be generated from heterogeneous - narrative and structured - data sources coming from hospitals, general practitioners, devices and PROMS to support data interoperability and high-quality toward a truly FAIR personal health record. Knowledge graphs enable embedding, supporting faster and accurate reasoning and retrieval (and RAG) ; knowledge graphs also support scalable data quality checks at individual patient level while all data quality frameworks focus on population level. Full value of a PHKG can only realised if the supporting ontology is easily aligned with other standards; as a consequence the project is developing and validating a Simplified Upper Level Ontology (SULO) providing a solid semantic backbone across standards

In terms of curation and quality enhancement, the project developed several novel machine-learning tools for data extraction, mapping, harmonisation and quality enhancement of structured data. Two are more noteworthy
1. The multilingual AIDAVA-mBERT model represents a major step forward in clinical natural-language processing for under-resourced European languages. Trained on newly annotated corpora in Estonian, Dutch and German, it enables accurate information extraction and entity linking to international terminologies such as SNOMED-CT and FHIR, that can be integrated into the PHKG.
2. AIDAVA implemented automated SHACL-based validation and patient-level quality labelling, enabling continuous measurement of completeness and consistency across datasets.

These combined innovations push the state of the art from isolated FAIRification efforts in specific domains toward FAIRification of the complete individual health, enabling the “curate once, use many times” paradigm. By embedding trust, transparency and multilingual capability into the process, AIDAVA establishes a replicable blueprint for interoperable, high-quality and ethically governed personal health data across Europe.

Periodic Reporting for period 2 - AIDAVA (AI powered Data Curation & Publishing Virtual Assistant)

Download Download the content of the page