Periodic Reporting for period 2 - AIDAVA (AI powered Data Curation & Publishing Virtual Assistant)
Reporting period: 2024-03-01 to 2025-08-31
Following the evaluation of this first generation of the prototype, the project focused on improvements on the AI curations tools and HTIL to deliver the second generation of the prototype.
In WP1, updated personas and revised business requirements informed the design of G2.
WP2 finalised the semantic and architectural framework, including development of the Simplified Upper Level Ontology (SULO) as the semantic backbone of AIDAVA. The solution design (D2.4) integrates lessons learned from G1 and Sustainability Advisory Board (SAB) recommendations on market potential.
Under WP3, G1 was finalised, deployed with continuous support during evaluation. Improvements for included a structured approach for integration and testing of the backend micro-services (i.e imaging quality enhancement and tools developed in WP5) as well as enhancement of the human-AI dialogue.
WP4 ensured compliance with GDPR. A regulatory conformance analysis produced recommendations for trustworthy AI, complementing ongoing Data-Protection Impact Assessments and oversight from the Ethical Advisory Board. The Data Quality (DQ) Framework was validated, with DQ checks aligned with the ontology while defining metrics for completeness and provenance.
WP5 delivered several novel AI tools. The multilingual AIDAVA-mBERT model for Named-Entity Recognition and Temporal Extraction was developed on AIDAVA’s annotated datasets in Estonian, Dutch and German. The FAIRification tools (automated mapping, Entity Linking, Entity Deduplication, MutateAndTransform and publishing tools to generate IPS, CVD score and BC registry) that were deployed in G1 were substantially improved for G2. Explainability modules were extended.
WP6 expanded communication and exploitation activities. The project website and social media channels continued to grow; AIDAVA partners presented results at scientific and policy events. The SAB identified important areas related to the EHDS where AIDAVA could bring value. During Exploitation Workshops the consortium discussed key exploitable results toward ownership and exploitation.
WP7 coordinated technical and financial reporting, managed risks and integrated the new beneficiary (DFP) into the consortium. All deviations were addressed through the approved amendment, keeping the project on track for its objectives.
AIDAVA prototype demonstrated that Personal Health Knowledge Graphs (PHKG) can successfully be generated from heterogeneous - narrative and structured - data sources coming from hospitals, general practitioners, devices and PROMS to support data interoperability and high-quality toward a truly FAIR personal health record. Knowledge graphs enable embedding, supporting faster and accurate reasoning and retrieval (and RAG) ; knowledge graphs also support scalable data quality checks at individual patient level while all data quality frameworks focus on population level. Full value of a PHKG can only realised if the supporting ontology is easily aligned with other standards; as a consequence the project is developing and validating a Simplified Upper Level Ontology (SULO) providing a solid semantic backbone across standards
In terms of curation and quality enhancement, the project developed several novel machine-learning tools for data extraction, mapping, harmonisation and quality enhancement of structured data. Two are more noteworthy
1. The multilingual AIDAVA-mBERT model represents a major step forward in clinical natural-language processing for under-resourced European languages. Trained on newly annotated corpora in Estonian, Dutch and German, it enables accurate information extraction and entity linking to international terminologies such as SNOMED-CT and FHIR, that can be integrated into the PHKG.
2. AIDAVA implemented automated SHACL-based validation and patient-level quality labelling, enabling continuous measurement of completeness and consistency across datasets.
These combined innovations push the state of the art from isolated FAIRification efforts in specific domains toward FAIRification of the complete individual health, enabling the “curate once, use many times” paradigm. By embedding trust, transparency and multilingual capability into the process, AIDAVA establishes a replicable blueprint for interoperable, high-quality and ethically governed personal health data across Europe.