Periodic Reporting for period 2 - DataTools4Heart (A European Health Data Toolbox for Enhancing Cardiology Data Interoperability, Reusability and Privacy)
Periodo di rendicontazione: 2024-04-01 al 2025-09-30
DataTools4Heart (DT4H) addresses these challenges by building a European health data “toolbox” for cardiology. The project develops a standards-based Common Data Model and ingestion pipelines to harmonise cardiology data, a multilingual clinical NLP suite for cardiology reports, a federated and privacy-preserving AI infrastructure, and an open platform with virtual assistants and audit trails. These tools are demonstrated in real-world heart-failure use cases across several European centres. Social sciences, law and ethics are integrated through a dedicated work package that interprets EU law, designs governance mechanisms and drafts a sector-specific EU Code of Conduct, while clinicians and patients help shape requirements and workflows. Together, these elements are expected to deliver a reusable reference infrastructure for trustworthy cardiology AI with potential to scale to other diseases and to support EU strategies on health data and AI.
On the technical side, the project delivered a mature data backbone. A heart-failure Common Data Model based on HL7 FHIR and major clinical terminologies was finalised and implemented; a modular open-source Data Ingestion Suite, built on the toFHIR engine, was deployed and validated in multiple European centres; and a Feature Extraction Suite formalised cohorts and AI-ready variables as machine-readable definitions. A federated metadata catalogue was introduced to expose harmonised feature availability and codebooks across sites. For unstructured data, DT4H developed a multilingual clinical NLP stack with cardiology-tuned language models, annotated corpora and named entity recognition systems for seven European languages, integrated with terminology services. In parallel, the project built a federated learning framework (FLCore) with machine learning and survival-analysis models, fairness-oriented and uncertainty-aware aggregation, and complementary tools for differentially private synthetic data generation that underpin the CardioSynth pipeline. All components were integrated in a single web-based platform that includes shared workspaces, the catalogue, NLP and synthetic-data services, federated learning, a permissioned blockchain for audit trails and a virtual assistant capable of interacting with the ecosystem in natural language. Clinical partners, through regular clinical–technical meetings and demonstrations in realistic heart-failure scenarios, guided the design and initial validation of these tools.
These results open the door to large-scale, multi-centre, multilingual research in cardiology without centralising patient-level data. To ensure further uptake, the project focuses on open-source releases, clear documentation and containerised deployment, as well as alignment with major standards and EU regulation. Remaining needs for full exploitation include extended real-world validation in additional centres, continued hardening and maintenance of the open-source components, suitable funding and procurement instruments for hospitals and SMEs to adopt federated infrastructures, and sustained collaboration with other European initiatives to ensure interoperability and long-term sustainability.