PHASE IV AI has made significant progress:
• A strong progress was achieved in defining user, data, legal, ethical, technical, and architectural requirements to support the PHASE IV AI project.
• The project developed and validated AI pipelines for generating synthetic health data. These tools are tailored to three use cases - lung cancer, prostate cancer, and ischemic stroke, incorporating privacy-preserving techniques such as differential privacy and diffusion models, to meet General Data Protection Regulation (GDPR) compliance.
• A structured data collection process was set up and key datasets requiring harmonization were identified early in the process. The project harmonized real-world datasets using the OMOP (Observational Medical Outcomes Partnership) Common Data Model, facilitating interoperability across countries and institutions. Legal and ethical frameworks were established to ensure compliance with GDPR and the EU AI Act.
• The development of synthetic data services under PHASE IV AI has focused on several complementary objectives: generating new data to augment cohort sizes, creating de-identified datasets for privacy-preserving analytics, imputing missing data from observed values, and ultimately simulating disease progression through synthetic data modeling.
• Foundation for federated model training and validation in real-world healthcare scenarios was created. Key achievements include the deployment and validation of a distributed secure multi-party computation (SMPC) network across eight partner organizations, the development of preliminary federated machine learning workflows, and the initial preparation of hardware acceleration strategies.
• A prototype of the Health Data Hub was designed, integrating services for anonymization, harmonization, and synthetic data generation. The project also explored decentralized infrastructure (DePIN) and blockchain-based certification to ensure trust and traceability.
• Two rounds of use case workshops for stakeholder were held across four countries, to engage and gather input from clinicians, researchers, industry, and regulators. These insights were translated into user stories and usage scenarios that guide system development.
• Pilot Plan includes the study protocols covering the three study use cases: lung cancer, prostate cancer, and ischemic stroke. These pilots will validate the utility of synthetic data and AI models in real clinical environments.