Periodic Reporting for period 2 - SYNTHEMA (Synthetic generation of hematological data over federated computing frameworks)
Période du rapport: 2024-06-01 au 2025-11-30
Against this background, the SYNTHEMA Research and Innovation Action was launched under Horizon Europe to address a central challenge in modern healthcare: how to unlock the value of sensitive health data for research and innovation, while fully respecting privacy, ethics and European regulations. The project focuses on rare haematological diseases, using two representative clinical cases: sickle cell disease, a non-oncological condition, and acute myeloid leukaemia, a haematological malignancy.
The overall objective of SYNTHEMA is to enable secure, privacy-preserving reuse of health data by developing advanced methods to anonymise data and generate high-quality synthetic data. Synthetic data are artificially generated datasets that reproduce the statistical and clinical properties of real patient data without revealing personal information. By combining synthetic data generation with federated learning – an approach that allows AI models to be trained across multiple institutions without moving raw data – SYNTHEMA aims to overcome data silos and support GDPR-compliant research across Europe.
The project pathway to impact is built around three interconnected elements. First, SYNTHEMA establishes a cross-border federated computing infrastructure connecting health data centres, research organisations and technology providers. Second, it develops and validates innovative AI pipelines for anonymisation and synthetic data generation, ensuring an optimal balance between data utility and privacy protection. Third, it embeds ethical, legal and social considerations into the technical design, ensuring trustworthy AI and responsible data governance.
By widening access to realistic, privacy-safe data, SYNTHEMA is expected to significantly increase the scale and quality of research in rare haematological diseases. Its results support European health data strategies, contribute to the development of the European Health Data Space and provide reusable tools and standards that can be transferred to other disease areas beyond haematology.
On this basis, the project developed a secure federated learning platform deployed across participating clinical sites. The platform enables local training of AI models while keeping sensitive patient data within the originating institutions. Advanced privacy-enhancing technologies, including secure multi-party computation and differential privacy, were integrated to protect data during model training and aggregation.
In parallel, SYNTHEMA designed and implemented robust anonymisation pipelines capable of de-identifying and minimising clinical data while preserving their analytical value. These pipelines were validated using quantitative privacy and utility metrics and applied to the project’s clinical datasets.
The project also developed multiple synthetic data generation models for structured clinical data and medical images. These models were trained and tested in both centralised and federated settings. A dedicated validation framework was established to assess the clinical realism, statistical fidelity and privacy safety of the generated synthetic data, combining automated metrics with expert clinical review.
Together, these activities resulted in a fully integrated technical ecosystem that links real-world data collection, federated AI training, anonymisation, synthetic data generation and validation within a single, interoperable framework.
The project delivers reusable technical components, including federated learning infrastructures, anonymisation engines, synthetic data pipelines and validation methodologies, all aligned with European standards and regulatory requirements. These results lay the groundwork for broader uptake in clinical research, regulatory science and innovation, enabling safer data sharing and AI development.
To ensure further uptake and long-term impact, additional steps will be needed beyond the project lifetime. These include continued clinical validation, extension to additional diseases and data types, alignment with emerging regulatory frameworks, and engagement with public and private stakeholders to support deployment, sustainability and potential commercialisation. By addressing both technical and governance challenges, SYNTHEMA provides a solid foundation for the future of trustworthy, data-driven healthcare in Europe.