Skip to main content
Weiter zur Homepage der Europäischen Kommission (öffnet in neuem Fenster)
Deutsch de
CORDIS - Forschungsergebnisse der EU
CORDIS

Synthetic generation of hematological data over federated computing frameworks

Periodic Reporting for period 2 - SYNTHEMA (Synthetic generation of hematological data over federated computing frameworks)

Berichtszeitraum: 2024-06-01 bis 2025-11-30

Rare haematological diseases affect millions of people in Europe, yet more than 70% of these conditions are classified as rare, meaning that clinical data are scarce, fragmented and often locked within isolated hospitals or national registries. This lack of accessible, high-quality data slows medical research, limits the development of innovative treatments, and hampers the use of artificial intelligence (AI) methods that rely on large, diverse datasets. At the same time, strict European data protection rules are essential to safeguard patients’ privacy, creating additional challenges for cross-border research.
Against this background, the SYNTHEMA Research and Innovation Action was launched under Horizon Europe to address a central challenge in modern healthcare: how to unlock the value of sensitive health data for research and innovation, while fully respecting privacy, ethics and European regulations. The project focuses on rare haematological diseases, using two representative clinical cases: sickle cell disease, a non-oncological condition, and acute myeloid leukaemia, a haematological malignancy.
The overall objective of SYNTHEMA is to enable secure, privacy-preserving reuse of health data by developing advanced methods to anonymise data and generate high-quality synthetic data. Synthetic data are artificially generated datasets that reproduce the statistical and clinical properties of real patient data without revealing personal information. By combining synthetic data generation with federated learning – an approach that allows AI models to be trained across multiple institutions without moving raw data – SYNTHEMA aims to overcome data silos and support GDPR-compliant research across Europe.
The project pathway to impact is built around three interconnected elements. First, SYNTHEMA establishes a cross-border federated computing infrastructure connecting health data centres, research organisations and technology providers. Second, it develops and validates innovative AI pipelines for anonymisation and synthetic data generation, ensuring an optimal balance between data utility and privacy protection. Third, it embeds ethical, legal and social considerations into the technical design, ensuring trustworthy AI and responsible data governance.
By widening access to realistic, privacy-safe data, SYNTHEMA is expected to significantly increase the scale and quality of research in rare haematological diseases. Its results support European health data strategies, contribute to the development of the European Health Data Space and provide reusable tools and standards that can be transferred to other disease areas beyond haematology.
During the reporting period, SYNTHEMA completed the core technical development of its platform and pipelines. A major achievement was the harmonisation and integration of clinical data from multiple European healthcare centres into two high-quality reference datasets for sickle cell disease and acute myeloid leukaemia. These datasets were standardised using common European data models and prepared for advanced AI processing.
On this basis, the project developed a secure federated learning platform deployed across participating clinical sites. The platform enables local training of AI models while keeping sensitive patient data within the originating institutions. Advanced privacy-enhancing technologies, including secure multi-party computation and differential privacy, were integrated to protect data during model training and aggregation.
In parallel, SYNTHEMA designed and implemented robust anonymisation pipelines capable of de-identifying and minimising clinical data while preserving their analytical value. These pipelines were validated using quantitative privacy and utility metrics and applied to the project’s clinical datasets.
The project also developed multiple synthetic data generation models for structured clinical data and medical images. These models were trained and tested in both centralised and federated settings. A dedicated validation framework was established to assess the clinical realism, statistical fidelity and privacy safety of the generated synthetic data, combining automated metrics with expert clinical review.
Together, these activities resulted in a fully integrated technical ecosystem that links real-world data collection, federated AI training, anonymisation, synthetic data generation and validation within a single, interoperable framework.
SYNTHEMA advances the state of the art by demonstrating, at scale, that high-quality synthetic health data can be generated and validated within a federated, privacy-preserving environment. Unlike traditional approaches, the project combines anonymisation, synthetic data generation and federated learning into a single end-to-end workflow, designed specifically for sensitive and rare disease data.
The project delivers reusable technical components, including federated learning infrastructures, anonymisation engines, synthetic data pipelines and validation methodologies, all aligned with European standards and regulatory requirements. These results lay the groundwork for broader uptake in clinical research, regulatory science and innovation, enabling safer data sharing and AI development.
To ensure further uptake and long-term impact, additional steps will be needed beyond the project lifetime. These include continued clinical validation, extension to additional diseases and data types, alignment with emerging regulatory frameworks, and engagement with public and private stakeholders to support deployment, sustainability and potential commercialisation. By addressing both technical and governance challenges, SYNTHEMA provides a solid foundation for the future of trustworthy, data-driven healthcare in Europe.
synthema-logo-horizontal.png
Mein Booklet 0 0