Synthetic hEalthcare dAta goveRnanCe Hub

Información del proyecto

Identificador del acuerdo de subvención: 101172997

DOI

10.3030/101172997

Fecha de la firma de la CE 13 Septiembre 2024

Fecha de inicio 1 Octubre 2024

Fecha de finalización 30 Septiembre 2028

Financiado con arreglo a

Health

Coste total

€ 13 573 062,50

Aportación de la UE

€ 7 916 214,30

7 916 214,30

5 656 848,20

Coordinado por

THE PROVOST, FELLOWS, FOUNDATION SCHOLARS & THE OTHER MEMBERS OF BOARD, OF THE COLLEGE OF THE HOLY & UNDIVIDED TRINITY OF QUEEN ELIZABETH NEAR DUBLIN
Ireland

Periodic Reporting for period 1 - SEARCH (Synthetic hEalthcare dAta goveRnanCe Hub)

Período documentado: 2024-10-01 hasta 2025-09-30

Europe faces persistent barriers in accessing high-quality, privacy-preserving and interoperable health data. Hospitals, researchers, and industry partners work within highly fragmented data environments, constrained by GDPR, strict governance processes, and limited mechanisms for cross-institutional data sharing. These barriers slow research and the development of trustworthy AI tools across major disease areas—including oncology, cardiovascular disease, stroke, and gastrointestinal disorders. At the same time, emerging EU policy frameworks such as the European Health Data Space (EHDS) and the AI Act highlight the need for secure, FAIR-aligned approaches to health data access, reuse, and innovation.

SEARCH directly addresses these challenges by combining synthetic data generation, federated analytics, and robust governance frameworks to enable secure, scalable, and regulation-ready medical data innovation. Through extensive engagement with clinicians, data managers, and technical stakeholders, the project ensures that solutions are grounded in real operational requirements.

During the first reporting period, SEARCH established the foundational elements required for this ecosystem: user and technical requirements (56 prioritised needs across 25 organisations), mapping of 140 real-world clinical datasets, and a modular federated architecture for secure data discovery, harmonisation, and model training. The project also delivered the first Synthetic Data Assessment and Credibility (SDAC) Framework, providing structured methods for evaluating privacy, fidelity, utility, and fairness of synthetic datasets—essential for regulatory readiness.

SEARCH’s overall objectives are to:

Develop high-fidelity multimodal synthetic data aligned with FAIR principles.

Build a federated platform enabling secure distributed analytics while preserving data sovereignty.

Apply and validate these methods across six clinical studies.

Establish pathways for exploitation, regulatory alignment, and long-term sustainability.

Through these activities, SEARCH aims to reduce barriers to data access, accelerate AI development, and strengthen Europe’s capacity for safe, trustworthy digital health innovation.

During the first reporting period, the project established the core data, architectural, and methodological foundations required for secure, FAIR-aligned use of real and synthetic health data.

A consortium-wide consultation captured input from across partner organisations, resulting in prioritised user, technical, and clinical requirements that now guide the platform’s development. In parallel, partners completed a comprehensive mapping of 140 datasets across oncology, cardiovascular, neurological and gastrointestinal domains, representing more than 6.5 million data items. Metadata schemas and interoperability specifications were defined using established standards including HL7 FHIR, SNOMED CT, DICOM and OMOP, providing a harmonised structure for future federated data preparation.

The project delivered the initial technical architecture for the SEARCH platform, detailing modular components for data curation, harmonisation, federated learning, synthetic data generation.

Foundational synthetic data generation methods were developed across structured/tabular data, physiological signals, radiology and endoscopy images, and genomics. Early evaluation procedures were aligned with the SDAC framework to assess privacy, fidelity and utility.

Clinical teams defined the structure, data flows and variable requirements for six validation studies in oncology, cardiovascular disease and gastrointestinal medicine, establishing the basis for downstream model testing and clinical decision-support evaluation.

Together, these advances provide the essential groundwork for federated platform deployment, synthetic data generation at scale, and clinical validation activities planned for the next period.

During the reporting period, the project delivered advances that significantly improve how health data can be accessed and validated for clinical research and AI development. A unified set of clinical, technical and regulatory requirements was established, enabling an integrated approach to federated data use and synthetic data generation that is not yet common in European healthcare. The mapping of 140 datasets across major disease areas provides an unprecedented view of cross-site data availability and variability, forming a foundation for harmonised, privacy-preserving analytics.

A major step beyond current practice is the creation of the Synthetic Data Assessment and Credibility (SDAC) framework, which consolidates privacy, fidelity, utility and fairness checks into a single validation process for synthetic medical data—one of the first structured approaches of its kind in Europe.

The project also defined a modular architecture for federated learning and synthetic data generation, clearly separating local (on-premise) and central components to maintain data sovereignty while enabling multi-institution model development in line with GDPR and emerging EU health data policies.

To maximise future uptake, further progress will require: continued technical refinement of synthetic data pipelines; targeted engagement with regulators and HTA bodies; contribution to international standardisation; and alignment of exploitation pathways with clinical priorities. These will support sustainable deployment across healthcare, research and industry.

Infographic for SEARCH

Periodic Reporting for period 1 - SEARCH (Synthetic hEalthcare dAta goveRnanCe Hub)

Descargar Descargar el contenido de la página