The SYNTHIA project addresses a critical challenge in modern healthcare: the difficulty of accessing high-quality patient data due to incompleteness, stringent privacy laws and complex regulations. This scarcity of real-world data severely limits the training of robust Artificial Intelligence models, slowing down medical research and thereby development of personalised treatments.
SYNTHIA's overall objective is to build trust and accelerate the adoption of Synthetic Data—artificially generated data that statistically mimics real patient records without compromising individual privacy. The project is developing and rigorously validating cutting-edge Synthetic Data Generation tools and methods for diverse medical data types, including imaging, genomic, and clinical notes.
Our focus is on six high-impact diseases: Lung Cancer, Breast Cancer, Multiple Myeloma, Diffuse Large B-cell Lymphoma, Alzheimer’s Disease, and Type 2 Diabetes Mellitus. Through dedicated technical use cases across the disease areas, SYNTHIA will provide the scientific evidence needed to show that results derived from synthetic data are as reliable as those from real world data, ensuring a proper balancing between data utility and patient privacy.
The project’s pathway to impact involves creating a comprehensive synthetic Data Evaluation Framework and a sustainable, federated synthetic data publishing platform. This platform will offer researchers certified, fit-for-purpose synthetic datasets and validated synthetic data generation tools. By establishing this widely accessible resource, SYNTHIA will significantly accelerate the development of Artificial Intelligence-based diagnostic and prognostic tools, enable faster clinical trials (for example, using synthetic control arms), and contribute substantially to the emerging European Health Data Space. This initiative is positioned to leverage the expected massive growth in the synthetic data market, strengthening Europe’s leadership in data-driven personalised medicine.