Synthetic and scalable data platform for medical empowered AI

Project Information

AISym4MED

Grant agreement ID: 101095387

DOI

10.3030/101095387

EC signature date 22 November 2022

Start date 1 December 2022

End date 30 November 2026

Funded under

Health

Total cost

€ 6 341 765,00

EU contribution

€ 6 341 765,00

6 341 765,00

Coordinated by

ASSOCIACAO FRAUNHOFER PORTUGAL RESEARCH
Portugal

Periodic Reporting for period 2 - AISym4MED (Synthetic and scalable data platform for medical empowered AI)

Reporting period: 2024-06-01 to 2025-11-30

AISym4Med aims to develop a platform that will provide healthcare data engineers, data scientists, practitioners and researchers access to a trustworthy dataset system, augmented with controlled data synthesis for machine learning purposes, along with data and model auditing modules.
This platform will address data privacy and security by combining new anonymization techniques, attribute-based privacy measures and trustworthy tracking systems. Data quality evaluation measures and model inspection methods will be available to identify shortcomings in the different stages of the AI models’ development pipeline, such as biased or unreliable datasets and models, and on-demand controlled data synthesis will be provided to address potential limitations. Real-world and synthetic data quality assessment and human-centred design for validation purposes will also be implemented to guarantee the representativeness of the platform’s datasets. Furthermore, this platform will exploit federated technologies to allow the secure usage of private data from closed borders, promoting indirect access to a broader number of databases, while respecting the privacy, security and General Data Protection Regulation requirements.
The proposed platform will support the development of new robust artificial intelligence-based solutions for health and streamline their integration in clinical scenarios. By leveraging distributed tools, digital technologies and state-of-the-art AI approaches, it will benefit researchers, innovators, patients and providers of health services, while maintaining a high level of data privacy and its ethical usage.
This platform will be validated against local, national and cross-border use cases targeting different types of stakeholders (data scientists and engineers, artificial intelligence software developers, researchers, clinical professionals, among others), to validate its different functionalities and its usability in real-world settings.

The AISym4MED project has successfully transitioned from conceptual design to a functional, federated prototype, defining a distributed architecture that supports secure health data collection, aggregation, and analysis. By implementing a backend based on modular containerization, the project has established a secure environment where isolated code execution mitigates adversarial risks and ensures data integrity. Simultaneously, a human-centred frontend was deployed, aiming to make complex functionalities, such as synthetic data generation and model auditing, intuitive for diverse stakeholders, including bioengineers and clinicians. This is supported by a cross-border health database architecture using standardised schemas and clinical terminology, enabling seamless integration across diverse datasets.
Data auditing is anchored in pyMDMA (Multimodal Data Metrics for Auditing) library, providing a standardised framework for validating both real and synthetic medical images, time series, and tabular data. Regarding model auditing, new modules for explainability, fairness and bias mitigation, uncertainty estimation, and privacy have been implemented to quantify and correct model performance across protected demographic groups, assess confidence in predictions, and evaluate privacy risks, ensuring the development of trustworthy AI. GASTeN framework for stress-testing models against edge-case data was conceptualised.
Regarding synthetic medical data generation, the project has moved beyond preliminary testing to deliver high-fidelity generative models across multiple modalities and targeting the project’s use cases. Key breakthroughs include a controllable model for retinal fundus images and specific generative models for ECG, EEG, and clinical tabular data. A quantitative approach to evaluating synthetic data was proposed, unifying the dimensions of fidelity, diversity, privacy, and utility. Specific additional metrics to evaluate synthetic clinical time series were conceptualised. To bridge the gap between quantitative metrics and clinical trust, the project launched the "Doctor-in-the-Loop" evaluation workflow.
To ensure trustworthiness and data privacy, the project evolved its framework into a functional, multi-layered security architecture, based on robust legal foundation for cross-border processing.
To iterative validation of functionalities was set. Early pairing of technical partners with use-case owners, yielded the essential "building blocks" for validation, including data dictionaries, feature extraction methods, and predictive and generative models tailored to real-world clinical scenarios.

The AISym4MED project is progressing from conceptual design toward a functional, federated ecosystem for health data, with validation currently underway across five clinical use cases. The technical achievements so far include the development of a modular, containerised backend and a human-centred frontend. Scientific impact is evidenced by over 23 research publications and the release of open-source tools like Audinter and pyMDMA.
The project has explored high-fidelity generative models using Stable Diffusion and Pathology-controlled generation, and proposed specific adaptations for small real datasets, showing preliminary promise in improving disease classification by mitigating real-world data scarcity. Further advancements, such as GASTeN and the "Doctor-in-the-Loop" workflow, are intended to provide evidence on the reliability and clinical applicability of machine learning models, with the goal of supporting their suitability for real-world scenarios as validation continues. These technical advancements are complemented by a multi-layered governance architecture, designed to ensure that data handling remains strictly GDPR-compliant.
Future efforts will focus on demonstrating the consistency of these features in broader operational environments and refining exploitation strategies to ensure long-term scalability and market alignment. To ensure further uptake and success, continued research into federated learning and model auditing will be crucial for adapting to evolving data landscapes and regulatory requirements. Demonstrating the effectiveness of synthetic data in real-world applications will help validate its benefits and encourage broader adoption. Engaging with industry partners and conducting pilot projects will also be essential for aligning the technology with practical needs and establishing its value in various sectors.

Periodic Reporting for period 2 - AISym4MED (Synthetic and scalable data platform for medical empowered AI)

Download Download the content of the page