Periodic Reporting for period 1 - INSAFEDARE (INNOVATIVE APPLICATIONS OF ASSESSMENT AND ASSURANCE OF DATA AND SYNTHETIC DATA FOR REGULATORY DECISION SUPPORT)
Période du rapport: 2023-11-01 au 2025-04-30
Development of realistic synthetic datasets provides a potential solution to these challenges. These are precise computer generated datasets that exhibit the same statistical properties as the equivalent real dataset. Compared with anonymised and de-identified datasets, synthetic datasets have three advantages: a) they overcome lengthy approval processes required for anonymised data; b) they offer access to variables that may be considered sensitive and not included in anonymised and de-identified datasets; and c) they are immune to cross-referencing and harvesting of information with other datasets.
The INSAFEDARE project is developing a toolkit to enable cost-effective and high assurance decision-making within regulatory compliance processes for medical devices. The project will provide guidance on quality and safety assurance of datasets as a tool for validation, and identify how synthetic datasets can be used to establish assurance in advance of formal certification processes to reduce risks for device developers and provide improved efficiencies for regulatory bodies. The project will develop tools for discovery, integration, and query of multiple datasets, and for supporting the sustainable, dynamic, and through-life surveillance of medical devices, while capturing the impact of new evidence from newly published datasets.
An important technology milestone was achieved with the development of the Data Integration Pipeline Tools, which enable users to define workflows as a series of interconnected tasks, each encapsulated within a Docker container. Its modular architecture and model-based design simplify the management of complex data handling processes. The tool supports integration with various data sources, such as files and databases, and offers flexibility in how these sources are connected. The deliverable includes a comprehensive review of current data pipeline orchestration technologies evaluated against 12 key characteristics derived from the project’s requirements. The analysis concludes that no existing solution fully meets all these criteria. Key features include visual workflow design, automated execution, and built-in monitoring. It has been developed to be user-friendly and adaptable to various synthetic data generation scenarios.
+ Innovative data quality framework
+ Advanced data integration framework
+ Novel algorithms for synthetic data generation
+ Regulatory compliance-driven metrics for assessing datasets
These technologies will also form the basis for stakeholder engagement for extending Europe’s regulatory practices and standards.