Periodic Reporting for period 2 - IDEA4RC (Intelligent Ecosystem to improve the governance, the sharing and the re-use of health Data for Rare Cancers)
Reporting period: 2024-03-01 to 2025-08-31
Taken together rare cancers account for 20%-25% of all new cancer diagnoses, however knowledge on rare cancers is scarce due to small sample sizes and lack of data. Clinical and translational research need large international collaborations exploiting networks specialized in rare cancers, that pool knowledge and data together.
The IDEA4RC ambition is to establish a first-in-the-field European data ecosystem for rare cancer linking in a federated network data provided by eleven centres of the European Reference Network on rare adult solid cancers (EURACAN), including about 70 health care providers in 26 EU Member States.
This ecosystem builds on a federation of local data repositories, the IDEA4RC capsules, hosted at each EURACAN centre and totally interoperable. Access to the capsules will be governed by data access permits, rules and conditions tailored to the specific needs and constraints of each centre. Thus, not only data will never leave the centre, but data analysis and processing will be performed on each local capsule and only aggregated data will be shared. AI tools will be built to enable multi-language data processing and analysis; a data navigator will assist clinicians and researchers in finding and accessing available data of stipulated quality and modern trust-building technologies will orchestrate data governance. The developed tools will be tested in relevant pilot cases across 11 EURACAN centres .
The IDEA4RC data ecosystem will advance clinical research contributing to improve the quality of care, patients’ access to optimal diagnosis and treatment, so that all patients have equal access to high quality specialist care all over EU, in compliance with principle 16 of the European Pillar of Social Rights.
Pathways to scale towards significant impact revolve around: (i) coordination with EURACAN governing bodies to propose the data ecosystem to other centres; (ii) dissemination towards other ERNs on rare diseases; (iii) creation of a community of interest among wider audience of interested stakeholders; (iv) possible commercial exploitation of software technologies developed within the project.
1. Agreements have been signed among Consortium partners to enable data sharing and data processing
2. Data governance rules have been finalized, tools for data governance and access control defined. The Independent Legal and Ethics Advisors supervised and advised on he work performed to ensure legal and ethical compliance.
3. A first set of data has been produced by all clinical partners to be made available in each partner's data capsule
4. The IDEA4RC data model and the associated data dictionary have been finalized, the FHIR implementation guide has been upgraded to include more concepts, the metadata taxonomy has been finalized.
5. The multilingual dictionaries and the tools for the textual data annotation have been completed and delivered
6. Quality check is ongoing at each clinical centre in order to ensure high-quality data injection into the capsules
8. NLP models for three languages (Swedish, Italian and Polish) have been produced and are currently improved with more data
9. Capsules version 2 is implemented in 10 pilot sites and has been successfully tested for data injection
9. The engagement of stakeholders for a wider community of interest progressed involving patients advocates, clinical stakeholders outside the IDEA4RC Consortium and other interested parties
10. Exploitation activities are ongoing: definition of the project results, value proposition and preliminary business canvas.
IDEA4RC's data model is being designed following the newer and most adopted standard terminologies to facilitate collaboration and data sharing among healthcare stakeholders across Europe.
The project also emphasizes the identification, protection, management, and exploitation of valuable intellectual property arising from its outcomes. It employs mechanisms such as open-source licensing, patents, copyrights, and trade secrets to safeguard innovations and maximize their impact.
By aligning with industry standards like OMOP and FHIR, IDEA4RC facilitates the exchange of healthcare data in a format that makes it easier to comply with the EHDS regulation.
IDEA4RC zero-trust philosophy and service mesh technology ensure a secure processing environment for privacy-preserving data processing. IDEA4RC robust security measures and encryption protocols, safeguard sensitive healthcare data from unauthorized access or breaches, in alignment with EHDS principles for the protection of patient privacy and confidentiality in all data processing activities.
IDEA4RC innovative data governance approach based on social sciences and humanities research reflects EHDS goals of promoting responsible research and innovation. IDEA4RC transparent policies and procedures for data collection, storage, and usage ensure compliance with regulations regarding data governance. The integration of technologies like blockchain enhances data traceability and integrity, further aligning with EHDS objectives of fostering trust and accountability in healthcare data management.
Finally, IDEA4RC will develop Natural Language Processing algorithms to tackle the scarcity of structured health data most of which is currently available in natural language text. By transforming such texts into structured variables, conducive to automated analysis, IDEA4RC NLP algorithms will enhance the usability and interoperability of healthcare data for secondary use, including scientific research, aligning with one of the main EHDS objectives.