Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

A new EU Framework for an Ethical Re-use of Health Data

Periodic Reporting for period 1 - DataCom (A new EU Framework for an Ethical Re-use of Health Data)

Reporting period: 2023-07-01 to 2025-06-30

In recent years, health data sets have turned into an economic asset, directly or indirectly monetized by companies and institutions, to such an extent that a health marketplace has been created. They are also being employed in the public sector for a large number of applications: disease control and fighting; pandemic-related decision making;
public health expenses estimation; machine learning training, etc. The European Commission set out the intention of taking action on citizens' secure access to and sharing of health data across borders and on better data to advance research, disease prevention and personalised health and care . However, without appropriate safeguards, citizens’ data are subject to the risks of data accumulation and exploitation and data subjects might lose control over such sensitive information. Public entities (e.g. hospitals) hold a huge amount of health data; however, the legal framework regarding the possibility of re-using and exploiting such data is still unclear and fractionated among Member States (MS), and needs to confront with the relevant data protection laws and regulations (GDPR, Regulation EU2018/1725, Convention 108+). Health data is, thus, pseudonymized, de-identified, or anonymized using different techniques. Because anonymized data sets are not considered personal data anymore, they are left without legal protection. They are often commercially exploited and re-used without assessing the risks and expectations of data subjects, sometimes without informing them, even for detrimental purposes. Existing scholarly opinions have focused either on considering personal data as untradeable or on allowing its full exploitation with the consent of the data subject.

OVERALL OBJECTIVE: DataCom aims at building a new framework to facilitate an ethical secondary use of health data held by public bodies, with the aim of improving accountability and enhancing responsible re-use, bridging gaps in scholarly efforts but also providing policy suggestions to EU institutions, facilitating decision making, and ensuring that health data in the EU does not become a mere exploitable commodity.To build this new framework, DataCom will develop and test in intersectoral practical environments the innovative concept
of “Ethical Commodification”: the possibility of exploiting personal data in an ethical way for the public good, in accordance with data subject’s expectations and needs and taking into account the risks associated with the exploitation of anonymized data sets, focusing on 3 MS, Italy, Spain, and the Netherlands. In order to reach this objective, DataCom has a strong interdisciplinary and intersectoral approach, bridging law, ethics, and computer science, taking into consideration the needs of the public sector and those of citizens.

In order to reach this urgency DataCom focuses on the following specific objectives:
SO1) developing a “New EU Framework of Ethical Commodification”, by examining the existing theoretical frameworks of re-use of health data based on desk research;

SO2) mapping the actors involved in health data re-use (i.e. health care public servants, researchers, professional/patients associations of 3 selected MS), to assess their needs and knowledge about the consequences of health data handling and re-use on citizens (surveys, interviews and focus groups);

SO3) promoting awareness strategies among public servants in the 3 selected MS, drafting the manual “Guidelines for public bodies to responsibly re-use citizen’s health data” in order to provide them with the appropriate means to exploit data in a privacy-preserving and ethical way.

To answer DataCom’s research question, an interdisciplinary approach combining insights from computer science (anonymization, security), law (privacy), and ethics (data ethics, human rights) is highly advantageous. AI, Ethics, and Law are essential to the study of health data re-use and the development of the “New EU Framework of Ethical Commodification”. DataCom will build upon legal and ethical scholarship on data re-use, and study data science techniques through the lenses of applied ethics on the topic, which allows the project to move beyond an exclusive focus on either law or ethics. Methodologically, DataCom will combine Law and Sociology: the use of process tracing and the combination of document analysis, interviews, and focus groups is in line with the social science methodology, while legal sources will be mapped and analysed on the basis of legal scholarship. DataCom has also a strong intersectoral dimension, as it will be hosted both by universities and public institutions.
The project successfully delivered substantial scientific and technological advancements at the intersection of AI ethics, health data governance, and regulatory compliance. These achievements were realized through a combination of theoretical contributions, methodological developments, and empirical applications.

1. Advancement of Theoretical Frameworks
A core scientific achievement was the development of a framework for the ethical reuse of health data within AI-driven systems. This framework addresses the risks of health data commodification, the challenges of anonymization in light of the European Health Data Space (EHDS) proposal, and the regulatory implications of the AI Act. These contributions are articulated in key publications, such as:
Redefining Anonymization: Legal Challenges and Emerging Threats in the Era of EHDS (Springer, 2025)
The Risks of Health Data Commodification in the EU Digital Market (Yearbook of Antitrust and Regulatory Studies, 2024)
The fellow also created a collection of laws and other sources in the project’s website as detailed in the proposal, the “Ethical Health Data secondary use” collection, to help colleagues navigate the regulator landscape in the three countries of study (Ital, Spain, Netherlands).

2. Development of the FanFAIR Tool
A notable technological output is the enhancement of FanFAIR, a semi-automated tool for the ethical and legal assessment of health datasets. The tool now integrates advanced functionalities for dataset pre-processing and fairness assessment, making it a practical instrument for ensuring dataset compliance with ethical standards and supporting AI Act obligations. Its capabilities were validated and discussed in:
Investigating Fairness with FanFAIR: Is Pre-Processing Useful Only for Performances? (IEEE Symposium on Computational Intelligence in Health and Medicine, 2025)
Assessment of Health Data Sets Fairness (BMC Bioinformatics, Special Issue, 2025 - forthcoming)

3. Contribution to AI Act Compliance Methodologies
The project has directly contributed to the scientific debate on AI regulatory compliance, especially concerning dataset assessment and bias mitigation. Outputs in this area include:
Dataset Assessment Tools for the AI Act Compliance of High-Risk AI Systems (Journal of Law, Market & Innovation, forthcoming, 2025)
The AI Act Proposal: A New Right to Technical Interpretability? (Milano University Press, 2025)
Software Systems Compliance with the AI Act. Lessons Learned from an International Challenge (ACM ICSE 2024 Proceedings)

4. Impactful Contributions to AI Ethics and Medical AI
Additional scientific achievements relate to the application of these frameworks and tools in real-world contexts, such as:
Credit Scoring Judicial Review Between the Court of Justice of the EU and Comparative Case Law (Media Laws, 2025)
Identifying Bias in Data Collection: A Case Study on Drugs Distribution (IEEE WCCI IJCNN, 2024)
Assessing Cardiac Functionality by Means of Interpretable AI and Myocardial Strain (submitted to CIBCB 2025)
These outputs have significantly contributed to advancing knowledge in the fields of AI ethics, law, and health data governance, offering both theoretical insights and practical solutions applicable within the current European regulatory framework.
The project delivered several high-impact innovation outputs, advancing both scientific knowledge and providing practical solutions for healthcare, AI developers, and regulatory actors.

1. FanFAIR Software – Enhanced Version
An essential technological innovation is the improved FanFAIR tool, now equipped with:
- Pre-processing techniques for bias detection and mitigation.
- Dataset auditing functionalities aligned with the upcoming AI Act.
- Ethical and legal assessment modules specific to health datasets.

This tool is designed to support both researchers and practitioners, including healthcare providers and AI developers, in assessing the fairness and legal compliance of datasets used in AI-based health applications.

2. New Framework for Ethical Reuse of Health Data in AI Systems
The project produced a New Framework of Ethical Commodification, integrating legal, ethical, and socio-technical perspectives. This framework innovatively addresses:
- Risks of health data commodification in AI systems.
- Challenges related to data sharing and AI-driven processing.
- Guidance for complying with AI Act and EHDS requirements.

3. Innovation for Hospitals and Healthcare Organizations
A key applied innovation is the development of a methodology supporting hospitals and healthcare organizations developing AI systems. This methodology enables healthcare institutions to:
-Evaluate their datasets for biases and legal risks before they are used in AI development.
-Align their internal processes with the AI Act and GDPR requirements.
-Promoting trustworthy AI systems, minimizing potential harm to patients and improving transparency.

The practical implementation of this innovation was piloted in the collaboration with Padua’s hospital, where legal developments for data sharing between healthcare institutions were integrated into the project’s outputs.

4. Practical Guidelines for AI Act Compliance
The Guidance Manual for Public Bodies offers actionable steps for public administrators and healthcare entities to ensure AI Act compliance, focusing on ethical, transparent, and legally sound data handling and AI development.

5. AI Act Compliance Methodology
The project produced a specialized dataset assessment methodology designed to:
-Be directly integrated into AI development pipelines.
-Support compliance with the AI Act’s fairness, transparency, and risk management obligations.
-Be easily adopted by companies and hospitals developing AI systems.

Contribution to the state of the art
The project contributed significantly to advancing the state of the art in multiple interrelated fields: health data governance, AI ethics, and the legal implications of AI and data regulation in Europe.

1. Bridging AI Ethics and Health Data Governance
The project introduced a new theoretical framework on the ethical commodification of health data, addressing the urgent issue of how health data is reused under EU regulations such as the AI Act, EHDS, and Data Act. Existing literature primarily focused on privacy and cybersecurity risks, often overlooking how data reuse practices may reinforce structural inequalities and diminish patient rights. This project was among the first to apply critical legal and socio-technical approaches to systematically investigate how power asymmetries and forced data reuse shape AI-driven healthcare applications.

2. Advancement in Dataset Fairness Assessment
The enhanced FanFAIR tool and related research (Gallese et al., BMC Bioinformatics, 2024; Gallese et al., IEEE 2025) pushed forward the methodological state of the art by operationalizing fairness assessment and bias mitigation for health datasets. This is one of the first tools explicitly aligning dataset auditing with AI Act obligations, addressing a critical gap in AI compliance tools where most frameworks focused on algorithmic auditing but neglected the dataset level.

3. Contribution to the Understanding of Regulatory Overlaps
The project is among the first to systematically explore the intersection between the EHDS, AI Act, and Data Act, particularly regarding:
-The legal contradictions in the opt-out mechanisms and anonymization loopholes.
-The implications of the public interest clause for patients' rights.
-The potential for regulatory gaps to enable systemic discrimination, especially in the use of health data for AI development.

This contribution is highly original, as it reframes health data governance from a compliance-centered to a justice-centered perspective, filling a major gap in the literature identified by recent EU-funded reports and academic critiques.

4. Influencing the Policy-Making Debate
The findings from the project contributed directly to European policy development through active participation in the AI Office's working groups for the first General-Purpose AI Code of Practice. The project's insights on the risks of forced data reuse and the need for patient-centered safeguards have fed directly into regulatory discussions.

5. Integration of Empirical Evidence
While the project faced challenges in fieldwork, the integration of empirical insights from collaborations with hospitals, public bodies, and health professionals (e.g. Padua Hospital, Ca' Foscari University) has enriched the state of the art by grounding theoretical insights in real-world practices and highlighting practical obstacles in implementing AI governance at the institutional level.
Overall, the project established a novel approach that merges law, ethics, and computer science, positioning itself at the frontier of interdisciplinary research on responsible AI in healthcare.
Scientific and/or technological quality of the results
The scientific and technological quality of the results achieved in the project is demonstrated by their originality, rigor, and direct applicability to real-world challenges in AI and healthcare.

Contribution to the Development of an Ethical Framework for the Reuse of Health Data by Public Institutions
A key scientific and policy-oriented achievement of the project is the development of a new ethical framework for the reuse of health data by public institutions, critically addressing the growing risks associated with data commodification, regulatory gaps, and insufficient ethical oversight.

1. Identifying Ethical and Legal Gaps in Current Frameworks
Through my research, particularly in The Risks of Health Data Commodification in the EU Digital Market​, I highlighted how the European Digital Strategy Corpus (DSC)—comprising the AI Act, EHDS, Data Act, and other initiatives—tends to prioritize data-driven innovation without sufficiently safeguarding citizens’ rights. I demonstrated how anonymization, once perceived as a sufficient safeguard, often fails in practice due to re-identification risks and ambiguous legal interpretations. This work filled a critical gap by systematically examining the risks of commodification not just by private companies but also by public institutions, an area often overlooked in both academic and policy circles.

2. Redefining Anonymization as a Political and Ethical Act
In Redefining Anonymization: Legal Challenges and Emerging Threats in the Era of EHDS​, I argued that anonymization should not be treated as a mere technical procedure but as a political and ethical decision. Public institutions, in particular, often apply anonymization techniques without sufficient awareness of their limitations, thereby exposing vulnerable populations to exploitation and privacy violations. I proposed a reconceptualization of anonymization processes, recommending that public entities systematically consider power asymmetries, potential harms, and data subjects’ expectations in their decision-making.

3. Developing the “New EU Framework of Ethical Commodification”
Building upon these findings, the project articulated a comprehensive ethical framework specifically aimed at public institutions engaging in health data reuse. This framework includes:
-Guidelines for balancing public interest with individual rights.
-Recommendations for ethical and legally robust anonymization practices.
-Considerations for preventing discrimination and protecting vulnerable groups.
-Procedures for institutional accountability and public transparency.


Technological quality of the results
The technological quality of the results is high.

1. High-Quality Development of FanFAIR
The enhanced version of FanFAIR, presented at the IEEE Symposium on Computational Intelligence in Health and Medicine (SSCI 2025), represents a robust and scientifically validated contribution to AI fairness assessment. The tool employs a rule-based fuzzy inference system to assess dataset fairness semi-automatically. This approach was carefully designed to balance automation and expert judgment, acknowledging the inherent complexity of fairness evaluation.
The system was further strengthened with:
A novel sensitive variable analysis to detect correlations between sensitive attributes and predicted outcomes.
An advanced outlier detection algorithm capable of handling missing values through an improved Isolation Forest technique.
Seamless integration with Pandas DataFrames, increasing usability and facilitating adoption by data scientists and practitioners.

2. Empirical Validation on a Real-World Dataset
The application of FanFAIR to a real-world dataset on COVID-19 patients demonstrated both its scientific soundness and practical value. The results revealed that appropriate pre-processing not only improved machine learning performance but also significantly increased dataset fairness. Specifically, the fairness score improved progressively across the data cleaning stages (from 75.9% to 84.3%), confirming FanFAIR's capacity to detect and quantify fairness improvements in datasets.

3. Strong Methodological Foundation
The development of FanFAIR and the associated studies were grounded in state-of-the-art AI ethics, machine learning, and legal frameworks. The methodological rigor is evident in the tool's design:
It integrates statistical indicators with qualitative legal and ethical assessments.
It is aligned with EU regulatory requirements, especially the AI Act, ensuring that the scientific contribution is not only academically sound but also practically useful.

4. Contributions to AI Fairness Research
FanFAIR contributes to addressing a recognized gap in AI fairness literature — the lack of tools that assess fairness at the dataset level rather than solely at the algorithmic level. This shift is crucial, as fairness issues often originate from biased or incomplete data rather than model behavior alone. The project thus offers a novel and needed approach within the AI fairness ecosystem.

5. Efficiency and Accessibility
The tool was designed with accessibility in mind. It can operate efficiently on standard consumer-grade hardware and does not require advanced machine learning expertise from the user, making it suitable for deployment in both research and professional settings, including hospitals and public administrations.
These qualities ensure that the scientific outputs are not only innovative but also robust, impactful, and ready for practical application in settings where fairness in AI is mission-critical, such as healthcare.
DataCom Poster Page 1
DataCom Poster Page 3
DataCom Poster Page 2
My booklet 0 0