Bayesian Models and Algorithms for Fairness and Transparency

Project Information

BayesianGDPR

Grant agreement ID: 851538

DOI

10.3030/851538

Project closed

EC signature date 28 January 2020

Start date 1 April 2020

End date 30 September 2025

Funded under

EXCELLENT SCIENCE - European Research Council (ERC)

Total cost

€ 1 443 697,00

EU contribution

€ 1 443 697,00

1 443 697,00

Coordinated by

THE UNIVERSITY OF SUSSEX
United Kingdom

Periodic Reporting for period 4 - BayesianGDPR (Bayesian Models and Algorithms for Fairness and Transparency)

Reporting period: 2024-10-01 to 2025-09-30

Machine learning systems are increasingly used by government agencies, businesses, and other organisations to assist in making life-changing decisions such as whether to grant someone bail, whether to invite a candidate to a job interview, or whether to give someone a loan. However, the data that is being used to train machine learning models consists of examples of decisions made by humans and therefore reflects societal biases. The danger is that biases in the training data result in machine learning-based decisions that perpetuate these biases. This bias is reflected and can even be exacerbated by the systems.

The ambition of the BayesianGDPR project is to develop models and algorithms that will enable large-scale applications of fair machine learning systems (taking into account fairness under uncertainty in collected data, in models, in future data predictions, and in future consequences of decisions or actions) that are also transparent in various challenging domains in science, industry, and decision making. We set out to achieve this ambitious grand challenge by: 1) developing a machine learning framework for addressing fairness under uncertainty in a static setting, 2) extending the framework for addressing fairness under uncertainty in a dynamic setting, and 3) allowing stakeholders to gain some knowledge of what changes are required for fairness to be met, thereby ensuring transparency in fairness.

The success of the BayesianGDPR project would benefit many other machine learning-based disciplines, such as computer vision, natural language processing, and data mining. In the short term, organisations relying on machine learning technologies will have concrete tools to comply with the non-discriminatory principles of GDPR and similar laws. In the medium term, BayesianGDPR would impact research in computational law, and its integration into mainstream legal practice. In the long term, BayesianGDPR will also ensure the continued confidence of the general public in the deployment of machine learning systems.

The Predictive Analytics Lab wearepal.ai research lies in the area of machine learning, with an emphasis in ethical and trustworthy machine learning (auditing/mitigating inappropriate bias against protected subgroups, and improving transparency of algorithmic systems); safe and robust machine learning (ensuring reliably good performance even when encountering extreme situations); and interactive machine learning (facilitating an understanding between a user and an algorithmic system).

Key publications around fairness under uncertainty in a static setting:
* M. Bartlett, S. Romiti, V. Sharmanska, N. Quadrianto. Okapi: Generalising Better by Making Statistical Matches Match. Neural Information Processing Systems NeurIPS, 2022.
* S. Romiti, C. Inskip, V. Sharmanska, N. Quadrianto. RealPatch: A Statistical Matching Framework for Model Patching with Real Samples. European Conference on Computer Vision ECCV, 2022.
* T. Kehrenberg, M. Bartlett, V. Sharmanska, N. Quadrianto. Addressing Attribute Bias with Adversarial Support-Matching. Transactions on Machine Learning Research TMLR, 2024.
* V. Sharmanska, L. A. Hendricks, T. Darrell, N. Quadrianto. Contrastive Examples for Addressing the Tyranny of the Majority. arXiv, 2020.
* A. Barrainkua, S. Mazuelas, N. Quadrianto, J. A. Lozano. Safe Fairness Without Demographics: Spectral Uncertainty Set Perspective, IEEE TPAMI, 2026.

Key publications around fairness under uncertainty in a dynamic setting:
* G. D. Németh, M. A. Lozano, N. Quadrianto, N. Oliver. A Snapshot of the Frontiers of Client Selection in Federated Learning. TMLR, 2022.
* A. Barrainkua, P. Gordaliza, J. A. Lozano, N. Quadrianto. Preserving the Fairness Guarantees of Classifiers in Changing Environments: a Survey. ACM Computing Surveys, 2023.
* A. Barrainkua, P. Gordaliza, J. A. Lozano, N. Quadrianto. Uncertainty Matters: Stable Conclusions under Unstable Assessment of Fairness Results. Artificial Intelligence and Statistics AISTATS, 2024.
* T. Kehrenberg, J. S. Bautiste, J. A. Lozano, N. Quadrianto. Dissecting Performative Prediction: A Comprehensive Survey, arXiv, 2026.
* J. S. Bautiste, T. Kehrenberg, J. A. Lozano, N. Quadrianto. Strategically Deceptive Model Deployment in Performative Prediction, submitted, 2026.

Key publications around transparency in fairness:
* T. Kehrenberg, M. Bartlett, O. Thomas, N. Quadrianto. Null-sampling for Interpretable and Fair Representations. ECCV, 2020.
* O. Thomas, M. Zilka, A. Weller, N. Quadrianto. An Algorithmic Framework for Positive Action. ACM EAAMO, 2021.
* L. Gee, W. Y. Li, V. Sharmanska, N. Quadrianto. Visual-Word Tokenizer: Beyond Fixed Sets of Tokens in Vision Transformers, TMLR, 2025.
* A. Barrainkua, G. De Toni, J. A. Lozano, N. Quadrianto. Revisiting (Un)Fairness in Recourse by Minimizing Worst-Case Social Burden. AAAI, 2026.

We have released 12 open-source software packages which can be found at https://github.com/wearepal/: "nifr"; "positive action framework"; "RealPatch"; "okapi"; "support-matching"; "compression-subgroup"; "std-al"; "UncertaintyMatters"; "SPECTRE"; "MISOB"; "Visual Word Tokenizer"; "PerformativeGYM".

The project has also contributed to capacity building through training activities, supervision of early-career researchers, and international collaborations between UK and Spanish institutions. Exploitation efforts are ongoing through an ERC Proof of Concept Act.AI project, a Horizon Europe TANGO project, and the preparation of new EU funding proposals, which builds directly on the results of BayesianGDPR. In parallel, work is underway to develop a joint UK–Spain university start-up focused on trustworthy machine learning applications.

The BayesianGDPR project has made significant progress in developing fair, transparent, and robust machine learning systems. We introduced a novel combination of statistical matching and feature representation learning to address subgroup imbalances in training data and to improve robustness to real-world distribution shifts. Subgroups refer to environments or domains, such as Europe or Asia, as well as demographic attributes such as female or male, including overlapping dimensions of race, gender, age, disability, and environment. Building on this foundation, we coupled fairness with robustness and uncertainty through a framework that leverages unlabelled data and domain information to improve generalisation. Our approach makes fairness interventions both effective and interpretable, enabling stakeholders to understand how fairness is achieved. This addresses a critical blind spot in existing fairness research. For example, our analysis revealed that a fairness intervention model can learn to disentangle gender from non-gender-related information. However, within the remaining non-gender-related information, undesirable residual correlations, such as with skin tone, may still persist. As another example of transparency in fairness, when matching with real samples to balance datasets, we can inspect whether the resulting matches are related in semantically meaningful ways.

Together, these advances position uncertainty and interpretability as key components of trustworthy machine learning and lay the groundwork for large-scale, fair, and transparent machine learning applications that align with non-discrimination principles such as those in the GDPR.

wearepal.ai

Periodic Reporting for period 4 - BayesianGDPR (Bayesian Models and Algorithms for Fairness and Transparency)

Download Download the content of the page