Bayesian Models and Algorithms for Fairness and Transparency

Periodic Reporting for period 2 - BayesianGDPR (Bayesian Models and Algorithms for Fairness and Transparency)

Okres sprawozdawczy: 2021-10-01 do 2023-03-31

Machine learning systems are increasingly used by government agencies, businesses, and other organisations to assist in making life-changing decisions such as whether to grant someone bail, whether to invite a candidate to a job interview, or whether to give someone a loan. However, the data that is being used to train machine learning models consists of examples of decisions made by humans and therefore reflects societal biases. The danger is that biases in the training data result in machine learning-based decisions that perpetuate these biases. This bias is reflected and can even be exacerbated by the systems.

The ambition of the BayesianGDPR project is to develop models and algorithms that will enable large-scale applications of fair machine learning systems (taking into account fairness under uncertainty in collected data, in models, in future data predictions, and in future consequences of decisions or actions) that are also transparent in various challenging domains in science, industry, and decision making. We set out to achieve this ambitious grand challenge by: 1) developing a machine learning framework for addressing fairness under uncertainty in a static setting, 2) extending the framework for addressing fairness under uncertainty in a dynamic setting, and 3) allowing stakeholders to gain some knowledge of what changes are required for fairness to be met, thereby ensuring transparency in fairness.

The success of the BayesianGDPR project would benefit many other machine learning-based disciplines, such as computer vision, natural language processing, and data mining. In the short term, organisations relying on machine learning technologies will have concrete tools to comply with the non-discriminatory principles of GDPR and similar laws. In the medium term, BayesianGDPR would impact research in computational law, and its integration into mainstream legal practice. In the long term, BayesianGDPR will also ensure the continued confidence of the general public in the deployment of machine learning systems.

The Predictive Analytics Lab wearepal.ai research lies in the area of machine learning, with an emphasis in ethical and trustworthy machine learning (auditing/mitigating inappropriate bias against protected subgroups, and improving transparency of algorithmic systems); safe and robust machine learning (ensuring reliably good performance even when encountering extreme situations); and interactive machine learning (facilitating an understanding between a user and an algorithmic system).

Key publications around fairness under uncertainty in a static setting:
* Myles Bartlett, Sara Romiti, Viktoriia Sharmanska, Novi Quadrianto. Okapi: Generalising Better by Making Statistical Matches Match. Thirty-Sixth Conference on Neural Information Processing Systems NeurIPS, New Orleans, Louisiana, USA, 2022.
* Sara Romiti, Christopher Inskip, Viktoriia Sharmanska, Novi Quadrianto. RealPatch: A Statistical Matching Framework for Model Patching with Real Samples. European Conference on Computer Vision ECCV, Tel-Aviv, Israel, 2022.
* Thomas Kehrenberg, Myles Bartlett, Viktoriia Sharmanska, Novi Quadrianto. Addressing Missing Sources with Adversarial Support-Matching. arXiv, 2022.
* Bradley Butcher, Vincent Huang, Christopher Robinson, Jeremy Reffin, Sema Sgaier, Grace Charles, Novi Quadrianto. Causal datasheet for datasets: An evaluation guide for real-world data analysis and data collection design using Bayesian Networks. Frontiers in Artificial Intelligence, p. 18. ISSN 2624-8212, 2021.
* Viktoriia Sharmanska, Lisa Anne Hendricks, Trevor Darrell, Novi Quadrianto. Contrastive Examples for Addressing the Tyranny of the Majority. arXiv, 2020.

Key publications around fairness under uncertainty in a dynamic setting:
* Ainhize Barrainkua, Paula Gordaliza, Jose A. Lozano, Novi Quadrianto. A Survey on Preserving Fairness Guarantees in Changing Environments. arXiv, 2022.
* Gergely D. Németh, Miguel Angel Lozano, Novi Quadrianto, Nuria Oliver. A Snapshot of the Frontiers of Client Selection in Federated Learning. arXiv, 2022.

Key publications around transparency in fairness:
* Oliver Thomas, Fair Representations in the Data Domain. PhD Thesis, University of Sussex, 2022.
* Oliver Thomas, Miri Zilka, Adrian Weller, Novi Quadrianto. An Algorithmic Framework for Positive Action. ACM conference on Equity and Access in Algorithms, Mechanisms, and Optimization EAAMO, Virtual, 2021.
* Thomas Kehrenberg, Myles Bartlett, Oliver Thomas and Novi Quadrianto. Null-sampling for Interpretable and Fair Representations. European Conference on Computer Vision ECCV, Glasgow, UK, 2020.

The first achievements in the BayesianGDPR project are linked to a unique and novel combination of statistical matching with feature representation learning for mitigating unexpected failure of machine learning models caused by subgroup imbalance within the training data. Subgroups refer to environments/domains such as Europe or Asia, also refer to demographic attributes such as females or male, including overlapping dimensions of race, gender, age, disability, environment, etc. We have used matching at the level of individual samples and at the level of a group of samples. Matching with real samples can facilitate greater transparency in fairness, e.g. by investigating whether the produced matches are related in semantically-meaningful ways. We have shown that the proposed statistical matching framework is robust to real-world distribution shifts, and we now started investigating it in an online, dynamic setting. We will progress towards a machine learning framework that will enable large-scale applications of fair machine learning systems in various challenging domains in science, industry, and decision making and demonstrate their added value in different time-scales: short-term: providing concrete tools to comply with the non-discriminatory principles of GDPR and similar law; medium-term: impacting research in computational law, and its integration into mainstream legal practice; and long-term: ensuring the continued confidence of the general public in the deployment of machine learning systems.

wearepal.ai

Periodic Reporting for period 2 - BayesianGDPR (Bayesian Models and Algorithms for Fairness and Transparency)

Udostępnij tę stronę

Pobierz