Fairness in Language Models: Equally right for the right reasons

Informazioni relative al progetto

FairER

ID dell’accordo di sovvenzione: 101065558

DOI

10.3030/101065558

Progetto chiuso

Data della firma CE 16 Giugno 2022

Data di avvio 1 Settembre 2022

Data di completamento 31 Agosto 2024

Finanziato da

Marie Skłodowska-Curie Actions (MSCA)

Costo totale

Nessun dato

Contributo UE

€ 214 934,40

Coordinato da

KOBENHAVNS UNIVERSITET
Denmark

Periodic Reporting for period 1 - FairER (Fairness in Language Models: Equally right for the right reasons)

Periodo di rendicontazione: 2022-09-01 al 2024-08-31

Large-scale pre-trained language models have revolutionized the field of natural language processing (NLP). They carry great potential and promise to solve many real-life applications such as translation, search, question answering and many more. Recently, a new generation of language models, so called Large Language Models (LLMs) have been developed and released in applications like ChatGPT which reached 100 million active users within 2 months, in comparison Google Translated reached that threshold after 78 months. It is therefore important to get a deeper understanding of how those models work and in particular in what scenarios they might not work as expected. Machine learning-based algorithms, such as LLMs are calibrated and heavily rely on predefined training data to solve particular tasks. In order to generalize well, i.e. to work on a variety if not all possible unseen datasets, a large amount of training data is needed. One of the problems that arises with this training procedure is that the datasets are too big to be curated and no one knows the datasets in detail. Furthermore, decisions made by those algorithms are often dicult to trace back so they mainly function as black boxes and it is not trivial to explain their decisions. Biases such as stereotypes about certain demographics that appear in training datasets will then be forwarded to the models and influence their decisions. It is therefore of critical importance to thoroughly understand those models in order to prevent them from harming certain demographics, often those who already suer from implicit biases in society. Language models not only need to be correct, they need to be “right for the right reasons”.

As those models are meant to interact with humans and base their decisions on a human-like reasoning, I argue that in order to understand those models in-depth, it is important to investigate those models further with respect to how well they align with human behaviour regarding dierent demographics. Therefore I want to investigate

a) the reasoning behind a decision, i.e. align attention and gradient-based importance
attributes by models with human fixation patterns with the help of eye-tracking and

b) whether the performance in a task aligns dierently between models and humans for
dierent demographics and languages

c) explain a model’s decisions with respect to gradient-based explainability methods to further open the blackbox and make models more transparent. This way, we can also trace back a model’s decision to the input which explains what part of the data is responsible for a certain outcome. Finally, this research needs

d) to be carried out in a multilingual setting and extended to languages other than English.

Over the course of this project, I have been involved in various collaborations and have conducted relevant work in the field. We have recorded a multilingual eye-tracking dataset, where we collected data in English, Spanish and German from participants solving a Question-Answer task with the help of simple computer webcams. We have used this dataset to compare human attention to model attention and analysed whether low-cost eye-tracking data can be a valuable alternative to annotation studies in order to valuate explainability methods. In another study, we analysed a normal reading task for 13 different languages and how fixation patterns differ in those languages. I have also worked on new explainability methods and carried out annotation studies to evaluate contrastive explanations where we explain a model decision in contrast to an alternative. Furthermore, I have worked on algorithmic fairness in language models but also vision-language models where an image and text are fed into a model to solve for instance a question-answer task. We evaluated the performance based on demographic groups like gender. Recently, I have also investigated political bias in LLMs and how we can align them with political ideologies which poses both a challenge and an opportunity in future research and use.

The results of this project have been published in 7 peer-reviewed publications, we have released code on GitHub and published 4 open-source datasets. This enables others to use and build upon our work.

Periodic Reporting for period 1 - FairER (Fairness in Language Models: Equally right for the right reasons)

Condividi questa pagina Condividi questa pagina sui social network

Scarica Scarica il contenuto della pagina