Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Fairness in Language Models: Equally right for the right reasons

Project description

Investigating impartiality in language models that apps use to understand language

Natural language processing (NLP) empowers digital devices to analyse, understand and synthesise human language, be it text or speech. Most systems are based on language models using a large corpus of training data automatically derived from internet sources. However, this makes them vulnerable to unchecked prejudice, stereotypes and exclusion. The EU-funded FairER project will investigate NLP language models and solution strategies in a multilinguistic context. It will determine their objectivity and inclusiveness, not only in demographic terms (e.g. race, gender, age) but also at literacy level. The work is expected to make NLP applications more equitable and provide a basis for further investigation.

Objective

Most of us use technology related to natural language processing (NLP) such as Google Search or virtual assistants in phones and other devices on a daily basis. Large-scale pre-trained language models hereby play a crucial role as they often form the basis of those technologies. Those models are trained on a large amount of training data (e.g. the entire English Wikipedia and the Brown corpus) which makes it impossible to curate the training corpus and potential stereotypes and biases will be implemented into the model, often without researchers noticing. This can lead to problematic and unfair behaviour towards certain demographics, often those who already suffer from implicit biases in society.

With FairER, I aim to get a deeper understanding of the inner workings of these language models. In particular, I want to investigate how well their solution strategies align with those of humans and whether this depends on certain demographic attributes such as gender, race, age but also reading abilities and level of education. I will also probe those language models for fairness and inclusiveness, i.e. find out whether the performance of an NLP application depends on demographic attributes of the user. Furthermore, I will conduct this project in a multilingual setting and apply interpretability methods to better understand the rationale behind a models decision.

The main impact of FairER will be a better understanding of how language models treat different demographics. These insights will help to improve the fairness and inclusiveness of NLP applications. Furthermore, the datasets I will record and publish along with the code will encourage other researchers to replicate my findings and continue this line of research. Ultimately, this project will have both a scientific and societal impact on the NLP community and users of NLP applications.

Keywords

Project’s keywords as indicated by the project coordinator. Not to be confused with the EuroSciVoc taxonomy (Fields of science)

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

HORIZON-TMA-MSCA-PF-EF - HORIZON TMA MSCA Postdoctoral Fellowships - European Fellowships

See all projects funded under this funding scheme

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

(opens in new window) HORIZON-MSCA-2021-PF-01

See all projects funded under this call

Coordinator

KOBENHAVNS UNIVERSITET
Net EU contribution

Net EU financial contribution. The sum of money that the participant receives, deducted by the EU contribution to its linked third party. It considers the distribution of the EU financial contribution between direct beneficiaries of the project and other types of participants, like third-party participants.

€ 214 934,40
Address
NORREGADE 10
1165 KOBENHAVN
Denmark

See on map

Region
Danmark Hovedstaden Byen København
Activity type
Higher or Secondary Education Establishments
Links
Total cost

The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.

No data
My booklet 0 0