Language technology is all around us, in the form of smart speakers, translation tools, and the ever-expanding chatbots and language models. You've probably interacted with at least one language model since waking up. All of this technology, however, assumes that language is a uniform thing. We do not, however, use the same language with friends as we do in a business meeting, with children or adults, or with dialect and standard speakers. As human speakers, we are constantly adjusting how we speak and how we listen to the people around us.
Language technology, on the other hand, does not. As a result, it only works well for a small subset of the population: those whose languages have been modeled. We and others have demonstrated that even simple language technology is ineffective for dialect speakers, women, young people, and a variety of other underrepresented groups.
The INTEGRATOR project aims to change all of that. We are collaborating with experts from various fields to make language technology more equitable, less biased, and more inclusive. The project develops the theory, data sets, algorithms, and models required to achieve those objectives.
The main goal is to algorithmically incorporate demographic factors into NLP models in order to improve performance and mitigate demographic bias in language technology.
The second goal is to theoretically ground the work by identifying which demographic factors influence which NLP applications.
The ultimate goal is to provide data sets and data representations.
While NLP has evolved significantly in the years since this project was conceived, the issues it addresses have become more pressing in many ways. Large language models, such as GPT-4, have emerged as a driving force in NLP. While they have replaced many previous technologies and rendered some traditional tasks (and some aspects of the proposed work) obsolete, they continue to operate on the same assumptions as previous language technology.
Throughout the project, we have made significant contributions to the understanding and treatment of bias in scientific literature. These contributions have resulted in 30 publications, an expanding network of collaborators, successful placement of former project members, and an impact on current language technology.
This project's work has been presented in nearly 30 different venues, including keynote speeches at workshops and invited talks at universities.