Obj. 1: Advancing research in responsible creation and application of multilingual, multimodal pretrained XR models.
We have contributed novel multilingual data for XR model pretraining, as well as novel XR models. Our contributions to data creation span both speech and text modalities: In speech, mHuBERT-147 (147 languages) and Speech-MASSIVE (12 languages); in text, ELITR-bench, MAIA, PMIndiaSum, and other datasets addressing machine translation, language identification, and multilingual natural language understanding. In terms of models, we built mHuBERT-147 (for speech), TowerLM (for text translation tasks), and Spire (an extension of Tower to the speech modality). In RP2 we released the EuroLLM suite of models, three multilingual foundation models with sizes 1.7B/9B/22B trained from scratch through a EuroHPC extreme-scale grant, and supporting all 24 EU official languages plus 11 additional languages, and achieving state-of-the-art performance in various multilingual benchmarks. We also built two task-specific efficient models: the Multilingual DistilWhisper for automatic speech recognition, and an approach for efficient CTC regularization for speech translation.
Obj. 2: Development of new methods and algorithms for adapted, contextualised, and robust dialogue assistant XR models.
We have contributed datasets, methodology, software and empirical observations to advance various aspects of adaptation, contextualisation, uncertainty-awareness, explainability and robustness of XR models. Our contributions span 160 publications, associated open-source repositories, and have won 5 awards at research conferences. In an effort to accelerate development in these topics, we co-organised various international events (6 shared tasks and 1 workshop).
Obj. 3: Development of tools for online meetings and customer support agents.
We developed a customer support assistant prototype incorporating technology developed in UTTER (the Tower+ and xCOMET models), including machine translation, quality estimation, grammatical error correction, cultural appropriateness detection and adaptation, and emotion recognition of customer messages.
We developed a meeting assistant prototype, developed to test long-context large language models (LLMs) in realistic settings. In the first year, we built a general-purpose, LLM-powered assistant for friendly, informal meeting interactions. The 2nd year version added robustness to ambiguity, noise, and edge cases. The 3rd year lead to a trustworthy-by-design assistant built on NAV’s Trust Mediator (TM) framework, incorporating input filtering, safeguards and compliance checks which are core components for building accountable AI systems. We have demonstrated these prototypes in user days and they have undergone evaluation.
Obj. 4: Sustainable, maintainable platform and services.
We released TowerEval, an open-source LLM evaluation repository and toolkit for several different text-based tasks, ranging from translation to grammatical error correction. Models and datasets developed in Obj. 1 have been released with open weights in the Hugging Face ecosystem and downloaded over 1.7M times so far. Unbabel’s Widn.ai translation service was built by leveraging the Tower models developed in UTTER and combining it with proprietary technology and resources (Figure 3). The Naver trustworthy-by-design assistant, which builds on the Trust Mediator (TM) framework, a key contribution to Obj. 3, also supports this objective