DEep COgnition Learning for LAnguage GEneration

Informazioni relative al progetto

DECOLLAGE

ID dell’accordo di sovvenzione: 101088763

DOI

10.3030/101088763

Data della firma CE 5 Aprile 2023

Data di avvio 1 Agosto 2023

Data di completamento 31 Luglio 2028

Finanziato da

European Research Council (ERC)

Costo totale

€ 1 999 595,00

Contributo UE

€ 1 999 595,00

1 999 595,00

Coordinato da

INSTITUTO DE TELECOMUNICACOES
Portugal

Periodic Reporting for period 1 - DECOLLAGE (DEep COgnition Learning for LAnguage GEneration)

Periodo di rendicontazione: 2023-08-01 al 2026-01-31

Large-scale language models have led to impressive results in many NLP tasks, exhibiting transfer and few-shot learning capabilities. When interacting with such systems, users commonly find them capable of reasoning, planning, and explaining their decisions, often in convincing ways. However, despite the enormous advances in the last years, current deep learning models for NLP are still very limited in fundamental ways and many important ingredients are still missing to achieve a satisfactory level of "intelligence". Some of these limitations partly stem from their monolithic architectures, which are good for some perceptual tasks, but unsuitable for tasks requiring higher-level cognition.

The overarching goal of DECOLLAGE is to attack these fundamental problems by bringing together tools and ideas from machine learning, sparse modeling, information theory, and cognitive science, in an interdisciplinary approach.

Three research directions are:
1. Designing new components for utility guidance, control, and contextualization. This will endow the model with the ability to predict its own quality (an "inner voice" or a critic) and to handle contextual information (e.g. document-level, conversation-level, meta-information about the surrounding environment), doing so in a modular, selective, and efficient manner.
2. Developing dynamic memory structures that facilitate continual learning, by supporting efficient reading and writing access, fast adaptation, and representation of world and self-knowledge. We will exploit synergies with sparse modeling and information retrieval.
3. Formalizing and implementing new mathematical models for sparse communication, bridging the gap between discrete (symbolic) and continuous representations, and developing techniques to integrate multiple modalities (such as text, speech, and image signals) into a shared representation space. This will draw links between information theory, formal languages, and neuroscience.

We will apply these innovations to highly challenging language generation tasks, including machine translation and open-ended generation.

I present below a summary of the research and technological achievements along the main objectives/activities from 1 August 2023 until 31 January 2026, including the main results, the released code and datasets.

In WP1, we published a survey (Fernandes et al., TACL 2023), developed frameworks for quality-guided generation (Farinhas et al., EMNLP 2023), for comparison between alignment strategies for MT (Ramos et al., EAMT 2024), and for quality-aware sampling (Faria et al., NeurIPS 2024). We pioneered finetuning and in-context learning techniques to steer LLMs for MT (Alves et al., Findings of EMNLP 2023), later improved with continued pretraining to lead to TowerLLM (Alves et al., COLM 2024), downloaded 200K+ times in Huggingface. A later iteration of this model (Tower v2) has obtained the best results in the WMT 2024 General MT task (Rei et al., WMT 2024). We also advanced multilingual embedding models through the release of EuroBERT (Boizard et al., COLM 2025). We advanced interpretability techniques to analyze and understand contextual contributions in LLMs (Zaranis et al., Findings of EMNLP 2024) and we developed efficient state-space model architectures to handle long contextual information in MT (Pitorro et al., WMT 2024). We advanced multilingual contextualization for translation (Ramos et al., COLM 2025). Finally, we advanced on new uncertainty quantification techniques using conformal prediction (Campos et al., TACL 2024; Campos et al., AISTATS 2025), extending these techniques to non-exchangeable data, important for language generation (Farinhas et al., ICLR 2024; Ulmer et al., EACL 2024), and we implemented an uncertainty-based trigger for MT (Farinhas et al., EMNLP 2025).

In WP2, we proposed new sparse and structured associative memories (Martins et al., NeurIPS 2023 Workshop; Santos et al., NeurIPS 2024), the latter distinguished as a spotlight paper, in which we develop a new variant of modern Hopfield networks with exact retrieval. We also developed a new framework for continuous-time long-term memories, with application to video data, leading to a new model called ∞-video (Santos et al., ICML 2025, Santos et al., JMLR 2025) and to continuous-time Hopfield networks (Santos et al., ICLR 2025 Workshop). We proposed and developed a new efficient algorithm for sparse attention (AdaSplash) scaling up transformers to very long contexts (Gonçalves et al., ICML 2025) – this work has been accepted as an oral paper at ICML 2025 (around 1% of the paper submissions).

In WP3, we developed a communication-theoretic framework for generation-reranker systems, accepted as a spotlight paper at NeurIPS 2024 (Farinhas et al., NeurIPS 2024) and we published a book (Niculae et al., 2025, Foundations and Trends in Signal Processing) which surveys discrete latent structured models. We initiated our work in Task 3.2 through a video-text model (Santos et al., ICML 2025), a speech model (Fucci et al., ACL 2025), and we released SPIRE, a multimodal text-speech LLM (Ambilduke et al., Findings of EMNLP 2025). We developed a new framework, xTower, which verbalizes explanations for error spans in LLMs and corrects those errors (Treviso et al., Findings of EMNLP 2024).

In WP4, we studied the limitations of automatic evaluation metrics in assessing high-quality translations (Agarwal et al., EMNLP 2024a), and created a new high-quality user preference dataset for machine translation (Agarwal et al., EMNLP 2024b). We also annotated data to assess to which extent the context contributes to chat translation evaluation and under what conditions (Agrawal et al., TACL 2024). We created several new datasets for WMT 2023 and 2024 shared tasks (Freitag et al., WMT 2023; Blain et al., WMT 2023; Zerva et al., WMT 2024; Mohammed et al., WMT 2024). As for Task 4.2 (“Automatic evaluation”) several important steps have been taken, with the development and public release of a new evaluation metric for translation, CometKiwi v2, which won the Quality Estimation Shared Task (Rei et al., WMT 2023), followed by an explainable metric which predicts MQM error spans, xCOMET (Guerreiro et al., TACL 2024), which received 100+ citations. We extended some of the automatic metrics above to provide confidence intervals, using conformal prediction (Zerva et al., TACL 2024). We also developed and studied LLM-based fine-grained metrics (Fernandes et al., WMT 2023), created new metrics for evaluating and analyzing the robustness of translation models and LLMs to noisy source sentences (Peters et al., ACL 2025), and studied the biases of automatic evaluation metrics (Zaranis et al., ACL 2025). We developed a zero-shot benchmarking strategy (Pombal et al., COLM 2025), a question-answering based evaluator for MT (Fernandes et al., COLM 2025), and M-Prometheus, a suite of open multilingual LLM judges (Pombal et al., COLM 2025b).

I highlight the following scientific publications:

1) Guerreiro et al (TACL 2024). “xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection.” - xCOMET has become widely adopted by the community and the paper has received 130+ citations as of September 2025.

2) Alves et al. (COLM 2024 oral spotlight). “Tower: An Open Multilingual Large Language Model for Translation-Related Tasks.” - Tower LLM models have been released openly at Huggingface and have been downloaded 200K+ times. The paper has received 140+ citations as of September 2025. A later iteration of this model (Tower v2) has obtained the best results in the WMT 2024 General MT task, outperforming strong commercial systems such as Open AI’s GPT4, Google’s Gemini, Google Translate, and DeepL (Rei et al., WMT 2024).

3) Santos et al. (ICML 2024 spotlight). “Sparse and Structured Hopfield Networks.”

4) Farinhas et al. (NeurIPS 2024 spotlight). “Reranking Laws for Language Generation: A Communication-Theoretic Perspective.”

5) Gonçalves et al. (ICML 2025 oral). “AdaSplash: Adaptive Sparse Flash Attention.”

Periodic Reporting for period 1 - DECOLLAGE (DEep COgnition Learning for LAnguage GEneration)

Condividi questa pagina Condividi questa pagina sui social network

Scarica Scarica il contenuto della pagina