Periodic Reporting for period 4 - DREAM (Distributed dynamic REpresentations for diAlogue Management)
Reporting period: 2024-03-01 to 2025-05-31
1) A suit of implemented conversational agent models. In a series of publications, we have developed machine learning models that generate referring expressions that are discriminative, exploit the conversational common ground, and are adapted to the expertise of the dialogue partner.
2) A comprehensive analysis and evaluation of multimodal systems. In another series of publications, we have provided evidence that vision-language models learn better semantic representations than language-only models for concepts that are visually grounded; we have shed light on how and when these models perform multimodal integration, and have uncovered patterns of alignment and miss-alignment between their internal representations and human cognitive signals.
3) An in-depth analysis of linguistic priming and interactive alignment in humans and language models. In another series of publications, we have carried out empirical corpus-based analyses of human-human dialogue in first- and second-language acquisition setups and shown that large language models exhibit structural priming effects that are similar to those observed in humans.
4) A framework based on notions from Information Theory to analyse and model language use in conversation. In another series of publications, we have shown that information content tends to decrease over the course of a dialogue rather than remain constant as argued by well-known theories and proposed a novel model of utterance processing effort that relies on alternative sampling operationalised with large language models.
5) An analysis of model uncertainty and its practical consequences. In yet another series of publications, we have studied model uncertainty and its relation to human production variability, proposed methods to exploit uncertainty to generate clarification questions, and expose the dangers of unrecognised ambiguity and lack of uncertainty awareness.
In addition to the scientific publications, the project's output also includes public code repositories and datasets, which are available from the project's website.