Periodic Reporting for period 1 - DivCon (Divergence and convergence in dialogue: The dynamic management of mismatches)
Okres sprawozdawczy: 2023-05-01 do 2025-10-31
With DivCon, we aim to transform basic understanding of human interaction by showing how successful dialogue is driven by incremental, local and dynamic processes of mismatch management.
In everyday interactions, we continuously make predictions about what will happen next, based on how our own and others' behaviour aects the world, to open up new possible courses of action. In dialogue, these predictions are about sounds, words, inferences and even non-speech actions such as gestures or eye gaze. If our expectations are not met, we have to ascertain if the mismatching input can be resolved, or integrated as a surprising but rewarding outcome (as in the case of humour).
DivCon will produce a suite of corpus and experimental data for exploring the timely issues of communication via different forms of computer-mediated communication, including text-based chats, video calls and virtual reality meetings. To do this, the project will create a novel experimental platform for experiments in real time live multimodal interactions using avatars and virtual reality. The formal arm of the project will develop a precise theory of divergence and convergence in interaction which unites verbal and non-verbal dialogue phenomena using core notions of prediction and underspecification. This model provides an important step in the path to genuinely adaptive conversational AI systems, which are still beyond the reach of researchers despite the promise of recent decades.
The research and technological achievements will be organised in thematic areas which cross-cut the work packages described in the description of the action, whilst all pertaining to the bigger picture of the project.
In text-based chats, we have mostly been focussing on pragmatic mismatches. This work has included an analysis of memes (presented at conferences), showing non-propositional effects which differ between participants who get the meme or don’t, in line with our previous analysis of humour. We have collected experimental data of triadic text-chats in which we manipulated some of the interlocuters turns to include congruent or incongruent emojis. The work is currently being written up, but results indicate that incongruent emojis (e.g. a happy face when talking about killing someone), which we take to be an pragmatic indication of non-seriousness, to prompt conversational partners to increase their emoji use. In collaboration with Sara Amido (a visiting PhD student with us for three month from University of Pompeu Fabra in Barcelona) we have also collected data in which we intervene with pragmatic markers that we have identified as potentially being indicators of confrontational dialogue. This data is currently being analysed.
For face to face dialogues we have investigated the interplay of metaphor and disfluencies (Han Qiu et al 2024). This work is currently being extended to include co-speech gestures, since previous research suggests an association between speaker repairs and gesture use, but what the influence of the presumed increased cognitive load of using or understanding a metaphor is not known. We are also currently collecting a corpus of instructive dialogues, where an expert player of the game Carcassone explains the game to a novice. This task mirrors much real world dialogue in terms of mismatched knowledge distribution, and we are currently mapping points in the dialogues where there is a breakdown (e.g. because the expert has miscalculated how well the novice is keeping up). The data are rich and provide different strategies which could be transferred to a dialogue system.
Due to the development of large language models (LLMs) and other technologies, which were not envisaged in the original proposal, a key strand of research has been in investigating mismatches in interactions with dialogue systems and generative AI. We are currently undertaking several data collections in this area including collecting dyads or individuals in a voice activated game and people discussing aesthetic assessments with a voice dialogue system. We have further conducted a number of experiments with the Furhat social robot, including on gaze and laughter coordination as reported in Giannitzi et al., 2025. We have also had access to a corpus of German data from German households interacting with a smart assistant (Alexa), in which we are studying cases of laughter in terms of how active a conversational participant the Alexa is taken to be. This work, including a fine-grained taxonomy for laughter since human-human taxonomies cannot capture all cases in our data, is currently in progress with Mathias Barthel in Mannheim.
In terms of formal work, there is research along a number of dimensions. Work looking at how topoi (rules of thumb underpinning reasoning) operate in dialogue in general and in offensive humour in particular as a test case (Howes et al., 2025) is ongoing. We have further investigated topoi produced by children in collaboration with Claire Prendergast in Oslo, as reported in Howes et al., 2024, and in work currently under review, as well as a comparison of how these differ from responses given by adults and those given by an LLM, which are found not to reason in a human-like manner although they do seem to have access to the same topoi that the human respondents use. In work currently under review, we have also looked into LLM counterfactual explanations to show that there are differences in notions of actionability in LLMs but that these misalignments are primarily due to problems in language generation rather than inherent properties of the models. One of the PhD students in the project is further looking at impersonal pronouns in dialogue corpora, and the PI is guest editing a special issue of the journal Languages on Dynamic Syntax (one of the theoretical frameworks used in the project).
Full publications (including conference contributions) are detailed on http://www.christinehowes.com/research/divcon(odnośnik otworzy się w nowym oknie)
The project, which has been interdisciplinary from the outset, is therefore also now encompassing new areas of research in human-robot interaction (HRI), including consideration of the ethical aspects (as presented in Lagerstedt, E. & Howes, C (2024). I robot, you Jane? Ethics in the age of social robots. In Unfolding Ethics in Research and Society: Beyond Ethical Principles and Guidelines; WASP-HS Workshop in conjunction with the conference AI for Humanity and Society.
Preliminary results suggest that the incremental mismatch driven model of dialogue can better explain both grounded dialogue data and abstract linguistic puzzles. The early results offer evidence that large language models, as they are currently designed, will never be able to have the flexibility of human-human dialogue, and the project thus has far reaching consequences for the development of conversational AI systems.