Skip to main content

MULTImodal and MULTIparty Social Interactions MOdelling

Periodic Reporting for period 1 - MULTISIMO (MULTImodal and MULTIparty Social Interactions MOdelling)

Reporting period: 2016-09-01 to 2018-08-31

The goal of the project was to model the multimodal behavior of speakers participating in multiparty social interactions. The target has been the development of new knowledge, i.e. dialog and behavioral models that are indispensable for technologies capable of incorporating human-like communicative abilities into human-computer intelligent interfaces. To this end, this project addressed multimodal and multiparty dialogue management driven by cognitive models of human interactive behavior.

The research objective was to understand and model the multiparty interaction configuration and the underlying communicative and social behavior of the participants. This novel research aimed to respond to the following challenges in terms of analysis and modeling: (a) interpretation of conversational behavior and the turn-taking mechanism, (b) detection of personality traits and behavior analysis, and (c) modeling the multimodal strategies and regulatory actions reflected through the participants’ verbal and non-verbal signals and psychological variables.
The analysis and interpretation of multimodal signals provided the necessary semantic and communicative context to decode the speakers’ interactional behavior. This research work has been ambitious also in that it goes beyond two-party interaction and will focus on multiparty interaction, a so far under-studied, yet a very common, rich and informative everyday communication phenomenon. Specifically, it has examined and modeled the set of communicative strategies that are important for the participants to acquire, develop and manage, so that they reach a communicative goal. The project also examined the effect of personality traits and emotions on the degree of speakers’ engagement in the interaction, their level of attention and the tendency to create or the ability to manage conversational conflicts. In this context, the achievement of the communicative goal has been assessed through measurable multimodal signals that interlocutors make and has been correlated to the participants’ personality traits and turn-taking activity.
The project built on models trained on a new dataset of multimodal communicative human behavior - obtained throughout rigorous recording of natural interactions in multiparty settings session carried out in the scope of the project (in total 4 hours and 50 human participants involved) - combined with state-of-the-art techniques in multimodal communication and language technology, leading to entirely fresh views on human-machine interaction.
The project has also innovated in the field of affective computing and behavioral analytics, through Investigation of perception and automatic detection of psychological variables, group leadership and emotion-related features in group interaction through exploitation of linguistic, acoustic and visual features.

The work performed in the project can be summarised in the following activities:
- Experimental design, implementation and collection of a multimodal corpus of three-party interactions.
- Conversational dominance quantification and detection.
- Personality detection from audio and text.
- Measuring engagement from linguistic repetitions and turn-taking elements.
- Dialogue laughter classification and modeling.
- Speech pause analytics for the detection of the next speaker in multi-party interaction.
- Development of an innovative scoring system to measure collaboration and task success in small groups from voice, facial expressions, turn-taking, and personality features.

Research carried out in the project resulted in 9 publications and the release of the project’s dataset, the MULTISIMO corpus.
The Fellow’s paper “Quantifying dominance in the Multisimo corpus” presented at the 9th IEEE Conference on Cognitive Infocommunications, received a best paper award.
To disseminate the project results and to achieve maximum outreach, the Fellow participated in more than 10 conferences and other networking events, gave two invited talks, and was actively involved in the European Researchers’ Night events in 2016 and 2017.
To create and maintain a vibrant community around the area of multimodal interaction analysis, the Fellow co-organised two scientific tracks in international conferences.
As part of her career development plan, and in order to maximise the impact of the research results of the project, the researcher fully exploited training opportunities provided by the host institution and the EU, including academic training, research development and exploitation of research results.
The Fellow also demonstrated efficacy in directing researchers through contributing to the supervision of both final year undergraduates and postgraduates, and a measure of the success of this is that such projects have culminated in peer reviewed publications. She also contributed to the teaching mission of the host institution through training on research project supervision activities, and through providing guest lectures to modules that are part of the core curriculum of both undergraduate and post-graduate courses.
The MULTISIMO corpus of annotated and analysed human-human interactions that was developed in the project has been made freely available to the research community. The impact of this corpus will be tremendous as the designed analyses progress and as others in the research community avail of the resource the Fellow has designed and constructed. It will also serve as a reference of resource development to the community, and significant contributions towards the integration of multi-media and multi-layered data encoding and standardization are expected.
Through its novel perspective, work carried out in the project has substantially covered the multiparty interaction setting and has integrated the underlying emotional states and communicative intent into models of interactional behavior. It has thus laid the foundations of intuitive and context-sensitive multimodal communication through interfaces that have the potential to automatically adapt to the users and their state and take into account their cognitive, auditory and visual cues. The integration of technologies emerging from the research output will influence the design of collaborative human-machine interfaces and, in the long run, improve the quality or efficiency of private or professional services and positively influence user communities.