Until now, social signalling processing was focused only on audio analysis of overlapping speech. This approach excluded such nuances as mimicry and rapport of empathy. The interplay of facial, head, eye, arm, bodily and vocal behaviour, which was left out, has as much to say as the sound of speech itself. In terms of human behaviour, conflict escalation and resolution, researching these non-verbal cues can provide valuable insight for better conflict management. The EU-funded project CONFER (Automatic detection of conflict escalation and resolution in social interactions) conducted an automatic analysis of conflict in interactions from real-world audiovisual data. It aimed to facilitate automated modelling of conflict escalation and resolution based on co-occurrence, frequency, duration and temporal evaluation of behavioural cues. Additionally, it created and analysed tools for automatic audiovisual recognition and prediction of conflict escalation and resolution that can take temporal interplay between interlocutors into account. Videos were used to extract over 60 hours of live political debates televised in 2011 and 2012. Debate participants are a good model of real-world conversation since they have real motivations that lead to real conflicts. Data were annotated by 10 experts for continuous conflict intensity, valence and arousal using a joystick annotation tool. Both physical and inferential aspects of the conversations were taken into account. Since the data are real-world examples, they are filled with errors, including missing and incomplete data. Various novel methods were developed to handle such issues. Release of the annotated data is expected to have a signification impact on the progress of research in cognitive and social sciences. Novel applications such as conflict management systems, negotiation support systems and conflict detection will also be possible.
Conflict escalation and resolution, automatic sensing, non-verbal interlocutors, social interactions, negotiation