Skip to main content

Automatic Analysis of Group Conversations via Visual Cues in Non-Verbal Communication

Final Report Summary - NOVICOM (Automatic Analysis of Group Conversations via Visual Cues in Non-Verbal Communication)

Social interaction is a fundamental aspect of human life and is also a key research area in psychology and cognitive science. Social psychologists have been researching the dimensions of social interaction for decades and found out that a variety of social communicative cues strongly determine social behavior and interaction outcomes. Many of these cues are consciously produced, in the form of spoken language. However, besides the spoken words, human interaction also involves nonverbal elements, which are extensively and often unconsciously used in human communication. Nonverbal communication is conveyed as wordless messages, in parallel to the spoken words, through aural cues (voice quality, speaking style, rhythm, intonation) and also through visual cues (gestures; body language; facial expression and gaze). These nonverbal cues are used by all of us every day to infer the mood and personality of others, as well as to make sense of social relations, in a very wide range of situations.

Computational analysis of social interaction, in particular of face-to-face group conversations is an emerging field of research in several communities such as human computer interaction, machine learning, speech and language processing, and computer vision. Close connection with other disciplines including psychology and linguistics also exist in order to understand what kind of verbal and non-verbal signals are used in diverse social situations to infer human behavior. The ultimate aim is to develop computational systems that can automatically infer human behavior by observing a group conversation via sensing devices such as cameras and microphones. Besides the value for several social sciences, these systems could open doors to a number of relevant applications that support interaction and communication, including tools that improve collective decision making, that help keep remote users in the loop in teleconferencing systems, and that support self-assessment, training, and education.

Our aim in the “Automatic Analysis of Group Conversations via Visual Cues in Nonverbal Communication (NOVICOM)” project, is to develop computational systems that can automatically analyze social behavior by observing conversations via cameras and microphones. We focus on group conversations and aim to infer aspects of the underlying social context, including both individual actions and interactions among the people in the group. Examples to such aspects are dominance, leadership, and roles.

In the NOVICOM project, conducted at the Social Computing group at Idiap, we are exploring models that can estimate social behavior from both audio and visual nonverbal cues, with a specific focus on visual cues. We concentrated on a selected number of key research tasks in social interaction analysis. These include the automatic estimation of dominance in groups, the emergence of leadership, and personality. In these situations, people unconsciously display visual cues, in the form of gestures and body postures, which partly reveal their social attributes. For each task, our specific objectives are twofold. First we attempt to automatically detect the visual nonverbal cues that are displayed during interaction. Second, we investigate multimodal approaches that integrate audio and visual nonverbal cues to infer social concepts.