Community Research and Development Information Service - CORDIS

H2020

SEWA Report Summary

Project reference: 645094
Funded under: H2020-EU.2.1.1.4.

Periodic Reporting for period 1 - SEWA (Automatic Sentiment Estimation in the Wild)

Summary of the context and overall objectives of the project

The overall aim of the SEWA project is to enable computational models for machine analysis of facial, vocal, and verbal behaviour in the wild. This is to be achieved by capitalising on the state-of-the-art methodologies, adjusting them, and combining them to be applicable to naturalistic human-centric human-computer interaction (HCI) and computer-mediated face-to-face interaction (FF-HCI). The target technology uses data recorded by a device as cheap as a web-cam and in almost arbitrary recording conditions including semi-dark, dark and noisy rooms with dynamic change of room impulse response and distance to sensors. It represents a set of audio and visual spatiotemporal methods for automatic analysis of human spontaneous (as opposed to posed and exaggerated) patterns of behavioural cues including analysis of rapport, mimicry, and sentiment such as liking and disliking.

In summary, the objectives of the SEWA project are:
(1) development of technology comprising a set of models and algorithms for machine analysis of facial, vocal and verbal behaviour in the wild,
(2) collection of the SEWA database being a publicly available benchmark multilingual dataset of annotated facial, vocal and verbal behaviour recordings made in-the-wild representing a benchmark for efforts in automatic analysis of audio-visual behaviour in the wild,
(3) deployment of the SEWA results in both mass-market analysis tools based on automatic behaviour-based sentiment analysis of users towards marketed products and a sentiment-driven recommendation engine, and
(4) deployment of the SEWA results in a novel social-network-based FF-HCI application – sentiment-driven Chat Social Game.

The SEWA project is expected to have many benefits. Technologies that can robustly and accurately analyse human facial, vocal and verbal behaviour and interactions in the wild, as observed by webcams in digital devices, would have profound impact on both basic sciences and the industrial sector. They could open up tremendous potential to measure behaviour indicators that heretofore resisted measurement because they were too subtle or fleeting to be measured by the human eye and ear. They would effectively lead to development of the next generation of efficient, seamless and user-centric human-computer interaction (affective multimodal interfaces, interactive multi-party games, and online services). They would have profound impact on business (automatic market research analysis would become possible, recruitment would become green as travels would be reduced drastically), and they could enable next generation healthcare technologies (remote monitoring of conditions like pain, anxiety and depression), to mention but a few examples.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

WP1 – SEWA DB collection, annotation and release
• Obtained ethical approval for the SEWA experiment.
• Designed the SEWA experiment protocol and implemented the data collection website.
• Conducted 199 successful data recording sessions using the aforementioned website. A total of 398 participants from 6 different cultural backgrounds (British, German, Hungarian, Serbian, Greek and Chinese) were recorded, resulting in more than 44 hours of audio-visual corpus covering a wide range of spontaneous expressions of emotions and sentiment.
• Extracted the low-level acoustic features (ComParE and GeMAPSv01a) from all SEWA recordings.
• Automatically tracked the 49 facial landmarks in all SEWA recordings. These results will be further refined through semi-automatic correction.
• Identified a total of 540 representative segments (high/low arousal, high/low valence, and liking/disliking; in total 90 segments per culture group) from the SEWA corpus. These segments – titled “the core SEWA dataset” – will be annotated fully in terms of facial landmarks, vocal and verbal cues, facial action units (FAUs), continuously valued emotion dimensions (valence and arousal), mimicry, sentiment, rapport, and template behaviours.
• Released the SEWA database version 0.1 internally, as according to the data management plan, and prepared the web-portal and EULA for subsequent public database release.

WP2 – Low-level Feature Extraction
• Implementation and evaluation of a software tool openWord to generate Bag-of-Audio-Words (BoAW) representations from acoustic low-level descriptors (LLDs) for robust acoustic features.
• Feature enhancement by deep neural networks to improve acoustic features computed from noisy speech signals.
• Cross-corpus emotion analysis, i.e., testing models for emotion analysis on languages which are not included in the training data.
• Implementation of the incremental in-the-wild face alignment method for automatic facial landmark localisation.
• Generation of multi-lingual dictionaries for BoAW representations with multi-databases in different languages.
• Application of the state-of-the-art of linguistic features employed in text retrieval to the sentiment analysis task.

WP3 – Mid-level feature extraction
• Existing, state-of-the-art tracking algorithm has been used for extracting the features such as facial landmarks, 3D head pose, nods and tilts (WP3.1).
• Three methods for Facial Action Unit (AU) detection and intensity estimation have been developed (WP3.2).
• The methods for AU detection were trained and tested on two publicly available datasets of naturalistic facial behaviour coded in terms of AU intensity. On both datasets the proposed methods improved the state-of-the-art in automatic AU detection and intensity estimation.

WP4 – Continuous Affect and Sentiment Sensing in the Wild
• Work on WP4 will start in M15 (April 2016).

WP5 – Behaviour Similarity in the Wild – start M12 end M30
• Work on WP5 will start at the end of M12 (from 1st February 2016).

WP6 – Temporal Behaviour-Patterning and Interpersonal Sentiment in the Wild
• Work on WP6 started just in M12 (January 2016).

WP7 – Integration, Applications and Evaluation

PlayGen have advanced the definition and design of the Chat Social Game.
• Refined and tested the concept underlying the Chat Social Game so that it is focused on a practical application with potential social and financial benefit.
• Clarified target user group and signed up 3 universities as partners to support user recruitment.
• Carried out 2 focus groups to define user needs.
• Developed initial game design concepts and mockups.
• Implemented initial prototype two-player chat-based game for debating called Sumobate.
• Progressed core technical functionality and advanced technical integration discussions.
• Planned evaluation approach.

RealEyes has focused on three major activities. First, they redefined

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

WP1 – SEWA DB collection, annotation and release
• A total of 398 participants from 6 different cultural backgrounds (British, German, Hungarian, Serbian, Greek and Chinese) were recorded in the wild, resulting in more than 44 hours of audio-visual corpus covering a wide range of spontaneous expressions of emotions and sentiment during both video-watching and computer-mediated face-to-face communication sessions. The data will be annotated in terms of facial landmarks, vocal and verbal cues, facial action units (FAUs), continuously valued emotion dimensions (valence and arousal), mimicry, sentiment, rapport, and template behaviours. The SEWA database will be publicly released in a web-based searchable form. The SEWA database is the very first to contain in-the-wild recordings of people’s reactive and interactive behaviours, to be demographically balanced (equal number of male and female subjects, equal number of subjects from each age group 20-30-40-50-60, in each age group there is at least one of the following interactive dyads male-male, male-female, female-female), having all subjects who are native speakers, to be annotated in terms of all visual, vocal, and verbal cues, affective dimensions, sentiment, and social signals such as rapport and mimicry.

WP2 – Low-level Feature Extraction
• Proven that Bag-of-Audio Words is able to predict emotions in terms of arousal and valence in a better way than all other known approaches and published results.
• The deteriorating effect of noise on acoustic features is overcome using de-noising auto-encoders (feature enhancement).
• Development of a hybrid system combining BoAW (acoustic features) and BoW (Bag-of-Words, linguistic features) with different feature fusing schemes.
• Implemented the incremental in-the-wild face alignment method for automatic facial landmark localisation. The tracker is capable of accurately tracking the 49 facial landmarks in real-time and is robust against illumination change, partial occlusion and head movements.

WP3 – Mid-level feature extraction
• Showed that by modelling AU segments in videos using a mixture of nominal and ordinal states improves the AU segmentation/detection over the state-of-the-art conditional random field models that employ either type of the states (i.e., nominal or ordinal).
• The proposed extension of the CORF (Conditional Ordinal Random Field) model, by defining its feature functions by means of Neural Networks, resulted in better estimation of AU intensities. The model was also applied to the task of agreement level estimation from the MAHNOB database – and it outperformed the existing methods applicable to the target task.

WP4 – Continuous Affect and Sentiment Sensing in the Wild
• Work on WP4 will start in M15 (April 2016).

WP5 – Behaviour Similarity in the Wild
• Work on WP5 will start at the end of M12 (from 1st February 2016).

WP6 – Temporal Behaviour-Patterning and Interpersonal Sentiment in the Wild
• No findings so far as WP6 started just in M12 (January 2016).

WP7 – Integration, Applications and Evaluation
• Social Chat Game
Through a series of discussions with the Valorisaton Board and the project partners, it has been concluded that the application of the SEWA technology for Chat Social Game should represent a new approach to communication skills training utilising emotion detection technologies developed in SEWA, together with validation methodologies. The application is targeting young people aged 18+, who are either in educational institutions or have recently completed education, who are shortly embarking on a career, and who’d benefit from a light touch, fun and meaningful way of practicing negotiation and discussions that are part of the everyday working life (e.g. job interview, dealing effectively with customers, negotiating a reduction in rent with a landlord or being more effective at dealing with work colleagues). Feedback from employers and end-us

Related information

Images

Record Number: 186633 / Last updated on: 2016-07-14