Community Research and Development Information Service - CORDIS


SEWA Report Summary

Project ID: 645094
Funded under: H2020-EU.

Periodic Reporting for period 2 - SEWA (Automatic Sentiment Estimation in the Wild)

Reporting period: 2016-02-01 to 2017-07-31

Summary of the context and overall objectives of the project

The overall aim of the SEWA project is to enable computational models for machine analysis of facial, vocal, and verbal behaviour in the wild. This is to be achieved by capitalising on the state-of-the-art methodologies, adjusting them, and combining them to be applicable to naturalistic human-centric human-computer interaction and computer-mediated face-to-face interaction. The target technology uses data recorded by a device as cheap as a web-cam and in almost arbitrary recording conditions including semi-dark, dark and noisy rooms with dynamic change of room impulse response and distance to sensors. It represents a set of audio and visual spatiotemporal methods for automatic analysis of human spontaneous patterns of behavioural cues including analysis of rapport, mimicry, and sentiment such as liking and disliking.
The objectives of the SEWA project are:
Development of technology comprising a set of models and algorithms for machine analysis of facial, vocal and verbal behaviour in the wild, Collection of the SEWA database being a publicly available benchmark multilingual dataset of annotated facial, vocal and verbal behaviour recordings made in-the-wild representing a benchmark for efforts in automatic analysis of audio-visual behaviour in the wild, Deployment of the SEWA results in both mass-market analysis tools based on automatic behaviour-based sentiment analysis of users towards marketed products and a sentiment-driven recommendation engine, and Deployment of the SEWA results in a novel social-network-based FF-HCI application – sentiment-driven Social Game.
Technologies that can robustly and accurately analyse human facial, vocal and verbal behaviour and interactions in the wild, as observed by webcams in digital devices, would have profound impact on both basic sciences and the industrial sector. They could open up tremendous potential to measure behaviour indicators that heretofore resisted measurement because they were too subtle or fleeting to be measured by the human eye and ear. They would effectively lead to development of the next generation of efficient, seamless and user-centric human-computer interaction (affective multimodal interfaces, interactive multi-party games, and online services). They would have profound impact on business and they could enable next generation healthcare technologies.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

WP1 – SEWA DB collection, annotation and release
Annotated valence and arousal in all recordings of the subjects watching the 4th stimulus clip and in all full video-chat recordings. Released the SEWA database version 1.0 publicly as according to the data management plan.
WP2 - Low-level Feature Extraction
Application of the state-of-the-art of linguistic features employed in text retrieval to the sentiment analysis.Investigation of acoustic landmarks as robust linguistic features for emotion recognition. Implementation of incremental in-the-wild face alignment method for automatic facial landmark localisation.
WP3 – Mid-level feature extraction
Development of copula based model for the intensity estimation of action units. Annotated 100 sequences from the SEWA data for AU detection. Implementation of the deep convolutional model for mid-level feature extraction.
WP4 – Continuous Affect and Sentiment Sensing in the Wild
Investigation of confidence measure-based Semi-Supervised Learning (SSL) for multimodal emotion recognition. Deep Neural Network (DNN)-based Multi-Task Learning using the uncertainty of the labels (disagreement between annotators) as an additional target. Enhancement of the Automatic Speech Recognition (ASR) module for the languages German and English.
WP5 – Behaviour Similarity in the Wild
Development, implementation, and evaluation of a novel methodology for unsupervised temporal segmentation of behaviour based on multimodal data. Development of a novel framework novel framework for dynamic behaviour modelling, analysis, and prediction.
WP6 – Temporal Behaviour-Patterning and Interpersonal Sentiment in the Wild
An audiovisual fusion method based on cross-prediction of each modality has been modified. An approach based on low-order linear dynamical systems has been developed. Experiments have been conducted on the SEWA database for behaviour prediction of valence, arousal and liking in-the-wild, facial, vocal and audio-visual behaviour similarity estimation and (semi)-unsupervised behaviour understanding in-the-wild.
WP7 – Integration, Applications and Evaluation
SEWA Audio and Video tools are tested and integrated to processing pipeline to have more behaviour input for the emotional profiles. Evaluation of the SEWA tools. Exploration of commercial opportunities through multiple meetings with a variety of potential partners.
WP8 – Dissemination, Ethics, Communication and Exploitaion in Part B.
WP9 – Project management
Overall strategic and operational management and steering of the project, ensuring the accuracy, quality and timeliness of deliverables. Management of liaison with the European Commission; management of public face of the project and networking with other related projects.Co-ordination of coherence of all developments between WPs.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

WP1: We released the SEWA database (SEWA DB), a multilingual dataset of annotated facial, vocal and verbal behaviour recordings made in-the-wild. SEWA DB will be used for a number of challenges and benchmarking efforts and will have more than 200 active users worldwide by the end of the project. The SEWA DB can be accessed online at
WP2: Development of a hybrid system combining BoAW (acoustic features) and BoW (Bag-of-Words, linguistic features) and also BoVW (Bag-of-Visual-Words) with different feature fusing schemes. The toolbox openXBOW has been released and has already been used on different tasks.s.
WP3: We have presented the robust mid-level visual feature detection component developed for the SEWA project
WP4 : Realisation of a fully automatic continuously-valued sentiment and affect dimensions predictor from audio-visual data recorded in the wild, which gets performance competitive or better than other state-of-the-art approaches. Definition of meaningful confidence measures for regression problems.
WP5 : Experiments have been conducted on the SEWA database for (i) behaviour prediction of valence, arousal and liking in-the-wild as well as (ii) facial, vocal and audio-visual behaviour similarity estimation for behaviour template discovery and, in general, (semi)-unsupervised behaviour understanding in-the-wild. Experiments on the naturalistic data of the SEWA Database demonstrate the robustness and the effectiveness of the proposed framework. The predictive framework is shown to be capable of predicting future labels of behaviour even when given a small amount of past observations, characterized by (possibly) corrupted and/or non-informative annotations. The similarity estimation framework is shown to be capable of discovering representative templates of affective behaviour.
WP6 : We have also developed a facial, vocal and audio-visual behaviour similarity measurement framework which can be used (i) to find typical templates of affective behaviour and (ii) classify never-before-seen sequences as being similar or dissimilar to the identified behaviour templates.
WP7 : In partB
WP8 : SEWA partners have increased the interest of general public and the industry in the field.

Related information

Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top