CORDIS - EU research results
CORDIS

Voice Emotion detection by Appraisal Inference

Periodic Reporting for period 1 - VocEmoApI (Voice Emotion detection by Appraisal Inference)

Reporting period: 2015-11-01 to 2017-04-30

The automated sensing of human emotions, emotional disturbances, and other kinds of affective states is gaining ever increasing scientific and commercial attention. Automated voice analysis has also become a hot topic for research on clinical diagnoses and commercial products have emerged for markets such as call center monitoring, telecom activities, entertainment and games, as well as household robots. Up to date, existing approaches to voice emotion recognition rely on machine learning techniques where detection algorithms are trained to identify specific emotions in a learning set of expression clips coded for ground truth and subsequently applied for classification of testing sets. There are two major drawbacks of this widely used method: 1) the difficulty of generalizing detection algorithms to new types of speech corpora and the on-the-fly analysis of spontaneous speech, and 2) the need to rely on a relatively small set of basic, highly prototypical, emotions such as joy, fear, anger, and sadness. However, many real life applications of emotion detection such as clinical diagnosis, marketing research, media impact analysis, audience research, forensics and public security, require subtle differentiations in the feeling states and mixed emotions are of central interest. In this context coarse classifications of basic emotions are of limited use (especially as prototypical, strong and pure emotions occur very rarely in everyday interaction).

The ERC Proof of Concept project VocEmoApI (Vocal Emotion detection by Appraisal Inference) has developed and implemented a patent pending innovative architecture for the automatic detection of a large gamut of emotions and other affective states in running speech that avoids both of these problems. The approach is based on the pioneering notion - from Scherer’s Component Process Model of emotion - that emotion processes are lawfully produced by the sequential-cumulative appraisal of given events and situations with the help of a small number of appraisal checks or criteria. This theory, which has been strongly confirmed by massive experimental evidence (as part of Scherer’s ERC-AdG PROPEREMO), predicts that if one can reconstruct the results of this appraisal process, one has all necessary elements to understand the nature of the resulting emotion or affective state.

Within the VocEmoApI project extensive analysis were conducted on four major vocal emotion portrayal corpora (with 4000+ speech samples) and on two commercial targeted experimental studies that manipulated specific appraisals. This research has yielded a set of vocal acoustic parameters that can serve as powerful predictors of the four major appraisal criteria (novelty/unpredictability and pleasantness/goal conduciveness of the event, urgency of preparing physiological and motor responses to cope with the consequences, and the degree of control/power/coping potential available to the individual). The results have been used to construct a powerful appraisal check prediction algorithm which is commercially available as a software module (SDK) and cloud API service from audEERING. The software is capable of identifying vocal cues related to appraisal checks in real-time while running speech is produced. From these cues a prediction of the most likely coordinate in a four dimensional appraisal space is made. Second, an Emotion Inference Module was developed which identifies the most likely emotional states from previously inferred appraisal score coordinates. The component is based on data from a massive international study on the semantic profiles of major emotion words which are most likely used in real life situations to describe a wide range of emotional states.
A prototype web-service API, and a stand-alone Android version, have been built for feature extraction, appraisal criteria score prediction and emotion category inference and have been evaluated in a number of applications. A patent application has been submitted and numerous dissemination and commercialization activities are under way.