European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

Connected headset adapting music to emotions

Periodic Reporting for period 1 - ONTBO (Connected headset adapting music to emotions)

Período documentado: 2023-06-01 hasta 2024-01-31

The initial objective of the project was to build an industrial demonstrator of a connected audio headset equipped with electrodes to measure the user's brain activity through EEG, in order to capture their emotional state and influence it. This represents a first use case of a more general solution for emotion regulation through digital content across various mediums.
A significant portion of the work carried out this year has focused on implementing treatments to identify the emotional state of a user based on data from sensors. We primarily used four modalities to detect the emotional state of a user:
• Video streaming capturing facial expressions, as captured by a camera.
• Prosody, based on audio streaming captured from a microphone.
• Text data, which can be derived from manual input (for example, in the case of chatbots, online data, or a speech-to-text algorithm).
• Data from biometric sensors.
Following a study of the existing literature on each modality, we implemented differentiated approaches to capture the emotional state of a user:
• For video, we adopted a Deep Learning approach, notably using Convolutional Neural Networks (CNN) on the video stream
• Prosody and biocensors also rely on a machine learning approach, but based on metrics derived from a pre-processing phase using 'classic' signal processing algorithms.
• Lastly, text utilizes classical Natural Language Processing (NLP) algorithms combined with models associating semantic meaning of words or expressions with emotional intensities. In our initial approach, we attempted to use a Deep Learning method for consistency with other modalities, but contrary to our initial expectations, an NLP approach proved more fruitful.
We have also developed a multi-modal fusion algorithm. This algorithm analyzes the results obtained from each modality and, assuming the observation of the same phenomenon (synchronicity of data streams), computes the most probable emotional state derived from all modalities. This algorithm, derived from our internal research and not identified in academic literature, takes into account belief functions associated with each modality's ability to capture each emotion, the reliability of emotional measurements, and can accommodate any combination of sensors. This represents a strong competitive differentiator compared to existing approaches.
In particular, we created several validation datasets, both general and corresponding to specific use cases. Notable datasets include:
A 'natural' audio dataset dedicated to prosody. These recordings were tested with our model, allowing us to verify that the performance of our system is independent of the culture of the observed individual.
The videos represent various potential use cases of our technology:
• Clips from television programs, such as political debates.
• Streams from the eSports domain.
• For audio-only content, we annotated emergency call recordings (such as 911 calls in the United States).
We had the opportunity to engage with a major Japanese banking institution. The challenge for this bank was the analysis of transcriptions of recorded telephone conversations with their clients. These efforts allowed us to test our text-to-emotion algorithms on a language other than English (as the provided texts were in Japanese), which was the only language supported by our system until now.
For this work, we attempted three different approaches:
• Translation of the provided text into English. This approach proved unsuccessful as many nuances specific to Japanese were lost.
• Training on data translated into Japanese.
• Therefore, we opted for a dataset native to Japanese, augmented by the available data, and this latter approach proved very effective with performance rates approaching 95%.
We have also completed other work packages from the overall project: determining personality traits influencing emotional response. The work began with a literature review of existing personality and socio-cultural models, which laid the foundation for the model proposed by Ontbo. For instance, we drew inspiration from the OCEAN model (also known as Big5, Plaisant et al., 2010), widely considered a reference in the field.

An analysis of use cases for our technology allowed us to expand this model with metrics beyond purely psychological aspects, categorized into four groups:

• Personality analysis
• Socio-cultural factors
• Interests
• Personal history and contextual factors

The current model of the digital psyche comprises a total of 62 inputs, enabling the generation of a combinatorial space exceeding by several orders of magnitude the number of humans living on Earth (4.16x10^18, assuming a combinatorial base on Boolean inputs of the digital psyche – an underestimate as many inputs of the digital psyche use richer representation formats such as integers, enumerations, etc.).

These efforts serve as input data for the batch "measuring personality traits of the digital psyche," which involves identifying and developing algorithms to measure each input of the digital psyche.

ONTBO's vision is to quickly enter the market to validate product-market fit. Thus, as part of our business strategy, we have chosen to market our technological module.

We obtained the support of the Ministry of Culture and their label through the demonstration held in April 2023. We continue to work with them on various use cases.
In May, we proposed our solution to the military, defense, and aviation sectors, aiming to improve operational efficiency through the evaluation of emotional reactions in virtual situations, stress resistance, and detection of fatigue and stress in pilots. The process required integration into their annual budget.
We were approached by the Ministry of the Interior to help the relevant authorities ensure the security of the Olympics. Here, ONTBO could assist in detecting suspicious behaviors, crowd movements, or aiding in the training of recruits ensuring security during this event.
We participated in last June's VivaTech and were invited for a presentation on the challenges of artificial intelligence by IBM. We were also contacted by a player in the automotive sector for multimodal emotion detection.
We have also identified very interesting use cases in the Luxury and Banking sectors.