A ground-breaking Expressive Text-To-Speech platform to create emotionally resonant virtual voices

Project Information

ETTS

Grant agreement ID: 190181809

DOI

10.3030/190181809

Project closed

EC signature date 17 June 2022

Start date 1 January 2022

End date 30 June 2024

Funded under

The European Innovation Council (EIC)

Total cost

€ 2 960 380,15

EU contribution

€ 2 072 266,00

2 072 266,00

888 114,15

Coordinated by

VOISEED SRL
Italy

Periodic Reporting for period 2 - ETTS (A ground-breaking Expressive Text-To-Speech platform to create emotionally resonant virtual voices)

Reporting period: 2023-01-01 to 2024-06-30

Voice is the most powerful medium to deliver content to people, in any language, any culture. Today we are experiencing an ever-increasing demand for global content instantly available worldwide, however, a huge amount of this content is only subtitled and not dubbed. As a result, a large number of stories are never being told, and people can’t enjoy content in their native language. This is due to the traditional approach for expressive voice recording that is time consuming and expensive, while many content creators face budget and time constraint. Text-To-Speech (TTS) solutions are increasingly emerging to overcome the standard voice production pipeline complexity, but they are still not developed enough to achieve the necessary levels of emotions and prosody to give the same sensation as human voices. Content creators, to succeed in increasingly competitive international markets, cannot rely on poor robotic artificial voices.
We have developed the first Deep Learning based technology able to synthesize controllable highly expressive speech in multiple languages with multiple voices. This technology will be the core engine of specific solutions and applications to create and localize expressive voice content for multiple markets and verticals, with the primary aim to “Voice the Unvoiced”.

We have developed and optimised our multilingual deep learning models for AI voice dubbing and production, we developed a user-friendly project management interfaces. We collected a unique proprietary high quality multiemotion multilingual (HEMM) dataset to train our models and reach high-quality standards.
By the end of the project, we have released an enhanced version of the ETTS platform (Revoiceit) which features improved functionality in 16 supported languages consolidating Revoiceit as a powerful tool for AI dubbing and production. The latest version improves the efficiency and accuracy of the virtual voice production process, and offers an even more intuitive and user-friendly experience.

Our technology is based on pioneering studies on prosody and emotions. Our proprietary models, optimised and trained during this project, allow to move away from the standard artificial, robotic, and inexpressive voices synthesized by standard AI systems. With the ETTS project, we have developed the first generative AI technology able to synthesize phoneme level controllable expressive speech in multiple languages with multiple voices. This was implemented in a user-friendly cloud-based software platform that improves the efficiency and accuracy of the virtual voice production process. We created a new, highly professional solution that can change the workflow of audio localization, helping customers to streamline their process and achieve their audiovisual localisation goals.

ETTS dubbing and production platform

Periodic Reporting for period 2 - ETTS (A ground-breaking Expressive Text-To-Speech platform to create emotionally resonant virtual voices)

Download Download the content of the page