The ARIA-VALUSPA Platform (AVP) is the main output of this project. It is a modular architecture with three major blocks running as independent binaries whilst communicating using ActiveMQ. It is available from
https://github.com/ARIA-VALUSPA/AVP(öffnet in neuem Fenster)The ARIA framework's Input block includes state of the art behaviour sensing, many components of which have been specially developed as part of the project. From Audio, we can recognise Gender, Age, Emotion, Speech activity and Turn taking, and a separate module provides speech recognition. Speech recognition is available for the three languages targeted by the project. From Video, we have implemented face recognition, emotion recognition, detailed face and facial point localisation, and head pose estimation.
Based on the ARIA-Framework we delivered a complete tri-lingual (English, Fench, German) Virtual Human representing Alice in Wonderland. It has been realised in Greta (Ogre), LA, and Unity3D. It makes full use of the Behaviour Analysis provided from audio and video.
An important contribution in the first period of the ARIA-VALUSPA project is the NoXi database of mediated Novice-Expert interactions. It consists of 83 dyads recorded in 3 locations (Paris, Nottingham, and Augsburg) spoken in 7 languages (English, French, German, Spanish, Indonesian, Arabic and Italian). The aim of the endeavour was to collect data to study how humans exchange knowledge in a setting that is as close as possible to the intended human-agent setting of the project.
We attained state of the art results by embracing the hugely popular and successful Deep Learning approach to Machine Learning, but doing so in a smart manner. A compbination of Deep Learning, Cooperative/Transfer/Active Learning, state of the art sub-systems such as facial point localisation and voice activity detection, state of the art databases, and the highest possible expertise on the behaviour analysis domain has resulted in novel systems that go well beyond the previous state of the art in terms of accuracy and speed.
The project has delivered a completely reworked Integrated Speech and Gesture Behaviour Generation system. Instructed by a novel parallel focus Dialogue Manager Architecture, making use of behaviour generation markup standards allows us to visualise the behaviour with either Greata or Living Actor. Both technologies deliver synchronised speech and face synthesis, and aim to include ever more accurately timed reactive behaviour.
In terms of impact, we have reached a high-impact agreement with a major multinational company who will the the sponsor of the Industry ARIA. In terms of academic impact, the consortium has published 54 peer-reviewed, open-access publications as part of the project, which equates to 3 publications per month. Of these, 17 are joint public/private publications.