Periodic Reporting for period 2 - ARIA-VALUSPA (Artificial Retrieval of Information Assistants - Virtual Agents with Linguistic Understanding, Social skills, and Personalised Aspects)
Okres sprawozdawczy: 2016-07-01 do 2017-12-31
As part of the project, the consortium will develop two specific implementations of ARIAs for two different industrial applications. A ‘speaking book’ application will create an ARIA with a rich personality capturing the essence of a novel, so the user can ask questions about anything related to the novel. Secondly an ARIA scenario will be implemented that is proposed in a business case by one of the Industry Associates at the end of year one. Both applications will have provable commercial value, either to our Industry Partners or Industry Associates. ARIA-VALUSPA prototypes will be developed during the project supporting the three European languages English, French, and German to increase the variety and amount of potential user groups. The framework to be developed will be suitable for multiple platforms, including desktop PCs, laptops, tablets, and ultimately smartphones. The ARIAs will be able to be displayed and operate in a web browser.
The ARIAs will have to deal with unexpected situations that occur during the course of an interaction. Interruptions by the user, unexpected task switching by the user, or a change in who is communicating with the agent (e.g. when a second user joins in the conversation) will require the agents to either interrupt its behaviour, execute a repair behaviour, re-plan mid and long-term actions, or even adapt on the fly to the behaviour of its interlocutor.
The ARIA framework's Input block includes state of the art behaviour sensing, many components of which have been specially developed as part of the project. From Audio, we can recognise Gender, Age, Emotion, Speech activity and Turn taking, and a separate module provides speech recognition. Speech recognition is available for the three languages targeted by the project. From Video, we have implemented face recognition, emotion recognition, detailed face and facial point localisation, and head pose estimation.
Based on the ARIA-Framework we delivered a complete tri-lingual (English, Fench, German) Virtual Human representing Alice in Wonderland. It has been realised in Greta (Ogre), LA, and Unity3D. It makes full use of the Behaviour Analysis provided from audio and video.
An important contribution in the first period of the ARIA-VALUSPA project is the NoXi database of mediated Novice-Expert interactions. It consists of 83 dyads recorded in 3 locations (Paris, Nottingham, and Augsburg) spoken in 7 languages (English, French, German, Spanish, Indonesian, Arabic and Italian). The aim of the endeavour was to collect data to study how humans exchange knowledge in a setting that is as close as possible to the intended human-agent setting of the project.
We attained state of the art results by embracing the hugely popular and successful Deep Learning approach to Machine Learning, but doing so in a smart manner. A compbination of Deep Learning, Cooperative/Transfer/Active Learning, state of the art sub-systems such as facial point localisation and voice activity detection, state of the art databases, and the highest possible expertise on the behaviour analysis domain has resulted in novel systems that go well beyond the previous state of the art in terms of accuracy and speed.
The project has delivered a completely reworked Integrated Speech and Gesture Behaviour Generation system. Instructed by a novel parallel focus Dialogue Manager Architecture, making use of behaviour generation markup standards allows us to visualise the behaviour with either Greata or Living Actor. Both technologies deliver synchronised speech and face synthesis, and aim to include ever more accurately timed reactive behaviour.
In terms of impact, we have reached a high-impact agreement with a major multinational company who will the the sponsor of the Industry ARIA. In terms of academic impact, the consortium has published 54 peer-reviewed, open-access publications as part of the project, which equates to 3 publications per month. Of these, 17 are joint public/private publications.
We attained state of the art behaviour analysis results by embracing the hugely popular and successful Deep Learning approach to Machine Learning, but doing so in a smart manner. A compbination of Deep Learning, Cooperative/Transfer/Active Learning, state of the art sub-systems such as facial point localisation and voice activity detection, state of the art databases, and the highest possible expertise on the behaviour analysis domain has resulted in novel systems that go well beyond the previous state of the art in terms of accuracy and speed.
The project has delivered a completely reworked Integrated Speech and Gesture Behaviour Generation system. Instructed by a novel parallel focus Dialogue Manager Architecture, making use of behaviour generation markup standards allows us to visualise the behaviour with either Greata or Living Actor. Both technologies deliver synchronised speech and face synthesis, and aim to include ever more accurately timed reactive behaviour.
In terms of impact, we have reached a high-impact agreement with a major multinational company who will the the sponsor of the Industry ARIA. In terms of academic impact, the consortium has published 93 peer-reviewed, open-access publications as part of the project, which equates to almost 3 publications per month.