Periodic Reporting for period 2 - COMPRISE (Cost-effective, Multilingual, Privacy-driven voice-enabled Services)
Okres sprawozdawczy: 2020-06-01 do 2021-11-30
COMPRISE has defined a fully private-by-design methodology that reduces the cost and increases the inclusiveness of voice interaction technology. The innovative software tools developed have increased privacy to an unprecedented level, allowed the development of dialogue systems without any training resources in the target language, and reduced the cost of integrating voice features in mobile applications by more than 70%. These tools are now part of the COMPRISE SDK and the COMPRISE Cloud Platform, which are available in open source for voice technology companies and application developers.
With respect to the inclusiveness objective, the COMPRISE Speech-to-Text Translation tool can translate spoken language in a way that is robust to STT errors and disfluencies (e.g. hesitations, missing words). We also introduced a multilingual NLU system, which addresses the detection of user intent in any language without any training resources in that language, and an STT personalisation method, which improves STT performance by 27% relative for users with regional or foreign accents with only 1 h of untranscribed training data per accent.
With respect to the cost-effectiveness objective, COMPRISE Weakly Supervised STT reduces the amount of human annotated data needed to train STT systems by more than 40%, while COMPRISE Weakly Supervised NLU benefits from as low as 100 labeled training examples and scales seamlessly down to a zero-shot setting, requiring no training at all. All these innovative software tools leverage cutting-edge deep learning and speech and language processing approaches and new approaches developed within COMPRISE.
Existing and new software tools have been integrated into an SDK interoperating with a Cloud Platform, which provide a full-fledged open-source solution for voice technology companies and application developers. The COMPRISE SDK includes the COMPRISE Client Library, which can be deployed on any Android or iOS device and integrates all required voice functionalities, the COMPRISE App Wizard, which allows quick configuration of these functionalities, and the COMPRISE Personal Server, which runs computationally demanding services outside the device while still preserving privacy. The COMPRISE Cloud Platform provides services for data collection and curation and for system training.
We have also developed six demonstrators to showcase these innovative tools: Cookbook, Notes, Remote Presentation Control, Shoplay, Hospital Concierge; and Doctor’s Assistant. The integration of voice features in the Remote Presentation Control demonstrator took 2 PMs with COMPRISE vs. 7 PMs without it, which translates into cost savings above 70%. These demonstrators were evaluated by potential end-users, who appreciated the new user experience offered by voice features and rated the demonstrators positively. This validates the benefits of COMPRISE, especially in the sectors of smart consumer apps, e-commerce, and e-health.
All of these advances have been thoroughly followed and monitored via rigorous management tasks, via a thorough comprehensive summary and analysis of the main aspects regarding the General Data Protection Regulation (GDPR) that needs to be considered for the implementation of the project and the development of COMPRISE, and via efficient dissemination, communication and exploitation-related activities.
The COMPRISE SDK, the COMPRISE Cloud Platform, the COMPRISE Voice and Text Transformers and the COMPRISE Weakly Supervised STT and NLU tools are freely available in open source. Customisation and high-level support are also available at a cost. Thanks to our ambitious exploitation strategy, these COMPRISE outcomes are expected to enable many businesses in the Digital Single Market to quickly develop multilingual voice-enabled applications in many languages. They will also positively impact European citizens by offering unprecedented privacy guarantees, facilitating their access to voice-enabled contents and services in other languages, and improving their overall experience. COMPRISE will find application in many sectors beyond those demonstrated, e.g. e-government, e-justice, e-learning, tourism, culture, or media.