Skip to main content

Cost-effective, Multilingual, Privacy-driven voice-enabled Services

Periodic Reporting for period 1 - COMPRISE (Cost-effective, Multilingual, Privacy-driven voice-enabled Services)

Okres sprawozdawczy: 2018-12-01 do 2020-05-31

Besides visual and tactile, voice interfaces are becoming an increasingly popular means of interaction with smart objects and applications. Some of the underlying technologies, namely Speech-to-Text (STT) and Natural Language Understanding (NLU), must be trained on large amounts of speech and text data stored in the Cloud. To do so, voice technology companies typically collect voice data from users and hire human annotators to transcribe them into text. Application developers then define a list of possible user requests and associated answers for every application. This process must be repeated for every language. This approach (i) raises critical privacy concerns relating to the users’ voice characteristics and the spoken contents, (ii) is not inclusive, in the sense that it fails to address languages and categories of users for which little data is available, and (iii) incurs high costs for voice technology companies, which have led to market domination by tech giants, and also for application developers.
The overall objective of COMPRISE is to define a fully private-by-design methodology that will reduce the cost and increase the inclusiveness of voice interaction technology through research advances on privacy-driven data transformation, on-the-fly translation, user personalisation, and automated data annotation. The resulting software tools will be integrated in an easy-to-use Software Development Kit (SDK) interoperating with a Cloud Platform. The sustainability of this new ecosystem will be demonstrated for three sectors with high commercial and societal impact: smart consumer apps, e-commerce, and e-health.
With respect to the privacy objective, we released two software tools which protect the voice of the users and their personal information: the COMPRISE Voice Transformer and the COMPRISE Text Transformer. The COMPRISE Voice Transformer aims to prevent biometric identification of the users by converting their voice to another random person’s voice. It provides a significant increase in privacy, as validated through state-of-the-art biometric protocols. The COMPRISE Text Transformer aims to identify potentially privacy-threatening words or phrases in a piece of text and to replace them by harmless alternatives preserving the text’s structure. The main innovation lies in our word and phrase replacement strategy which offers formal privacy guarantees.
With respect to the inclusiveness objective, we introduced methods to translate spoken language on-the-fly in a way that is robust to STT errors and disfluencies (e.g. hesitations, missing words), and to perform multilingual NLU with comparable performance to a monolingual method. We also designed personalised machine learning methods that adapt STT and NLU systems to each user. This improved STT performance for users with a regional or foreign accent, and NLU performance when training on privacy-transformed data.
With respect to the cost-effectiveness objective, we released two software tools called COMPRISE Weakly Supervised STT and COMPRISE Weakly Supervised NLU that significantly decrease the amount of human annotated data needed to train STT or NLU systems. All these innovative software tools leverage cutting-edge deep learning and speech and language processing approaches and new approaches developed within COMPRISE.
Existing and new software tools are being integrated into an SDK interoperating with a Cloud Platform, which will provide a full-fledged open-source solution for voice technology companies and application developers. The COMPRISE SDK includes the COMPRISE Client Library, which can be deployed on any Android or iOS device and integrates all required voice functionalities, the COMPRISE App Wizard, which allows quick configuration of these functionalities, and the COMPRISE Personal Server, which runs computationally demanding services outside the device while still preserving privacy. The COMPRISE Cloud Platform provides services for data collection and curation and for system training.
We are also developing a series of demonstrators to showcase the benefits of this new technology in the sectors of smart consumer apps, e-commerce, and e-health. So far, initial versions of three demonstrators have been implemented. These use cases embed different voice features and interactions with the user in order to validate the COMPRISE tools. In fact, the adoption of these tools by the three demonstrators allows us to highlight multiple possible improvements at different levels (documentation, training materials, features, etc.).
All of these advances have been thoroughly followed and monitored via rigorous management tasks, via a thorough comprehensive summary and analysis of the main aspects regarding the General Data Protection Regulation (GDPR) that needs to be considered for the implementation of the project and the development of COMPRISE, and via efficient dissemination, communication and exploitation-related activities.
COMPRISE is the first project worldwide to address the issue of privacy in voice technology. Pioneering privacy preservation solutions have been developed based on research advances in speech processing, natural language processing and machine learning. Additional research has allowed us to significantly reduce data annotation and application development costs, which opens a market for European SMEs against tech giants, and to reduce the gap between easy-to-understand and accented users, so as to provide more inclusive user experience.
By November 2020, initial prototypes of the COMPRISE SDK, the COMPRISE Cloud Platform, and the demonstrators will be developed and ready for demonstration. The feedback collected from end users will guide the finalization of these prototypes by November 2021 and their exploitation.
Ultimately, these COMPRISE outcomes are expected to enable businesses in the Digital Single Market to quickly develop multilingual voice-enabled applications in many languages. They will also positively impact European citizens by offering unprecedented privacy guarantees, facilitating their access to voice-enabled contents and services in other languages, and improving their overall experience. COMPRISE will find application in many sectors beyond those demonstrated, e.g. e-government, e-justice, e-learning, tourism, culture, or media.