Community Research and Development Information Service - CORDIS


SpeechXRays Report Summary

Project ID: 653586
Funded under: H2020-EU.3.7.

Periodic Reporting for period 1 - SpeechXRays (Multi-channel biometrics combining acoustic and machine vision analysis of speech, lip movement and face)

Reporting period: 2015-05-01 to 2016-10-31

Summary of the context and overall objectives of the project

There are several biometric modalities that may be used for access control purposes. Low cost solutions based on existing embedded sensors (camera-based face recognition) are now provided as standard features of laptops and smartphones but their accuracy is low.
New sensors (fingerprint readers) can be embedded in laptops and smartphones, however they generate additional costs (as they are only used for identification purposes). Iris recognition is a promising technology as it is extremely accurate and is not sensitive to ageing, however it is not easily applicable to mobile devices (all publicly deployed iris recognition systems acquire images of an iris in the near infrared wavelength band of the electromagnetic spectrum and cannot use standard cameras). Worse, all these systems (fingerprint, face, and iris) 1can be spoofed by fake biometrics as simple as high resolution colour printouts.
The most convenient and cost-effective biometric modality is voice, which can be easily captured on a mobile device using its embedded microphone. Voice is noise robust to the human ear. However, today’s commercial solutions in voice biometrics fail to deliver the required accuracy levels and are very sensitive to ambient noise, because they rely mostly on machine learning techniques employing statistical analysis and, unlike the human ear, do not consider the acoustic correlates of voice quality.

The SpeechXRays project will develop and test in real-life environments a user recognition platform based on voice acoustics analysis and audio-visual identity verification. SpeechXRays will outperform state-of-the-art solutions in the following areas: • Security: high accuracy solution (cross over accuracy1 of 1/100 ie twice the commercial voice/face solutions) • Privacy: biometric data stored in the device (or in a private cloud under the responsibility of the data subject) • Usability: text-independent speaker identification (no pass phrase), low sensitivity to surrounding noise • Cost-efficiency: use of standard embedded microphone and cameras (smartphones, laptops) The project will combine and pilot two proven techniques: acoustic driven voice recognition (using acoustic rather than statistical only models) and multi-channel biometrics incorporating dynamic face recognition (machine vision analysis of speech, lip movement and face). The vision of the SpeechXRays project is to provide a solution combining the convenience and cost-effectiveness of voice biometrics, achieving better accuracies by combining it with video, and bringing superior anti-spoofing capabilities. The technology will be deployed on 2000 users in 3 pilots: a workforce use case, an eHealth use case and a consumer use case. The project lasts 36 months and is coordinated by world leader in digital security solutions for the mobility space.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

We can describe the work during this period as two main tracks which we have made them converging during the last six months. Indeed we can consider that our efforts have been essentially related to Use Cases Tests specifications and main Biometrics and Security development.
About Use Case Test definitions, it is important to underline that those tasks are really the starting point of the project providing a general frameworks for all our activities. Especially our aim to provide the best security and privacy solutions must rely on real applications. A very important aim of the project is to demonstrate that the biometrics authentication with the constraints of providing a high level of security and privacy enforcement can be accepted by end-users as convenient way to enable access control. Within that perspective, the partners in charge of this work succeed to delivered complete and final specifications and a public perspective about it.
For the development of components we can focus on the outputs linked to Biometrics modalities such as voice and face and the security function of the cancellable biometrics library. Those designs and implementations face impressive technical challenges and even if we have some delays the early outputs shows very interesting results. The originality of the approach has been largely recognised during the EAB conference in Darmstadt in September 2016. A lot of our thought have been related to the validation and the search of the adequate procedure.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

The overall Objective of SpeechXRays is develop innovative solution for Biometrics recognition especially regarding Voice and Face modalities. More than the development of those components we target an evaluation that includes real experiments in the fields enabling practical use cases
During this period, use case definitions were a major milestone of our efforts. To initiate the work we wanted to have a clear and common view on what will be bring by each partner in the objective of the Use Case Tests. As soon as this view will be set and accepted, the most important part of the individual effort will be clearly specified. We shall underline that our design and development have produced with the constant thoughts of better security and privacy enforcement. We can see it as a “privacy by design” approach.

The WP2 has been dedicated to the development of a multimodal biometrics solution based on acoustic and machine vision analysis of speech, lip movement and face relying on a method never considered before: Combine acoustic analysis of the speech spectrogram with statistical analysis of the soundwave patterns. This system (with small changes) achieved good performances on the speaker evaluations on the MOBIO database.
In addition, TSP adapted and combined the baseline reference for speech (GMM-based system ) and face (SudFrog) system. The fusion of the speech and face modalities provided improvement over single modalities (as tested on MOBIO database under common protocols).

As mentioned Security and Privacy are major challenges. OT targets a fine grained security analysis of the environment and of the use case to ensure the adequate evaluation of security and privacy level. Again we have underlined the importance of linking this analysis with the solution compare to our objectives for Use Cases. Tech has implemented a secure and privacy-preserving mechanism for user enrolment and for speaker recognition.

We surveyed the national and European regulations for the protection of biometric data and assessed the impact that future changes will have on the project in general and the use case in particular (e.g., the REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL which was published on April 2016). We determined the life-cycle of biometric data based on the national and European regulations and the internal security requirements in the Use Cases Locations.

In a variety of government and private domains biometric recognition is being promoted as a technology that can help provide better control of access to physical facilities and IT accounts, and increase the efficiency of access to services and their utilization. Biometric recognition has been applied to patient tracking in medical informatics, and the personalization of social services, among other things. Our approach will lead to a strong knowledge about end-users adoption of very complete and innovate solutions in Use Case that covers a lot of social questions: e-health, Multimedia usage and daily workforce.

Related information

Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top