Skip to main content

Multimodal Verification for Teleservices and Security Applications

Objective

The primary goal of the M2VTS project is to address the issue of secured access to local and centralised services in a multi-media environment. The main objective is to extend the scope of application of network-based services by adding novel and intelligent functionalities, enabled by automatic verification systems combining multimodal strategies (secured access based on speech, image and other information). The objectives are also to show that limitations of individual technologies (speech recognition, speaker verification...) can be overcome by relying on multi-modal decisions (combination or fusion of these technologies) and can find practical and important applications in the new emerging fields of advanced interfaces for tele-services.
The main goals of the project are therefore :
- to implement and validate secured access schemes welded in existing voice-based services.
- to develop new security services exploiting emerging speech and image-based recognition technologies.
- to provide secured services on non secured networks (such as PSTN, ISDN, LAN).
- to develop new services for security applications (for alarm verification and access control).
A flexible software and hardware platform has been realised. From this platform, four demonstrators have been developed, evaluated and installed at end-user sites:
Level 1 : A network-based voice mail system with access control using robust speech recognition technology and rejection (one speech modality is used: speaker dependent password recognition)
Level 2 : The same application reinforced by two additional modalities (text-dependent and text-independent verification of the voice)
1 st Level 3 : An access control to buildings realised using the level 1 system complemented with a profile recognizer
2 nd Level 3 : An access control to rooms realised using the level 1 system, the profile recognizer and a face recognizer
Expected impact
The results of the project will feed a broad range of applications in many sectors.
Particularly in the telecommunication field, the results should have a direct impact on network services as security of information and access will become increasingly important (telephone fraud in US has been recently estimated at several billion dollars).

Main contributions to the programme objectives:
Main deliverables
Complete system for secure access to local and centralised services based on multimodal verification
Contribution to the programme
Robust solutions for secured access are key to the implementation of trusted services in many sectors

Technical Approach
The project will provide a first pilot demonstrator after 12 months, which will be tested by end-users who have expressed interest in the project (most of them are project partners). In order to do the verification, several modalities will be used. These can be categorised as being either visual-based or speech-based. Among the visual information, face is the most significant for verification and will be the basis of the image modalities to be dealt with. Traditionally, faces are handled as 2D objects, as acquired by a camera. For pose independence of the head, 3D information will also be used. The following ways of recovering 3D information will be investigated :
- motion pictures (video)
- structured light with one camera
- several cameras (stereo)
One of the most important aspects of this project consists in combining the information from all available modalities.
Key Issues
It will extend the usability of network-based services by adding secured access. It will allow mobility by providing a service from any location. It will be validated and tested in at least three European languages. It will demonstrate novel technologies for user authentification based on speech and image recognition leading to a fusion of multimodal information. It will provide secured access on non secured networks. It will give solutions for access control (e.g. to tele-shopping, tele-banking or to buildings), surveillance as well as intrusion detection, and alarm verification.
Summary of trial
The goals have been reached, and the field tests are now being completed. Some hardware problems on the level 3 demonstrators have resulted in a one month delay on the field tests for these demonstrators. First results are anyhow currently available. The demonstrators are installed in places where the systems are publicly reachable. The level 1 and level 2 demonstrators are callable from the PSTN. The level 3 demonstrators are currently undergoing field tests. The achievements of the period cover also the novative and state of the art results obtained on the algorithmic side.
A synergy has been established with the VIDAS project (ACTS 057) and inside the MPEG4 ACTS Concertation for Facial Feature Extraction and Tracking potentially with other projects such as VANGUARD (ACTS 074) and other EUprograms projects. The AVBPA conference (Audio and Video-Based Biometric Person Authentication) has been organized by the algorithmic consortium of M2VTS, and will be attended by researchers from around the world. A special Open Day of AVBPA, the EFFACES Forum, will be held on 11 March '97 (one day before AVBPA) in order to allow EU projects working on Facial Feature Extraction and related topics to Image processing to present their main results and discuss on future common issues and action plans.
Continuing studies
Parallel to the development of the Pilot Demonstrators, algorithmic developments have started. The fields investigated include :
- text-dependent speaker verification from speech
- text-independent speaker verification from speech
- facial feature extraction and tracking from moving images
- verification from overall frontal view
- verification from lip shape
- verification from 3-D information obtained through structured light
- verification from Profile
- synchronization of speech and lip movement
The flexible platforms will be used to record a real-condition database for enhancement of the algorithms.
Dedicated hardware will be used to run the algorithms in real-time for the final systems. Two final demonstrators will be realised, one stand-alone and one to be installed in a PC.
Finally, an API layer is being specified for easy implementation of the various algorithms into Applications. Application Generation Tools are also being developed in order to add flexibility in the prototyping of applications covering the wide range of end user needs. Commercial Application Generation Tools will also be investigated in parallel.

Coordinator

Matra Communication
Address
Rue J P Timbaud
78392 Bois D'arcy
France

Participants (12)

BBV
Spain
Cerberus
Switzerland
EPFL
Switzerland
IDIAP
Switzerland
IMT Neuchâtel
Switzerland
Ibermatica
Spain
Renaissance
Belgium
UCL
Belgium
UNIVERSITY OF SURREY
United Kingdom
Address

GU2 5XH Guildford
Unidad Tecnica Auxiliar de la Policia
Spain
University of Carlos III
Spain
University of Thessaloniki -Aristotle
Greece