Skip to main content

Speech Driven Multi-modal Automatic Directory Assistance

Objective

SMADA addresses all technology aspects (Automatic Speech Recognition (ASR), dialogue design and Human Factors (HF) issues) needed to automate large proportions of the Directory Assistance requests handled by Network Operators. SMADA intends to improve the robustness and accuracy of very large vocabulary ASR by developing confidence measures for isolated words and multi-words expressions, progressive search, and unsupervised learning techniques to update acoustic and language models on the basis of recordings in operational services. Access from the fixed and cellular networks is investigated, both for uni-modal voice and multi-modal Web versions of the service. SMADA will investigate Distributed Speech recognition, especially for this type of service. Realistic field tests will be performed with uni- and multi-modal demonstrators to guarantee the right directions of HF and ASR research.

Objectives:
SMADA aims at improving all aspects of the technology needed to automate a large part of the calls to the Directory Assistance service without compromising customer satisfaction. SMADA aims at a breakthrough in the robustness and accuracy of ASR technology by developing confidence measures, noise robust decoding, progressive search and techniques for unsupervised learning from the speech recorded during operation of the service. SMADA will build demonstrators for mono- and multi-modal access to DA, which are used for realistic field trials. Human Factors research addresses issues related to caller-system, and in the case of partial automation, operator-system interaction. Attention will be paid to calls from the fixed and cellular networks. SMADA addresses both mono-modal voice access and multi-modal access to Web-based DA services. Protocols supporting distributed speech recognition are investigated.

Work description:
SMADA starts from existing prototypes of systems that automate part of the DA service of four Network Operators. These systems are meant to support the operators, and eventually to replace human operators for the simplest routine calls. SMADA will evaluate and improve these prototypes. This will be done in an iterative way. The existing prototypes will first be improved by optimising existing technology (the effectiveness of which is proved by off-line tests) and by integrating the results of human factors research (that can partly be done using Wizard of Oz methods). Subsequently the prototypes will be further improved by the implementation of new and improved technology developed in the project itself. Technology development will be guided by the results of the field trials. Next to the automation of conventional human based DA services, SMADA will design and test a prototype for multi-modal access to Web-based DA information, combining speech input and text output. The project has a strong focus on improving technology for robust ASR.

There are five technological workpackages.
.The first WP deals with an in-depth evaluation of the needs for existing systems and services.
.The second, and by far the largest WP addresses the development of new technologies that are meant to improve performance and user-friendliness of the service. Topics are unsupervised learning, multi-pass decoding with progressive search, confidence measures for multi-word expressions, and adaptive acoustic and language models. This WP will use the data collected in the other WPs.
.WP3 ad-dresses the use of Distributed Speech Recognition for DA services. In this WP we will investigate the protocol issues involved in mixing voice and data in one call and develop and test prototype systems.
.WP4 is focused on the design, implementation and test of multi-modal DA information in the Internet.
.WP5 evaluates automated mono-modal voice access to DA services by means of HF research and field trials.

Milestones:
The milestones are the availability of speech data for training and testing at several moments in the project, several versions of mono and multi-modal demonstrator systems for use in the field tests, and the availability of improved technology for integration in the demonstrators. The results of SMADA are a breakthrough in the robustness and accuracy of ASR under real world conditions, HF improvements in automated DA services, and insight in the issues involved in multi-modal access to Web-based services.
The SMADA project has contributed successfully to the standardisation of a noise robust front-end algorithm to allow for distributed speech recognition, in which the calculations of the front-end are handled within a standard mobile terminal and the acoustic decoding is handled by a back-end recogniser. Intermediate proposals were submitted to ETSI jointly by the SMADA partners Alcatel and France Telecom. The candidate that was awarded as standard was submitted jointly by the SMADA partners Alcatel and France Telecom, together with Motorola. The field tests conducted within the context of the SMADA project resulted in the availability of databases of real-world recordings of actual customers of a telephone based DA service. These databases provided a rich source of information about real customer formulation variants. As a result, new algorithms for updating existing formulation variants and finding formulation variants that are new for the DA service could be successfully developed and tested. Moreover, these databases allowed developing confidence measures that are better suited in an actual DA service.

Finally, as a result of the successive field tests in the SMADA project, effective HF improvements could be developed and evaluated. Summing up, the SMADA project effectively resulted in improved functionality and usability of automated services that use ASR in their user interface. The improvements in the actual field services and the improvement of the possibility to actually use speech input as part of a multi-modal user interface for internet-based applications have both contributed to widening the range of e-commerce services that can be offered in the fixed and mobile telephone networks as well as in the internet. In Germany, access to internet services with multimodal interaction capabilities was demonstrated for different scenarios, like travel and hotel reservation, management of personal and group schedules, in which natural speech could be used together with key and hand-writing recognition as input modalities. These applications were shown in the CEBIT 2002 trade fair in Hanover. In France, successive field trials were run with a telephone DA service in different parts of the country during 2002, where each trial effectively profited from the SMADA results. In Italy, the technology developed and improved in the context of the SMADA project is actually being deployed in a DA field service with real customers. As a result, the cost of offering the DA service has significantly been reduced. In the Netherlands fully automated DA was introduced in July 2003, partly on the basis of the experience gained in SMADA.

Funding Scheme

CSC - Cost-sharing contracts

Coordinator

STICHTING KATHOLIEKE UNIVERSITEIT
Address
Geert Grooteplein-noord 9
6525 EZ Nijmegen
Netherlands

Participants (5)

ALCATEL SEL AG
Germany
Address
Lorenzstrasse 10
70435 Stuttgart
FRANCE TELECOM
France
Address
6 Place D'alleray
75505 Paris 15
LOQUENDO SPA
Italy
Address
Via Valdellatorre 4
10100 Torino
POLITECNICO DI TORINO
Italy
Address
Corso Duca Degli Abruzzi 24
10129 Torino
UNIVERSITE D'AVIGNON ET DU PAYS-VAUCLUSE
France
Address
Rue Louis Pasteur 74
84029 Avignon