Skip to main content

Speech Understanding and Dialogue

Objective

SUNDIAL addresses the problem of speech-based cooperative dialogue as an interface for computer-based information services. The main technologies to be developed are continuous speech recognition and understanding, and oral dialogue modelling and management.
The project addressed the problem of speech based cooperative dialogue as an interface for computer based information services. The main technologies to be developed are continuous speech recognition and understanding, and oral dialogue modelling and management.

The project started with a number of definition studies for the general architecture and studies of application scenarios. A common architecture has been defined, together with the interfaces between the major modules.
A small 50 word vocubulary for the telephone speaker independent recognizer has been developed, suitable for a banking by phone application. Tests on the recognizer using the recognizer sensitivity analysis (RSA) technique have shown 95.6% correct recognition on the RSA 31 word vocabulary.

Preliminary results for the acoustic phonetic decoding module show that continuous density HMMs (CDHMM) achieve 77.6% word accuracy on sentences compared to 68.5% for discrete density HMMs using 275 phonetic units for the Italian language and a near 1000 word vocabulary. These results are for speaker independent recognition of telephone quality sentences, but do not take into account the effect of the linguistic processing module on sentence understanding performance.
Results for the English language using CDHMM show that phoneme recognition accuracy on the DARPA TIMIT database is comparable to that achieved by Kai-Fu Lee in the Carnegie Mellon SPHINX system.
A common dialogue manager architecture has been defined and work is in progress on its implementation.
Speech input will be sentences of naturally spoken utterances of telephone quality with a vocabulary of 1000-2000 words for each application. The grammar will be based on a subset of the four partners' languages (English, French, German and Italian). The project has begun with speaker-independent recognition of sub-word units. The second phase will consider automatic online speaker adaptation with a view to improving performance. The dialogue manager will allow users to express themselves in a restricted natural language.

Prototypes will demonstrate the technology for three main information service applications: intercity train timetables (German), flight enquiries and reservations (English and French) and a hotel database (Italian). The spoken language phenomena to be covered will be determined from analysis of both human dialogue corpora as well as human-machine simulations. Each demonstration system will be evaluated through extensive user trials.

For all demonstrators, the project has to define a common general architecture, common formalisms for grammar representation across languages, and common semantic representations for dialogue management and message generation.

Coordinator

Logica Ltd
Address
64-68 Newman Street
W1A 4SE London
United Kingdom

Participants (9)

CAP Gemini Innovation
France
Address
7 Chemin Du Vieux Chêne
38240 Meylan
CNET France Télécom
France
Address
2 Route De Tregastel
22300 Lannion
Centre National de la Recherche Scientifique (CNRS)
France
Address
Campus Du Beaulieu
35042 Rennes
Centro Studi e Laboratori Telecomunicazioni SpA
Italy
Address
Via G. Reiss Romoli, 274
10148 Torino
DAIMLER-BENZ AG
Germany
Address
Wilhelm Runge Straße 11
89081 Ulm
Friedrich-Alexander-Universität Erlangen Nürnberg
Germany
Address
Martensstraße 3
91058 Erlangen
POLITECNICO DI TORINO
Italy
Address
Corso Duca Degli Abruzzi 24
10129 Torino
SIEMENS AG
Germany
Address
Otto-hahn-ring 6
81739 München
Sarin Telematica SpA
Italy
Address
Statale 148 Pontina Km 29.100
00040 Pomezia Roma