SUNDIAL addresses the problem of speech-based cooperative dialogue as an interface for computer-based information services. The main technologies to be developed are continuous speech recognition and understanding, and oral dialogue modelling and management.
The project addressed the problem of speech based cooperative dialogue as an interface for computer based information services. The main technologies to be developed are continuous speech recognition and understanding, and oral dialogue modelling and management.
The project started with a number of definition studies for the general architecture and studies of application scenarios. A common architecture has been defined, together with the interfaces between the major modules.
A small 50 word vocubulary for the telephone speaker independent recognizer has been developed, suitable for a banking by phone application. Tests on the recognizer using the recognizer sensitivity analysis (RSA) technique have shown 95.6% correct recognition on the RSA 31 word vocabulary.
Preliminary results for the acoustic phonetic decoding module show that continuous density HMMs (CDHMM) achieve 77.6% word accuracy on sentences compared to 68.5% for discrete density HMMs using 275 phonetic units for the Italian language and a near 1000 word vocabulary. These results are for speaker independent recognition of telephone quality sentences, but do not take into account the effect of the linguistic processing module on sentence understanding performance.
Results for the English language using CDHMM show that phoneme recognition accuracy on the DARPA TIMIT database is comparable to that achieved by Kai-Fu Lee in the Carnegie Mellon SPHINX system.
A common dialogue manager architecture has been defined and work is in progress on its implementation.
Speech input will be sentences of naturally spoken utterances of telephone quality with a vocabulary of 1000-2000 words for each application. The grammar will be based on a subset of the four partners' languages (English, French, German and Italian). The project has begun with speaker-independent recognition of sub-word units. The second phase will consider automatic online speaker adaptation with a view to improving performance. The dialogue manager will allow users to express themselves in a restricted natural language.
Prototypes will demonstrate the technology for three main information service applications: intercity train timetables (German), flight enquiries and reservations (English and French) and a hotel database (Italian). The spoken language phenomena to be covered will be determined from analysis of both human dialogue corpora as well as human-machine simulations. Each demonstration system will be evaluated through extensive user trials.
For all demonstrators, the project has to define a common general architecture, common formalisms for grammar representation across languages, and common semantic representations for dialogue management and message generation.
00040 Pomezia Roma