CORDIS - Risultati della ricerca dell’UE
CORDIS

Dealing with uncertainty in spoken language processing: Reasoning and problem-management

Final Activity Report Summary - DEAWU (Dealing with uncertainty in spoken language processing: Reasoning and problem-management)

The DEAWU project was concerned with the foundations of future spoken dialog systems (SDS). Because the quality of speaker-independent speech recognition improved significantly in recent years, SDS led to practical applications in areas like voice banking and timetable or other travel information via telephone.

Most of these systems operate in a question-answer manner. The caller is being asked a question and expected to answer it cooperatively, i.e. in a way such that the system can 'understand' it. Quite often callers do not like SDS very much, as has been confirmed by several user studies in recent years, largely because of their inflexible reaction to any kind of 'unexpected' input. As soon as the caller says something the system is not prepared for, system behaviour becomes highly unnatural, leading to frustration on the part of the hearer, which in turn often results in further utterances the system cannot handle. The heart of the problem is the lack of smart and reliable error-handling strategies. At basically every level of processing, i.e. phonetic, syntactic, semantic and pragmatic, an SDS typically cannot be entirely sure of what the caller has said and meant. It has instead to guess, accumulate evidence from different sources and then reason under uncertainty.

The DEAWU project aimed at devising strategies for precisely this task; making decisions for the next dialog step, in the light of uncertainty and potential or actual misunderstanding. In order to undertake empirical studies and build a prototypical implementation of an SDS, a suitable application domain had to be defined. We chose an interactive two-dimensional board game, 'Pentomino', which allowed for studying different types of interesting misconceptions in the dialog while keeping the overall vocabulary and syntactic variation in utterances relatively small, so that no large-scale grammar construction and wide-domain speech recognition were necessary, since they would have taken too much of the project's time.

One major contribution of the project was the collection and annotation of data. Conversations between dialog partners in a Pentomino game, in which various kinds of acoustic 'disturbance' was artificially inserted in order to prompt the conversants to display their error-handling behaviour. A suitable annotation scheme was devised and a corpus of speech data and transcriptions was thus created. This corpus, containing German and English data, was made available to other researchers for their own purposes, in particular to study linguistic behaviour in the face of misunderstandings. Our own evaluations led to a series of publications outlining models of human behaviour in dialog when ambiguity and uncertainty were present.

For developing a prototypical dialog manager component, we first set up an implementation of a voice-controlled Pentomino game. The design included a graphical user interface showing boards and tiles and an externally developed speech recognition component to enable users to control the game by speaking commands. In the case of ambiguity, the system produced clarification questions, e.g. users were prompted to specify the tile they wanted to move more clearly, in case their description could not be resolved uniquely. Step by step, we then extended the dialog management component to handle different types of uncertainty, adding some of the insights generated in the data collection phase of the project to the implementation which, by the time of the project completion, was able to serve as an assistant to a human Pentomino player, moving tiles in response to voice commands and engaging in clarification dialogs when misunderstandings were encountered.