Trusting in words; novel speech-driven services

The successful deployment of new speech-driven Directory Assistance (DA) services and other applications will depend on the accuracy and reliability of the recognition results. Reliable confidence measures thus play a crucial role and are necessary in all practical applications where a decision must be made whether a recognised word or sentence should be accepted or rejected.

Digital Economy

Automatic speech recognition (ASR) takes as input the user's speech and produces a transcription, potentially including errors, of what the user said. Real applications for speech recognition technology need reliable systems that achieve consistently correct results in different tasks and environments. However, current speech recognition systems have not been perfected yet, and the process of identifying errors in the speech recognition process remains important. In particular, the recognition of a large vocabulary of proper names is very difficult. Confidence measures are a means to manage the uncertainty relating to the accuracy of a speech recognition system's results. Acoustic confidence measures are thus useful in many aspects of speech recognition, such as error rejection, out-of-vocabulary word detection and keyword spotting. The EU-funded SMADA project has conducted research on the impact of two different causes of ASR errors: confusion of acoustically similar names and words and problems caused by background noise or unclear articulation. The project found that confidence measures based on simple acoustic likelihood give the best results in correcting the confusion of acoustically similar words. For problems caused by noise or bad articulation, confidence measures should be based on a more complex algorithm. The algorithm should be able to compare the proportion of the probability mass of the first-best hypothesis relative to the competing hypotheses. The project resulted in confidence measures that are sufficiently reliable to allow the dialogue manager of an automated Directory Assistance (DA) to decide what task the user is trying to accomplish and how to resolve any ambiguity. It ensures that when a system makes a decision, the Word Error Rate (WER) is acceptably low. These measures allow a reduction in the number of dialogue exchanges, and thus reduce the duration of the interaction. They can also limit the percentage of the wrong proposals made by the automatic service. The results of this project can also be used for unsupervised training and adaptation.