Servicio de Información Comunitario sobre Investigación y Desarrollo - CORDIS

Speech Driven Multi-modal Automatic Directory Assistance (SMADA): modelling user formulation variants

The SMADA project aimed at an improved functionality and usability of automated services that use automatic speech recognition (ASR) in their user interface, either as the only input/output modality (i.e. over the telephone) or as one of the modalities in multi-modal interfaces. The results of experiments on modelling user formulation variants include:

- Generating business entry variants:
The analysis of the traffic has shown that about 80% of the Directory Assistance DA customer accesses are related to business listings. Thus, it is important to improve the percentage of success of the automatic system for this class of calls. Directory Assistance for business listings, however, is a challenging task: one of its main problems is that customers formulate their requests for the same listing with great variability. Since the content of the original records in the database does not, typically, match the linguistic expressions used by the callers, a complex processing step is needed for deriving a set of possible formulation variants (FVs) from each original records in the listing book.

A large percentage of user expressions, however, still remain uncovered by the FV database. Thus, we have proposed a procedure for detecting, from field data, user formulations that were not foreseen by the designers. These formulations can be added, as variants, to the denominations already included in the system to reduce its failures.

Our approach is based on partitioning the field data into phonetically similar clusters from which new user formulations can be derived.

Our working hypothesis, confirmed by the experimental results, was that collecting a large number of requests for the same denomination, there is high probability of obtaining clusters of phonetically similar strings, characterized by high cardinality and small dispersion of the included strings, whose central elements, defined as the string that has the minimum sum of the distance from all the other elements of the cluster, are quite accurate phonetic transcriptions of (possibly new) user formulations.

During the project we collected tens of millions of phonetic strings referring to business listings routed to the operators because the automatic system was unable to terminate the transaction with the customer. Our procedure is able to filter a huge amount of calls routed to the operators, and to detect a limited number of phonetic strings that can be inspected by human operators, easily transcribed orthographically, and associated with the corresponding phone number. This approach has been used to update the system vocabulary giving a significant reduction of the system failures on a field test set.

- Dealing with lexical variants for proper name recognition:
Recognition of a large vocabulary of proper names is a difficult task of a very high perplexity. A suitable dialog strategy can substantially reduce the false automation rate if the Word Error Rate (WER) on proper name recognition is kept low.

In principle, a good strategy should evaluate an input with an initial set of ASR systems and produce an indication of acceptance or rejection and, for each case, suitable new processes which may involve specialized discriminative recognisers should be executed for refining the confidence in a decision until the confidence is so high that the phase known in computer transaction as ?commit? can be reached.

Of particular interest for DA are decoders based on lexical models that account for distortions of a canonical pronunciation as they appear in surface phonetic representations of words.

Search based on a network with all possible distortions of canonical forms may lead to an increase in word error rate because the knowledge used includes a large number of distortion models which are inconsistent with the distortion types introduced by a given speaker.

A methodology has been introduced, based on the above considerations, for rescoring the N-best hypotheses generated, after a short dialogue, by a system developed at France Telecom R&D for the recognition of proper names pronounced in isolation and belonging to the whole French directory. A blackboard-based architecture has been proposed for scheduling the execution of different recognition processes using different lexical models.

Using this architecture, a consensus based verification strategy has been developed and tested with a French directory of more than 100,000 entries. Results have shown much better performance with respect to the use of posterior probability.

A journal paper is in preparation on this topic.

- Generating pronunciation variants for city-names recognition:
Recognizing city-names is mandatory in many applications such as directory assistance, tourism information, etc. However this task is quite difficult in France as it implies a large vocabulary (40,000 city-names). Furthermore, some names are short monosyllabic words, while other ones, such as long official compound-names, are frequently abbreviated in shorter common names. Hence rules were defined to predict automatically short abbreviated common names, in order to add those extra variants in the recognition vocabulary.

The principle of this rule-based approach has been published at ICASSP 2003.

More information on the SMADA project can be found on the project’s website:

Información relacionada

Reported by

Politecnico di Torino - DAUIN
C. Duca degli Abruzzi, 24
10143 Torino
See on map
Síganos en: RSS Facebook Twitter YouTube Gestionado por la Oficina de Publicaciones de la UE Arriba