Speech Recognizer Quality Assessement for Linguistic Engineering


The project aims at developing an assessement paradigm for large vocabulary, speaker independent, continuous speech recognition in Europe, taking into account the distinctive characteristics of a multilingual environment and identifying the problems it raises . Also, the project will begin the definition of guidelines for future assessement actions. If the SQALE project proves successful, these guidelines could be extended to an evaluation paradigm for future large scale European language/speech programs.

The project will also directly contribute to the assessement and evaluation of NLP systems in at least three ways:

- a general framework will be established for comparing machine generated output with reference corpora;
- a first step will be taken toward handling real-word phenomena, such as false starts and hesitations; - the effects of differing test set perplexities across various European languages will be quantified.

The SQALE experiment will therefore not only extend European standards in speech recognition assessment (which are limited to isolated and connected word systems, without a direct link with language models) but also initiate the necessary and much awaited integration between speech and NL assessment methodologies.

Approach and Methods

The project takes into account the experience gained by the partners in the 1992 DARPA RM and WSJ evaluations in order to investigate how the US protocols can be improved and extended into a multilingual experimental design, as required for a European approach.

The basic idea of the project is to form a small consortium, made up of a coordinating laboratory - having a high technical expertise in the field - and three other laboratories testing their "in house" recognition systems. The "testing" laboratories are located in three different countries, where three different languages are spoken. Hence two dimensions of the research paradigm are investigated: the recognition algorithms (at least 3) and the languages (at least 3). In particular, the experiment will focus on two independent research questions:

the merits of different recognition algorithms applied to the same data, and
the relative difficulties in speech recognition across different languages.

Having multiple sites applying their algorithms on the same database makes it possible to discuss the merits of different methods on the same data. Testing the same algorithm on different databases in different languages will reveal the relative difficulties of speech recognition for different languages, and the degree of robustness of the algorithm with respect to a given language.

Each testing laboratory will be responsible for providing data in its own language - both written and spoken corpora - for assessing its systems according to a commonly accepted protocol and for performing the assessement procedure for English and at least one other language. The coordinator will organize the assessement experiment and will be responsible for timely distribution of training and test materials, and for gathering of the tests results. He will also score the recognizers output and analyze the results.

The high quality of the three test sites and their recognition systems (all three labs proved to perform at the top level in the DARPA 92 bench mark test) and the high technical standard of the coordinating laboratory are considered essential ingredients for the success of the project. .

Exploitation and Future Prospects

SQALE intends to bridge the gap between the state-of-the-art in commercial systems assessment - as examplified by the SAM Esprit project - and the state-of-the-art in research systems assessement - as represented by ARPA. It will therefore have a direct relevance to current leading edge research and development, and should also have a pull through effect on future application-driven and technology-driven research. Furthermore, SQALE will operate in a multilingual European context and will therefore go beyond the current ARPA scope. Cross-language assessment and evaluation have never been performed on this scale previously: SQALE will be a pioneer project in this respect.

As far as more immediate and practical results of the project are concerned, the dissemination of the following material is envisaged (through EAGLES):

the speech corpora, including the speech signal and the associated transcription, the lexica and the text corpora;
the results obtained by each testing participant in its own language and in the common language;
the guidelines and recommendations on how to conduct and organize systems evaluation in a multinational, multilingual context.

These results will constitute a baseline from which it will be possible to improve the methodology, enlarge the number of participants, augment the difficulty of the tasks and ensure the coordination with closely related research areas, such as written language processing and machine translation.

A primary basis for interaction between speech and NL systems will be represented in fact in the near future by the common use of text corpora and statistically based language models. The development of common assessement methodologies and protocols will be equally relevant for NL and speech integration.


TNO Institute for Human Factors
Kampweg 5, Po Box 23
3769 ZG Soesterberg

Participants (3)

Centre National de la Recherche Scientifique (CNRS)

Philips GmbH
Weisshausstraße 2
52066 Aachen
University of Cambridge
United Kingdom
Trumpington Street
CB2 1PZ Cambridge