Skip to main content

Investigation into the Effective Use of Speech at the Human-Machine Interface

Objective

The SPEECH project addressed voice applications in a wide range of environments. The objectives were to:
-determine the current state of the art (hardware and software) and the current areas of actual and imminent application
-explore the potential for future applications
-determine the additional new requirements for hardware and software so that the potential application areas can be realised
-provide a five-year forecast of the likely development of speech technology, with specific reference to application areas.
The project addressed voice applications in a wide range of environments. The objectives were to:
determine the current state of the art (hardware and software) and the current areas of actual and imminent application;
explore the potential for future applications;
determine the additional new requirements for hardware and software so that the potential application areas can be realized;
provide a 5-year forecast of the likely development of speech technology, with specific reference to application areas.

The results of the research showed that the effectiveness of an application was often governed by the appropriateness of the microphones and other ancillary equipment with properly calibrated equipment, consistently high background noise did not detract from recognition. A well designed vocabulary on a low quality recognizer could out perform a badly designed vocabulary on superior equipment. Many of the current successful applications used small vocabularies organized into context selected sets. Large vocabularies tended to encourage the notion that unrestricted language can be used, which is not yet possible.

The most important considerations for the development of a successful system were that:
isolated word recognition systems perform satisfactorily;
several reliable speech synthesis systems can be used with speech recognition systems to give a completely hands free environment;
speech should be part of the overall design of a system;
implementation is successful (as this requires more knowledge than the average potential system designer has available);
the greatest expenses do not arise from the cost of the voice equipment but the cost of thorough system design and integration.

The major considerations for future development are discussed.
There were several significant findings:
-The effectiveness of an application is often governed by the appropriateness of the microphones and other ancillary equipment.
-With properly calibrated equipment, consistently high background noise (up to 90 dBA) does not detract from recognition.
-A well-designed vocabulary on a low-quality recogniser can out-perform a badly designed vocabulary on superior equipment.
-Many of the current successful applications use small vocabularies organised into context-selected sets. Large vocabularies tend to encourage the notion that unrestricted language can be used, which is not yet possible.
A comprehensive final report is available which includes a list of conclusions and recommendations. These are the most important considerations for the development of a successful system:
-Isolated word-recognition systems perform satisfactorily; connected speech is possible with good equipment and good design. Continuous speech recognition is unreliable or extremely specialised and expensive.
-There are several reliable speech synthesis systems that can be used with speech recognition systems (for example, with touch-tone telephones for dial-in enquiry systems) to give a completely hands-free environment.
-Speech should be part of the overall design of a system; success is less likely when speech is added to a current package.
-Successful implementation requires more knowledge than the average potential system designer has available. The suppliers' supporting software, technical descriptions and documentation tend to be poor.
-The greatest expenses do not arise from the cost of the voice equipment but the cost of thorough system design and integration.
The major considerations for future developments are:
-Continuous speech recognition systems require research into and analysis of phonetic and linguistic factors and need to be implemented via knowledge-based interpreters on faster and cheaper processors. This is unlikely within the next ten years.
-Speech synthesis applications are the most likely for early widespread development, especially by telephone companies.
-The industrial area is the most amenable to speech recognition applications with current equipment. Further exploitation in the office environment can only come from speech synthesis and with better continuous speech recognition.
-The next generation of systems will analyse and store speech based on phonemes; this will cut down storage requirements, but will result in language and dialect dependencies.
-The achievement of true speaker-independent recognition, removing the need for "enrolment" or "voice training" for each new speaker, will be highly dependent upon the outcome of current knowledge-based system and algorithm research and is definitely som e way off.
Exploitation
Speech technology is already being successfully used, and, provided the current limitations are observed and taken account of in the design, there are good prospects for increased use of the technology in selected and restricted situations. Unrealistic expectations of the customer and over-selling by the suppliers, coupled with poor documentation, are producing a large number of failed projects, causing prospective beneficiaries to delay their commitment. Improvements in the basic technology are continuing. Continuous speech recognition is not available commercially, but neither are the systems ideas or designs that could make effective use of it. It has become clear from limited experimentation that considerable complexity in software may be necessary todeal with even quite limited vocabularies and restricted syntax where interpretation is called for. On the other hand, the benefits of simple speech input coupled with synthesised voice prompting have been demonstrated publicly by the project team. Theyhave shown the benefit of totally hands-free control and the value of a well-designed, simple command syntax.

Coordinator

British Maritime Technology Cortec Ltd
Address
Wallsend Research Station
NE28 6UY Wallsend
United Kingdom

Participants (3)

Fincantieri Cantieri Navali Italiani SpA
Italy
Address
Corso Cavour 1
34100 Trieste
International Computers Ltd (ICL)
United Kingdom
Address
Lovelace Road
RG12 4SN Bracknell
Voice Systems International
United Kingdom
Address
15 St Margaret's Road
CB3 0LT Cambridge