Skip to main content

Advanced Algorithms and Architectures for Speech and Image Processing

Objective

The objective of SIP was to develop the algorithmic and architectural techniques required for recognising and understanding spoken and visual signals, and to demonstrate these techniques by means of suitable applications.
The work was planned in three parallel areas: speech analysis, image analysis and pattern recognition and understanding.
With respect to speech, the initial application target was to extend as far as possible current state-of-the-art techniques for speech recognition. The resulting system was to be tested using a vocabulary of the order of 1,000 words with constrained syntax and using continuous speech.
For image processing, the project attempted to go beyond treating the image merely as sampled data. Applications involved in medical imagery and industrial inspections were used to test the tools and to study architectural and implementation issues. At the higher level of processing, close commonality can be expected between techniques for speech and image processing. Subsequent work will study architectures suitable for the higher levels, which can interface with the lower level systems.
Algorithms and prototype equipment are available for recognition of continuous speaker dependent speech and for understanding phrases within restricted semantic domain, and also for image feature extraction and recognition. Applications are in fields such as medicine, robotics and telecommunications.
A prototype was made available on 29/03/89
The operating environment is as follows :
Hardware: special hardware coupled with Symbolics

The objective of speech and image processing (SIP) was to develop the algorithmic and architectural techniques required for recognising and understanding spoken and visual signals and to demonstrate these techniques by means of suitable applications. The work was planned in 3 parallel areas: speech analysis, image analysis and pattern recognition and understanding.
Progress on speech processing was made along 2 complementary lines: a statistical approach and a knowledge based approach. Preliminary results were obtained from the statistical approach, based on a first implementation, using very large lexicons. For the knowledge base approach, a methodology for representation of the lexical and acoustical knowledge was chosen. In addition, the architecture of the acoustical front end was realized and the first digital signal processing boards tested.
A coordinated set of algorithms and architectures for image recognition and understanding was developed and demonstrated. Layer approaches based on single instruction multiple data (SIMD) and multiple instruction multiple data (MIMD) machine were realized for image feature extraction.
Implementation aspects of the physical architecture for high level processing based on transputers fully interconnected through a switching network were analyzed in detail. A switching element for nonlocal communication was designed outside the project, and the first building-block, comprising 2 processing elements and a hardware emulation of the interconnection network, is now available.
Progress on speech processing was made along two complementary lines: a statistical approach and a knowledge-based approach. Preliminary results were obtained from the statistical approach, based on a first implementation, using very large lexicons. For the knowledge-based approach, a methodology for representation of the lexical and acoustical knowledge was chosen. In addition, the architecture of the acoustical front-end was realised and the first digital signal processing boards tested. The lexical access and the verification based on a Hidden Markov Model were demonstrated on a VAX machine on a set of short sentences uttered by a single speaker in a noisy environment. Methods to incorporate syntactic and semantic information were studied to achieve understanding of uttered sentences. A small question-answering system running on a Symbolics machine was demonstrated. The system starts from the word lattice produced by the speech system, builds a representation of the query-using syntax and semantics, and inally answers the query.
A coordinated set of algorithms and architectures for image recognition and understanding was developed and demonstrated. Layer approaches based on Single Instruction Multiple Data (SIMD) and Multiple Instruction Multiple Data (MIMD) machine were realised for image feature extraction. A heterogeneous approach was taken, linking a SIMD GAPP array for the low-level processing and an MIMD transputer-based machine or an array processor for the medium-level processing. The interfaces and the I/O of the data we re developed and optimised. Estimates of performance were derived from a set of algorithms running on the different parts of the architecture. This was improved by setting up real benchmarks. Specific work was done to provide a coordinated set of algorithmic tools for digital angiography applications.
Implementation aspects of the physical architecture for high-level processing based on transputers fully interconnected through a switching network were analysed in detail; a switching element for non-local communication was designed outside the project, and the first building-block, comprising two processing elements and a hardware emulation of the interconnection network, is now available.
PIPES, the first prototype realisation of a Prolog transputer-based machine where the transputers are fully interconnected using a packet-switched network, was demonstrated. It will be implemented on the high-level architecture for speech and image understanding and applied to real-time tasks.
Exploitation
SIP has been the source of applications in sound, vision and robotics through the development of a coordinated set of algorithms and architectures for image recognition and understanding. It provides the foundation for applications in medicine, in industry and in other domains. Project results also support the development of intelligent workstations to support both graphic and image processing.
The successful combination of statistical techniques and knowledge-based techniques for speech recognition will result in a major breakthrough in the field. The complete real-time stand-alone system displaying spoken Italian which is now under developmentwill be adapted for French and German.

Coordinator

Centro Studi e Laboratori Telecomunicazioni SpA
Address
Via G. Reiss Romoli, 274
10148 Torino
Italy

Participants (4)

Daimler-Benz AG
Germany
Address
Wilhelm-runge-straße 11
89013 Ulm
GEC-Marconi Materials Technology Ltd
United Kingdom
Address
Elstree Way
WD6 1RX Borehamwood
Thomson CSF
France
Address
3 Avenue De Belle Fontaine
35510 Cesson-sévigne
Université de Strasbourg I (Université Louis Pasteur)
France
Address
7 Rue De L'université
67000 Strasbourg