The main objective of this project is to learn how artificial neural networks (ANNs) can be used for continuous speech recognition to significantly improve state-of-the-art systems and, using dedicated hardware, to develop fast implementations of the resulting algorithms, ie real-time recognition and fast turnaround of training. More specifically, this project addresses the problem of improving state-of-the-art, hidden markov model (HMM)-based, large vocabulary, speaker dependent and independent, continuous speech recognition systems by means of hybrid HMM/ANN structures.
In this framework, different ANN architectures will be compared and speaker adapation methods will be developed. This project contains two parts with very strong inter-dependencies:
- development and evaluation of theories and methods to improve hybrid HMM/ANN systems, and
- development of hardware and software tools to help the research and to implement resulting algorithms.
Hybrid structures consisting of combinations of hidden Markov models (HMM) and artificial neural networks (ANN) are being exploited to improve the state of the art in large vocabulary, continuous speech recognizers. Building on existing prototypes that were available the project includes state of the art HMMs and ANNs and explores aspects such as theory, implementation, improved training and speaker adaptation in hybrid HMM/ANN systems.
The results have shown that the hybrid approach was able to achieve a recognition performance comparable to much more sophisticated state of the art HMMs, with around 5% error rate on the DARPA Resource Management database (1000 words, speaker independent, continuous speech recognition task).
APPROACH AND METHODS
The consortium brings together partners with existing skills and baseline systems in the area: LHS and ICSI (Intl. Computer Science Institute, Berkeley, CA, subcontractor) in hybrid hidden Markov model (HMM)/multilayer perceptron (MLP) structures and CUED in recurrent neural network (RNN) structures, both of which perform competitively with state-of-the-art HMM technology; INESC in artificial neural networks (ANNs) and speaker adaptation, and ICSI in their development of the Ring Array Processor (RAP) which provides over 500 Mflops and which is now being used for computation by each partner.
The main research themes include further development and improvement of the baseline HMM/MLP hybrid, and development of an HMM/RNN hybrid; definition of common recognition software to be used as a basis for comparison and assessments of research results; comparison of both MLP and RNN hybrid systems; development of better acoustic features with enhanced speaker and communication channel robustness; incorporation of improvements in hybrids analogous to those used in state-of-the-art HMM recognisers; development of better training procedures; investigation of fast speaker adaptation in hybrids; demonstration of real-time recognisers and their evaluation against state-of-the-art HMMs and international reference databases such as DARPA Resource Management (1000 words, speaker independent, continuous speech) and Wall Street Journal (5000 and 20000 words, speaker independent, continuous speech).
The training of hybrid structures is highly computer intensive. The inclusion of ICSI as a subcontractor gives the consortium access to the very high performance hardware (RAP and a VLSI processor called SPERT) and software tools which ICSI has developed and will further adapt as the project progresses. These hardware and software tools will be used as a common platform of this project.
This project is expected to make a significant technical and scientific contribution to the use and understanding of HMM/ANN hybrids and of HMMs and ANNs separately in speech recognition, pattern recognition, and to the neural computing involved. It will also provide a testbed for a new generation of commercial speech recognition systems exploiting hybrid HMM/ANN technology.
CB2 3RF Cambridge