Synthesised talking face derived from speech for hearing disabled users of voice channels

Speeding up sound recognition systems

A high level phoneme recognition system with minimum delay has been developed for the Synface prototype. This is a multilingual technology for a speech-derived synthetic face providing important visual speech information to hearing impaired users of telephone and other voice channels.

Digital Economy

The hearing impaired population is at an immense disadvantage when it comes to accessing spoken technological information. Visible face movements are a progressive means of heightening speech intelligibility for hearing-impaired persons in particular, and for everyone in regards to noise. Such developments are made possible via the rapid evolution of multi-modal speech technology and PC processing power. The Synface prototype in particular is a system that consists of a hybrid of recurrent neural networks (RNNs) and hidden Markov models (HMMs). The RNNs function as frame-by-frame estimators of the posterior probability of each speech sound given the acoustic evidence. Then these probabilities are fed into HMMs containing a time evolution model. A decoder extracts the best phonetic sequence for a given speech segment. The main advantage of the recogniser is that it may be useful in several situations requiring quick recognition such as in pronunciation training software. Currently the recogniser is available in versions trained for English, Swedish and Flemish.

Project Information

SYNFACE

Grant agreement ID: IST-2001-33327