Service Communautaire d'Information sur la Recherche et le Développement - CORDIS

A phoneme recognition system with a very low delay has been developed for the Synface prototype

A state-of-the-art phoneme recognition system with a very low delay has been developed for the SYNFACE prototype. The conditions faced by the recogniser for SYNFACE are adverse in many respects. The system is required to be speaker independent because the identity of the caller is not known in advance; task independent because the conversation is not restricted to any particular domain; narrow band because the conversation is supposed to happen across the telephone line; low latency because very short delay is allowed between the incoming speech and the lip movements of the avater, if the turn taking mechanism in the telephonic conversation is to be preserved.The system is based on a hybrid of recurrent neural networks (RNNs) and hidden Markov models (HMMs). The RNNs are used as frame-by-frame estimators of the posterior probability of each speech sound given the acoustic evidence. The probabilities are then fed into the HMMs that bear a model of time evolution. A Viterbi-like decoding scheme is employed in order to obtain the best phonetic sequence for a given speech segment. The recogniser could be used in many situations where a very fast recognition is useful, for example in pronunciation training software. Today the recogniser exists in versions trained for English, Swedish and Flemish.

Informations connexes

Reported by

Research dept.
KTH, Speech, Music and Hearing, Lindstedtsv 24
100 44 Stockholm