Speeding up sound recognition systems
The hearing impaired population is at an immense disadvantage when it comes to accessing spoken technological information. Visible face movements are a progressive means of heightening speech intelligibility for hearing-impaired persons in particular, and for everyone in regards to noise. Such developments are made possible via the rapid evolution of multi-modal speech technology and PC processing power. The Synface prototype in particular is a system that consists of a hybrid of recurrent neural networks (RNNs) and hidden Markov models (HMMs). The RNNs function as frame-by-frame estimators of the posterior probability of each speech sound given the acoustic evidence. Then these probabilities are fed into HMMs containing a time evolution model. A decoder extracts the best phonetic sequence for a given speech segment. The main advantage of the recogniser is that it may be useful in several situations requiring quick recognition such as in pronunciation training software. Currently the recogniser is available in versions trained for English, Swedish and Flemish.