When one considers human performance as our target, universal automatic recognition of speech is far from a solved problem. This seems to be related by a large amount to feature extraction, modelling and adaptability weaknesses, as discussed in recognized publications. Strikingly, these weaknesses remain fully in the case of clean speech in known conditions, emphasizing deficiencies in dealing with intrinsic speech variabilities and extracting information form the signal itself. This has however been partly hidden by the more pressing problem of making state-of-the-art systems usable in real noisy situations, under constrained tasks, with the implicit target of reaching "clean speech" performance, with deserved success.
The goal of DIVINES is to develop some new knowledge towards renewed feature extraction and modelling techniques that would have better capacities, particularly in handling speech intrinsic variabilities. First, human and machine performance and the effect of intrinsic variabilities will be compared based on a diagnostic procedure. The outcomes of this analysis will then be exploited to target feature extraction, acoustic and lexical modelling. Compatibility with techniques dealing with noise and integration within current systems are also part of the objectives.
The project is relevant to the "multimodal interfaces" objective as it concerns more accurate and adaptable recognition of spoken language. This is central to the concept of multimodal man-machine interaction where the speech understanding service is likely to remain an independent component in a modular design. Advances in this field could be decisive in realizing the vision of natural interactivity.
Funding SchemeSTREP - Specific Targeted Research Project
H3A 2T5 Montreal (Qc)