The project work concentrated on two pillars, one being the work on wireless acoustic sensor networks (WASNs) for voice enhancement regardless of the speaker’s location and orientation in the smart home, the second pillar being a large-vocabulary speech recognition system for web-based services.
LISTEN achieved significant progress and remarkable innovations along the lines of the planned research. On one hand, we proposed WASN-based sound localization and directional speech enhancement, as well as a WASN sensor innovative design. A system that operates voice enhancement in real-time consisting of low-cost stand-alone sensors was designed, studied, and implemented. The novelty of this system is evidenced by several top-quality research publications. The “heart” of the system is the estimation of direction of arrival (DoA) of multiple sound signals from each microphone array sensor, making the problem of exact sound localization as a “bearing-only” estimation problem. For multiple concurrent sound sources, this is a difficult problem to solve, and our approach has been one of very few in this area, not only demonstrated by relevant publications but also via real-time demonstrations. In addition to this problem, we examined the problem of estimating the DoA using spherical microphone arrays, that provide information in the 3D space (vs. the 2D information that we are restricted to obtain when using planar arrays).
Similarly, the large-vocabulary speech recognition system optimized for the smart home and web-based services was implemented in the 4 languages of interest (English, Greek, German, Italian). The LISTEN consortium developed this solution in a fully local (on device) implementation, which offers the advantage of scalability (i.e. a WASN solution with tens of sensors per home, as envisioned by the consortium) and privacy to the users (no audio sent outside the home). Additionally, joint work in acoustic sensor networks and speech recognition was performed, and the innovation achieved is evidenced by the 2nd place our technology received in the widely acknowledged CHiME speech recognition “challenge”.
Overall, the technological objectives of LISTEN were clearly achieved. Progress in terms of project management and dissemination and exploitation of results is evidenced by the submission of 2 US patents, 30 top-quality scientific publications, 2 international workshops organized by the consortium academic partners, radio interviews of the LISTEN staff, newspaper reports on the project, school visits to partners premises, etc.