Periodic Reporting for period 2 - LISTEN (Hands-free Voice-enabled Interface to Web Applications for Smart Home Environments)
Reporting period: 2017-06-01 to 2019-05-31
The objectives of the LISTEN project have been:
• Objective 1: To develop a large-vocabulary speech recognition system for the smart home, for multiple domains (e.g. command/control of the smart home functionalities, web access, web search, email / message dictation, calendar and to-do lists, social networks, etc.), and for multiple languages. Focus has been on English, Greek, Italian, and German to demonstrate the system’s easy adaptation to further languages.
• Objective 2: To develop a front-end for robust speech capture in a distributed fashion, performing in real-time localisation of multiple speakers and accordingly enhancing the speech capture for these specific locations. The front-end design includes the hardware design of a low-cost distributed acoustic sensor network, where each sensor is equipped with an on-board processor for the signal localisation and enhancement to be performed in a collaborative manner.
• Objective 3: To create a prototype and evaluate it in practical situations, including in the Ambient Intelligence Facility of the coordinator (FORTH). To widely disseminate the project results and maximise the potential societal and commercial impact of our activities and the developed platform.
LISTEN achieved significant progress and remarkable innovations along the lines of the planned research. On one hand, we proposed WASN-based sound localization and directional speech enhancement, as well as a WASN sensor innovative design. A system that operates voice enhancement in real-time consisting of low-cost stand-alone sensors was designed, studied, and implemented. The novelty of this system is evidenced by several top-quality research publications. The “heart” of the system is the estimation of direction of arrival (DoA) of multiple sound signals from each microphone array sensor, making the problem of exact sound localization as a “bearing-only” estimation problem. For multiple concurrent sound sources, this is a difficult problem to solve, and our approach has been one of very few in this area, not only demonstrated by relevant publications but also via real-time demonstrations. In addition to this problem, we examined the problem of estimating the DoA using spherical microphone arrays, that provide information in the 3D space (vs. the 2D information that we are restricted to obtain when using planar arrays).
Similarly, the large-vocabulary speech recognition system optimized for the smart home and web-based services was implemented in the 4 languages of interest (English, Greek, German, Italian). The LISTEN consortium developed this solution in a fully local (on device) implementation, which offers the advantage of scalability (i.e. a WASN solution with tens of sensors per home, as envisioned by the consortium) and privacy to the users (no audio sent outside the home). Additionally, joint work in acoustic sensor networks and speech recognition was performed, and the innovation achieved is evidenced by the 2nd place our technology received in the widely acknowledged CHiME speech recognition “challenge”.
Overall, the technological objectives of LISTEN were clearly achieved. Progress in terms of project management and dissemination and exploitation of results is evidenced by the submission of 2 US patents, 30 top-quality scientific publications, 2 international workshops organized by the consortium academic partners, radio interviews of the LISTEN staff, newspaper reports on the project, school visits to partners premises, etc.