Service-Oriented Ubiquitous Network-Driven Sound

Periodic Reporting for period 1 - SOUNDS (Service-Oriented Ubiquitous Network-Driven Sound)

Período documentado: 2021-01-01 hasta 2022-12-31

The SOUNDS European Training Network (ETN) revolves around a new and promising paradigm coined as Service-Oriented, Ubiquitous, Network-Driven Sound. Inspired by the ubiquity of mobile and wearable devices capable of capturing, processing, and reproducing sound, the SOUNDS ETN aims to bring audio technology to a new level by exploiting network-enabled cooperation between devices. The Internet-of Things (IoT) has created numerous opportunities for audio technology by allowing devices to mutually connect and establish ad-hoc networks. However, it has also led to a technology gap, in that IoT-connected audio devices could achieve much more than simply streaming audio from one device to the other.

The SOUNDS ETN aims to fill this gap by adding a software layer in which audio devices cooperate, on top of the connectivity provided by the IoT. We thus envision the next-generation audio technology to be capable of providing enhanced hearing assistance, creating immersive audio experience, enabling advanced voice control and much more, by seamlessly exchanging signals and parameter settings, and by spatially analyzing, processing, and reproducing sound jointly with other nearby audio devices and infrastructure. Moreover, such functionality should be self-organizing, flexible, and scalable, requiring minimal user interaction for adapting to changes in the environment or network. It is anticipated that this paradigm will eventually result in an entirely new way of designing and using audio technology, by considering audio as a service enabled through shared infrastructure, rather than as a device-specific functionality limited by the capabilities and constraints of a single user device.

To attain this paradigm shift in audio technology not only requires additional research but also calls for a new generation of qualified researchers with a transdisciplinary and international scientific profile, strong collaborative research and research management skills, and the intersectoral expertise needed to carry research results from academia to industry. The SOUNDS ETN will offer the best possible framework for achieving these goals, by organizing advanced, interdisciplinary research training, developing solid transferable skills, and providing intersectoral and international experience in a network of qualified and complementary industrial and academic institutions.

Various problems relating to speech, audio, and acoustic signal processing have been considered for the specific scenario that acoustic microphone and/or loudspeaker arrays are distributed in space. The following challenges, arising due to the ad-hoc and spatially scattered nature of such distributed acoustic transducer arrays, have been investigated:
• Generalization of a single-channel speech enhancement strategy to a binaural speech enhancement strategy for binaural microphone arrays by means of the "better-ear processing" principle;
• Estimation of the distance between audio devices by means of a new set of relative acoustic features identified from distributed microphone signals;
• Compensation of the distance between audio devices by incorporating microphone-dependent delays for dereverberation of distributed microphone signals;
• Assessment of the shortcomings of existing audio quality evaluation methodologies for the sensory evaluation of spatially dynamic audiovisual sound scenes.

In the SOUNDS paradigm, ubiquitous processing is to be achieved by exploiting IoT-enabled inter-device cooperation rather than relying on massive, dedicated audio infrastructure. The following challenges, arising due to the envisaged network-driven approach to audio signal processing, have been investigated:
• Development of a strategy that allows two in-car speech processing systems (i.e. speech enhancement and in-car communication) to collaborate with limited information exchange;
• Redesign of sound zone control methods to cope with a reduced access to physical microphone signals by means of kernel interpolation weighting;
• Conversion of a transmit sound zone control problem into a virtual receive sound zone control problem by exploiting duality;
• Development of a method for sampling rate offset estimation and compensation for distributed adaptive node-specific signal estimation in wireless acoustic sensor networks;
• Reduction of the required information exchange between nodes in a wireless acoustic transducer network for sound zoning by truncation of room impulse responses;
• Low-latency transmission of speech signals in wireless acoustic transducer networks by deep joint source-channel coding.

The need for intelligent, self-configuring devices and algorithms, having the capability of becoming aware of the acoustic environment in which they operate, has led to the investigation of the following problems involving the estimation of room acoustics properties, and the localization, detection, and classification of sound sources:
• Optimal estimation of a speech signal from hearing aid microphones and a remote microphone;
• Sound source localization in wireless acoustic sensor networks by means of distributed steered response power mapping;
• Sound source localization on distributed microphone networks by means of graph neural networks;
• Fast direction-of-arrival estimation with deep complex-valued convolutional-recurrent neural networks;
• Blind single-input multiple-output room acoustics identification in wireless acoustic sensor networks by means of online-ADMM and distributed adaptive norm estimation.

The SOUNDS ETN aims to equip its Fellows with a unique transdisciplinary scientific profile founded on five scientific disciplines: acoustics, auditory perception, machine learning, communication networks, and signal processing. In this way, the SOUNDS ETN fills a gap in the landscape of European research training networks, by clearly positioning itself at the intersection of recent and ongoing programmes. Furthermore, the SOUNDS research programme embraces two timely research areas that have primarily been explored independently in other research projects: the area of acoustic and auditory modeling and processing and the area of network-driven sensing and computing.

The strong industry involvement and the considerable share of Technology Development activities in the SOUNDS ETN will likely yield a significant amount of exploitable results in each of the targeted areas. Specifically, but by no means limitatively, the following exploitable results are anticipated:
(1) Signal processing algorithms exploiting cooperation with external audio devices for listening comfort improvement and speech intelligibility enhancement in hearing aid systems;
(2) Cooperative speech processing and machine learning algorithms achieving high speech quality and user flexibility in voice communication systems;
(3) Multi-device speech processing and machine learning algorithms for enhanced meeting services;
(4) Speech processing and machine learning algorithms providing robust and seamless voice control for home automation by running on IoT-connected audio devices;
(5) Audio processing and machine learning algorithms for the analysis of acoustic scenes and events in smart-home audio device networks;
(6) Audio quality models for object-based spatial audio streaming in ad-hoc loudspeaker networks;
(7) Audio signal processing algorithms for object-based spatial audio rendering in ad-hoc loudspeaker networks.

2021-12-10-sounds-ws-attendees01-small.jpeg

Periodic Reporting for period 1 - SOUNDS (Service-Oriented Ubiquitous Network-Driven Sound)

Compartir esta página

Descargar