Skip to main content
Ir a la página de inicio de la Comisión Europea (se abrirá en una nueva ventana)
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

Service-Oriented Ubiquitous Network-Driven Sound

Periodic Reporting for period 2 - SOUNDS (Service-Oriented Ubiquitous Network-Driven Sound)

Período documentado: 2023-01-01 hasta 2025-06-30

The SOUNDS European Training Network (ETN) revolves around a new and promising paradigm coined as Service-Oriented, Ubiquitous, Network-Driven Sound. Inspired by the ubiquity of mobile and wearable devices capable of capturing, processing, and reproducing sound, the SOUNDS ETN has aimed to bring audio technology to a new level by exploiting network-enabled cooperation between devices. The Internet-of Things (IoT) has created numerous opportunities for audio technology by allowing devices to mutually connect and establish ad-hoc networks. However, it has also led to a technology gap, in that IoT-connected audio devices could achieve much more than simply streaming audio from one device to the other.

The SOUNDS ETN has contributed to filling this gap by adding a software layer in which audio devices cooperate, on top of the connectivity provided by the IoT. We thus envision the next-generation audio technology to be capable of providing enhanced hearing assistance, creating immersive audio experience, enabling advanced voice control and much more, by seamlessly exchanging signals and parameter settings, and by spatially analyzing, processing, and reproducing sound jointly with other nearby audio devices and infrastructure. Moreover, such functionality should be self-organizing, flexible, and scalable, requiring minimal user interaction for adapting to changes in the environment or network. It is anticipated that this paradigm will eventually result in an entirely new way of designing and using audio technology, by considering audio as a service enabled through shared infrastructure, rather than as a device-specific functionality limited by the capabilities and constraints of a single user device.

To attain this paradigm shift in audio technology not only requires additional research but also calls for a new generation of qualified researchers with a transdisciplinary and international scientific profile, strong collaborative research and research management skills, and the intersectoral expertise needed to carry research results from academia to industry. The SOUNDS ETN has offered the best possible framework for achieving these goals, by organizing advanced, interdisciplinary research training, developing solid transferable skills, and providing intersectoral and international experience in a network of qualified and complementary industrial and academic institutions.
Various problems relating to speech, audio, and acoustic signal processing have been considered for the specific scenario that acoustic microphone and/or loudspeaker arrays are distributed in space. Four challenges, arising due to the ad-hoc and spatially scattered nature of such distributed acoustic transducer arrays, have been investigated and have led to the following final project results:
Challenge 1.A: Transducer subset selection
• Optimal microphone subset and reference microphone selection for speech dereverberation;
• Dereverberation software toolbox.
Challenge 1.B: Privacy
• Optimal privacy-utility trade-off in voice anonymization with neural audio codecs;
• Privacy-aware automatic speech recognition and speech processing method using pseudo-speech representations;
• Privacy-aware data augmentation and training methods for speech processing.
Challenge 1.C: Large, unkown, and changing array geometries
• Relative acoustic features for distance estimation in smart-homes;
• Distributed-microphone dereverberation with microphone-dependent prediction delays;
• Room equalization with distributed arrays by means of receiver distance estimation.
Challenge 1.D: Perceptual cues & quality attributes
• Binaural speech enhancement using intelligibility-optimal masks;
• Deep learning methods for speech enhancement in binaural assistive listening devices;
• Deep complex transformer models for speech enhancement in binaural assistive listening devices;
• Binaural speech enhancement software toolbox;
• Literature review on sensory evaluation of spatially dynamic audiovisual sound scenes;
• Audio quality evaluation protocol for spatially dynamic content reproduction;
• Sound field equalization for audio rendering in reverberant environments.

In the SOUNDS paradigm, ubiquitous processing is to be achieved by exploiting IoT-enabled inter-device cooperation rather than relying on massive, dedicated audio infrastructure. Three challenges, arising due to the envisaged network-driven approach to audio signal processing, have been investigated and have led to the following final project results:
Challenge 2.A: Distributed algorithm design
• In-car speech zone detection and head orientation estimation;
• Decoupled sound field reproduction and sound zone control;
• Sound field estimation from moving microphones;
• MoveBox: a Python-based software toolbox for simulating sound fields with moving sources and/or receivers;
• Low-rank-based multichannel active noise control.
Challenge 2.B: Bandwidth reduction
• Low-transmit-power sound zone control;
• Spatial covariance estimation from room acoustics prior knowledge and using Riemannian optimization;
• Sound field estimation from uncertain data;
• Reduced-information sound zone control;
• One-shot distributed node-specific signal estimation;
• Low-latency deep joint source-channel coding for speech transmission;
• Channel-configurable deep coding for speech transmission;
• Low-latency deep coding for speech transmission and enhancement.
Challenge 2.C: Synchronization and network topology
• Distributed adaptive node-specific signal estimation robust to sampling rate offsets;
• Fast-converging topology-independent distributed adaptive node-specific signal estimation;
• SendBox: a software toolbox to simulate wireless transmission effects of compressed audio.

The need for intelligent, self-configuring devices and algorithms, having the capability of becoming aware of the acoustic environment in which they operate, has led to the investigation of the following three challenges and the achievement of the following final project results:
Challenge 3.A: Sound source localization
• Distributed steered response power method;
• Microphone subset selection for steered response power;
• Literature review and generalization of steered response power method;
• XSRP: eXtensible Steered Response Power software toolbox;
• Deep complex-valued neural networks for direction-of-arrival estimation;
• Graph neural networks for distributed sound source localization;
• Dual-input neural networks for sound source localization;
• Neural steered response power method for sound source localization and tracking;
• Neural drone localization method.
Challenge 3.B: Room impulse response estimation
• Reduced-bandwidth distributed blind system identification.
Challenge 3.C: Speech activity estimation
• Speech enhancement with delayed remote microphone signals;
• Binary estimator selection in hearing aids with a remote microphone;
• Target speech presence estimation in hearing aids with a remote microphone.
Given the pervasiveness of audio technology in various consumer and business markets, the technology sectors that could be impacted by the emergence of the SOUNDS paradigm are highly diverse. Throughout the SOUNDS project, research results have been exploited in four relevant technology sectors: hearing assistance, voice communication, smart environments, and spatial audio.
2021-12-10-sounds-ws-attendees01-small.jpeg
Mi folleto 0 0