Skip to main content



Reporting period: 2017-10-01 to 2019-09-30

"The central objective of CROWDMERSE is to enable the real-time capture, processing, and reproduction of an immersive audio experience to the user of any event of interest, focusing on ""crowded"" acoustic events. Acoustic scenes with large audiences, such as sports events or music concerts, present significant challenges for immersive reproduction which have not been so far adequately considered in the signal processing community. These challenges include the need to spatially capture and reproduce numerous sound sources, such as the applause and yells of the spectators, in addition to the “main” sounds of the event (ball kicks, musical instruments, etc.) In order to spatially sample large outdoors venues that are of central interest in this project, the acoustic sensors should be low-cost, small-sized, and easy to deploy supporting wireless operation, as in Wireless Acoustic Sensor Networks (WASNs).

There are several applications of the immersive reproduction of crowded acoustic events. An application of considerable commercial interest is sporting events: by placing a microphone sensor in a football stadium within the fans’ bleachers, it would be possible to capture and reproduce the experience of ""being there"" to any TV or tablet user sitting in his/her living room. Going one step further, imagine a stadium populated by dozens of microphone sensors, all feeding audio signals to a central server that allows a producer, or even the end-user, to select the desired audio feed heard. Users could be offered a visual representation of the stadium, and interactively navigate the sonic landscape. Another application space of similar nature is coverage of large music concerts.

Our final objective is to derive the technology that enables the real-time delivery of such crowded acoustic events, so that the designed platform can be readily integrated with today’s High-Definition TV (HDTV) programs. CROWDMERSE focuses on spatial capturing of the sound information using multiple closely placed microphones forming a microphone array (a node of the deployed WASN) in order to offer immersive reproduction of the soundscape to the end-user in real-time. The audio content is rendered via a home entertainment system equipped with multiple loudspeakers, accompanying the audio-visual content of a typical HDTV broadcast."
Research was implemented in the first stages of all technical challenges in audio processing for crowded environments, from the use of microphone arrays as a low-cost compact device for capturing and reproducing an immersive sound experience in crowded environments, as an alternative to high-cost ambisonics capturing devices, as well as the overall system design including microphones and loudspeakers examined in a unified manner, including psychoacoustic principles. A typical case of crowded acoustic environment appears in the capturing and broadcasting of the spectators’ responses during an athletic event. There is a need for systems able of capturing and transmitting the soundfield to the TV sets of the end users with minimal delay, provided that the end user is equipped with a surround sound system. The acoustic conditions in such environments are different to those found in a typical case in the sense that there is an enormous number of sound sources present. There may be thousands of spectators cheering and applauding simultaneously at different angles and distances with respect to the sensor array which is used for capturing the acoustic scene. We worked on a technique which may efficiently deal with such environments using a planar circular array with an arbitrary number of sensors. While an arbitrary number and positioning of loudspeakers may be used at the reproduction stage, we concentrated on the applicability of our technique with respect to the 5.1 surround system since it is the most popular system for domestic use. We worked on designing optimal panning functions for reproduction with four loudspeakers non-linear mapping of the physical sound direction to the perceived angle in the reproduction sight. The idea has been to introduce loudspeaker panning functions (such as Vector Base Amplitude Panning) into the overall system design optimization including capturing and reproducing sounds in crowded environments. In parallel, we looked at issues related to calibration of the audio capturing/reproduction setup, which are especially important for high-quality immersive applications.
We performed research on the signal processing considerations regarding the capturing and reproduction signal paths optimized for the best listener experience, in order to achieve the most “faithful” resynthesis of the initial soundscape, placing importance on deriving methodologies which can deal effectively with the challenges of crowded environments (non-measurable number of sound sources, wideband noise-type sounds like applause, emphasis on ambience rather than foreground). We placed emphasis on amplitude panning functions, which are typically employed when rendering directional sound sources using multiple loudspeakers, and the question to investigate has been how the panning effect can be taken specifically into consideration in this “unified” signal processing analysis/synthesis of the soundscape. We concentrated initially on sports events, and especially football as the main use-case for our project.

The panning functions produce an identity mapping between the incident and perceived direction, e.g. the system recreates a virtual acoustic environment which is, to some degree, identical with the original sound field at the location of capturing. In our work, the most innovative point has been that we considered on a non-identity mapping between the incident and the perceived direction, so that the listener is “placed” at the center of the stadium, even though the capturing device is located far away from that center. Indeed, placing the array at the center of the stadium would be impossible in most cases without disturbing the players inside the field. Moreover, the examined formulation allows a flexible segmentation of the acoustic environment so that different arrays may be used in order to capture different segments of the stadium geometry. During the reporting period, we derived beamforming weights that can achieve the directivity patterns that have the desired effect. We also performed a thorough calibration procedure in order to identify potential sources of mismatch that might degrade the overall performance, including the microphone device (microphone array), the loudspeaker array, and the soundcard to be employed in the listening tests.

The project expected impact is the enhancement of the career of the Researcher, as well as the derivation of technological achievements that can improve Europe’s position in the particular area of the project. Regarding the Fellow, this period has been a great exposure to the industrial landscape of a dynamic SME, while the research he performed in this period formed the basis for important results that can be obtained in future research. At the same time, the research foundations laid in this period can form a basis for important research outcomes in the area especially of sport events broadcasting, which is currently under great technological advancements in terms of rendering quality and is open to such innovations that can enhance even more the quality that the viewers enjoy.