Skip to main content

Immersive Audio Rendering and Transmission Technologies

Final Activity Report Summary - MUSE (Immersive audio rendering and transmission technologies)

Multichannel audio offers significant advantages compared to stereophonic audio regarding the effect of the reproduced recording. Use of a large number of loudspeakers around the listener, results in a more realistic reproduction of the music and the recording venue. Using more than two loudspeakers during the reproduction of the recording translates into using a large number of microphones during the audio recording stage. The result is capturing more versions of the same performance, which, when played back through a large number of loudspeakers, will create multiple sound directions around the listener. The need for more microphone recordings results in a recording with increased demands in storage and transmission rate compared to stereo. Several algorithms have been developed which reduce the transmission requirements of multichannel recordings (e.g. MPEG-AAC), however they remain impractical for many of today's applications, e.g. transmission through low-bandwidth media such as the Internet or wireless channels. Recently, MPEG Surround was developed that allows for rates in the order of 64 Kbits/sec for a 5.1 multichannel recording, which is a highly efficient coding result. The method proposed in this project operates in similar philosophy and achieves similar rates with MPEG Surround. However, in contrast to MPEG Surround, our method allows resynthesis of the individual microphone signals (before mixing) at the receiving end, which can be expected to offer a more realistic reproduction of the multichannel recording. Additionally, in contrast to most multichannel audio coding methods, our approach is suitable for interactive and immersive audio applications such as distributed musicians collaboration and remote mixing.

The research objectives of the project were twofold. First, we developed a mathematical model and a corresponding coding method that significantly reduces the transmission requirements of multichannel audio. In our method, one channel (reference channel) of the multichannel recording is considered available at the receiver of a communications channel, and all the remaining channels are resynthesised by the reference channel and small side information. This resynthesis procedure is based on extracting parameters, during encoding at the transmitter, which capture the inter-channel similarities. We applied a particular mathematical model for the audio signals, namely the source/filter model, in each of the channels. Our experiments showed that use of a multiresolution filterbank for modeling the audio signals, results in a better model and thus in highly similar source signals for all channels. In other words, for resynthesis of all audio channels, their corresponding source signal can be readily obtained from the reference channel. Next, we showed that these filters can be encoded using only 5 Kbits/sec/channel, which is a very small addition in the transmission requirements of the reference channel (which could be coded, e.g. using MP3 encoding requiring 64 Kbits/sec). Thus, our method resulted in a high degree of transmission rate reduction.

As planned in the project proposal, our next step was to tailor our method to the specific problem of transmission through wireless networks. In such cases, it is often possible that some packets of the transmitted information might be lost or delayed due to channel conditions. Especially for audio applications, not only lost packets but delayed packets as well are a serious problem. In any case, the missing information will result in an audible degradation of quality in the audio signal. Our objective in the second part of this project was to design a model-based Packet Loss Concealment (PLC) scheme for multichannel audio, building on our previously proposed coding method. Our results showed that statistical estimation methods of the missing audio segments can result in good PLC performance, both objectively and subjectively.