Skip to main content
European Commission logo print header

Semantic analysis of Audio for Multimedia Management

Final Activity Report Summary - SEMANTICA (Semantic Analysis of Audio for Multimedia Management)

The main outcome of this Marie Curie European reintegration grant (ERG) was the successful reintegration of the fellow Dr Roman Jarina at the University of Zilina and the establishment of a laboratory and a strong research group for advanced audio processing for multimedia management.

During the SEMANTICA project, tasks in the following research areas were addressed:
1. advanced speech analysis and recognition;
2. speaker segmentation and indexing;
3. generic audio signal classification; and
4. joint audio-visual analysis.

The scientific significance of this research activity was substantial, particularly when considering the ongoing work on the 'International Standards Organisation and International Electrotechnical Commission, moving pictures experts group 21' (ISO/IEC MPEG 21) standard implementation. The MPEG-21 defined a multimedia framework to enable transparent and augmented use of multimedia resources across a wide range of networks and terminals utilised by different communities. Content-based analysis of audio and video enabled to extract semantically important information from the multimedia stream, and thus enabled fast and efficient access to the information content. Such structured data were necessary for cross-media delivery. For example, fast access to the multimedia was crucial for mobile applications because of channel capacity and power limitations.

In the area of speech analysis and recognition, a novel method of signal parameterisation was proposed. The method was based on two-dimensional cepstral analysis and enabled compact representation of speech in time-frequency space. This approach efficiently reduced an amount of speech features in comparison with conventional methods, while memory and computational saving requirements became again an important aspect in the area of multimedia content management. We believed that this new approach could be implemented into key-word spotting algorithm. Key-word spotting played a fundamental role in various fields, such as retrieval, topic and genre detection of audio and video content.

To support joint audio and visual analysis, the MPEG-7 based audio and video browsing tool was developed. The version of the tool that was available by the time of the project completion processed metadata which described temporal decomposition of audio and video content. It was specially designed for intelligent navigation in video, since it could display video shots or audio clips of current content, put key frames of video segments into scroll grid, paint a time line and other functions. This tool demonstrated the usefulness of the content description interface standard, which was referred to as MPEG-7. The tool had a modular architecture which was aimed to be further extended. It could be used for demonstration as well as testing and evaluation purposes. Moreover, the tool, with some extension, would be very helpful for semi-automatic annotation and labelling of the digital content during the creation of multimedia databases. These research activities could thus be viewed as innovative research aimed at ways of automatically instantiating the MPEG-7 description schemes for specific multimedia content domains.

During the ERG project, research work on speaker indexing and generic audio classification was also initiated. This was an ongoing work for the research group of the University of Zilina and would continue in the years following the project completion.