Skip to main content

Audio visual indexing and retrieval for non it expert users

Exploitable results

The AVIR project has created and demonstrated innovative scenarios for broadcast services and consumer applications based upon descriptors of TV programmes. In order to realise this, novel algorithms and techniques for audio-visual content analysis and indexing, have been developed for the extraction of content descriptors to be added to existing metadata. The newly created routines can be used to extract basic information from the audio or video channel. Remarkable results have been obtained in the processing of low level information such as shot and editing effect detection, camera motion parameter estimation, micro-segment separation; key-frame selection, dominant colour extraction, micro-segment based or shot based mosaic extraction, generic audio class segmentation, audio event analysis, script warping, speech recognition. Such low level information can be organised according to different criteria for easy browsing and search, or further processed in order to provide more semantically meaningful information to the user, e.g. with cross-modal analysis. Within a specific context it is possible to identify events (e.g. in a football match or a talk show) and to obtain the transcription of dialogues in videos through speech recognition. The descriptors are all produced according to the MPEG7 standard, and are organised according to description schemes developed for AVIR using a specific DDL. The above results have been integrated into a single system for metadata production. The AVIR Service Provider System (SPS) is designed as a distributed system with a layered architecture which incorporates audio and video analysis routines as functional modules. It includes components for MPEG-7 compliant (XML schema) metadata import, storage, management, and export. The system will be integrated into an existing commercial product of Tecmath, which leapfrogs existing solutions and as first features metadata in MPEG7 format. The metadata can be later transferred to a server of the Transmission Centre, also developed in AVIR, which delivers data and content to the consumer terminal via a DVB channel. The transmission of metadata is based on the DVB Object Carousel protocol, where also the EPG application can be transmitted according to the DVB MHP (Multimedia Home Platform) specifications. A client for data reception and parsing has been developed on an experimental Set-top box. The transmission system has been tested and demonstrated with the successful transmission of data, download and execution on the STB of the DVB-J application, a customised EPG displaying a.o. segment descriptors of current programs. The AVIR Consumer System is an advanced STB with storage capabilities for PersonalTV applications (for automatic content filtering) and with advanced retrieval features. The intuitive, flat, multimodal GUI (graphical user interface) includes: an extended multi-channel EPG, personal recommendations of programs, automatic recording of programs based on user profiles, search for specific programs and quick access to functionality using voice control. Further the system features innovative, powerful methods for searching and browsing through TV programs, exploiting the received descriptors for key-frame browsing, text search, similarity retrieval, colour slider, etc.