Skip to main content

Combined Image and Word Spotting

Objective

This project aims to facilitate common procedures of archiving and retrieval of audio-visual material. The objective of the project is to develop and integrate a robust unrestricted keyword spotting algorithm and an efficient image spotting algorithm specially designed for digital audio-visual content, leading to the implementation and demonstration of a practical system for efficient retrieval in multimedia databases.

Specifically, a system will be developed to automatically retrieve images, video, an speech frames from an audio-visual database based on keywords entered by the used through keyboard or speech. Combined word and image spotting will be used and will provide an efficient mechanism enabling focused and precise searches with improved functianality and robustness. The CIMWOS system aims to become a valuable assistant in promoting the re-use of existing resources thus cutting down the budgets of new productions.

Work description:
Today, a vast amount of information is accumulated in the form of video, pictures, and audio, which does not lend itself to automated searching. To improve the usability of these invaluable resources indexing techniques are required, which are currently very expensive and time-consuming tasks mainly carried out manually by experts. In view of the expansion of the digital television and of video-based communications and related applications the need for an editor-like tool that allows the user to see/hear, select/modify and search over audio-visual databases becomes indispensable.

Although some European projects are addressing the issue of automated indexing of audio-visual material based on subtitles and speech recognition, the problem of locating important video clips based on their image contents has not been addressed. CIMWOS will use a dual audio and visual approach to locate important clips within multimedia material employing state-of-the-art algorithms for both image and speech recognition. Image processing algorithms will extract features to be used for pattern matching to recognise object classes. Continuous speech recognition algorithms will locate keywords in sound-clips and in the soundtracks of the video-clips, enabling more focused and precise searches. The search for object classes (e.g. face templates) and repeating patterns will be carried out off-line, making higher level descriptors available for the on-line search.

Similarly, automatic speech recognition will be performed off-line and an indexing mechanism will associate text fragments with audio fragments to refer to the audio contents of the database through their text transcription. Only text based retrieval algorithms will be involved on-line. Modern text retrieval algorithms will be enhanced and applied to ensure fast, efficient, and effective information retrieval from the multimedia database. CIMWOS will create and maintain a set of indexes to the multimedia contents with initial support for three European languages, namely English, French and Greek, while system design will ensure an open architecture for more languages to be added in the future. Users will be able to perform speech based, image based, and mixed searches on multiple criteria for text-based retrieval, based on the automatically generated annotations. The results of the searches will first be transmitted in a compacted "preview" format before downloading the actual content enabling users to determine which information will be actually retrieved.

Milestones:
The CIMWOS system will be a powerful tool in the hands of the world of media and television, video, news broadcasting, show business, advertisement, and any organisation that produces, markets and/or broadcasts video and audio programmes, facilitating common procedures of retrieving audio-visual material during a research, a production of a documentary, etc.

Utilising the vast amounts of information accumulated in audio and video, the CIMWOS system will become an invaluable assistant in promoting the re-use of existing resources and cutting down the budgets for new productions.

Funding Scheme

CSC - Cost-sharing contracts

Coordinator

INSTITUTE FOR LANGUAGE AND SPEECH PROCESSING
Address
Epidavrou & Artemidos 6
15125 Maroussi - Athens
Greece

Participants (5)

BETV SA
Belgium
Address
Chaussee De Louvain 656
1030 Bruxelles
EIDGENOESSISCHE TECHNISCHE HOCHSCHULE ZUERICH
Switzerland
Address
Raemistrasse 101
8092 Zuerich
IDIAP (FONDATION DE L'INSTITUT DALLE MOLLE D'INTELLIGENCE ARTIFICIELLE PERCEPTIVE)
Switzerland
Address
Rue Du Simplon 4
1920 Martigny
KATHOLIEKE UNIVERSITEIT LEUVEN
Belgium
Address
Oude Markt 13
3000 Leuven
SAIL LABS TECHNOLOGY AG
Austria
Address
Mariannengasse 14
Wien