Skip to main content
European Commission logo print header

Immersive Multimedia Interfaces

Final Report Summary - IM3I (Immersive multimedia interfaces)

Executive summary:

IM3I is a 24-month project involving 7 partners from 6 European Union (EU) countries whose aim was to provide the creative media sector with new ways of searching, summarising and visualising large multimedia archives.

The IM3I project has followed a strictly user-centred development approach from the early stage of system design and functionality requirements till the final evaluation stage. The platform developed by the IM3I project can be defined as an intelligent and adaptable repository software. The notion of 'intelligence' refers to its ability to derive semantic descriptions from media assets stored. Adaptability designates the authoring capabilities of the platform which enables users to define and publish repositories of media artefacts in an easy manner. As tangible results, after two years of activity, IM3I has developed a flexible platform composed of three layers:

- The first layer is the service-oriented architecture IM3I is based on and which is responsible for facilitating the communication between the analysis layer and the interface layers.
- The analysis layer developed in IM3I contains several pipelines dedicated to the analysis of distinct media types: video, audio, images and text-based files. The functionality of the pipelines is to extract meaningful information from the raw files themselves and to provide search and retrieval mechanisms that do not per se rely on manual annotations. The different pipelines feature the following functionalities: transcoding / file adaptation, segmentation, attribution of semantic concepts, extraction of physical characteristics.
- The IM3I interface layer consists of two categories of interfaces: the specialised multimedia search and annotation tools and the IM3I authoring interfaces which flexibly compose IM3I end-user environments.

The results of the IM3I project have been presented at a large number of events through various communication channels: presentations at meetings and conferences, organisation of workshops, articles in newsletters, project website, realisation of a video, distribution of flyers and brochures etc.

In total, IM3I has produced 12 papers published in peer-reviewed publications and has been presented at 26 events.

An exploitation plan has been established and the project has identified its target market, within which the opportunity can be successfully realised.

Project Context and Objectives:

With the explosion in the volume of digital content being generated, there is a pressing need for highly customisable interfaces tailored according to both user profiles and specific types of search. IM3I provides the creative media sector with new ways of searching, summarising and visualising large multimedia archives. IM3I is based on a service-oriented architecture that allows multiple viewpoints upon multimedia data that are available in a repository, and provides better ways to interact and share rich media. This paves the way for a multimedia information management platform which is more flexible, adaptable and customisable than current repository software. This in turn enables new opportunities for content owners to exploit their digital assets.

The IM3I project addresses the needs of a new generation of media and communication industry that has to confront itself not only with changing technologies, but also with the radical change in media consumption behaviour.

IM3I enables therefore new ways of accessing and presenting media content to users, and new ways for users to interact with services, offering a natural and transparent way to deal with the complexities of interaction, while hiding them from the user. But most of all, designed according to a service-oriented architecture (SOA) paradigm, IM3I defines an enabling technology capable of integrating into existing networks, which will support organisations and users in developing their content related services.

The objectives of the project were:

(1) Apply a user-centric development approach in order to meet user expectations, maximise the return of investment and the exploitation potential of the IM3I project results. Involve small and medium-sized enterprise (SME) users in the assessment and evaluation of the project's results in regular time periods in order to grasp changing requirements over time, and refine the project's developments. Develop interaction models of users with content by observing users and analysing their requirements.

(2) Design and develop a SOA that accommodates the elaborated user needs and the IM3I cases of use and is adaptable and supported by cross-device authoring tools and enables transparent user navigation across different technologies, such as broadcast and internet.

(3) Develop automatic and semi-automatic methods and tools for semantic annotation of multimedia assets based on features combined from multiple information channels (image, text, motion, audio, music, speech), context and workflow information, and knowledge described in ontologies.

(4) Explore new cross-device and cross-network (broadband, mobile, internet) human-computer interaction (HCI) solutions for composing mixed media audio-visual search queries as well as multi-view point presentation interfaces by utilising content interaction models, and establish new ways of searching and retrieving content.

(5) Demonstrate and disseminate the project results to prove the viability of IM3I interface technologies and the economic advantage of its product-like prototypes. Contribute and shape open standards related to audio-visual search, retrieval, annotation and indexing.

(6) Develop innovative and viable business models and cases with identity trust and micro payment provisions on usage of audio-visual data that address the production and search of creative content in intra- and interorganisational networked SME enterprises.

Project results:

Based on the indications derived from the user requirements collected during the first phase of IM3I and the scenarios developed by partners' organisations to steer the development of the IM3I platform, the project has been developed and deployed.

The platform developed by the IM3I project can be defined as an intelligent and adaptable repository software. The notion of 'intelligence' refers to the ability of the platform to derive semantic descriptions from media assets stored. Adaptability designates the authoring capabilities of the platform which enables users to define and publish repositories of media artefacts in an easy manner.

These capabilities are facilitated by the open and flexible IM3I SOA-based architecture that has been developed and finalised during the second year of activity of the project. The infrastructure developed within the IM3I project is composed of an analysis layer that contains several pipelines dedicated to the analysis of distinct media types: video, audio, images and text based files. The functionality of the pipelines is to extract meaningful information from the raw files and to provide search and retrieval mechanisms, which do not per se rely on manual annotations. The different pipelines are capable of transcoding / file adaptation, segmenting files, attributing semantic concepts to these artefacts and extracting physical characteristics of the files.

At an interface level, the IM3I project has developed and deployed two types of tools addressing the needs of different types of users:

1. The specialised search and annotation tools are targeted at users who work with media archives in a professional context, either as archivist or documentarists. These tools have been developed to facilitate specialised kind of tasks such as ontology-based search for media fragments or the correction of automatic annotations. These tools communicate directly on the IM3I analysis layer in order to search directly for any annotations added and correct any false annotations at the source, before being exposed to end-user interfaces by means of the central IM3I repository.

2. The authoring environment acts as a user friendly interface which enables content holders to author and publish web-based media applications. It allows the import of a media collection, the extension of its characteristics and the publication of elaborate workflow patterns on enriching, searching and viewing the contents. A designed workflow can also be designed to build up a collection of media artefacts as a result of a published workflow. The authoring interface interacts directly with the architecture layer, to store and retrieve media-artifacts and the characteristics of an IM3I end-user web environment. As such, the IM3I platform has been deployed as a system whose main fundamental features respond to the objectives derived from the user requirements study that stated that IM3I should be 'more flexible, more adaptable ad customisable than the current state-of-the art repository softwares on the market'.

2.3.1 The IM3I infrastructure

The semantic visual content annotator

Video data can be stored and represented using several formats and standards. The MPEG compression standards have become ubiquitous, since the main ideas of inter and intra-frame compression have proved to be very effective. At present the videos used within IM3I are compressed using MPEG-2 and MPEG-4 codecs, but also streamable videos (typically in Flash FLV format) are of interest. In order to avoid restricting the project to only a limited number of formats, the video segmentation algorithm works on uncompressed data. This allows using the tool also on streaming videos whose locations are provided in rich site summary (RSS) feeds as real-time messaging protocol (RTMP) streams. The implemented algorithms are fast enough to run faster than real time.

The system has been extended to deal also with several different image formats commonly available on the web, e.g. to process images that are viewable in websites or whose uniform resource identifiers (URIs) are provided in RSS feeds.

The tool performs also semantic annotation of the key frames and images to complement the one that can be obtained using the Bag of Words (BoW) approach. The system uses a set of pre-trained detectors, based on the Viola & Jones (Viola 2004) algorithms, for the recognition of human body parts (i.e. faces and face parts, that are connected to other annotations like close-up), person and crowd (based on the HoG detector by Dalal and Triggs, (Dalal 2005)) and some syntactic annotations that have also a semantic counterpart for artists and video professionals (i.e. dominant colour, quantity of motion, b/w vs. colour material) and that are part of the MPEG-7 standard (see MPEG-7 section).

The tool performs also a semantic annotation of the shots (or single images, if required), to recognise a set of concepts that are associated to basic visual elements, such as colour and motion (of interest for artists and professional video editors) and common concepts like persons, body parts, etc.

Automatic semantic visual content annotation

The semantic analysis of images aims to recognise high-level concepts within them.
A grid sampling strategy, which can cover uniformly the image areas with a large number of points, generally offers very good performance. In particular, the best performance is achieved using the 64 dimensions SURF descriptor, due to its effectiveness and its small size. Early fusion techniques, in general, have slightly worse performance of the corresponding unimodal techniques considering the classification accuracy, but lead to a substantial reduction in execution time. We observe, in particular, the good results of the early fusion associated with grid sampling, due to the fact that it keeps a dense description only of semantically relevant areas. In this way are confirmed the considerations on the description of MSER.

Tests for the late fusion technique show that the greatest increase in performance in terms of classification accuracy is achieved by combining a dense description of MSER regions with a method of describing local invariant features, such as SURF. This is probably due the fact that portions of the image not included in any region MSER are nevertheless described by the SURF key point.

The best compromise between the level of accuracy and execution speed is offered by a Late Fusion between a strategy based on SURF, and another that uses early fusion with dense sampling within grid-10 MSER regions. Neglecting the execution time, the maximum level of accuracy is achieved by strategies of late fusion still based on grid sampling pitch no greater than 10 pixels.

The implementation of the IM3I system comprises all approaches: unimodal, late and early fusion. Expert users can select the approach that suits better their needs.

Retrieval based on visual similarity:

A system for content-based image retrieval (CBIR) allows searching images, photos or key frames based on the similarity of visual content with some examples provided by the user. This approach differs from the retrieval that is possible using the semantic annotation produced with the BoW approach. In this latter case retrieval becomes text-based, since the user will formulate its queries using high-level semantic concepts.

CBIR can be considered a syntactic-level retrieval, in that the features used to formulate the query are based on low/mid-level features (typically colour and texture).

The system developed within IM3I is based on a combination of global features that capture different aspects of images. It can be used on single images or in the key frames extracted from video shots. The features capture colour information and texture, and are based on standard MPEG-7 descriptors, namely: scalable colour descriptor (SCD), colour layout descriptor (CLD), edge histogram descriptor (EHD).

When retrieving images based on dominant colour the system uses the SCD, while when retrieving images based on similarity with respect to a sample provided by the user, a combination of CLD and EHD is used.

The system is integrated with the semantic annotation system, i.e. it is possible to search videos and images both at syntactic and semantic level.

Retrieval based on audio similarity and semantics

A system for content-based audio retrieval allows searching audio files, possibly extracted from video file audio tracks, based on an example provided by users, according to the query-by-example (QBE) paradigm.

The tools for content-based audio retrieval and for semantic audio annotation have been developed as command line interface (CLI) applications to allow for easy integration within the web service infrastructure of IM3I.

Similarly to CBIR, content-based audio retrieval can be considered a syntactic-level retrieval, in that the features used to formulate the query are based on low/mid-level features (e.g. beat and amplitude) that model some several perceptual aspects of audio information. Within IM3I a tool for content-based audio retrieval has been developed, as well as a tool for semantic annotation of audio content.

The similarity between audio files is calculated using predefined groups of features (Slaney 2006). Currently these groups are can be timbre or rhythm related, and possibly subgroups of these. The rhythm-related features aim at representing the regularity of the rhythm and the relative saliencies and periods of diverse levels of the metrical hierarchy. In total 18 features related to the periodicity of the sound are extracted from the audio and its calculated beat histogram. These features are: sum, high / mid / low peak amplitude, high / mid / low peak BPM, low-mid BPM ratio, maximum / average autocorrelation, harmonic product spectrum of the beat histogram, spectral flatness of the beat histogram, standard deviation of the beat histogram, periodic centroid 1 / 2, periodic spread 1 / 2, number of maxima above a certain (relative) threshold.

Audio classification is currently performed in 2 stages: onset detection and support vector machine (SVM) classification. The onset detection algorithm cuts the source into segments, after which each segment is classified on its own. The classification is performed using SVMs. The application can be trained to classify a variable number of labels. Currently, the concepts used to classify audio segments are speech, music and undefined. The last one being ambient noise or other undefined content.

2.3.2 The IM3I interface layer

The IM3I project has developed and deployed two types of tools addressing the needs of different types of users targeting specialist users in archives aiming at annotation and search and end-user interfaces which can be flexibly published using the IM3I authoring environment.

2.3.2.1 Overview of the four main interfaces available to specialist users

The IM3I specialised search and annotation tools developed within IM3I are targeted at specific media types and provide interfaces to perform common tasks. These tools have been developed to facilitate specific tasks such as ontology-based search for media fragments or the correction of automatic annotations. The tools communicate directly on the IM3I analysis layer in order to search directly for any annotations added and correct any false annotations at the source, before being exposed to end-user interfaces by means of the central IM3I repository.

The annotation tools

The video annotator: Pan
The manual video annotation tool has been developed aiming at two goals:

(a) To provide a tool for the creation of ground-truth annotation of audio / visual concepts. These annotations are beneficial for the training of automatic concept detectors (audio and video).
(b) To provide a tool to inspect, validate and enhance the annotations provided by the automatic analysis pipelines, residing in the analysis layer of the platform.

Using this tool it is possible to annotate audio / video sequences associating video segments with concepts selected from different ontologies, associate geo-localised information to these segments and distribute the task of annotating videos to users with different access levels.

The image annotator / search engine: Daphnis
The image annotator has two goals: allow easy tagging/manual annotation of images and content-based retrieval that exploits the tags for filtering the results, of images and key frames obtained by the automatic annotation. This tool is connected, and complements, both the media annotator (Pan) and the search engine (Sirio). This tool can be used both by photo repositories and by video repositories: in the latter case the key frames extracted during the annotation process are treated as photos and can be searched by content similarity.

Retrieval tools
The main goals of these tools is to perform retrieval and browsing of multimedia material at a semantic level, exploiting the automatic annotations created by the IM3I processing pipeline or the annotations created with the tools.

The search engine: Sirio
The Sirio search engine allows different modes of operation and query building, from advanced queries built using a GUI with drag and drop, to simpler queries that follow a search based on keywords.

The three different types of interactions and query construction are:

(a) GUI to build composite queries that may include Boolean/temporal operators and (if required) visual examples;
(b) a natural language interface for simpler queries with Boolean / temporal operators;
(c) a free-text interface for Google-like searches.

In all the interfaces it is possible to extend queries adding synonyms and concept specialisations through ontology reasoning and the use of WordNet.

The browsing tool: Andromeda
The goal of this tool is to allow browsing of a collection of multimedia material: sometimes users do not have a clear idea of the content that they are interested in and a casual browsing may help to focus the search. In this sense this tool complements the other search engines.
Unlike other approaches, this browsing tool is based on the semantics of the annotations and exploits the same ontologies used for retrieval. It also uses an interface element that in recent years has become more familiar for users: the tag cloud.

2.3.2.2 Overview of the functionality of the authoring interfaces

In contrast to the specialist tools that focus on a structured approach to annotation and retrieval, the authoring environment balances the IM3I platform in providing enormous flexibility in publicising search - browse and display interfaces for media-collections. The authoring environment acts as a user-friendly interface which enables content holders to author and publish web-based media applications. It allows the import of a media-collection, the extension of its characteristics and the publication of elaborate workflow patterns on enriching, searching and viewing the contents. A designed workflow can also be designed to build up a collection of media artefacts as a result of a published workflow. The authoring interface interacts directly with the architecture layer, to store and retrieve media-artefacts and the characteristics of an IM3I end-user web-environment.

The authoring IM3I end-user functionality typically covers 5 distinctive stages:

(a) Importing an existing repository
The import function allows the import of data from a given (media)RSS stream. By identifying the contents of a stream and mapping it against a data model, one can import an entire repository.

(b) Extending the associated data model
Data models contain the descriptions of a repository and in this way stipulate what can be shown to or retrieved by an end-user. Data models are the containers that hold information. In IM3I a user can easily build, change and manipulate data models to suit end-user needs over time.
Extending the data model is required if new functionalities or layout options are required.

(c) Editing layout and editing features
This stage allows the editing of a variety of layout options and end-user functionalities.
The IM3I authoring allows the user to choose between several visualisation options of the type of media. For example, image-collections can be visualised by image-wall, dock representation of a series of images, image carrousel.

(d) Editing search and retrieval interfaces
This stage allows the selection of repositories to be published to end-users and the publication of search and filter options for end-users. This option offers two distinct functionalities to the authors of en environment:
(i) to define and limit a repository of artefacts (objects) to be published to end users (conditions);
(ii) to define filtering and (pseudo) search interfaces to be published to end users (filters).

(e) Embedding the IM3I end-user interfaces in a (corporate) website
This stage is only comprised of a copy-paste action from an author into the content management system (CMS) of choice. By using this option, the created rendered model will inherit the look and feel of the host website (colour scheme, type sizes and fonts). This ensures seamless integration of an IM3I repository as part of a larger website.
Embedding a render model in a website is relatively easy performed: just by copying the link which is available in the data inspector tab, or the preview of the retrieval modelling tab.

Potential impact:

IM3I has followed a sound methodology for developing the exploitation plan that enabled the consortium to test initial business assumptions of the IM3I offering and validate them with the market, refine them throughout the project in order to conclude in a realistic exploitation plan.

The production of digital content will be one of the major drivers of economic competitiveness in the coming decade and will make a major contribution to ensuring high levels of economic growth, robust export capacity and a highly skilled workforce. The economic multipliers arising from the digital content industry are significant, being higher than for most other categories of economic activity. The development of this emerging market has major implications for productivity growth in many other important sectors. Digital content and technology are, in fact, becoming important inputs to other industries and enablers that help transform the way they do business. Confident of its sound commercial and academic experience, the IM3I consortium has embraced the challenge of delivering a content management platform capable to support creative content professionals and end-users in unlocking their creativity by easing the way they can search, manage, annotate, use, re-use and publish multimedia content to develop new products and services.

IM3I defines an integrated system of search, annotating, tagging, browsing, managing and publishing services that support effective content use, re-use and augmentation, driving creativity, productivity and quality of experience. IM3I is resulting in an original approach to digital content: an approach that recognises the content's supportive value to productive activities. Not only does IM3I enable users to filter, produce and publish content, but also integrates social media features to let users interact with the content. The platform aims to become, more than a single enabling technology, an infrastructure or multimedia information management and publishing and a social space in which professional and semi-professional users - content creators, consumers and distributors - perform a variety of tasks from which they derive economic or intrinsic reward.

Based on sound and innovative technological solutions, which allow for easy searching, browsing and publishing of heterogeneous content repositories IM3I allows for creative media ideas to evolve into concepts.

The IM3I business proposition derives from the realisation that CMS have to become actual content infrastructures, capable to support content production and publication. Current offerings do not sufficiently address the emerging needs of users, who ask for more relevant filtering, integrates content management and augmentation services and intuitive interfaces that allow users to abstract from the domain and format in which the original information has been generated and to focus on the domain of the user's creative experience. The lack of integrated and systematic approach to content management across the whole spectrum of the content life cycle, limits the supportive potential of digital content, dissipating the enormous wealth of this resources that is increasingly bulging public and private archives.

IM3I can therefore serve as a powerful and effective tool for corporate multimedia content management, but also as an instrument for setting up specialised Web 2.0 environments that support member creativity and ability to realise business or other personal objectives. The intention is, in fact, to offer the system both as a product to media organisation and offer its services through the Internet, making them available to global audiences. In both configurations, IM3I will interact with other networks and enable the setting up of virtual communities of practice by easily integrating into third parties services.

IM3I service offerings to clients, which are sold on the basis of yearly licenses, are:

- Access to services through the application programming interface (API): This offering would include selling single or multiple components, which are offered by IM3I, in terms of features to a client. They would be able to use these, through APIs or services, and integrate this; however they might need it, to enhance existing workflows / representations.
- Hosting: This solution needs to be separated into two sub-offerings. All have the same preset: A client buys the IM3I platform, to be able to upload data to IM3I hosting and uses its features. The difference in the model is in the usage of the web-presentation: A first one would be, to create on its own with the authoring framework a web-representation. The second one would include, that the IM3I team would create the web-representation (on the basis of the embedded authoring environment).
- In-house installation: For an in-house installation of the IM3I platform it is required, to set up and configure the IM3I framework on existing servers of a client. Further adjustments and training is required, to train the personal, to be able to host, adjust and use the tools in the desired way.

The unique selling proposition of IM3I has been identified as: A flexible and customisable (from single services to complete digital asset management) framework targeting the creative industry organisations. It is easy to maintain, install and use, and extends current small, medium-sized and large organisational content repositories with advanced functions for searching, analysing, processing, indexing, and publishing multimedia content with edit once, publish everywhere.

All IM3I participating SMEs are committed in exploiting the project results. The basis of this co-operation is:

- IM3I acts a service provider and the consortium partners co-operate in delivering the service, investing resources into further development, support, and marketing of the underlying IM3I system.
- The ownership percentage in the IM3I system, defines the share of each partner in costs as well revenues from this co-operation.
-t is expected that due to the small number of involved parties the exploitation activities will be assigned / discussed through regular telephone conferences or email.

In order to reach the full potential of IM3I and the B2C market, additional development has to be performed. This will be ready in the first six months after completion and has been identified to be the following:

(a) bug-fixing to reach a stable and scalable platform (authoring, processing);
(b) integration of payment methods, to enable content marketplaces;
(c) integration with social software e.g. twitter, Facebook;
(d) integration of an incentive scheme to invite friends;
(e) transfer of the complete IM3I framework into IN2 hardware and infrastructure.

The above development needed, however does not affect professional creative organisations (business-to-business (B2B) customers), which are the main target of IM3I in the first year of exploitation activities. Thus, in parallel with the development above, we will perform the following marketing activities to attract creative businesses:

(a) follow-up focus group and field trial contacts to attract professional content;
(b) follow-up take-up prospects to attract professional content;
(c) disseminate a newsletter containing portal, tools and achievement news as a mean for updating the im3i community;
(d) disseminate through creative media-related blogs, and magazines;
(e) disseminate to social software (i.e. Facebook, LinkedIn) users;
(f) organise and attend meetings and events to present IM3I.

Exploitation activities in the first year of the endeavour

IN2

IN2, as the leader of the dissemination and exploitation activities, has already revamped and redesigned its product and service offering to reflect the IM3I project results. The complete IM3I framework will be installed at IN2's hardware and consolidated in one place in February 2010.
Furthermore, we have been in close contacts with all focus group members and take-up prospects and will continue following-up them with the latest results.
Additionally IN2 will lead the further development of the platform for reaching the full IM3I potential and the business-to-consumer (B2C) market.

SPRING

SRING will use the developed architecture and the evolved interfacing techniques:

(a) for the enhancement of its own products i.e. extending our trading software products (Maran and Vectorbull) with data coming from trading news channels, and providing specialised interfaces for its customers in order to support them in evaluating the market development and elaborating their decisions in trading;
(b) for disseminating the project results in the finance news sectors;
(c) by organising meetings to present the IM3I system in finance-related content providers and broadcasters.

Additionally SPRING will support the software development activities in the first six months. MICA

MICA plans to continue the exploitation started during the course of the project by setting up presentations, workshops and meetings with prospective customers. A list of potential B2B customers from the broadcasting companies, museums, archives, libraries, distributers, music information centres and advertising agencies within the range of creative industries has been compiled. MICA will support the above compiled leads in realising the added value of IM3I to present, manage and publish content, and in developing new revenue streams that can flow in from the use of the IM3I system. Furthermore, MICA plans to encourage its partners in various music related networks to invite potential new customers in their areas to presentations and workshops where IM3I will be presented.

NEOS

NEOS intends to further exploit the IM3I platform through existing sales channels. To this purpose, a number of presentation and follow-ups will be set with broadcasters from: Siena, Firenze, Milano, Salerno, Capri, Monza, Varese, Como, Brescia (Brescia Mobile Channel initiative), Regione Liguria (through ICT partner Insiel), Regione Friuli Venezia Giulia (through ICT partner Insiel), Regione Lombardia (through ICT partner Lombardia Informatica), Regione Piemonte (through ICT partner CSI Piemonte). Additionally NEOS will support the software development of the first six months.

NAVA

NAVA plans to demonstrate the IM3I technology and its possibilities to the network of its partners that basically include educational establishments, other audiovisual archives, museums, digital archives and libraries but also actors of the private sector like e.g. broadcasting companies. NAVA intends to communicate the project results in accordance with the needs of the potential customers and plans to demonstrate the IM3I tools by showing specific user scenarios depending on the nature of the given cultural institution or company and best practices. In addition, NAVA will consider the project results and the experience when it specifies its own technical developments as the technical objectives of the archive are in line with the IM3I developments regarding especially the automatic annotation.

List of Websites: http://www.im3i.eu
info@im3i.eu
http://www.facebook.com/pages/IM3I-Project/219820070170.