Multimodal meeting manager


This proposal is about processing meetings held in a room equipped with multimode sensors. The overall objective is the construction of a demonstration system to enable the offline structuring, browsing and querying of an archive of meetings. The project will include the design, collection and annotation of a multimode meetings database, the processing of audio/video streams and the integration and structuring of these streams using the outputs of various recognisers and analysers. We assume the availability of textual side information (e.g. an agenda), which enables the application of some useful constraints. The expected results of the project include a demonstrator system, and advances in models and algorithms for multimode recognition, integration and information access.

Construction of a system to enable structuring, browsing and querying of an archive of meetings, taking place in a room equipped with multimode sensors:
1. Development of a smart meeting room; multimode data collection and annotation;
2. Analysis and processing of audio/video streams; robust conversational speech recognition; gesture/action recognition; identification of emotion and intent; person identification; source localization and tracking;
3. Integration, structuring and information access: information management framework; multithread integration models and algorithms; meeting summarization; multimode information retrieval and extraction;
4. Construction of a demonstration system for browsing and accessing information from an archive of processed meetings;
5. Evaluation at the system and component technology level.

Work description:
The work is divided into five work packages (WPs), plus project management.
WP1 (Smart Meeting Room, Data Collection and Annotation) is concerned with the specification of the smart room environment and of data collection and annotation protocols, resulting in the M4 meeting corpus;
WP2 (Multimode Recognition) deals with the development of multimode recognisers that transform raw audio and video streams to higher level streams. The work will focus on the development of existing work (within the partners) in speech recognition and action/gesture recognition, porting to the M4 domain. It will also involve investigations regarding multimode person identification, emotion and intention recognition, and source localization and tracking. The higher level streams generated in WP2 will form the basis for the integration and information access operations of WP3.WP3 (Multimode Integration) focuses on the principled integration of multiple streams, and the development of information access methods to enable retrieval browsing and summarization from an archive of multithread meeting data;
WP3 is a key element of M4 since it forms a bridge between the multimode recognition level (WP2) and the application demonstrator (WP4);
WP4 (Demonstration and Evaluation) consists of the construction of an offline demonstration system for the Multimode Meeting Manager, along with formal and informal evaluation of the system as a whole, and its component technologies;
WP5 is concerned with Dissemination, Exploitation and Evaluation. A key aspect of this WP is the large Industrial Advisory Board set up by the project, with representatives from industrial areas which could exploit the results of M4.

Expected result: development of a demonstration system for structuring, browsing and querying an archive of meetings recorded in a room equipped with a variety of multimode sensors. Milestones:
1) Specification and implementation of smart meeting room environment and data collection/annotation protocol;
2) Development of multimode recognisers;
3) Development of methods for multimode integration and information access;
4) Design, implementation and evaluation of M4 demonstrator.

