Project ID: 315507
Funded under: FP7-SME
Country: Spain

Periodic Report Summary 1 - DOCUMEET (Transcription, summarisation and documentation of meetings using advanced speech technologies, indexing and browsing capabilities)

Project Context and Objectives:

DocuMeet will provide a meeting environment that enables you to manage each and every stage of your meetings, from planning and preparation to execution, with optimum utilisation of the outcomes.

Meetings are vital to all organisations, large and small, yet they are often not as effective as they could be. Time in meetings is often spent recording what is being spoken, either by a participant or a dedicated minute taker, but these minutes may miss key details and be expensive in terms of human resources.

DocuMeet ( will have the potential to support the management of a complete lifecycle of a meeting (preparation, invitation, execution, documentation and follow-up) as well as its integration in the organisation’s knowledge base.

This innovative solution will consist of four important components: a hardware platform, a transcription module, a summarisation module and a meeting browser. The project will provide seamless access to all resources associated with a meeting. DocuMeet will provide agendas, audio recordings, transcriptions, and summaries, fully adaptable to the requirements of different organisations supporting different access levels, with integration into specific knowledge management systems, implementation of domain-specific vocabularies, and more.

DocuMeet is a 24 months “Research for the Benefit of the SMEs” project with a budget of 1.5 million euros. The European Commission, through the 7th Framework Programme (FP7), has provided 1.1 million euros to co-fund it. DocuMeet brings together 8 European organisations from 4 different countries. Research centres, universities, telecommunications, software and digital services and multilingual communications SMEs from Germany, Spain, Switzerland and United Kingdom are involved.

Project Results:

DocuMeet project started in November 2012 and will last for 24 months. The work performed during the first reporting period has been focused mainly in the following points:

Definition of the target market

The first steps of the project consisted in the review and redefinition of the targeted users and uses for DocuMeet. After studying the problem among all partners considering different scenarios, it was decided to focus on formal meeting strongly structured. This will allow the reach a higher level of performance by narrowing the scope of the meeting covered by the system and by selected a kind of meeting were additional information is usually available (detailed agendas, reports, etc.) and where the kind of speech is less challenging.

Definition of how the system will be used

After Identifying the targeted meetings and users, a set of use cases, scenarios, user roles and user requirements were developed. This information was developed mainly by ATEKNEA, but compiling input and feedback from all the other partners. The main modules defined were:

DocuMeet Recording Unit (DRU): a set of individual recording devices and a tablet app that will be used for capturing other meeting information during the meeting.
DocuMeet Data Repository (DDR): a repository for all data gathered and produced by the DocuMeet system. It includes usage data, recordings, transcripts, summaries and attachments.
DocuMeet Transcription Engine (DTE): This module uses the information captured by the DRU for producing meeting transcripts.
DocuMeet Summarization Engine (DSE): This module will use different meeting information contained in the DDR for producing meeting transcripts and topic analysis results.
DocuMeet Meeting Browser (DMB): This software solution will allow users interacting with all the above mentioned modules and the results produced by them.

Design and first steps of the implementation of the different modules

All three RTDs worked during this period in the design of the DocuMeet Recording Unit (UAB); the DocuMeet Transcription Engine (USFD); and the DocuMeet Summary Engine (ATEKNEA). In addition, ATEKNEA also started working in the design of the DocuMeet Meeting Browser (DMB) although it was not planned until August in order to fulfil the SMEs requirements of having as soon as possible tangible results that could be used among SMEs to evaluate the project progress or outside the consortium to get external feedback.

Study of data needs and available data

USFD led the compilation of data needs and existing corpus that could be useful for the project. ATEKNEA also provided input in the part related to Summarization. The result of this task is D2.1. This report emphases the need for large amount of in-domain data for creating robust meeting transcription models. However, the amount of in-domain data from the meeting domain is relatively small due to the cost of manual transcription (especially for spontaneous speech), and due to privacy constraints on real data. This report also presents almost all corpora that may help in building robust meeting transcription and summarization models.

Preparation of a unit to be used for gathering data:

ATEKNEA, in collaboration with USFD, has been working in the preparation of a platform that will be used for recording meetings in order to be able to gather information for the project. This platform is not the DRU which is being developed by UAB in WP3, this platform will be only used during the project for recording meetings but it is not part of the final DocuMeet system. Although this preliminary platform was planned as the interconnection of microphones with a audio unit using wires, the redefinition of target meetings in WP1 led to the following situation: project meetings were no longer the kind of meetings that DocuMeet targets. The intention is to record formal board meetings among LAND customers. This fact, made necessary making the system as transparent as possible for the participants in these meetings. To that end, it was decided to change the approach and use a wireless system in order to make the system more user friendly. For reducing the intrusiveness of the system, it was also decided to use a multi-track recording unit instead of an audio interface and a computer. The configuration used for the first test showed that the microphones compatible with the selected wireless system (almost each manufacturer has its own format) did not provide the necessary quality. Microphones from other manufacturers have been also used. It was necessary to perform mechanical adaptations in the system for doing these second tests. At the moment of the preparation of this report these second tests have been executed and have provided good results.

Definition and execution of tests for verifying the feasibility of the use of MEMS microphones.

One significant change over the initially planned system has been the selection of MEMS microphones for the DRU. While these microphones present significant advantages over conventional condenser microphones, a feasibility test was necessary in order to guarantee that the use of MEMS microphones was not going to degrade the performance of the DocuMeet Transcription Engine. A test platform was developed and the necessary experiments were also executed with positive results.

Potential Impact:

The DocuMeet project will produce as a main result an integrated system for meeting recording, transcription, analysis, summarization and also a tool for browsing all this information. This means that, in addition to the whole system, the components for meeting recording, transcription, summarization and the interface for browsing all this information might be integrated with other components or exploited separately.

DocuMeet will help organizations to hold more effective meetings, reducing the time wasted in meetings and creating effective and efficient knowledge management systems to drive competitiveness. By generating profits, the project will also add revenue and commercial benefits to the consortium SMEs.

While the market potential is vast, the consortium has chosen to focus in organizations that hold strongly structured and formal meetings. This selection has been made in order to maximize the performance of the system by selecting a market that, being attractive, implies a set of restrictions on meetings that made DocuMeet more likely to perform according to user expectations.

In order to have an understanding of the competition for DocuMeet, it is possible to split DocuMeet into two key components, the hardware and software. Currently there is a huge and extremely competitive market for recorders, with a range of quality and prices. However, none currently produce the specific kit that will be developed during the DocuMeet project that will be crucial for providing quality audio for transcription. In terms of the software, again there is a huge market for dictation and speech to text software. Nevertheless, key assets of the system, such as the meeting transcription and summarization currently do not exist in the open market.

List of Websites:

Albert Nieto
Tel.: +34 93 204 99 22
