Transcription, summarisation and documentation of meetings using advanced speech technologies, indexing and browsing capabilities

Final Report Summary - DOCUMEET (Transcription, summarisation and documentation of meetings using advanced speech technologies, indexing and browsing capabilities)

Executive Summary:
DocuMeet provides a meeting environment that enables you to manage each and every stage of your meetings, from planning and preparation to execution, with optimum utilisation of the outcomes.
Meetings are vital to all organisations, large and small, yet they are often not as effective as they could be. Time in meetings is often spent recording what is being spoken, either by a participant or a dedicated minute taker, but these minutes may miss key details and be expensive in terms of human resources.
DocuMeet (www.documeet.eu) has the potential to support the management of a complete lifecycle of a meeting (preparation, invitation, execution, documentation and follow-up) as well as its integration in the organisation’s knowledge base.
This innovative solution consists of four important components: a hardware platform, a transcription module, a summarisation module and a meeting browser. The project provides seamless access to all resources associated with a meeting. DocuMeet provides agendas, audio recordings, transcriptions, and summaries, fully adaptable to the requirements of different organisations supporting different access levels, with integration into specific knowledge management systems, implementation of domain-specific vocabularies, and more.
DocuMeet is a 24 months “Research for the Benefit of the SMEs” project with a budget of 1.5 million euros. The European Commission, through the 7th Framework Programme (FP7), has provided 1.1 million euros to co-fund it. DocuMeet brings together 8 European organisations from 4 different countries. Research centres, universities, telecommunications, software and digital services and multilingual communications SMEs from Germany, Spain, Switzerland and United Kingdom are involved.

Project Context and Objectives:
Context:
Meetings are vital to all organisations, large and small, yet they are often not as effective as they could be. Research suggests that on average we spend 60 hours a month in meetings, with more than a third deemed unproductive and with up to 50% of meeting time wasted. A survey reports that 91% of meeting attendees admit to daydreaming, 73% say they have brought other work to meetings and 39% say they have dozed during meetings. It is not surprising then that they have been described as “terrible, toxic, poisonous things” with studies linking them to stress, welfare and dissatisfaction of the worker.
Yet, without meetings, it would be difficult to manage the company effectively. While most of the actual work performed by an organization occurs between meetings, meetings serve as catalysts for measuring progress towards tasks, information sharing, decision making, and problem solving. Meetings, therefore, have a huge strategic importance for an organization, by generating innovation and sharing knowledge - key intangible assets. In SMEs, often perceived as the driving force of innovation, meeting outcomes and the ability to act upon them, are of particular importance and may have crucial impact on the future of the company. It is therefore vital that SMEs have the adequate tools in order to successfully exploit the knowledge and decisions generated in meetings.
Successful meetings are measured by the effective outcomes they produce. Time in meetings is often spent recording what is being spoken, either by a participant or a dedicated minute taker, but these minutes may miss key details, are expensive in terms of human resources and are rarely distributed or integrated into the company’s knowledge base. This means that key ideas are lost, decisions fail to be followed up and meetings have to be repeated. Information and communication technologies, some widely implemented in both SMEs and large enterprises, have the potential to support more efficient meetings in organisations by supporting automatic meeting transcription, summarisation and documentation; however, to date, this potential has yet to be fully exploited.
Automatic meeting transcription has been a focus of research and development in the last few years. However, it seems that the current paradigm of speech-to-text and speech recognition technologies has reached a plateau: the NIST’s Rich Transcription Evaluation conference has detected no significant improvement maintaining a constant 20% error rate since 2007. In the context of automatic meeting transcription, this finding implies that significant improvements in the performance of transcription engines should not be expected and that the current state of the art should be considered as baseline for any attempt to bring to the market automatic meeting transcription products and any related services. Currently, the best performance for meeting transcription is reached when using close-talking microphones, yielding a word error rate (WER) of approx. 20%. However, these results are achievable in controlled scenarios; under real scenarios, due to the unconstrained nature of spoken speech and the uncontrolled recording conditions, performance decreases significantly.
During 2011 CPSL and GROUPV approached CRIC with the following idea: combine the commercial expertise of each company in order to produce an innovative product to support meeting transcription and other activities associated with automatic transcription. CPSL, as a company focused on delivering translation and transcription services, was already familiar with Automatic Speech Recognition (ASR) tools. However, it was only making use of dictation systems (ex. Nuance) that allowed their employees to dictate text to the computer instead of typing it. GROUPV was marketing software solutions for meeting management, but was lacking automation aspect to make its products more innovative. Prior to exploring the idea of developing an automatic transcription tool focused on meetings, a deep study of the current state-of-the-art of ASR and related fields was carried out and during this study several companies that were operating in related fields were also contacted in order to have a wider view of the technical and commercial perspective. This study led to the following conclusions:
The transcription of a meeting can be a valuable output by itself but in most cases outputs derived from meeting transcriptions are the most valuable: summaries, search services, translations, etc
There are several meeting management tools that provide support for the pre-meeting and meeting stage, however, similarly to GROUPV’s product, they are lacking automatic features and rely only on users’ input.
The creation of such an integral system was prevented due to the poor performance of meeting transcription engines. However, in light of dedicated hardware platforms and recent results of the most advanced research groups in the field, it seems that high-level transcription (with a low WER) is feasible only if the system can be adapted to the speaker (supervised adaptation), in which case the need for a training phase may reduce user acceptance and is hardly marketable.

Objectives:
The main objective of DocuMeet is to develop and validate a cost-effective solution for automatic meeting transcription, summarization and documentation that will allow organisations to obtain detailed and searchable reports and summaries of their meetings at an affordable price.

Scientific objectives:
In the field of ASR, the main objective will be to test the impact of supervised and unsupervised adaptation in the DocuMeet transcription engine. The objective is to produce a robust system that will be able to reproduce results obtained in controlled scenarios in uncontrolled scenarios.
In the field of automatic meeting summarisation the main objective will be the integration of three different kinds of features and the study of their joint contribution to the quality of automatic generated extractive summaries of meeting transcription automatically generated:
Rich ASR results (word lattices) which have already proven key to overcoming the impact of transcription errors on summarisation.
Prosodic information (rhythm, stress and intonation of speech) will be used to fully exploit speech information available as it has a positive impact in summarizing ASR results.
Data extracted from information fed into the DocuMeet framework, such as meeting agenda, profiles of attendees, support documents, etc.
This objective will be achieved in M17 by measuring the summary engine results in task 5.5.

Technological objectives:
Perform a study of user requirements and use-case scenarios based on which the final architecture of the system and its performance goals will be defined. This objective will be achieved in M4 and its achievement will be checked in D1.2 and D1.3.
Develop a prototype of a meeting recording platform including advanced close-proximity microphones with a background-noise cancellation mechanism with the aim of providing high quality recordings for automatic transcription. UAB will develop a set of recording devices using directive microphones for gathering the speaker voice and additional microphones for capturing ambient noise for later deduction. These recording devices will be wireless and record the audio internally. Following the meeting, the audio will be transferred to a computer using Bluetooth. This objective will be achieved in M12.
Develop a docking-station that will be used as a hardware interface between the DocuMeet recording platform and the DocuMeet software. This docking station will use an induction for charging the batteries of the recording devices and Bluetooth for obtaining the audio from each recording device. This objective will be achieved at M15
Develop an automatic transcription engine, applying advanced ASR technologies, to produce meeting transcriptions. This engine will include supervised adaptation in order to improve accuracy by enabling development of personalised models and lexicons and unsupervised adaptation in order to leverage a greater amount of recorded data that has not been transcribed. These two characteristics will allow obtaining in real environments results that are currently obtained only in controlled scenarios. This objective will be achieved in M17.
Develop a summarisation engine that uses meeting transcription results (word lattices), prosodic information obtained from the audio recordings and DocuMeet related information (role of each participant, user profile, etc.) to produce extractive summaries of the meetings. The objective of DocuMeet is to reach a significantly reduced WER (20% lower than the WER of the transcription) side-by-side with high user satisfaction from produced summaries. The validation method will be decided in task 5.5 as there exists many different validation methods for summaries that are used in different cases. This objective will be achieved in M17.
Developed a user-friendly interface (“meeting browser”) to support organisations along the life-cycle of meetings by better structuring meetings, providing important a-priori information to the summarisation unit, moderating an efficient meeting and, finally, documenting and following-up on meeting results. This will also include an automatic link between speech and support documentation (i. e., slides used in a presentation) and an indexing and search engine that will cover all the information stored by the system (speech, transcriptions, summaries and supporting documentation). This objective will be achieved at M17. This interface between the user and DocuMeet system will implement National Language Support and will be developed in a multiplatform framework (probably java).
Integrate the developed components in order to produce 3 DocuMeet prototypes to be used in the validation process. This objective will be achieved in M20.
Verify the final prototype on-site at the SMEs premises following defined scenarios. This objective will be achieved in M24 together with the release of the final DocuMeet prototype.

Strategic objectives:
Foster a smooth and gradual technology transfer process from the RTDs to the SMEs
Secure property rights and generate a viable business plan and as a means of ensuring the commercialisation of the product by the SMEs
Plan and engage in dissemination actions aiming at preparing the grounds for commercialisation of the DocuMeet system and promoting end-user acceptance
Explore additional scenarios under which the full DocuMeet system, or its individual components, may open new revenue channels for SMEs as stand-alone products or as services

Project Results:
As a whole, the DocuMeet project has produced an integrated solution for
1. planning meetings online,
2. easily record them with a set of dedicated microphones,
3. transfer the recordings to an online server,
4. automatically compute the meeting transcripts,
5. automatically summarize the transcripts,
6. automatically extract the meeting keywords,
7. automatically organize all the information in a central online repository,
8. allow for searching the repository for a particular meetings by means of a meeting search engine, and
9. allow for listening the meeting recordings and viewing the meeting transcript, summary, keywords and user participation rate with a web browser.

Each one of the DocuMeet modules has been a challenge by itself, while all of them have had to comply with a set of restrictions in order to be able to operate as a whole; for instance, the synchronization error of the recording system had to be below 30ms so that the transcription engine could properly recognize the speech, and consequently allow the summarization and keyword extractor engines to generate useful summaries and keyword sets. We describe below the achievements per each DocuMeet module and cite the interdependencies with other modules when relevant.

DocuMeet Recording System:

During the DocuMeet project, 3 DocuMeet recording kits have been produced. Each kit is composed by a set of 8 DRUs (DocuMeet Recording Units), an Android tablet, and a USB hub. During a meeting, each participant carries a DRU in a similar fashion than an identity badge, hanging from the neck by means of a necklace.

Each DRU contains two MEMS microphones, one at the top for recording the voice of the meeting participant that is carrying it, and another at the front bottom for recording the ambient sound. Since every microphone is subject to crosstalk, the DRUs include these two microphones so that the voice of the other meeting participants, and other ambient noise, can be subtracted from the main audio channel of each meeting participant.

Transmitting the audio through radiofrequency during a meeting is subject to several technical problems that can cause audio degradation and corruption. Instead, each DRU contains an SD card where both audio channels (voice and ambient) are stored, using an audio file format compatible with the DocuMeet transcription engine (WAV, signed PCM 16bits, 16KHz). Once the meeting is over, the DRUs are connected to the Android tablet through the USB hub (Android tablets usually have a single USB port able to provide current for a single device, hence the need for a USB hub supporting 8 devices or more). The tablet then extracts the recordings from all the DRUs and sends them to the DocuMeet server through WiFi. This last part has also been tricky due to the volume of information to be moved (1.8GB for 1 hour of meeting with 8 participants). The transfer process has been optimized, namely all the DRUs transfer their recordings concurrently to the tablet, in order to reduce the time required (one third of the meeting duration, that’s it, 20 minutes for 1 hour of meeting). This was necessary so that the tablet did not run out of battery in the middle of the process (the first prototype needed 16 hours for 1 hour of meeting, while tablet batteries usually do not last more than 10 hours).

Each DRU contains a Lithium-Polymer battery with an autonomy of 7 hours, 2 more than the minimum requested in the DoW. The USB hub is able to charge the DRU batteries of 8 microphones at the same time (it is able to provide 500mA X 8 = 4A), while allowing for transferring the recordings from the microphones to the tablet.

The DRUs contain a Bluetooth radio for receiving commands from the tablet (e.g. start recording). However, in the final version it is not being used since in the end the Bluetooth protocol did not allow for sending commands to all 8 DRUs synchronously, so that all the microphones started to record at the same time (with a maximum time shift between 2 DRUs less or equal than 30ms). Instead, the Bluetooth interaction between the tablet and the DRUs was reimplemented in order to use the USB interface instead, which is faster and more reliable. The usability of the recording kit is slightly reduced, though it is still possible to walk freely around the meeting room while recording a meeting: before the meeting starts, the DRUs must be plugged into the USB hub, and the hub to the tablet, so that the tablet can command the DRUs to start recording. Once the DRUs start recording, they can already be unplugged from the hub and worn by each meeting participant. When the meeting is to end, the DRUs have to be plugged again into the hub so that the tablet can send to them the stop recording command.

An Android app was developed in order to govern the DRUs and to guide the meeting chair through the meeting agenda. The app first ask the meeting chair for its DocuMeet username and password, which are verified against the DocuMeet database through the Internet. Then the app downloads the agendas of the future meetings organized by that meeting chair. The chair is to select the meeting he/she wants to record, assign a DRU to each one of the participants, and press a button in order to start recording the meeting. During the meeting, the app shows the agenda items and allows the chair to annotate the timestamps of multiple events, namely:

• when a meeting agenda ends and the next one starts,
• when a speaker starts and stops speaking,
• when a presentation (e.g. PowerPoint) starts and ends

As well, it is also possible to take notes, which are also accompanied by the timestamp of the moment of their creation.

The timestamps of the speakers are used in order to improve the quality of the automatic summaries, though they are not mandatory. If these timestamps are missing, the DocuMeet transcription engine automatically detects when each participant speaks, though the transcript will be less accurate.

The app also allows for transferring the recordings from the microphones to the tablet, and then from the tablet to the DocuMeet server, where they will be stored and processed.
Finally, the app allows for associating a DRU set with the tablet they are going to work with. Each DRU internally contains a unique serial number, which is composed by 12 hexadecimal digits. In order to easily identify the DRUs, the app allows for associating a number from 1 to 8 to each DRU. This process is as follows:
• DRUs are connected to the hub and the hub to the tablet so that the app detects and shows the serial numbers of each DRU in a table, a serial number per line.
• For each line of the table, the user adds a number from 1 to 8 by dragging and dropping numbered icons.
• The user touches the serial numbers one by one so that the corresponding DRU starts blinking. For each DRU, the user puts an sticker on the DRU with the corresponding number from 1 to 8.

This way, when giving DRUs to meeting participants, they can be easily identified by mean of the stickers on them. If a DRU stops working, its sticker can be placed on a new DRU, and the association process repeated in order to let the app be aware of the change. In order to ease reassociations, the app remembers the previous association so that only the new microphones need to be identified and given a number from 1 to 8.

DocuMeet Transcription Engine:

Initially, the DocuMeet transcription engine (DTE) was supposed to target formal meetings where speaking turns are well defined and manually annotated. By limiting the domain of application of the DTE, USFD wanted to ensure the best accuracy of the automatically generated transcripts. However, as the project evolved, the SMEs had many difficulties in making money solely on formal meetings, hence requested the DTE to be more flexible. Moreover, it was impractical for their meeting users to manually annotate a timestamps each time somebody started and stopped speaking, due to the spontaneous nature of the meetings. Consequently, the DTE has been redesigned for being able to deal with any kind of meeting, and being able to automatically detect when each speaker starts and ends talking (speaker diarisation), and to identify the speaker that is speaking at each moment, even when several speakers talk at the same time, and in spite of crosstalk (voices of other speakers being recorded by other microphones than their owns, since it is impossible to completely isolate a microphone from its surrounding environment). Of course, this has not come without a significant drop on the accuracy of the transcripts, though acoustic modelling has been significantly improved by the inclusion of considerably more training data, Minimum phoneme error (MPE) training, improved Deep Neural Network (DNN) training, and the inclusion and optimisation of Vocal Tract Length Normalisation (VTLN) and speaker adaptive training (SAT). Nevertheless, the current DTE constitutes a fully functional prototype on which additional effort can be put in order to perfect it.
Apart from speech recognition, the DTE includes an automatic sentence end detector and a letter upper-caser, whose purpose is to insert punctuation symbols in the transcript and restore uppercase letters in order to produce more readable transcripts. Moreover, the DTE is able to extract prosodic features that can be exploited by automated processes posterior to speech recognition, such as automatic summarization. For the moment, the sentence end detector is already taking advantage of them.
Automatic transcription is a complex process which requires a significant amount of computing power. In order to reduce the time required for computing the transcripts, the DTE is able to execute several subtasks concurrently on a computing grid, to the extent of being able to compute the transcript of a meeting in a time lapse equal to the duration of the meeting. This has been tested on the computing cluster of the Speech and Hearing Group of USFD’s Department of Computer Science.
The subtasks and their interdependences are defined by means of graphs, which are then applied by means of a graph processing system. This system has been optimized in several in order to smooth the whole DocuMeet loop (plan a meeting, record it, transcribe, summarize, search and review it). Firstly, most modules were updated for efficiency and some bugs were removed. Secondly new modules were added, such as personalisation and adaptation modules, diarisation modules, prosodic feature modules and confusion network output. A novel DNN based speaker adaptation system was developed and published. Furthermore the system graphs were updated to be able to handle multi-channel data per speaker (the voice and ambient channels recorded by each DRU).
Finally, the sentence end detection was considerably improved in several ways: Conditional Random Field based models where enhanced with more data and new features, including a novel distance feature. The new models were trained on more data, and also a new way to train the models on ASR output was defined. Significant performance gains were obtained.

DocuMeet Summarization Engine:

The DocuMeet Summarization Engine (DSE) is conceived for being able to summarize any kind of text, while it is later possible to configure it in order to specialize in particular domains (e.g. meeting transcripts). It was first developed as a stand-alone application (independent of the DocuMeet system), and later integrated into the DocuMeet system by adding a layer on top that converts the output of the transcription engine into the format accepted by the summarization engine. Therefore, the DSE can both be used as a component of the DocuMeet system, as a standalone application, or be easily integrated into other systems.

The DSE performs extractive summarization, meaning that the DSE builds the summary by copying (extracting) from the original text the most relevant sentences only. Extractive summarization can be seen as a classification problem: whether a sentence should be kept for the summary or not. Several algorithms have been implemented and their performances compared, using the ROUGE evaluation tool, the standard of DUC and TAC conferences. The most sophisticated algorithm currently used by the DSE is based on the Maximum Marginal Relevance (MMR) algorithm, which according to the literature performs almost as good as algorithms based on Latent Semantic Analysis, though it is easier to implement and, most of all, easier to be adapt to our use case. The algorithm not only takes into account the relevance of the sentences, but also avoids redundancy in order to produce better summaries. Note that in a text focused in a particular domain, the most relevant concept is going to be mentioned multiple times throughout the text, and probably the same information is going to be repeated in several sentences. If relevance was taken into account only, the summary would contain several sentences providing almost the same information. By taking redundancy into account as well we avoid this repetition, and provide summaries that better cover the spectrum of information of the text to summarize.

Ranking of sentences is based on the computation of vectors of terms. These vectors represent the kind of information provided by a piece of text by means of a list of natural numbers (the frequencies of the words in the text). The DSE computes the vectors of terms of
• the sentences in the text (sentence information)
• the whole text,
• and the summary each time a new sentence is added to it

The DSE then uses the cosine similarity function in order to compute the similarity between the sentences and the text, and between the sentences and the summary. The former provides a measure of sentence relevance with respect to the document to summarize, and the latter provides a measure of sentence redundancy with respect to the summary that is being built. The DSE combines both measures in order to compute a sentence ranking which uses as criterion in order to extract the most relevant and non-redundant sentences.

The DSE was later modified in order to take into account that some words have a higher “semantic weight” than others, that is, some words are considered to be keywords while others are not. When computing vectors of terms, terms that are keywords are given a higher score than others, hence have a greater impact in the computation of the cosine similarity. The system does not simply decide that a word is a keyword or not, but computes another ranking (IDF or inverse document frequency) in order to represent the semantic weight of each word. Of course, depending on the target domain, the IDF table will be different. Therefore, by computing IDF tables from corpora of different domains, it is possible to adapt the DSE for computing better summaries depending on the domain. Indeed, a first IDF table was computed based on free books of the Gutenberg Project, and another IDF table was later computed based on a database of meeting transcripts.

Apart from summarization, the DSE includes as well a keyword extractor. This extractor is based in the same technology than the summarizer: vectors of terms. Indeed, the part of the DSE for the computation and comparison of vectors of terms was implemented as a separate module in order to be reutilized both by the summarizer and the keyword extractor. Moreover, the computation of IDF measures is also based on vectors of terms, and is included in the library for the management of vectors of terms. The keyword extractor basically computes the vectors of terms of the whole text, weighting the term frequencies by means of a given IDF table, and then selects the top ranked terms in the vector.

The summarizer and keyword extractor share as well some common natural language processes for text preprocessing which have also been implemented as separate modules inside a general natural language processing library, in order to facilitate their reutilization. Before being able to compute the vectors of terms, the texts must undergo the following treatments:
• split the text into sentences,
• split the sentences into tokens (simple words),
• compute the part-of-speech of each word in the sentence (whether it is a verb, noun, adjective, etc.),
• lemmatize each word (e.g. transform the verbs into their infinitive forms, transform nouns into their singular form), taking into account their parts-of-speech (e.g. lemma of verb ‘rose’ is ‘rise’, while lemma of noun ‘rose’ is ‘rose’), and
• remove stop words (words that have no semantic weight at all, such as prepositions and conjunctions, and therefore do not need to be taken into account when building the vectors of terms)

WordNet has also been integrated into the natural language processing for trying to take into account synonyms, though our results show that it slightly degrades the quality of the summaries rather than improving them. Future refinements of the exploitation of WordNet may achieve better results.

Finally, interjections produced by the DocuMeet Transcription Engine have also been added to the list of stop words used by the DocuMeet Summarization Engine in order not to mistake them for unknown keywords.

Meeting Browser:

The Meeting Browser is a web interface that allows to plan meetings and later view their transcripts, summaries and statistics and to edit the automatically generated transcripts. The Meeting Browser includes a home page where future and past meetings are presented. One can navigate through these lists, or use a search box in order to look for specific meetings. The meeting search engine is based on Apache Lucene and looks for the entered words in the meeting transcripts, participant lists and meeting description.
The interface for planning new meetings consists in a wizard that guides the user through 4 steps:
• Provide the basic meeting information (title, description, time and location)
• Select the meeting participants
• Write the meeting agenda
• Attach documents or presentations to share with the meeting participants
The selection of meeting participants includes as well a user search box, also based on Apache Lucene, which supports partial matching: one can type “ateknea” in order to obtain the list of all users whose email contains the word “ateknea” (e.g. jm.sastre@ateknea.com, or ferran.candela@ateknea.com). As well, one can just type the first letters of a name or surname and possible candidates will be presented. The user will then choose from the candidates found those that are to participate in the meeting.
The Meeting Browser includes an administration panel where DocuMeet administrators can create, modify and disable/reenable user accounts. The administration panel includes as well a user search box.
A user profile editor has been included so that each user can update their own personal data. The profile editor allows for uploading a photograph, which is later displayed in the meeting planner and the meeting viewer for facilitating the identification of meeting participants.
A meeting plan editor has been included so that meeting plans can be modified.
Once a meeting is recorded and sent to the DocuMeet server, the server must compute the transcript, summary, keywords and generate the compressed audio files for streaming the meeting recording through the Internet. These processes take time, most of all the computation of the transcript. An interface has been included in the Meeting Browser for tracking the progress of these processes.
An interface has been implemented for viewing all the meeting data. The interface includes a page which we have called “meeting graphical summary”, were one can view the meeting keywords, a pie chart showing the participation rate of each speaker, and a Gantt chart showing the duration of each agenda item. Moreover, one can navigate through the different agenda items in order to see the participation rate and keywords per item. Finally, a page for viewing the transcript and summary and for listening the meeting has also been implemented. The meeting player is synchronized with the transcript viewer, meaning that, as the meeting recorded is being played, the corresponding sentence is highlighted in the transcript. The transcript viewer includes a search box in order to look for specific parts of the transcript. As well, it also includes a box with the list of agenda items and another with the list of keywords. The boxes are linked with the transcript so that one can click on an agenda item in order to automatically navigate to that part of the transcript, or click on a keyword in order to highlight its occurrences in the transcript.
One of the work package objectives was to develop a format conversion tool for later being able to use some existent transcript editor in order to correct the transcript. However, given the particularities of the transcripts generated by the DocuMeet transcription engine (words aligned with their timestamps, sentences aligned with speakers, etc.), and for usability reasons, it was preferred to develop a custom transcript editor instead, which would be a part of the Meeting Browser itself. The transcript editor allows for doing the most important corrections detected by CPSL: deleting duplicated sentences due to crosstalk, and correct sentence words that were not properly transcribed.

DocuMeet Server/DocuMeet Data Repository

The DocuMeet system stores all the information in a central repository that has been called DocuMeet Data Repository (DDR). Though the information is indeed centralized in a single server, different kinds of data are stored in 3 different logic containers:
• meeting recordings, transcripts, summaries and keywords are stored as files in the server file system
• user information and meeting management data is stored in a MySQL database, and
• user information and meeting metadata and transcripts are also stored in an Apache Lucene index in order to allow for efficient retrieval of users and meetings by the meeting and user search engines.
Apart from the DDR, the DocuMeet Server hosts as well a set of web services that allow for interconnecting the different DocuMeet modules, except for the Meeting Browser, which has direct access to the DDR (indeed the Meeting Browser has also being hosted by the DocuMeet Server for the sake of simplicity, though it could have been hosted by a separate machine). Following the DocuMeet workflow, the integration points provided by the DocuMeet Server are as follows:
• When planning a new meeting, the Meeting Browser fills the corresponding database tables directly.
• When the meeting chair logs into the system using the Android tablet, the tablet accesses the database through a login web service.
• The tablet downloads the pending meetings of the meeting chair from the DDR database through another web service.
• The tablet uploads the meeting annotations taken during the meeting and the meeting recordings to the DDR through another 2 web services.
• Once the tablet has uploaded all the information to the DDR, it uses another web service in the DocuMeet server in order to signal that postprocessing of the meeting should start (computing the transcript, summaries, keywords, etc).
• The DocuMeet server invokes another web service hosted by USFD in order to obtain the transcript, and then stores it in the DDR. Note that the DocuMeet Transcription Engine requires a complex computing grid, which was supposed to be provided by Koemei, though in the end was provided by USFD.
• The DocuMeet server invokes another 2 web services in order to compute the summary and keywords, and then stores them in the DDR.
• Finally, when searching for a meeting, listening to it and viewing the transcript, summary, keywords, etc. the Meeting Browser extracts the needed information directly from the DDR and sends it to the client’s web browser.

Meeting corpora

Finally, during the DocuMeet project several meetings have been recorded, and a subset of them have been manually transcribed, in order to build a specific domain corpora to be used for training and evaluating the transcription and summarization engines. Recording meetings has been a very difficult task due to technical problems, first with the commercial microphones that where bought in order to have early recordings, and finally with the DocuMeet Recording System. The consortium managed to obtain a total of 5 hours of recordings free of any kind of error (e.g. missing channels, interferences, etc.), around 10% of the total recorded time.

Potential Impact:
During the first period of the project, RTD to SME technology transfer took place at the quarterly
project meetings, where RTD activities for the period are presented and discussed. A training plan was outlined. According to the agreed plan, the periodic consortium meetings would be used to carry out technology transfer sessions where RTD partners transfer the generated foreground to SMEs.
SMEs also contributed to the technology transfer sessions by exposing their background to aid in the definition of alternative and complementary development paths to achieve the intended foreground.
Partners submitted a draft version of the Plan for the Use and Dissemination of Foreground (PUDF) describing dissemination results to date and planned dissemination activities.

The DOCUMEET website was produced and available online at www.documeet.eu since Month 3.

A brochure for use at Trade Fairs and other dissemination events was designed as well as a poster, providing an attractive marketing image which would help disseminating the results to a wider audience during the second part of the project. SMEs undertook discussions in periodic project meetings regarding the exploitation potential of the technology. An exploitation plan was outlined which would continue to be refined during the next months of the project.
The training plan designed in the first period was executed to ensure that the lead-user and end-user SMEs assimilated the project results including the technical, socio-economic and commercial benefits. Tutorial for the use was produced to assist in the training and technology transfer actions.
In the second period of the project, the Exploitation Manager with the assistance from the Coordinator coordinated all the major training/ technology transfer actions between RTDs and SMEs under Work Package 9.
Together they defined the material to be used, training programme and arranged for its production.
The first step in the Training Plan was to continuously promote technology transfer during the whole duration of the project (joint R&D efforts, meetings, sharing of documentation and information), particularly between RTDs and lead-user.
Specifically oriented and extended training actions were organized to be performed together with all SMEs. The SMEs were trained in two sessions. The first one was in a meeting held in London (UK), organized by the SME LAND in September 2014, attended by the RTDs ATEKNEA, UAB and USFD and from the SMEs by LAND. The second one was in a meeting held in Barcelona (Spain), organized by the RTD partner ATEKNEA in September 2014, attended by the RTDs ATEKNEA and UAB and from the SMEs by JCB, CPSL, KOEMEI and GROUPV. The RTDs made the knowledge transfer sessions concerning the prototype and its operation, as well as the software. In this technology transfer session the whole consortium made a complete training on the use of all the partial results of the project.

The public website was updated regularly and improved during the second part of the
project, including information about the project execution. The DOCUMEET website is available
online at www.documeet.eu since M3.

All SME partners disseminated and contributed to the design of the exploitation plan for the project results. Every partner was responsible for dissemination in his own country. Country-overlapped or European-wide activities were carried out together by the project consortium. SMEs were primarily responsible for dissemination in trade fairs, trade magazines and to business contacts, etc. RTDs could also participate in events of this nature. RTDs performed dissemination to the scientific community, subject to approval by the exploitation board. All scientific publications include partner names (with consent), reference to the DOCUMEET project, and reference to EU
funding. SMEs could also participate in events of this nature. Several dissemination materials were produced: website, logo, poster, brochure, articles and project video.

The SMEs, led by the Exploitation Manager, continuously performed activities under Knowledge Management, IP Protection and Exploitation Potential and had discussions on the Exploitation and Use of the Foreground. The main results as exploitable knowledge and exploitable products, are: 1) DOCUMEET whole system; 2) SRU; 3) SMB; 4) SPU

In terms of IP Protection, the foreground of the project is jointly owned by the SMEs and they are discussing to establish a Joint Venture in order to set up a basis for operating the DocuMeet service.

Before the end of the project, SMEs decided to subcontract an external IPR consulting agency, CURELL SUÑOL SLP, to carry out a patent search and analysis. By the external company, several patent families have been identified relating to systems which are closely related to the DOCUMEET technology. After the analysis of the documents found, they concluded that the most general concept of DOCUMEET, seemed not to be patentable due to lack of inventive step.

After an extensive technology watch effort continuously made by the RTD Performers, the consortium believes that currently do not exist similar product in the market. The Consortium partners concluded that with the situation of competitors and the results achieved by the project, the potential commercial exploitation of the results would be viable following some improvements of the prototype. The possibility of searching for financing opportunities, to support these improvements, is an option considered by the SMEs.

List of Websites:
www.documeet.eu

Final Report Summary - DOCUMEET (Transcription, summarisation and documentation of meetings using advanced speech technologies, indexing and browsing capabilities)

Compartir esta página Compartir esta página en las redes sociales

Descargar Descargar el contenido de la página