This site has been archived on
The Community Research and Development Information Service - CORDIS
Information & Communication Technologies

Language Technologies

back to overview

Project factsheets will no longer be updated.  All information relevant to the project can be found on the CORDIS factsheet .  This is updated on a regular basis with public deliverables, etc.

SAVAS - Sharing AudioVisual language resources for Automatic Subtitling

296371 - CP



At a glance


  • Duration: 24 months
  • Start date: 1 May 2012
  • End date: 30 April 2014
  • Project officer: Susan Fraser
  • website


Nowadays, due to the quantity of the demand and the cost of the process, manual subtitling is no longer feasible. Broadcasters and subtitling companies are seeking for more productive subtitling alternatives. In this context, SAVAS partners aim to acquire, share and reuse audiovisual resources of broadcasters and subtitling companies so that high-tech European ASR (Automatic Speech Recognition) companies can use the shared data to develop domain specific Large Vocabulary Continuous Speech Recognisers (LVCSRs) in new languages to solve the automated subtitling needs of the media industry. Within the project, data and LVCSR technology for automated subtitling will be collected, shared and developed for the following six languages: Basque, Spanish, Italian, French, German and Portuguese.

Objectives and Innovation

SAVAS will exploit the META-SHARE infrastructure and build a SAVAS META-SHARE repository with the audiovisual language resources to be collected and annotated within the project. Their licensing scheme will be discussed and negotiated during the project.
Project partners will advance the state-of-the-art in the following domains: (a) automatic data collection, transcription and annotation; (b) data sharing; and (c) LVCSR technology for automated subtitling. Those objectives will be achieved by developing an advanced speech transcription and annotation methodology based on the combination of automatic and collaborative approaches; exploiting large amounts of already existing spoken data and transcripts; and building and training acoustic and language models for new languages and domains, respectively.
SAVAS participants will provide a META-SHARE compliant infrastructure for the open exchange of speech-related resources and will establish the legal and licensing foundations for the sharing and reuse of audiovisual resources within the European broadcasting community.

Target group of the project

The envisaged target groups of the technology developed within the project are: the broadcast industry including major national and international TV broadcasters, companies supplying TV products and services plus audiovisual market stakeholders; the scientific and research community; and final end-users and society at large, including associations for the right of disabled people.

The result

The main results include: the SAVAS common data repository where the collected and annotated audiovisual language resources will be shared and the transcription and dictation systems which will be developed for the 6 languages covered within the project.


The SAVAS project results will allow a better positioning of European broadcasters and subtitling companies in the market through the development and exploitation of ASR technology. The efficient production of subtitles will allow the provision of better accessibility services to the deaf, hard-of-hearing and ageing population; to promote language learning and, if linked to translation, to export locally produced contents into the global market. Moreover, SAVAS will support EU content providers in producing value-added online content that can be easily distributed, accessed and searched for by all the citizens, thus providing broader markets to national contents. Ultimately, SAVAS will contribute to the growth of the subtitling market and demand, providing a new innovative and more affordable service for the audiovisual sector and the subtitling market.








back to overview

This page is maintained by: Susan Fraser (email removed)