This site has been archived on
The Community Research and Development Information Service - CORDIS
Information & Communication Technologies

Language Technologies


Back to overview

Project factsheets will no longer be updated.  All information relevant to the project can be found on the CORDIS factsheet .  This is updated on a regular basis with public deliverables, etc.

TTC - Terminology extraction, translation tools and comparable corpora


At a glance

ICT-2007.2.2 - Cognitive Systems, Interaction, Robotics

248005 - STREP

The TTC project (Terminology Extraction, Translation Tools and Comparable Corpora) aims at leveraging machine translation tools (MT tools), computer-assisted translation tools (CAT tools) and multilingual content management tools by automatically generating bilingual terminologies from comparable corpora in serveral European languages (i.e. English, Grench, German and Latvian) as well as in Chinese and Russian.


In order to cope with the linguistic diversity in any application domain, only a data-driven approach can enable the creation of required language resources. The TTC project will leverage machine translation (MT) tools, computer-assisted translation (CAT) tools and multilingual content management tools by automatically generating bilingual terminologies from web-crawled data in five European languages, including one under-resourced language, Latvian, Chinese and Russian.


The project aims at automatically compiling mono- and bi-lingual terminologies by aligning comparable corpora. Terms in different languages are aligned based on the similarity of words next to them in the corpora (immediate vicinity), the approach is known as lexical context analysis. The system generates candidate translations for single- or multi- word terms. The approach relies on the one-to-one relation between terms and concepts.

Scientific Innovation

The system requires standard bilingual dictionaries to translate the words of the immediate vicinity.

The system will develop methods and tools enabling to build bilingual terminologies for almost any domain. The project plans to apply and test the bilingual dictionaries to:

  • CAT tools
  • MT systems, in particular MOSES
  • multilingual content management
  • terminology management

The project will consider the application domains of the aerospace industry, renewable energies and computer science.

The result

Outputs will be comparable corpora, open source terms extraction and alignment tools, an open "platform" based on web services/UIMA and implementations in use cases.



Contact Person:

Name: Pauline Boudant

Tel: +33-2-40998493

Fax: +33-2-40998412


Organisation: University of Nantes

More »


Back to overview

This page is maintained by: Susan Fraser (email removed)