Project factsheets will no longer be updated. All information relevant to the project can be found on the CORDIS factsheet . This is updated on a regular basis with public deliverables, etc.
TTC - Terminology extraction, translation tools and comparable corpora
At a glance
ICT-2007.2.2 - Cognitive Systems, Interaction, Robotics
248005 - STREP
The TTC project (Terminology Extraction, Translation Tools and Comparable Corpora) aims at leveraging machine translation tools (MT tools), computer-assisted translation tools (CAT tools) and multilingual content management tools by automatically generating bilingual terminologies from comparable corpora in serveral European languages (i.e. English, Grench, German and Latvian) as well as in Chinese and Russian.
In order to cope with the linguistic diversity in any application domain, only a data-driven approach can enable the creation of required language resources. The TTC project will leverage machine translation (MT) tools, computer-assisted translation (CAT) tools and multilingual content management tools by automatically generating bilingual terminologies from web-crawled data in five European languages, including one under-resourced language, Latvian, Chinese and Russian.
The project aims at automatically compiling mono- and bi-lingual terminologies by aligning comparable corpora. Terms in different languages are aligned based on the similarity of words next to them in the corpora (immediate vicinity), the approach is known as lexical context analysis. The system generates candidate translations for single- or multi- word terms. The approach relies on the one-to-one relation between terms and concepts.
The system requires standard bilingual dictionaries to translate the words of the immediate vicinity.
The system will develop methods and tools enabling to build bilingual terminologies for almost any domain. The project plans to apply and test the bilingual dictionaries to:
- CAT tools
- MT systems, in particular MOSES
- multilingual content management
- terminology management
The project will consider the application domains of the aerospace industry, renewable energies and computer science.
Outputs will be comparable corpora, open source terms extraction and alignment tools, an open "platform" based on web services/UIMA and implementations in use cases.
Name: Pauline Boudant
Organisation: University of Nantes
This page is maintained by: Susan Fraser (email removed)