TRANSLEARN: interactive corpus-based translation drafting tool

Translation work is very frequently characterized by two parameters: repetition and high demand on quality. This is particularly true for translation of technical and administrative documentation. The project tackles this problem by providing a computational environment, in more practical terms a toolbox that will: rid translators of the repetitive part of their work by reusing existing human translations and learning from them; enhance quality and consistency of translation by being able to integrate ancillary translation tools. Parallel texts of about 5.5 Mwords, in English, French, Portuguese and Greek have been processed and a large portion has been normalized, lemmatized, tagged and aligned at sentence level. Experiments for alignment below the level of sentence have been made yielding promising results. Two matching algorithms requiring shallow linguistic processing, have been implemented in the system's text matching tool, computing perfect and fuzzy matches between compared sentences. Fuzzily matching translations are post-edited and stored for future use, enabling the system to learn new translations. TRANSLEARN succeeded in combining numerical/statistical and symbolic/knowledge-based approaches to natural language processing (NLP), which are often regarded as mutually incompatible. The prototype software package produced is a powerful tool for pattern-matching and other intelligent applications. TRANSLEARN is a stand-alone utility or an integral part of workbench of wider scope.


