Project factsheets will no longer be updated. All information relevant to the project can be found on the CORDIS factsheet . This is updated on a regular basis with public deliverables, etc.
MOLTO - Multilingual On-Line Translation
247914 - STREP
At a glance
ICT-2009.2.2 - Language based interaction
MOLTO's goal is to develop a set of tools for translating texts between multiple languages in real time with high quality. Languages are separate modules in the toolkit and can be varied; prototypes covering a majority of the EU's 23 official languages will be built.
Tools like Systran (Babelfish) and Google Translate are designed for consumers of information, but MOLTO will mainly serve the producers of information. The quality needs to be good enough so that, for instance, an e-commerce site can translate their web pages automatically without the fear that the message will change. For example, using existing tools, a potential customer can read an e-commerce page written in French and have it translated it into Swedish just to find out whether the shop has something of interest for her. However, potential mistakes can arise for instance where the system has translated a price of 100 Euros to 100 Swedish Crowns (which equals 10 Euros). The customer will probably of course realise there is a mistake but this could be a costly one if company does not notice such an error.
An example of a use case for MOLTO would be a multilingual wiki page, such as seen in Wikipedia . This is characterized by the following:
- many languages (currently 264 languages in Wikipedia)
- many contributors (hundreds of thousands in Wikipedia)
- frequent updates (average in Wikipedia close to 20 per article)
- synchrony between languages (the same information in different languages; updates in one language propagated to the others)
- high quality (grammatically and stylistically flawless text)
The goal of synchrony is where the need for translation comes in. Wikipedia is based on the voluntary work of human translators but the frequency of updates and the multitude of languages make it impossible to achieve full synchrony by human translation. Consequently, a vast majority of the articles can only be found in one language: there are 2.8 million articles in English, but only 0.9 million in the second-largest Wikipedia language, German. Only 25 languages have more than 0.1 million articles. Automatic translation is the only conceivable way to maintain any kind of synchrony through languages and updates.
The above use case is of course highly relevant to the European reality, a union of countries with 23 official languages, where information from all aspects of life needs to be freely exchanged for mutual benefit.
The goal is to develop technology for content providers to create publishing-quality translations automatically. The tools will apply to specific domains and will require an initial adaptation of the translation system to each domain. MOLTO does not promise to scale up to the dimensions of the entire Wikipedia, but aims to produce, as one demonstration of MOLTO technology, a set of articles in the domain of cultural heritage. Ultimately the goal would be to make this adaptation feasible for programmers and translators without specific training in the system, as as automatically as possible. The number of languages initially aimed at is 15, which will include 12 of the 23 official languages of the European Union. The 12 EU-languages are Bulgarian, Danish, Dutch, English, Finnish, French, German, Italian, Polish, Romanian, Spanish, and Swedish, and the 3 non-EU languages are Catalan, Norwegian, and Russian.
The single most important S&T innovation of MOLTO will be a mature system for multilingual on-line translation, scalable to new languages and new application domains. MOLTO will use domain-specific scientific grammars and ontology-based interlinguas. These components will be implemented in Grammatical Framework (GF) which is a grammar formalism where multiple languages are related by a common abstract syntax. GF has been applied in several small-to-medium size domains typically targeting up to 10 languages, but MOLTO will scale this up in terms of productivity and applicability.
The result will be a tool for creating domain-specific translation systems; a set of tools for translators and the general public to translate documents; and three extensive case studies:
- mathematical exercises in 15 languages
- biomedical patent data in at least 3 languages
- museum object descriptions in 15 languages
The patent translator will be exploited by one of the partner companies, but all other results will be freely available as open source software licensed under LGPL.
The generic tools developed MOLTO will moreover make it possible for third parties to create such translation systems with very little effort. Creating a translation system for a new language covering an unlimited set of documents in a domain will be as smooth (in terms of skill and effort) as creating an individual translation of one document.
The main impact is expected to be on how the possibilities of translation are viewed in general. The field is currently dominated by open-domain browsing-quality tools (Google translate and Systran) and domain-specific high-quality translation is considered expensive and cumbersome. As a result of MOLTO, it should be possible for a producer of web documents to automatically generate them in many languages. This technique should apply to a wide range of web documents, within sufficiently well-specified domains. MOLTO will change this view by making it radically easier to provide high-quality translation on its scope of application—that is, where the content has enough semantic structure—and it will also widen this scope to new domains. Socio-economically, this will make web content more widely available in different languages, including interactive web pages.
Name: Emilia Rung
Organisation: University of Gothenburg
This page is maintained by: Susan Fraser (email removed)