Community Research and Development Information Service - CORDIS

No tall tale translation techniques

Auto translators, known as machine translators, are relatively new technologies. While they've come a long way, their accuracy and versatility are questionable. A Spanish university has now developed a machine translator that offers far great scope in both these aspects.
No tall tale translation techniques
A major difficulty in translations is often technical terms and various meanings behind the use of words. Moreover, in businesses where large volumes of translations are required, speed also becomes an issue.

Human language processing technologies are therefore, under pressure to provide accurate translations in record times. The current technology translates roughly ten thousand words per second with eighty five to ninety percent accuracy.

To achieve this, the developers have worked with modular engines, each independently working on finite-state techniques. These finite-states provide it with the speed (attainable on a regular office computer). Part of the high accuracy levels as stipulated is due to the Unicode upon which it is built. Unicode allows for rich text format (RTF) of text that can contain specialized character sets such as is found in Polish. Thus it minimizes incompatibilities.

Additional benefits, such as its open-source coding and easy integration into Internet technologies that do not have machine translation technologies included, make this a commercially viable product. This provides the means to better adapt the technology to suit one's needs, as well as allowing for better debugging techniques.

With special emphasis on Latin (Romance) rooted languages, the auto-translator is capable of treating specific and simple syntactic structures. This includes special features such as number and gender agreement, preposition treatment and correctly placing syntactical elements.

Moreover, the machine translation currently developed also provides scope for minority language pairs such as Occitan, Corsican and Sardinian. Currently a prototype exists for testing and the developers are looking for partners in industry, universities and technological centres.
Record Number: 82154 / Last updated on: 2005-09-18
Domain: IT, Telecommunications