Linguistic expressions are generally hard to translate, posing a challenge for interpreters and translators in an increasingly globalised world. The EU-funded LATEST (Advanced language technology platform for translators (LATEST)) project developed a new approach to automatically identify multiword expressions (collocations) and translate them. In order to achieve this, the project team considered the translation of multiword expressions as a two-stage process. The first involves extracting multiword expressions in each of the languages in question, and the second involves initiating a matching procedure in each language to propose translation equivalents. The approach circumvented the use of translation resources such as dictionaries, which takes too much time, relying instead on comparable corpora that have been compiled cost effectively. LATEST then tested the approach on English and Spanish, focusing on verb-noun expressions. The project team subsequently used statistical association measures during the extraction phase to estimate the relationship between two words and propose a word combination based on this. It then used distributional similarity methods in the translation phase based on the premise that such expressions feature the same or similar contexts as their translation equivalents. Texts used to test the translations were gathered from news articles, representing an approach that will be used on other language combinations beyond the project's end. With respect to the quality and size of comparable corpora, the project team found that quality is more important than the size of the data to achieve accurate automatic translation. The project's outcomes introduce novel research avenues to the field of phraseology, and specifically computational phraseology. The outcomes were disseminated through events and publications to relevant audiences, opening new areas of discussion in automatic translation and furthering this important quest. As Europe, and indeed the world, comes together, quicker and more natural translation will certainly be considered an asset.
Automatic translation, translation, language technology, multiword expressions, phraseology