Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Article Category

Content archived on 2023-03-02

Article available in the following languages:

EU project seeks to remove language barriers

More than half of Europeans can only hold a conversation in their own language. Yet many of us find ourselves working in a multi-lingual environment. For the most part, we rely on professional or online translation services to help us understand documents in other languages, b...

More than half of Europeans can only hold a conversation in their own language. Yet many of us find ourselves working in a multi-lingual environment. For the most part, we rely on professional or online translation services to help us understand documents in other languages, but these often produce inaccurate results. The Statistical Multilingual Analysis for Retrieval and Translation (SMART), an EU funded project launched recently, aims to help reduce such language barriers by applying statistical machine techniques to translation. Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical and information theoretic models. A word or phrase is translated to one of a number of possibilities based on the probability that it would occur in the current context. These techniques are particularly promising for translation purposes, in that they achieve performances equivalent or superior to those of rule-based translation systems which require the manual entry of large numbers of 'rules' by trained linguists, at a fraction of the development effort. There are, however, some identified shortcomings in these methods. For example, even though translations tend to be more lexically accurate using Statistical Machine Translation (SMT) systems than their rule-based counterparts, the text they produce tends to be less fluent. Also, SMT systems are trained in batch mode and are not adaptive to user feedback. 'There have been lots of applications of machine learning techniques to machine translation in the past,' says Dr Craig Saunders, the project partner at the University of Southampton's School of Electronics & Computer Science (ECS). 'The project aims to extend the more traditional methods based on log linear models, and also apply recent developments in machine learning for structured prediction which have lead to many new powerful techniques that show great potential in this area.' Over the next three years, the SMART consortium, which is being led by Xerox's European Research Centre in France, will apply improved statistical machine learning techniques to three user scenarios involving English, French, Spanish and Slovenian. The first scenario will focus on improving the systems used by professional translators. Currently, these systems store a lot of stock phrases, but if a word is translated badly, the system cannot correct itself, explained Dr Saunders. 'We will be looking at how we can make these systems adaptive,' he said. The second scenario looks at the situation faced by customer-support analysts working in call centres. 'It could be that a technician is a native speaker in one language, consulting a manual in another language and talking to a customer in a third language,' noted Dr Saunders. In the case where an analyst is an English speaker with only a smattering of German, an interface could be designed based on machine learning to allow the analyst to type in a search in English to find a document in German. Further development of such a system could even highlight the relevant passages of a text or key words in the returned results. Finally, the third user scenario involves enabling a user to access portions of the multilingual Wikipedia in languages of which they have limited command. The scenarios will be applied to real business environments, involving user groups from innovation-oriented small and medium sized enterprises (SMEs) and Xerox. 'This is the first time that new machine learning techniques are being used in this way,' said Dr Saunders. 'Xerox works across lots of different languages and cross language information access could be very useful in this context; the possibility of posing a query in one language and getting documents back in another is useful in a wide variety of applications. 'We are really trying to develop techniques that will help EU citizens in general, but if we want to try to evaluate the improvement in a quantitative manner, then it is easier to do this in an industrial setting,' Dr Saunders told CORDIS News. 'At the end of the project if the techniques are successful we really want to put up some web demos that the general public can use.'

My booklet 0 0