Please note that the project factsheets will no longer be updated. All information relevant to the projects can be found on the CORDIS factsheet . This is updated on a regular basis with public deliverables, etc.
ACCEPT - Automated Community Content Editing PorTal
288769 - STREP
At a glance
FP7-ICT-2011-7 - Language technologies
ACCEPT is a Collaborative Project – STREP aimed at developing new methods and techniques to make machine translation (MT) work better in the environment characterised by internet communities sharing specific information. Today, anyone can in principle create information and make it available to anyone in the world with internet access. Yet the language barrier remains: however accessible information is, it is still only available to those who speak the language it is written in. ACCEPT’s mission is to help communities share information more effectively across the language barrier, by improving the quality of machine-translated community content.
Objectives and Innovation
Existing Machine Translation engines cannot produce acceptable results for community content. There are a number of reasons for this: community users are not professional writers and generally do not pay heed to spelling, punctuation, or grammar rules; moreover, they use a rather informal tone, often making use of technical jargon and abbreviations; last but not least, they may create content in a language which is not their own. All these factors have a negative impact on the translation performance, not only of traditional MT systems but also of statistical systems which are deemed to be more robust. Another related problem is that the resources needed for building statistical systems are currently limited.
The ACCEPT project aims at increasing the reach of community information by increasing the number of people who can benefit from that information. High quality is essential when translating critical information, for instance in the health domain where poor quality may be life-threatening. The ACCEPT project proposes a new approach to help MT work better for community content, in order to ensure that the result is comprehensible and correct.
The approach consists of following main axes of research and development:
• Development of user-friendly (minimally intrusive) strategies for pre-editing the content for statistical machine translation. The project will identify the most important types of corrections that need to be applied to the source content in order to attain a higher translation quality.
• Development of strategies for post-editing. Ideally, post-editing of the translation results is done by bilingual skilled experts, but the lack of such experts is a major bottleneck. To overcome this bottleneck, the project will develop post-editing strategies which do not require proficiency of the source language, but only of the target language, thus enlarging the pool of (volunteer) skilled experts.
• Improvement of learning and development of feedback loops to improve Statistical Machine Translation (SMT) for community data. SMT systems can learn well from large amounts of similar content in a single domain, but improvement is necessary in areas where resources (parallel data) are sparse and heterogeneous. The project will develop innovative domain adaptation methods and will use linguistic information to cope with these issues. Moreover, it will take into account feedback from the post-editing process to automate corrections whenever possible. Another novel research topic that will be addressed is the use of text analytics for SMT. The project will try to determine if what we know about the content can help produce better translations (for instance, translations that preserve sentiment polarity).
For the first time, pre-editing, MT and post-editing will be linked together not just in a process, or workflow, but by connecting the software components together and by developing new linguistic software components specifically optimised for community content translation.
The target group of the project
The ACCEPT project will be addressing the challenge of removing the language barrier in two slightly different scenarios: firstly, for content in a typical commercial product forum relating to Symantec network security products, and secondly, for content in the community of volunteer translators Traducteurs sans Frontières, an NGO that creates medical, educational and nutritional information for use by people in areas of need. Enabling effective MT for the information which is often created by subject-matter experts rather than professional writers will significantly increase the reach of this information ─ and in particular help Traducteurs sans Frontières better achieve its mission of saving lives by delivering critical information in the right language at the right time.
While the main expected result is the improvement of technologies for translating community content, the effects will be useful for all those who need information instantly and reliably translated into their own language, despite linguistic imperfections.
The ACCEPT project will create sophisticated components for machine translation. The technology developed throughout the ACCEPT project will be made available in the form of demonstrators, which will be used by forum and NGO community members. These demonstrators can be divided into three categories: online source content checking, online MT editing (in both a monolingual and bilingual context) and online content evaluation. This technology will contribute to the take-up of MT in the burgeoning area of user generated content.
The European aspects of the project are as follows: the project will allow citizens across the EU better access to communities in both commercial and non-profit environments. It will also make companies in Europe better able to engage with their customers across the language barrier, making them more competitive and allowing them to expand more quickly across Europe. This will bring appreciable industrial and commercial impact – which can be easily exploited by the project partners.
Scientific and technical innovation involves the removal of bottlenecks in the areas of editing for SMT and in SMT itself, with the specific focus on dealing with non-professional community content, will help drive take-up of MT in this new Web 2.0 paradigm.
The social impact of the project is highlighted in the participation of Traducteurs sans Frontières - whose main focus is helping European NGOs provide better services in multilingual theatres of operations.
Name: Prof. Bouillon Pierrette
Tel: +41 22 379 8679
Organisation: Universite de Geneve , Switzerland
This page is maintained by: Susan Fraser (email removed)