Project factsheets will no longer be updated. All information relevant to the project can be found on the CORDIS factsheet . This is updated on a regular basis with public deliverables, etc.
PRESEMT - Pattern REcognition-based Statistically Enhanced MT
248307 - STREP
At a glance
ICT-2007.2.2 - Cognitive Systems, Interaction, Robotics
The need for cross-border communication among European citizens remains important in a multitude of applications such as information retrieval involving cultural or touristic information as well as technical documents (e.g. a manual or specification sheet that the professional needs to interpret). Thus, the need for effective Machine Translation remains a main concern in the modern EU environment, in particular as the number of the official European languages has been substantially increased with the most recent EU enlargement.
The MT systems available over the web currently generate rather poor translations. Users submitting a text for translation are often provided with a low-quality text, close to incomprehensible. What is required is a higher level of quality that is draft, but comprehensible as far as the average user is concerned. However, due to the natural language complexity it would be probably too complex a task to design a system that can produce translation of an appropriate quality for each and every domain. What is probably of more interest is to: (i) design and make available a system that can be rapidly developed to cover a new language pair, even by a relatively novice user, as well as to (ii) allow the user to extensively modify an existing language pair so that it better matches his/her requirements. Besides, the system will be characterised by the need for limited resources, inherent language independence and the ability to modify the language resources used.
The main requirements for the PRESEMT system are to generate translations fast (resulting in a real-time - or near real-time response) and to be able to develop new language pairs in a simple manner, without requiring specialised linguistic tools. In the modern multilingual environment of the European Union as well as beyond the Union, there exists an increased requirement for creating translation systems even for language pairs for which the availability of the essential linguistic tools is limited.
The PRESEMT project is intended to lead to a flexible and adaptable MT system, based on a language-independent method, whose principles ensure easy portability to new language pairs. This method attempts to overcome well-known problems of other MT approaches, e.g. compilation of extensive bilingual corpora or creation of new rules per language pair. PRESEMT will address the issue of effectively managing multilingual content and is expected to suggest a language-independent machine-learning-based methodology.
The PRESEMT project proposes a novel approach to the problem of Machine Translation by introducing cross-disciplinary techniques, mainly borrowed from the machine learning and computational intelligence domains, in the MT paradigm. To this end, a flexible MT system will be developed, which will be enhanced with (a) pattern recognition approaches (such as extended clustering or neural networks) towards the development of a language-independent analysis and (b) evolutionary algorithms (such as GAs or swarm intelligence) for system optimisation.
The PRESEMT project will result in a fully functional system prototype, available both as a stand-alone application as well as a web-based service. Furthermore, it will provide a language-independent methodology for effectively handling new language pairs.
The work carried out with PRESEMT is intended to lead to an MT system which is readily portable to new language pairs and which can also be customised by the user. This is expected to address the needs of a wide spectrum of potential users who wish to perform machine translation tasks.
Where will the project be present?
Since PRESEMT is cross-disciplinary in nature, the results of the project will be widely disseminated in scientific conferences and workshops which are of relevance, such as events that focus on Machine Translation as well as computational linguistics in general (including the MT Summit series, TMI, the ACL conferences, COLING and LREC). Furthermore, as PRESEMT is cross-disciplinary, specialist conferences will be attended to present advances in specific fields such as the ICPR series (for achievelements in the pattern recognition field) or the WSOM and/or IJCNN series.
More complete scientific results will be submitted to scientific journals. This involves on the one hand pursuing publications in established journals within the area of computational linguistics, while on the other hand presenting the advances achieved in more specific aspects of the system to specialist journals in the areas of parallelisation, machine learning, and evolutionary computation.
Name: George Tambouratzis
Organisation: Institute for Language and Speech Processing, Greece
Back to overview
This page is maintained by: Susan Fraser (email removed)