Objective
Original research objectives
The MATCHPAD project aims at developing quality machine translation systems between Hungarian and Polish languages on the one hand, from English and to French on the other, which can be used effectively. In particular, the aim is to facilitate the communication between official bodies of the EU Member States with the administrations of these two countries, and to prepare their integration. An additional aim is to stimulate the development of commercial products that may help the industry when addressing foreign markets. MATCHPAD project intends to demonstrate and put on the market 2 new MT systems with Hungarian and Polish languages from English and to French, and so provide access for the administrations, other public bodies and private companies of Europe to information and facilitate contacts between administration, citizen and third parties. The research aspect is particularly important in this project because, if Polish language is somewhat similar to the main EU languages, Hungarian is a very different kind of language. Analysis and Synthesis of the Hungarian are quite different, due to the agglutinative structure of this language far more complicated than the languages developed in the existing MT systems. The Matchpad project is indented to develop a new family of analysis tools for the East European languages.
The main tasks to be completed in cooperation between the software company and the Universities specialised in terminology research, each one in its own language, are:
- Hungarian and Polish Analysis
- Hungarian and Polish terminology
- Hungarian and Polish Syntheses
- Machine Translation Systems assembling and debugging
- Software Localisation Participation of administration institutions in each country assures the high quality of the final product and mass dissemination of the results.
More specifically, the main objectives of the project are:
- To create four operational language pairs prototypes for administration texts with dictionaries of about 25 000 to 30 000 terms
- Hungarian to French
- Polish to French
- English to Hungarian
- English to Polish
Expected deliverables
The objectives of the Matchpad project concerns Multilinguality in digital content and services, as we create Multilanguage tools for Machine translation (and indexation) based on powerful language analysis, transfer and generation (synthesis) modules, actually used by large administrations for European languages and their extension to agglutinative languages will be a very important aspect of our research:
- Localization of Systran MT products in Hungarian and Polish;
- two dictionaries for administration texts Hungarian and Polish terminology (about 50 000 terms) with translations in French and English;
- English to Hungarian and Polish to French Machine Translation prototypes;
- Hungarian and Polish to French Machine translation prototypes;
- Four Machine Translation systems (Hungarian and Polish from English and to French) for administration texts.
The main result of this project is first 4 translation engines belonging to 2 mains categories corresponding to 2 different technical implementation (A and B). It is also as side products, morphological analysers for Hungarian and Polish, and derived technologies such as spell-checking. A.EN based Engines: those engines are based on fully mature EN generic analysis module, relying on a 250 000 source form dictionary. The 2 translation systems obtained (ENPL, and ENHU) inherit the very high quality of the English analysis (developed and shared among all language is quite limited (about 30 000 entries). Both engines reach demonstration level quality. B. PL&HU->FR engines: they are based on a new generation of tranlator, whom development was initiated by this project. They are presumably far more modular and evolutive: however, since they do not any previously developed linguistic component; the obtained quality is still low, besides, the associated dictionaries are also small.
The principal effort has been made on the development of the analysis modules which is the basis of the translation engine and less effort on the synthesis. These engines are still prototypic. C. PL&HU->EN: those engines do not exist, but since most of development for B. was focussed on PL/HU analysis, the costs for producing such a system is not very important (see linguistique development) D. Morphological analysers and spell-checking application.
Fields of science (EuroSciVoc)
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques.
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques.
You need to log in or register to use this function
Call for proposal
Data not availableFunding Scheme
CSC - Cost-sharing contractsCoordinator
92044 PARIS LA DEFENSE
France