Interactive corpus-based translation drafting tool

Project Information

TRANSLEARN

Grant agreement ID: LRE61016

Project closed

Start date 1 January 1993

End date 1 July 1995

Funded under

Specific programme of research and technological development (EEC) in the field of telematic systems in areas of general interest - Linguistic research and engineering -, 1990-1994

Total cost

No data

EU contribution

No data

Coordinated by

Institute for Language and Speech Processing (ILSP)
Greece

Objective

The aim of the project is to provide a computational methodology and, in more practical terms, a toolbox which will aid the human translator working in a particular subset of general language (a sublanguage) in the following two ways:

relieve him from the repetitive part of his work, mostly dealing with specialised types of text
to enhance productivity and translation quality by assisting him through proposed alternative solutions as well as providing sophisticated ancillary tools.

A prototype application demonstrating the validity of the approach and allowing it to be evaluated in terms of translator productivity will be produced as a result of the project. The project will initially consider four languages: English, French, Greek and Portugese.

TRANSLEARN is based upon sophisticated pattern matching techniques, involving both linguisitic and statistical processing, which are used to identify the longest coherent part of source text which has already been translated and stored in a text database in both source and translated form. In the case of a full match between a piece of source text and a database entry, the corresponding translated text can be output automatically. Statistically ranked alternative translations can also be provided, if they exist. If no full match is detected, a reconstruction and optimal evaluation of all the partial matches is performed which is then, together with a confidence measure, presented to the translator. Fragments of source text for which translations above a certain confidence threshold do not exist will be presented to the translator for him to translator for him to translate. The translation is then incorporated into the database for future use. Existing field-proven techniques and utilities will be used for he creation of the database of parallel texts.

TRANSLEARN will collect and investigate a large body of translated texts within a well-defined sublanguage and text type, including the EC CELEX database, select the most coherent and homogeneous set of standard texts, and store these in an appropriately designed text database using existing software text handling and alignment tools. A linguistically and statistically-based pattern-matching mechanism, to be triggered by a source text, will then be developed. The most frequently used fixed locations and syntactic structures in the sublanguage considered will be stored in a separate database, as will statistical data concerning the text database.

Maximum use of existing products and software techniques will be made, and the sublanguages used for the prototype will be from administrative (EC regulations etc) and technical (software documentation) texts. The prototype will limited to fairly simple morphological and syntactic processing, and to known statistical for clustering and taxonomy derivation for fixed locations.

TRANSLEARN attempts to combine the statistical and linguistic/AI approaches (which are often regarded as mutually incompatible) in a synergistic way, and produce a large database of appropriately organized, indexed parallel texts in two sublanguages in an easily accessible form. The prototype software package produced will be a powerful tool of pattern-matching and other intelligent applications. Tools of this kind are expected to turn into highly marketable products, and TRANSLEARN will be marketed both as a stand-alone utility and as an integral part of toolbox with wider scope. It is intended to extend the prototype to cover the remaining EC official languages, and to get feedback on its functionality from translation services dealing with the types of text covered by the project. The prototype may also be ported onto the DOS and Macintosh platforms.

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

FP3-LRE - Specific programme of research and technological development (EEC) in the field of telematic systems in areas of general interest - Linguistic research and engineering -, 1990-1994

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Data not available

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Data not available

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Data not available

Coordinator

Institute for Language and Speech Processing (ILSP)

EU contribution

No data

Address

22 Margari Street
11525 Athens
Greece

Total cost

No data

Participants (4)

Instituto de Linguistica Teorica e Computacional

Portugal

EU contribution

No data

Address

74 5/6 rua Conde de Redondo
1100 Lisboa

Total cost

No data

Knowledge AE

Greece

EU contribution

No data

Address

Gounari 35 & Kanakari 184
26221 Patras

Total cost

No data

Sonovision ITEP Technologies

France

EU contribution

No data

Address

12 rue de Reims
94701 Maisons-Alfort

Total cost

No data

Birkbeck College, University of London

United Kingdom

EU contribution

No data

Address

Malet Street, Bloomsbury
WC1E 7HX London

Total cost

No data

Objective

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (4)

Share this page Share this page on social networks

Download Download the content of the page

Interactive corpus-based translation drafting tool

Objective

Fields of science (EuroSciVoc) CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s) Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s) Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (4)

Share this page Share this page on social networks

Download Download the content of the page

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.