Interactive corpus-based translation drafting tool

Información del proyecto

TRANSLEARN

Identificador del acuerdo de subvención: LRE61016

Proyecto cerrado

Fecha de inicio 1 Enero 1993

Fecha de finalización 1 Julio 1995

Financiado con arreglo a

Specific programme of research and technological development (EEC) in the field of telematic systems in areas of general interest - Linguistic research and engineering -, 1990-1994

Coste total

Sin datos

Aportación de la UE

Sin datos

Coordinado por

Institute for Language and Speech Processing (ILSP)
Greece

Objetivo

The aim of the project is to provide a computational methodology and, in more practical terms, a toolbox which will aid the human translator working in a particular subset of general language (a sublanguage) in the following two ways:

relieve him from the repetitive part of his work, mostly dealing with specialised types of text
to enhance productivity and translation quality by assisting him through proposed alternative solutions as well as providing sophisticated ancillary tools.

A prototype application demonstrating the validity of the approach and allowing it to be evaluated in terms of translator productivity will be produced as a result of the project. The project will initially consider four languages: English, French, Greek and Portugese.

TRANSLEARN is based upon sophisticated pattern matching techniques, involving both linguisitic and statistical processing, which are used to identify the longest coherent part of source text which has already been translated and stored in a text database in both source and translated form. In the case of a full match between a piece of source text and a database entry, the corresponding translated text can be output automatically. Statistically ranked alternative translations can also be provided, if they exist. If no full match is detected, a reconstruction and optimal evaluation of all the partial matches is performed which is then, together with a confidence measure, presented to the translator. Fragments of source text for which translations above a certain confidence threshold do not exist will be presented to the translator for him to translator for him to translate. The translation is then incorporated into the database for future use. Existing field-proven techniques and utilities will be used for he creation of the database of parallel texts.

TRANSLEARN will collect and investigate a large body of translated texts within a well-defined sublanguage and text type, including the EC CELEX database, select the most coherent and homogeneous set of standard texts, and store these in an appropriately designed text database using existing software text handling and alignment tools. A linguistically and statistically-based pattern-matching mechanism, to be triggered by a source text, will then be developed. The most frequently used fixed locations and syntactic structures in the sublanguage considered will be stored in a separate database, as will statistical data concerning the text database.

Maximum use of existing products and software techniques will be made, and the sublanguages used for the prototype will be from administrative (EC regulations etc) and technical (software documentation) texts. The prototype will limited to fairly simple morphological and syntactic processing, and to known statistical for clustering and taxonomy derivation for fixed locations.

TRANSLEARN attempts to combine the statistical and linguistic/AI approaches (which are often regarded as mutually incompatible) in a synergistic way, and produce a large database of appropriately organized, indexed parallel texts in two sublanguages in an easily accessible form. The prototype software package produced will be a powerful tool of pattern-matching and other intelligent applications. Tools of this kind are expected to turn into highly marketable products, and TRANSLEARN will be marketed both as a stand-alone utility and as an integral part of toolbox with wider scope. It is intended to extend the prototype to cover the remaining EC official languages, and to get feedback on its functionality from translation services dealing with the types of text covered by the project. The prototype may also be ported onto the DOS and Macintosh platforms.

Ámbito científico (EuroSciVoc)

CORDIS clasifica los proyectos con EuroSciVoc, una taxonomía plurilingüe de ámbitos científicos, mediante un proceso semiautomático basado en técnicas de procesamiento del lenguaje natural. Véas: El vocabulario científico europeo..

Programa(s)

Programas de financiación plurianuales que definen las prioridades de la UE en materia de investigación e innovación.

FP3-LRE - Specific programme of research and technological development (EEC) in the field of telematic systems in areas of general interest - Linguistic research and engineering -, 1990-1994

Tema(s)

Las convocatorias de propuestas se dividen en temas. Un tema define una materia o área específica para la que los solicitantes pueden presentar propuestas. La descripción de un tema comprende su alcance específico y la repercusión prevista del proyecto financiado.

Datos no disponibles

Convocatoria de propuestas

Procedimiento para invitar a los solicitantes a presentar propuestas de proyectos con el objetivo de obtener financiación de la UE.

Datos no disponibles

Régimen de financiación

Régimen de financiación (o «Tipo de acción») dentro de un programa con características comunes. Especifica: el alcance de lo que se financia; el porcentaje de reembolso; los criterios específicos de evaluación para optar a la financiación; y el uso de formas simplificadas de costes como los importes a tanto alzado.

Datos no disponibles

Coordinador

Institute for Language and Speech Processing (ILSP)

Aportación de la UE

Sin datos

Dirección

22 Margari Street
11525 Athens
Grecia

Coste total

Sin datos

Participantes (4)

Instituto de Linguistica Teorica e Computacional

Portugal

Aportación de la UE

Sin datos

Dirección

74 5/6 rua Conde de Redondo
1100 Lisboa

Coste total

Sin datos

Knowledge AE

Grecia

Aportación de la UE

Sin datos

Dirección

Gounari 35 & Kanakari 184
26221 Patras

Coste total

Sin datos

Sonovision ITEP Technologies

Francia

Aportación de la UE

Sin datos

Dirección

12 rue de Reims
94701 Maisons-Alfort

Coste total

Sin datos

Birkbeck College, University of London

Reino Unido

Aportación de la UE

Sin datos

Dirección

Malet Street, Bloomsbury
WC1E 7HX London

Coste total

Sin datos

Objetivo

Ámbito científico (EuroSciVoc)

CORDIS clasifica los proyectos con EuroSciVoc, una taxonomía plurilingüe de ámbitos científicos, mediante un proceso semiautomático basado en técnicas de procesamiento del lenguaje natural. Véas: El vocabulario científico europeo..

Programa(s)

Programas de financiación plurianuales que definen las prioridades de la UE en materia de investigación e innovación.

Tema(s)

Las convocatorias de propuestas se dividen en temas. Un tema define una materia o área específica para la que los solicitantes pueden presentar propuestas. La descripción de un tema comprende su alcance específico y la repercusión prevista del proyecto financiado.

Convocatoria de propuestas

Procedimiento para invitar a los solicitantes a presentar propuestas de proyectos con el objetivo de obtener financiación de la UE.

Coordinador

Participantes (4)

Compartir esta página Compartir esta página en las redes sociales

Descargar Descargar el contenido de la página

Interactive corpus-based translation drafting tool

Objetivo

Ámbito científico (EuroSciVoc) CORDIS clasifica los proyectos con EuroSciVoc, una taxonomía plurilingüe de ámbitos científicos, mediante un proceso semiautomático basado en técnicas de procesamiento del lenguaje natural. Véas: El vocabulario científico europeo..

Programa(s) Programas de financiación plurianuales que definen las prioridades de la UE en materia de investigación e innovación.

Tema(s) Las convocatorias de propuestas se dividen en temas. Un tema define una materia o área específica para la que los solicitantes pueden presentar propuestas. La descripción de un tema comprende su alcance específico y la repercusión prevista del proyecto financiado.

Convocatoria de propuestas Procedimiento para invitar a los solicitantes a presentar propuestas de proyectos con el objetivo de obtener financiación de la UE.

Coordinador

Participantes (4)

Compartir esta página Compartir esta página en las redes sociales

Descargar Descargar el contenido de la página

Ámbito científico (EuroSciVoc)

CORDIS clasifica los proyectos con EuroSciVoc, una taxonomía plurilingüe de ámbitos científicos, mediante un proceso semiautomático basado en técnicas de procesamiento del lenguaje natural. Véas: El vocabulario científico europeo..

Programa(s)

Programas de financiación plurianuales que definen las prioridades de la UE en materia de investigación e innovación.

Tema(s)

Las convocatorias de propuestas se dividen en temas. Un tema define una materia o área específica para la que los solicitantes pueden presentar propuestas. La descripción de un tema comprende su alcance específico y la repercusión prevista del proyecto financiado.

Convocatoria de propuestas

Procedimiento para invitar a los solicitantes a presentar propuestas de proyectos con el objetivo de obtener financiación de la UE.