Skip to main content
Ir a la página de inicio de la Comisión Europea (se abrirá en una nueva ventana)
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS
Contenido archivado el 2024-04-19

Creation, Reuse, Normalisation and Integration of Terminologies in Natural Language Processing Systems

Objetivo

Transterm addresses the problems of enriching terminologies and integrating them into the application dictionaries of NLP systems. It also deals with automatic and semi-automatic construction of application terminologies from corpora.The main objective is to facilitate the use of terminological data in NLP systems thus tackling the critical issue of real site customisation of this type of software. Two classes of users are foreseen: application developers and terminology builders/administrators.

There are three major lines of action:

The elaboration of a standardised generic representation of terminological data enriched with linguistic information, and application specific knowledge derived from terminological resources.
The implementation of a modular portable toolbox allowing a) the assembly and customisation of terminological resources in order to characterize and enrich these resources, check their coherence and merge them with lexical data to create machine-processable lexico-terminological objects and b) semi-automatic terminology extraction from text.
The validation of the tools, methods and formats developed within the project by means of three real site tests involving corporate data and two smaller-scale experiments covering altogether five languages (French, Italian, English, Greek and Portuguese).

Approach and Methodology

The project is based on methods and tools already existing within the consortium, or under development. Results from related EC sponsored projects and from the EUREKA projects GRAAL and GENELEX will be used. It is complementary to GRAAL and GENELEX, which deal with the generic grammatical and lexical components of NLP systems.

The TRANSTERM toolbox will also take into account the known document description means (such as SGML) in order to facilitate both the acquisition and reuse of terminological data. Existing international norms in the field of terminology will be taken into account and links will be established with ongoing standardisation efforts in this field (like LISA TIF) and neighbouring areas (eg. the Knowledge Interchange Format). The software will be developed on a UNIX platform considering emerging standards such as OSF/Motif.

Exploitation and Future Prospects

The project is very much user driven. The industrial consortium members expect to improve the productivity of their applications, especially in the area of automatic indexing. The software toolbox will allow the construction of application specific disambiguation heuristics and descriptions of transformations of identified grammatical constructs into objects conforming to the characteristics of a terminology.

Semi-automatic construction of terminological resources in languages such as Greek and Portuguese will be supported by providing tools usable in these environments. .SP 1 TRANSTERM is expected to lead to pre-industrial prototypes which lend themselves to rapid exploitation by industrial system developers leading to marketable products. Associated services will become more cost-effective. The results of work on standardisation will be made available to the scientific and industrial communities.

The close cooperation of TRANSTERM with the related Eureka projects GRAAL and GENELEX will have a synergetic effect on Community sponsered efforts in Natural Language Processing.

Ámbito científico (EuroSciVoc)

CORDIS clasifica los proyectos con EuroSciVoc, una taxonomía plurilingüe de ámbitos científicos, mediante un proceso semiautomático basado en técnicas de procesamiento del lenguaje natural. Véas: El vocabulario científico europeo..

Para utilizar esta función, debe iniciar sesión o registrarse

Programa(s)

Programas de financiación plurianuales que definen las prioridades de la UE en materia de investigación e innovación.

Tema(s)

Las convocatorias de propuestas se dividen en temas. Un tema define una materia o área específica para la que los solicitantes pueden presentar propuestas. La descripción de un tema comprende su alcance específico y la repercusión prevista del proyecto financiado.

Datos no disponibles

Convocatoria de propuestas

Procedimiento para invitar a los solicitantes a presentar propuestas de proyectos con el objetivo de obtener financiación de la UE.

Datos no disponibles

Régimen de financiación

Régimen de financiación (o «Tipo de acción») dentro de un programa con características comunes. Especifica: el alcance de lo que se financia; el porcentaje de reembolso; los criterios específicos de evaluación para optar a la financiación; y el uso de formas simplificadas de costes como los importes a tanto alzado.

Datos no disponibles

Coordinador

GSI-ERLI
Aportación de la UE
Sin datos
Dirección
1 place des Marseillais
94227 Charenton
Francia

Ver en el mapa

Coste total

Los costes totales en que ha incurrido esta organización para participar en el proyecto, incluidos los costes directos e indirectos. Este importe es un subconjunto del presupuesto total del proyecto.

Sin datos

Participantes (8)

Mi folleto 0 0