Transterm addresses the problems of enriching terminologies and integrating them into the application dictionaries of NLP systems. It also deals with automatic and semi-automatic construction of application terminologies from corpora.The main objective is to facilitate the use of terminological data in NLP systems thus tackling the critical issue of real site customisation of this type of software. Two classes of users are foreseen: application developers and terminology builders/administrators.
There are three major lines of action:
The elaboration of a standardised generic representation of terminological data enriched with linguistic information, and application specific knowledge derived from terminological resources.
The implementation of a modular portable toolbox allowing a) the assembly and customisation of terminological resources in order to characterize and enrich these resources, check their coherence and merge them with lexical data to create machine-processable lexico-terminological objects and b) semi-automatic terminology extraction from text.
The validation of the tools, methods and formats developed within the project by means of three real site tests involving corporate data and two smaller-scale experiments covering altogether five languages (French, Italian, English, Greek and Portuguese).
Approach and Methodology
The project is based on methods and tools already existing within the consortium, or under development. Results from related EC sponsored projects and from the EUREKA projects GRAAL and GENELEX will be used. It is complementary to GRAAL and GENELEX, which deal with the generic grammatical and lexical components of NLP systems.
The TRANSTERM toolbox will also take into account the known document description means (such as SGML) in order to facilitate both the acquisition and reuse of terminological data. Existing international norms in the field of terminology will be taken into account and links will be established with ongoing standardisation efforts in this field (like LISA TIF) and neighbouring areas (eg. the Knowledge Interchange Format). The software will be developed on a UNIX platform considering emerging standards such as OSF/Motif.
Exploitation and Future Prospects
The project is very much user driven. The industrial consortium members expect to improve the productivity of their applications, especially in the area of automatic indexing. The software toolbox will allow the construction of application specific disambiguation heuristics and descriptions of transformations of identified grammatical constructs into objects conforming to the characteristics of a terminology.
Semi-automatic construction of terminological resources in languages such as Greek and Portuguese will be supported by providing tools usable in these environments. .SP 1 TRANSTERM is expected to lead to pre-industrial prototypes which lend themselves to rapid exploitation by industrial system developers leading to marketable products. Associated services will become more cost-effective. The results of work on standardisation will be made available to the scientific and industrial communities.
The close cooperation of TRANSTERM with the related Eureka projects GRAAL and GENELEX will have a synergetic effect on Community sponsered efforts in Natural Language Processing.