Large-scale Grammars for EC languages


The project aims to develop extensible, well-designed, documented and tested lingware for nine EC languages based on a common mainstream software platform (ALEP) by re-using linguistic knowledge embedded in existing grammatical descriptions. LS-GRAM is to act as the kernel of an ALEP User Group providing feedback to the system developers, highlighting formal and computational shortcomings of the system and producing a rule coding manual for lingware developers. The production of lingware and extensive documentation will become part of an ALEP starter kit thus improving the conditions for wide distribution of ALEP.

Approach and Methodology

The project will adopt a staged approach in the execution of the language-specific work. A core consortium of four partners covering three languages (DE, EN, ES) will start in the last quarter of 1993. In a first phase of the project they will do a certain amount of definition work -- e.g. with respect to coverage of core grammars and of the end-of project demonstrator, documentation standards -- and methodological work -- e.g. concerning the re-use of linguistic (i.e. grammatical, lexical) knowledge embedded in existing large-scale grammars -- which will serve as input for the other participants, who will join the project some 6-7 months later.

The scope of the project is somewhat larger for the core consortium than for the other participants. For the three languages German, English and Spanish coverage will be determined on the basis of corpus analysis, which implies that phenomena which are not normally the focus of theoretical linguists (e.g. dates, parentheticals, appositions) will receive due attention. For the 6 other languages a more limited scope is envisaged; basically the aim here is to develop core grammars, which may serve as boot-strapping material for more ambitious future initiatives.

Furthermore, the project intends to make a contribution to grammar engineering methodology by addressing issues of modularity, extensibility and maintainability. The design of the documentation in view of easing the re-usability of the resources will be a priority issue.

Exploitation and Future Prospects

The creation of well-documented grammatical resources developed in a mainstream formalism covering nine languages will provide an attractive basis for training, research and application-oriented projects to build on. The availability of running grammars is expected to boost the dissemination of the ALEP platform within the NLP scientific community. It is hoped that with this larger-scale multilingual effort the standardisation process with respect to tools and methods in NLP will be reinforced and that it will constitute a milestone in the constitution of an EC-wide NLP infrastructure. Ultimately, the results of this project should encourage industrial product developers to adopt mainstream linguistic descriptions for their NLP applications.


