The project aims to provide extensions to standard unification grammar formalisms on the basis of linguistic and computational considerations in order to improve compatibility with current linguistic and computational linguistic practice and hence improve perspectives for the development of reusable grammatical resources. The extensions can be thought of as consisting of a library of datatypes. A datatype is a type of object that is used to represent information or knowledge combined with the operations on that type of object to manipulate that information. Both the syntax that is employed to refer to objects and operations and the computational implementation of the objects and the operations are important aspects of the datatype library with respect to this project.
Beginning with an inventory of the datatypes that best serve linguistic descriptions in mainstream frameworks such as HPSG, GB, LFG and CG, the project will produce a shortlist of the most prominent of these to be included in the library, and will provide specifications to implement these datatypes within the core of an implementation of the formalism of the ALEP platform, or extensions to that formalism. (The ALEP platform is an environment for 1 lingware development and testing being developed by BIM with Community funding). The datatype library will be accompanied by extensive documentation and made freely available.
The lack of appropriate tools for language engineering constitutes a serious impediment to the wide-scale commercial exploitation of NL research for product development and limits the value of research systems that could be used for developing, testing and improving linguistic theories. To overcome this problem it is important that the appropriate formal concepts and a representation language for the abstract specification of grammatical knowledge are developed. Recent computer science approaches to abstract datatypes have demonstrated that a simple algebraic semantics can be combined with powerful computational techniques, and many of these results transfer fruitfully to computational linguistics. In addition to this, recent developments in theoretical linguistics have reduced the distance between linguistic theory and language engineering.
The project will result in a library of datatypes to accommodate linguistic descriptions together with extensive documentation. The library will facilitate the writing of large-scale grammatical descriptions with a realistic coverage and it will improve the transparent representation of linguistic knowledge and hence the possibilities of employing these grammars in various applications. The library will also reduce the costs of the development of large scale grammatical descriptions. Using such library, grammar writers do not have to develop datatype definitions themselves, although they will still be free with respect to linguistic approach. The results of the project are therefore expected to be of interest to companies that engage in development of grammatical resources for commercial or research purposes. By expanding the datatype to include those used in the major linguistic frameworks the project will open the computational field to mainstream linguistics research; in particular it will make the ALEP ormalism and its implementations an attractive option to both the scientific and the commercial computational linguistic community. The project aims at a widespread acceptance of the library and the related ALEP formalism. To achieve this it is of prior importance that:
project results have the right properties (eg.\efficient, relevant and easy to use),
the library should be accompanied by extensive documentation that is clearly written, accessible to a wide range of users (from grammar writers to computational linguists to theoretical linguists) and presents a wide range of examples of its use,
a number of promotion activities should be undertaken to publicise information about the project results, to distribute the prototype library and to promote actual use of that library.
The first aim of the promotional activities is to inform both the academic and the commercial computational linguistics communities about the results of the project by reporting on the project at international conferences in the form of scientific papers as well as in the form of demonstrations.
The second aim is to distribute the prototype library as widely as possible by making the prototype library and the scientific reports freely available, (if possible through ELSNET). This will allow any interested party to obtain the practical results of the project for research purposes at no cost.
The third aim is to promote actual of the prototype library. For this purpose a workshop at the end of the project will be organised. Part of this workshop will be aimed at demonstrating the practical use of the datatype library for linguistic description to the participants and to give an opportunity for 'hands-on' experience. CEC representatives, parties that have shown an interest in the ALEP implementation and leading figures from the academic and commercial NLP communities will all be invited to attend.
5037 AB Tilburg