Skip to main content

Descriptive lexical specifications and tools for corpus-based lexicon-building

Objective

DELIS is a multidisciplinary project with three broad objectives:- to contribute to a methodology of dictionary development based on corpus evidence; to produce parallel dictionary fragments in five languages, and to produce software tools supporting this king of lexicographic work.

Its methodological goal is to use syntactic phenomena found in corpus evidence to define properties of lexical semantic classes, individual lexemes belonging to these classes and the readings of such items. Its descriptive goal is to produce a set of parallel dictionary fragments for English, French, Italian, Danish and Dutch, covering selected lexical semantic classes. In parallel with this work, software tools will be specified, implemented and integrated in a common user environment, providing computational support for the lexicographic work and the underlying methodology. These tools will include tools for corpus exploration and for the manual acquisition of lexical knowledge, its management and the population of previously defined type feature based models and its eventual (SGML-based) exportation and presentation in dictionary articles.

DELIS is a concrete, albeit incomplete, example of corpus-based design of multifunctional dictionaries as developed and discussed in the Eurotra-7. It is based on the assumptions that:

the criteria according to which lexical items are classified must be made as explicit, communicable and thus reproducible as possible by binding them to pieces of observable linguistic phenomena;
a single representation formalism, adequately supported by computational tools, leading to a consistent descriptive specification is required to use corpus evidence as a raw material for the linguistic description of lexical items (TFL, as an emerging standard, will be used for this purpose);
tools designed for the handling of descriptive linguistic specifications need to be generic with respect to the linguistic container (ie, independent of the contents), but they must also accommodate initial user requirements and subsequently be tailored according to the results of live testing.

The project software will be produced with the assistance of professional users from a dictionary publishing house and a translation/documentation company, in the form of requirements definition, feedback on specifications and field tests of early prototypes.

DELIS is an interdisciplinary technology-transfer project, making technologies which have been developed and are now beginning to be used in NLP available for lexicographic work in translation/documentation and publishing. It will also make significant contribution to the research areas of linguistic (particularly semantic) description and the integration of typed feature systems and user interfaces.

In particular, DELIS will contribute to a methodology of structuring semantic and syntactic information so that it is independent of editorial tools used to manage formal, typographical and other characteristics of lexical information. Ultimately the creation of product-independent lexical databases that can be used for more than just traditional paper dictionaries is envisaged.

The DELIS prototype will be parameterizable and thus adaptable to the systems and databases used by the various project participants.

Coordinator

Universität Stuttgart
Address
Azenbergstraße 12
7000 Stuttgart
Germany

Participants (7)

Centre for Sprogteknologi (CST), Copenhagen
Denmark
Address
80 Njalsgade
2300 Kobenhaven S
Consiglio Nazionale delle Ricerche (CNR)
Italy
Address
Via Della Faggiola 32
56100 Pisa
Lingsoft Inc
Finland
Linguacubun Ltd
United Kingdom
Address
17 Oakley Road
N1 3LL London
Sonovision ITEP Technologies
France
Address
12 Rue De Reims
94701 Maisons-alfort
Van Dale Lexicografie BV
Netherlands
Address
21C Mariaplaats Po Box 19232
3511 LK Utrecht
Vrije University Amsterdam
Netherlands
Address
1105 De Boelelaan
1081 HV Amsterdam