This project aims to implement two bilingual dictionaries (English-French and German-English) in an on-line context sensitive comprehension dictionary. It presupposes that a user has a text on an electronic medium (e.g. CD-ROM) that s/he wants to read. Clicking on a word will display a context dependent translation and on request, background information (up to the full dictionary entry) in the user's mother tongue. The system will reveal if the word is part of a multi-word idiom and will select the appropriate translation depending on the syntactic context.
The classical bilingual dictionaries will be turned into comprehension dictionaries designed for English students of German, French students of English or any people with a limited understanding of the foreign language. Bilingual dictionaries have traditionally been dominated by the requirement of people either composing in the foreign language, or translating into or out of it. Thus, the emphasis of the project is on the process of adapting existing bilingual dictionaries for foreign language comprehension, developing a user interface and on evaluating the user's response to comprehension tools that are integrated in their normal working environment (e.g. word processor).
A prototype of the system envisaged, called LOCOLEX, is already under development by the coordinator. The Compass project will improve this prototype through performance tuning, adding of the German-English language pair, adapting it to the specific needs of comprehending a foreign language and implementing a user interface that integrates LOCOLEX in the user's environment. Hence the consortium is aiming to attain the following objectives:
1. Specification of the features necessary in bilingual comprehension dictionaries,
2. Development of methods to analyse and evaluate existing bilingual on-line dictionaries and to adapt them,
3. Validate the methods applied to existing dictionaries, English-French (Oxford-Hachette) and for German-English (Oxford-Duden),
4. Design and implement a user interface that integrates LOCOLEX into the user's working environment,
5. Evaluate and test the system with users reading foreign language texts.
Approach and Methodology
The project is founded on the insight that recent advances in parsing technology may have made it possible for the look-up device itself to detect relevant features of a word's or phrase's syntactic context. At the same time, significant-sized dictionaries can now be stored in a hand-held or lap-top device. Hence this could support a display of what is being read and a context-sensitive system to look up unknown words and phrases. The system could keep a useful record of what the reader needed to look up, and hence may wish to review or memorise.
The starting points of the project are the LOCOLEX prototype, the English-French SGML-marked machine readable Oxford-Hachette dictionary and the type-setter band of the Oxford-Duden dictionary. The LOCOLEX prototype carries out a morphological analysis of the sentence in which the selected word occurs and a stochastic disambiguation of the word class information. This information is then matched against the dictionary. When words with several meanings are used in a context in which there are no exploitable features that allow one to select the appropriate sense, the entry is structured as a tree and information associated with the most general node is displayed allowing the user to zoom into the appropriate sub sense.
The dictionaries will be adapted to comprehension needs by filtering the non relevant information and many contextualising indicators, by decreasing the metalanguage and reinforcing the treatment of the multi-word lexemes. The hierarchical structure of the dictionaries will be made explicit by transforming the source text of both dictionaries into lexical databases. The conversions starting from SGML and type-setting tape will be compared and conversion guidelines will be drawn up. Lexical gaps, missing words or collocations detected by the statistical analysis of text corpora will be filled. The human look-up process will be analysed to design a user-friendly human-computer interface.
The definition of adequacy criteria as well as testing will be done in user environment by the Universities of Lyon 2 and Bournemouth so that the prototype matches real users needs.
Exploitation and Future Prospects
The project concerns the large number of people who have some knowledge in a foreign language but not enough to read it efficiently. Since texts on electronic media are becoming more and more popular (CD-ROM, on-line newspaper, electronic mail), the number of potential users of this type of device is growing rapidly. Hence the coordinator Xerox may integrate a further development of the prototype in one of its commercial products.
The project will provide methods and tools aimed at facilitating the reuse of existing lexica and at creating machine-processable lexical resources. It differs from other existing projects intending to convert printed dictionaries into computer-tractable ones in the sense that the dictionaries are developed to meet a specific purpose: foreign language comprehension. Secondary results will be an improvement of the University of Tbingen German tagger and a contrastive study and encoding guidelines for two dictionaries' conversions starting from SGML and type-setter format.
The Compass consortium intends to collaborate with the EAGLES lexicon committee and will develop contacts with partners of the ACQUILEX 2 project.
BH12 5BB Bournemouth