Catalogue with Multilingual Natural Language Access / Linguistic Server


The main objective is to develop a multilingual natural language access system to library catalogues. It will enable users to enter queries in different European languages (initially these will be German, English, French and Spanish) and it will analyse bibliographic information expressed in these languages, extracting keywords to be translated into a language selected by the user. Work on this project may also set future standards for linguistic treatment of library data.
Impact and expected results

CANAL/LS will considerably enhance accessibility of European multilingual library catalogues, using linguistic techniques to ensure that enquiries will result in all applicable responses regardless of language.
It will also help set future standards for linguistic treatment of library data.

Deliverables The fully integrated software package prototype is the main deliverable of the project.

To be publicly available, in line with project progress, will be:

Publications on project results;
A report on domain selection and user requirements;
The interface definition;
Design and architecture report;
Monolingual and bilingual dic-tionary report;
An evaluation of the prototype;
Final report.
Technical approach:

The central component of the system will be a "linguistic server" communicating with library automation systems ("clients") via a protocol to be defined. Messages exchanged between the library systems and the linguistic server will be in SGML format. The linguistic server may be based on different "linguistic engines" of different capabilities and employing different dictionaries. Two versions will be prototyped initially. The server will be "open" in so far as it allows for accommodation of additional languages and additional library systems.

The project has nine workpackages, focusing on software, linguistic and library problems, as follows:

User requirements and domain selection to ascertain user needs for multilingual bibliographic record retrieval and to choose a topic area to keep test dictionaries within practical limits;
Library system/linguistic server interface definition between library systems and linguistic server;
Linguistic server design and architecture of the linguistic server, including specifications of all functions and environment;
Linguistic server development/ library system interface modifying two linguistic systems, SX and EXTRAKT, to interpret SGML-encoded messages from the library system and perform the corresponding actions;
Library system adaptation of BABSY and SABINE participators with interfaces and expanded OPACs;
Dictionary generation - mono and bilingual for French, Spanish, German and English;
Software integration into proto-types for BABSY and SABINE;
System installation and indexing at two sites, linking to OPACs and re-indexing domain records;
Evaluation against initial user requirements.

The dissemination of information will be via conferences and library-specific events, trade fairs, electronic articles, scientific and technical press and at a specially convened workshop.

Key issues:

The primary element of the project is a linguistic server, developed as a "black box" to communicate with library systems. Key issues include:

A client-server protocol (for standardisation) for message exchange between library client systems and the linguistic server;
Data exchange format based on SGML (Standard Generalised Markup Language ISO 8879), for client-server communications;
Preparation of linguistic data (mono and bilingual dictionaries) and suitable tools for linguistic analysis of bibliographic data;
Inclusion of linguistic software and data collections from various European (ESPRIT) and national (DFG, Germany) programmes;
Integration of the linguistic server into library environments.


