This site has been archived on
ISTweb You are here: Telematics / Libraries / Activities
This is the Telematics for Libraries ARCHIVE.
These projects were part of Framework Programme 4.
The pages are no longer maintained
Digicult Home CORDIS Home Supporting Pages: DE | ES | FR | IT
ACTIVITIES
Introduction
Projects
Support actions
Publications
BACKGROUND
Support documentation
Policy
Statistics
FOCUS
Public libraries
Music libraries
Distance learning
Metadata
Cⅇ
Software
CONTACTS
(email removed)
Test sites

Telematics for Libraries - Project

BIBLIOTECA

Updated: 09 JUN 98
Project Number and Title
2023 - Bibliographic Texts Compositional Analysis
Programme/Action line Call Topic(s) Start End Project Duration
in Months
FP 3/ IV Theme 17 January 1994 December 1995 24
Keywords
OCR/ICR; intelligent document recognition; compositional analysis; keyword indexing; natural language processing; SGML


blue bullet Additional information can be obtained from the BIBLIOTECA website . (http://www.ucm.es/info/VerbaLogica/biblio.htm)


Theme
New bibliographic record products and services applying internationally recognised standards (Theme: 17)
Project description
This project attempts to integrate work in the fields of:
  • automatic retrospective conversion of library catalogues;
  • OCR/ICR technology;
  • natural language processing.
This project attemps to define and implement tools for intelligent recognition, analysis and transformation of information contained in a variety of library type documents.
The project developped a toolbox which allows analysis of the field/subfield structure underlying bibliographic references, dictionary entries, indexes of scientific periodicals etc. An intelligent document recognition system has been developed to enhance existing OCR/ICR technology by better image pre-processing, segmentation and incremental feedback from analysis of the documents. A structured analysis procedure allows the breakdown of texts into shorter units until individual informational elements are made explicit. These elements are then transferred to SGML which enables a normalised structure model to be developed.
The toolbox has been tested on indexes and references in scientific periodicals, tables of contents and catalogue cards.
Technical approach
The project was organised into seven workpackages:
  • Corpus selection and analysis;
  • Intelligent document recognition, enhanced with pre-processing functions, high level character segmentation and incremental processing;
  • Automatic field compositional analysis - outputting field parsers for different document classes;
  • Context based error detection and correction;
  • Conversion to SGML and translation to bibliographic database, CD-ROM and MARC variant;
  • Training, testing and evaluation;
  • Dissemination of information and exploitation.
The project used a 'rapid prototyping' methodology for software development.
Key issues
BIBLIOTECA integrated work in the fields of:
  • automatic retrospective conversion of library catalogues;
  • OCR/ICR technology;
  • natural language processing
to develop a toolbox for analysis of informal and formal field/subfield structures underlying indexes of periodicals, dictionaries, tables of contents and bibliographical references.
It then resolved the following issues:
  • Adaptation of OCR/ICR to librarian needs and upgrade of image pre-processing, segmentation and feedback from document analysis.
  • Devising a flexible system for analysis of semi-formatted documents, with contextual error detection, correction and feedback.
  • Breakdown of texts into explicit elements suitable for transformation into SGML (Standard Generalised Markup Language), which can facilitate conversion to other formats, such as MARC.
Impact and results
The BIBLIOTECA toolbox will substantially decrease the cost, time and effort involved in creation and update of bibliographic databases, by the substitution of manual analysis and key-board entry with intelligent document reading, using scanning, OCR/ICR and artificial intelligence techniques.
The benefits and results of this project include:
  • creation of keyword indexes from table of contents and indexes in books;
  • creation of article databases from content pages in serials;
  • creation of citation indexes from bibliographic references;
  • investigation of the possibility of more advanced 'intelligent' systems for indexing and classification;
  • automatic transformation of card files into standard formats.
Deliverables
The main deliverable from this project is a system which can:
  • produce keyword indexes from tables of contents and book indexes;
  • create article databases from contents pages in serials;
  • generate citation indexes from bibliographic references;
  • provide bibliographic databases from printed bibliographical dictionaries;
  • automatically transform card files into standard formats.
Development software and certain specifications, lists and technical documents were categorised as restricted.
Deliverables in the public domain are:
  • Detailed and top level work plans;
  • Technical reports: text selection criteria, draft framework, field structure, integrity criteria;
  • Appraisal: strengths, weaknesses, costs, benefits, performance;
  • Testing, examples and results;
  • Project reports (including final).
Documentation is available from the contact below and from http://www.ucm.es/info/VerbaLogica/biblio.htm .


Coordinator

Name of Institution/Organisation Postal Code / City Country
Universidad Complutense de Madrid / Verba Logica E - 28040 MADRID ES
Title, First Name, Name Mr. Jaime SARABIA ALVAREZ-UDE Address: Edificio Filosofia B.
Laboratorio de Inteligencia Artificial
Ciudad Universitaria
Tel: +34-9-1-394 60 54 Fax: +34-9-1-394 60 53
E-mail 1: (email removed) E-mail 2:

Other Partners

Name of Institution/Organisation Country Role
Matra Cap Systèmes FR P
Biblioteca Nazionale, Napoli IT P
C.BIC/CISC ES P
Instituto Cervantes ES P


Top of Document

This page is maintained by (email removed)
ISTweb Search ISTweb EC home FP5 home Disclaimer
IST news More links DG Information Society IST calls Back to top