This site has been archived on
ISTweb You are here: Telematics / Libraries / Activities
This is the Telematics for Libraries ARCHIVE.
These projects were part of Framework Programme 4.
The pages are no longer maintained
Digicult Home CORDIS Home Supporting Pages: DE | ES | FR | IT
ACTIVITIES
Introduction
Projects
Support actions
Publications
BACKGROUND
Support documentation
Policy
Statistics
FOCUS
Public libraries
Music libraries
Distance learning
Metadata
Cⅇ
Software
CONTACTS
(email removed)
Test sites

Telematics for Libraries - Projects

MORE

Updated: 15 JUN 99
Project Number and Title
1047 - MARC Optical Recognition
Programme/Action line Call Topic(s) Start End Project Duration
in Months
FP 3/ IV Theme 17 December 1992 October 1994 24
Keywords
OCR/ICR; retrospective conversion; library catalogues; structure recognition; character recognition


Theme
New bibliographic record products and services applying internationally recognised standards (Theme: 17)
Project description
The project had as key goals to evaluate the feasibility of OCR/ICR as an approach to the retrospective conversion of library catalogues, in printed form, through:
  • development of a prototype tool;
  • integration of prototype into a production environment;
  • test and assessment of methods under real conditions.
The retrospective conversion of library catalogues depends equally on character conversion of the data and on coding of the data's structure. Previous work investigated OCR but with only limited automatic treatment of the structure and formatting. Taking as source records a printed national bibliography, the project used state-of-the art tools in OCR/ICR and integrated these with an ODA-based approach to structure recognition in order to generate high-quality, UNIMARC-formatted records.
Technical approach
MORE was divided into three phases: specification, development and evaluation. Within the phases, tasks were scheduled over seven workpackages:
  • Technical specifications;
  • Dictionaries;
  • Structure recognition;
  • Character recognition;
  • Testing & acceptance of software;
  • Prototype;
  • Production test.
The system directly assimilates printed catalogues into machine-readable format via OCR. The tools for character and structure recognition can be configured to process all catalogues which have a sufficiently homogeneous structure.
When errors or other exceptions occur, the image of the original document, with the problem high-lighted, is displayed, with the best estimate solution plus alternatives. Verified data is converted to high quality UNIMARC formatted records.
The developed prototype was tested under production conditions using the 'Bibliographie de Belgique 1973', selected because its records pre-dated current layout standards. Nevertheless the success of the tests clearly demonstrated the viability and potential of the method.
Key issues
The main technical issues explored were:
  • Role and use of dictionaries, both generic and specific application derived;
  • Analysis and modelling of library catalogue data structures;
  • Integration of structure and character recognition tools.
Impact and results
The project will permit the extension and application of existing techniques to other domains of document processing in library catalogues.
The results include: Specifications of record structure analysis and recognition; Prototype workstation for OCR/ICR and structure recognition of printed library catalogue records; Sample conversions of printed national bibliographic records; Report on feasibility and cost-effectiveness of the approach.
Input accuracy, targeted at 99.8%, compares to double keying standards. Input speed, however, is much greater and the treatment of errors more immediate and informative, with document handling largely eliminated.
The method is technically and commercially feasible for a catalogue conversion system. As such it would be expected to at least halve human involvement in the process.
Deliverables
The production-tested prototype can be adopted as a commercial-grade workstation for RECON of printed library catalogues.
Software design and specification documents are deliverables of the project but have restricted availability.
Other published reports cover:
  • An evaluation of the prototype;
  • An evaluation of tests on the 'Bibliographie de Belgique 1973'.


Coordinator

Name of Institution/Organisation Postal Code / City Country
Jouve S.I. F - 75025 PARIS CEDEX 01 FR
Title, First Name, Name Marie-Elise Fréon Address: 18, rue Saint Denis
BP 414-01
Tel: +33-1 44 76 86 20 Fax: +33-1 44 76 86 39
E-mail 1: (email removed) E-mail 2:

Other Partners

Name of Institution/Organisation Country Role
Centre de Recherche Informatique, Nancy FR P
Bibliothèque Royale Albert 1er BE P


Top of Document

  • Field trials


This page is maintained by (email removed)
ISTweb Search ISTweb EC home FP5 home Disclaimer
IST news More links DG Information Society IST calls Back to top