MORE: MARC optical recognition

The project evaluates the feasibility of optical character reader (OCR)/intelligent character reader (ICR) as an approach to the retrospective conversion of library catalogues, in printed form, through: development of a prototype tool; integration of prototype into a production environment; test and assessment of methods under real conditions. The retrospective conversion of library catalogues depends equally on character conversion of the data and on coding of the data's structure. Previous work investigated OCR but with only limited automatic treatment of the structure and formatting. Taking as source records a printed national bibliography, the project used state-of-the-art tools in OCR/ICR and integrated these with an office document architecture (ODA)-based approach to structure recognition in order to generate high-quality, universal machine readable cataloguing (UNIMARC)-formatted records.

The results include: specifications of record structure analysis and recognition; prototype workstation for OCR/ICR and structure recognition of printed library catalogue records; sample conversions of printed national bibliographic records; report on feasibility and cost-effectiveness of the approach. Input accuracy, targeted at 99.8%, compares to double keying standards. Input speed, however, is much greater and the treatment of errors more immediate and informative, with document handling largely eliminated. The method is technically and commercially feasible for a catalogue conversion system. As such it would be expected to at least halve human involvement in the process.

