METADATA ENGINE

Project Information

META-E

Grant agreement ID: IST-1999-20021

Project closed

Start date 11 September 2000

End date 10 September 2003

Funded under

Programme for research, technological development and demonstration on a "User-friendly information society, 1998-2002"

Total cost

€ 2 954 206,00

EU contribution

€ 1 599 537,00

1 599 537,00

1 354 669,00

Coordinated by

LEOPOLD FRANZENS UNIVERSITAET INNSBRUCK
Austria

Objective

"Metadata" are playing a significant role in "digital preservation": Firstly, they are, in conjunction with emerging standards (such as XML, EAD, Dublin Core or RDF ), among the most promising ways to keep digital material "alive" over the years and decades. Secondly, metadata are needed for all kinds of resource discovery, i. e. using and accessing digital collections in a user-friendly way. The METADATA ENGINE project picks up these considerations and will develop software modules in order to automate metadata capturing by introducing layout and document analysis as a key technology for digitisation software. METAe will enhance dramatically the quality of creating and maintaining digital collections of printed material such as books and journals.

Objectives:
The METAe project will address the need for an automated generation of metadata during the conversion of printed documents and thus be able to make large scale digitisation of printed material, such as books and journals, more reliable in terms of digital preservation, more cost-effective in terms of automation, and more user-oriented in terms of future applications.
In order to achieve these aims the METADATA ENGINE project will
(1) introduce layout and document analysis to be employed as a key technology in future digitisation software,
(2) develop capturing and conversion tools for the automated recording and generation of administrative and descriptive metadata,
(3) develop an omnifont OCR-engine specialising in processing old European typefaces of the 19th century,
(4) strictly obey emerging standards in the fields of digital preservation and resource description, such as XML, EAD, TEI, or ISO 12083,
(5) develop a XML search engine capable for retrieving the tagged full text and the images.

Work description:
The METAe project will develop a software package which extensively automates and improves the generation of metadata by applying new technologies for character, layout and document recognition, and converts the captured information into XML documents. These XML files will serve as a basis for a variety of applications, such as new XML search engines, navigation tools, electronic books, audio books, or the automated production of HTML, XHTML, PDF or PS files.
The METAe package consists of (1) an input module for scanning printed material and importing existing bibliographic metadata, (2) an omnifont character recognition module (OCR-engine) specialising in typefaces of the 19th century, (3) a document analysis module capable of classifying pages according to their physical and logical structure (items such as title pages, table of contents pages, etc., will be recognised automatically), (4) a page layout analysis module capable of analysing and segmenting page elements such as page numbers, headings, captions, footnotes, pictures, highlighted phrases, or graphical separators, (5) a knowledge base providing a controlled vocabulary and rules for the recognition process (the table of contents is, in most cases, called "contents"), (6) a conversion module assembling an XML document containing all recognised metadata, and (7) an export module for the XML enriched document and the scanned image.
The XML documents will be generated according to emerging standards for digital preservation and the electronic interchange of information such as RDF, DC, EAD, TEI, or ISO 12083.
In order to introduce a wide public to the new features of accessing and browsing images and XML-marked full texts, a METAe search engine and web application will be developed as well.

Milestones:
1. The METADATA ENGINE will be the main software package for the automated generation of descriptive, administrative and technical metadata during the digital conversion process and the assembling of XML documents.
2. The METAe OCR engine will be an omni-font OCR engine specialising in Fraktur and old European typefaces of the 19th century. Historical dictionaries for five European languages will complete this OCR engine.
3. The METAe search engine and web application will be developed in order to show the new possibilities in retrieving and accessing digital converted documents which have been processed by the METADATA ENGINE

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

FP5-IST - Programme for research, technological development and demonstration on a "User-friendly information society, 1998-2002"

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

1.1.2.-3.2.4 - Digital preservation of cultural heritage

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Data not available

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

CSC - Cost-sharing contracts

Coordinator

LEOPOLD FRANZENS UNIVERSITAET INNSBRUCK

EU contribution

No data

Address

INNRAIN 52
6020 INNSBRUCK
Austria

Total cost

No data

Participants (12)

ABBYY EUROPE GMBH

Germany

EU contribution

No data

Address

ANGLERSTRASSE 6
80339 MUENCHEN

Total cost

No data

BIBLIOTECA STATALE A. BALDINI

Italy

EU contribution

No data

Address

VIA DI VILLA SACCHETTI 5
00197 ROMA

Total cost

No data

BIBLIOTHEQUE NATIONALE DE FRANCE

France

EU contribution

No data

Address

QUAI FRANCOIS MAURIAC
75706 PARIS

Total cost

No data

CCS COMPACT COMPUTER SYSTEME GMBH

Germany

EU contribution

No data

Address

SCHWANENWIK 32
22087 HAMBURG

Total cost

No data

FRIEDRICH-EBERT-STIFTUNG E.V.

Germany

EU contribution

No data

Address

GODESBERGER ALLEE 149
53175 BONN

Total cost

No data

INTERUNIVERSITAERES INSTITUT FUER INFORMATIONSSYSTEME ZUR UNTERSTUETZUNG SEHGESCHAEDIGTER STUDIERENDER

Austria

EU contribution

No data

Address

ALTENBERGERSTRASSE 69
4040 LINZ

Total cost

No data

KARL-FRANZENS-UNIVERSITAET GRAZ

Austria

EU contribution

No data

Address

UNIVERSITAETSPLATZ 3
8010 GRAZ

Total cost

No data

NATIONAL LIBRARY OF NORWAY, RANA DIVISION

Norway

EU contribution

No data

Address

FINSETVEIEN 2
8607 MO I RANA

Total cost

No data

SCUOLA NORMALE SUPERIORE

Italy

EU contribution

No data

Address

PIAZZA DEI CAVALIERI 7
56126 PISA

Total cost

No data

THE UNIVERSITY OF HERTFORDSHIRE

United Kingdom

EU contribution

No data

Address

COLLEGE LANE
AL10 9AB HATFIELD, HERTFORDSHIRE

Total cost

No data

UNIVERSIDAD DE ALICANTE

Spain

EU contribution

No data

Address

LUGAR CAMPO RABASA 99
03690 SAN VICENTE DEL RASPEIG (ALICANTE)

Total cost

No data

UNIVERSITA DEGLI STUDI DI FIRENZE

Italy

EU contribution

No data

Address

PIAZZA SAN MARCO 4
50121 FIRENZE

Total cost

No data

Objective

Fields of science (EuroSciVoc) CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s) Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s) Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (12)

Download Download the content of the page

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.