Collaboratory for Annotation, Indexing and Retrieval of Digitised Historical Archive Material


Many historic sources that constitute a major part of the European Cultural Heritage are fragile and distributed in various archives, so that full knowledge and usage of this material is severely impeded. COLLATE aims to implement a WWW-based collaborative work environment for archives, researchers and end-users concerned with digitised cultural material. The repository focuses on historic film documentation, including censorship files, photos and film fragments. Users will take an active part in evaluating sources and adding valuable information. While watermarking and (semi) automatic segmentation, categorization and analysis facilitate high standards in content management such as authenticity and subject identification appropriate task-based interfaces support deep content indexing and annotation. The system exploits these metadata using advanced document processing and XML-based content management and retrieval functionalities.

Project COLLATE aims at the development and practical usage of a content-centric, user-driven information system for the management of surrogates of fragile historic multimedia objects. As a distributed Web-based multimedia repository, it will function as a "collaboratory" supporting distributed user groups by dedicated knowledge management facilities like content-based access, comparison and in-depth indexing/annotation of digitised sources. Historians and domain experts will thus be enabled to provide and share valuable knowledge about the cultural, political and social contexts, which in turn allows other end-users to better interpret the historic material. The system will be evaluated in a realistic application dealing with the heritage of European film-making in the 20ies and 30ies (comprising film fragments, text documents and photos) provided by three major film archives that thus can intensify and extend their existing collaborations.

Work description:
Work in the COLLATE project will be carried out in three main phases. Phase 1 will analyse user and application requirements and the state of the art technology to produce a system specification of the collaboratory. We will define the digital multimedia collection (consisting of a large corpus of censorship documents on historic films, and for a subset enriched documentation including photos and film fragments, which will be comprehensively indexed, annotated and interlinked). Further, an appropriate domain model of the application databases and the necessary software technology will be developed, comprising: a module for the processing, representation and management of digitised documents (including structure analysis, mark-up, and digital watermarking of source material): appropriate knowledge processing tools as indexing aids (schemas, controlled vocabularies); tools for image and video analysis adapted to the purposes of tasks at hand. Based on a task model for different usages of the collaboratory (e.g., annotation, indexing, search, comparison, joint edition of sources) a first prototype of the annotation and indexing interfaces will be implemented. In Phase 2 the application partners will work on "preservation case studies" using the first prototype for indexing and annotating digitised text documents, and this work is subject to intensive empirical user studies. The technology providers will fine-tune the prototype based on these evaluation results, and add more components (e.g., digital watermarking, automatic structure detection for typed documents, images and film fragments, as well as content- and context-based retrieval tools). In the last phase, users will continue their indexing/annotation work, now also including multimedia documents, photos and films. An integrated system version will be implemented and its usage thoroughly evaluated in order to fine-tune the system and develop requirements for a generic interface for similar types of applications and for extending the collaboratory.

- Design, implementation and evaluation of a WWW-based collaboratory (collaborative online laboratory) that provides a comfortable working environment and user interfaces for supporting end-users in their annotation, indexing and retrieval of multi-format, multimedia historical archive material.
- A comprehensive digital multimedia collection on European historic films and film documentation, annotated and interpreted by a multi-national team of experts.
The COLLATE project implemented -A WWW-based collaboratory for archives, researchers and end-users working with digitized historic-cultural material, which provides interfaces for cataloguing, content-based in-depth indexing and annotation of documents, and content-based retrieval functions.
-Methods and tools for digital watermarking of the digitized document collection, combining copyright and integrity watermarks for protecting property rights of the document owners and ensure originality of the material.
-Components for the automatic detection of the layout structure of the documents and for the automatic classification of the documents, on the ground of their layout structure.
-A set of functions that can be used in order to create, instantiate and manage metadata associated to multimedia documents. This requires two phases: designing of the Documents Type Definition (DTD) and implementation of the basic function necessary to create and manage documents expressed in XML and stored within a DBMS which is XML compliant.
-A software component that allows the creation of annotations and nested/interlinked annotations.
-Methods users can employ to query the COLLATE databases in order to discover/retrieve documents, annotations and relationships among them.
-A multidimensional and syntactical document description scheme and a comprehensive domain-specific ontology as indexing terminology.
-A rough classification scheme for visual data which uses semantic descriptors to index visual material. On the primitive level , features are extracted by automatic algorithms on primitive picture characteristics and stored in the DBMS.
-A machine-learning mechanism which generates rules associating patterns on the primitive level with descriptors on the logical and abstract level. Using the rules, new visual material can be indexed on the semantic level automatically.
-An integrated working environment supporting collaboration between expert users. Hence the user interface has not only to provide support for the task to be performed, but also to reflect the ongoing discourse context.
-A middle-ware component supporting COLLATE's integration methodology by connecting each module of the system architecture to a backbone.

Funding Scheme

CSC - Cost-sharing contracts


Hansastrasse 27C
80686 Muenchen

Participants (6)

Schaumainkai 41
60596 Frankfurt
Obere Augartenstrasse I
1020 Wien
Malesicka 12
130 00 Praha
Frederiksborgvej 399
4000 Roskilde
Via Dello Zoosafari 9
72015 Fasano (Brindisi)
Piazza Umberto I, 1
70121 Bari