The Repository Service is one of the service modules belonging to OpenDLib but can also be used in other Digital Libraries. It stores, maintains, and preserves documents.
The structure of the documents that can be handled by a DL can vary considerably. For example, a digital library can contain conference proceedings that are aggregates of other documents (the preface and the articles). Each article, can be disseminated in different ways, for example it can be disseminated both as a text in postscript format (the readable content of the article) and as an audio in MPEG3 format (the speaker presentation). The same digital library can also contain project deliverables. These are likely to have a completely different structure. For example, they may be textual reports, structured into sections, and demos of the project prototypes. A digital library can also support different metadata formats. For example, it can have both a MARC format, used by library professionals, and RFC-1807 format used by the general public. In order to support this variability in DL content, the Repository Service stores and disseminates documents that conform to a powerful document model, the Document Model for Digital Libraries (DoMDL). This model can represent a wide range of document structures and associate any number of different metadata formats with them. Given its flexibility, this model represents the first mechanism implemented by the Repository Service to support the expandability of the DL content. The Repository Service is dynamically configurable along several dimensions. Below we list some of these.
- Publishing institutions. These are the publishing institutions that are entitled to store their documents in the Repository instance.
- Basic collections. A collection is a set of documents that satisfy some commonly established set of criteria. The documents stored in a Repository instance may be organized into basic collections. For example, the documents in a repository managed by a group of Computer Science institutions might be organized into basic collections that reflect the ACM subject schema classes.
- Metadata formats. The Repository is capable of storing multiple metadata formats. The simplest way to specify these formats is to describe them as simple XML configuration files that maintain (for each metadata format) the name and description, plus references to its DTD and to the list of used namespaces.
- Derived metadata formats. The Repository service can automatically derive metadata records from other existing metadata formats. For example, it can be configured to generate a Dublin Core record each time a MARC record is submitted. This automatic generation is executed by a generic procedure whose input is a tuple indicating the source metadata format name, the target name, and a reference to an XML file that maintains the corresponding mapping. This configuration file, called the mapping table, is very easy to define because it maintains the relation between source and target attributes plus a function, or a reference to it, to map source values into target ones.
- Manifestation type. Any view of a document can have several different manifestations, i.e. formats in which the document can be disseminated. For example, a conference paper can be disseminated both as a Postscript file and as a PDF file; the video of its presentation at the conference can be disseminated as MPEG and as AVI files. Manifestations can be physically stored within the Repository, or be handled by other specialized services.
- Derived manifestation type. The Repository service can automatically derive manifestation types from others. For example, it can be configured to generate a PDF manifestation each time a Postscript is submitted. This automatic generation is executed by using appropriate procedures pre-loaded by the service. Other derivations can easily be added by specifying the source and target manifestation type, plus a reference to an internal procedure or to an external program.
The Repository service can be customized by specifying the value of a number of configuration parameters. Some of these correspond to the content configuration dimensions listed above. Others specify the values for variation features, such as the security and the preservation policies that allow additional customization of the document and metadata handling functions.
The values assigned to the parameters are constrained by consistency rules that establish both the legal configurations of each single instance, (e.g. derived metadata formats must belong to the set of metadata formats supported), and the legal configurations of the whole group of instances, (e.g. a publishing institution cannot be associated with more than one instance). By changing the value of the above parameters and by exploiting the flexibility of the document model, the Repository Service can adapt its behaviour to many different situations.