Skip to main content
European Commission logo print header

A Digital Library Testbed to support Networked Scholarly Communities

Deliverables

The Scholnet Multimedia Document Storage and Delivery Service (SMDS) is a web-based service that supports the upload, storage and both the real-time streaming and the download delivery of stored multimedia documents across the Internet. The SMDS was implemented in C# and ASP.NET using a multi tier distributed system running well established technologies like Microsoft Windows 2000 Server, Microsoft Internet Information Server, Microsoft SQL Server 2000 and Realnetworks Realserver. The SMDS was designed to handle multimedia documents in non-composite (mpeg-1 and mpeg-2) as well as composite form (SMIL). Communication between the SMDS and other Scholnet services usually takes place by using the OpenLib protocoll including verbs to administrate the SMDS, but remote administration of the SMDS by means of a web based administrative interface is also possible. Regarding Scholnet the SMDS is used to provide the ordinary users of the system not only access to all the multimedia documents that are available, but also to provide upload mechanisms to expand the number of available documents. During the testing phase of the project the SMDS has been proved to work stable without major problems. Due to SMDS generic design it should be possible to reuse the existing system without too much effort.
Scholnet is a digital library service system to support networked scholarly communities. It provides traditional digital library services in addition to support for non-textual data types, hypermedia annotation, cross-language search and retrieval, and personalized information dissemination. This system enables members of a networked scholarly community to learn from, contribute to, and collectively build upon the community's discipline-oriented digital collections. The digital library can be used actively by the members of the community in everyday individual and/or collaborative tasks and would be regularly updated and extended. Scholnet System extends the functionality provided by the OpenDLib digital library service system, with the following services: - Multimedia Document Storage and Delivery Service. Scholnet System extends current multimedia database technology in order to integrate a multimedia support server into the OpenDLib architecture. - Hypermedia Annotation Service. A server supports the maintenance of collective knowledge and collaborative work. - Multilingual Information Access Service. This service enables monolingual access, search and retrieval in all languages used by the community. Two simple cross-language mechanisms are supported: controlled vocabulary searching using a multilingual thesaurus, and free-text searching. - Personalised Information Dissemination Service. This service is implemented by a server, developed using software modules developed by the EUROgatherer Project (IE-8011) for user personalisation and information pushing. These new services, as all the other OpenDLib services, are customizable to better satisfy the needs of the user community that will use it.
The Search Service is composed by three basic OpenDLib distributed components: the Query Mediator, the Index, and the Browse Service. The Query Mediator service dispatches queries to appropriate Index Service instances. It adapts its behaviour by taking into account the available Index Service instances, and therefore exploits the potentiality of the Index Service to the full. The Index service accepts queries and returns documents matching those queries. The Index Service is parametric with respect to the metadata formats, to the set of indexed fields, to the set of result formats and to the language of the terms. It can harvest documents using the appropriate protocol or can load the document stored locally. The Browse Service supports the construction of indexes for browsing and the actual browsing of these indexes on library contents. It is parametric with respect to the metadata formats, to the set of browsable fields, and to the set of formats for result sets. It can harvest documents using the appropriate protocol or can load the document stored locally. The three different components can be instantiated on the same server or can be distributed world wide in accordance with the administrator choices. In the first case, the Search Service is very similar to other proprietary solutions; in the latter the Query Mediator Service automatically drives the query through the best Index Service being able to overcome the following failures: Network Partitioning (when a server becomes unavailable due to a failure in the network); Server Failure (when a server becomes unavailable due to failure - hardware or software); Network Latency (when the response time of a server becomes abnormally slow due to network overload - or other problems); Server Latency (when the response time of a server becomes abnormally slow due to server overload - or other problems); Protocol Failure (when a server response to protocol requests with incorrect or invalid responses).
System integration is a challenging task in developing an advanced integrated system in a multinational, multi-partner project. Building upon the line of the SCHOLNET system architecture, which is determined by the paradigm of relatively independent service verbs, an approach has been developed and implemented to create a highly flexible and powerful user interface component that enables the adequate dynamic integration of the various service verbs. The high flexibility and the dynamic character of the service verbs coming with their own service specification at runtime required for a highly flexible user interface approach. This was the reason for using leading edge user interface models going beyond state-of-the-art form based user interfaces. For the SCHOLNET integration a user interface model is based on XForms, the upcoming standard for form-based user interfaces of Web applications has been chosen. Its innovative features are: - Extended client side interactions enabling the construction of dynamic forms, e.g. for retrieval support, that adapt to the current business step requirements, e.g. by dynamically adding new form elements, as well as an extended submit logic, - Client-site validation of structural as well as value-based constraints on the user interface instance data, - A clear conceptual separation of data, control, and layout following the model view controller approach, which enables the support of different UI agents and even of service interfaces, - The systematic support for flexibly grouping form elements, including declarative approaches for dynamic collections of instance data, as well as for the grouping of entire forms into larger, meaningful units with respect to the implemented business process (multi-form dialogs). The introduction of a powerful and flexible form-based user interface model makes the integrated SCHOLNET Web application interface more flexible, effective, and responsive. In order to achieve full flexibility it is combined with a Web service based method for service (verb) activation according to the user interaction results. These features proved very useful in the complex SCHOLNET integration task. The developed approach is not restricted to the application in the SCHOLNET approach. It can be considered as a case study on exploiting next-generation standard Web technology for building integrative user interfaces for autonomous distributed service environments.
DoMDL consists of a set of entities with attributes connected by several types of relations and restricted by constraints. Each of the entities, described below, models a particular aspect of a document. Document: a document is an abstract entity that represents a distinct intellectual creation. A URN names a document. Version: we recognize a document through its individual instances. An instance is a given edition of a document. The first instance of a document could be, for example, a document that describes an idea in a draft form; a second one could be the document reviewed after experimentation; and a final one the published work in a paper. An instance of a document is called a version. View: each view is a specific intellectual perception of a document instance. The boundaries of the view are defined so as to exclude aspects of physical form that are not integral to the intellectual perception of a document. Every document instance is perceived through one or more views. A document may be perceived by a user, for example, thought its description, or its summary. DoMDL distinguishes between two different classes of view: metadata, and content. Each of these classes' models specialized views that have different constraints and relationships. The Metadata View provides structured information about the document instance and its views. Typically, index services or other discovery or browsing services use this cataloguing information to facilitate location of digital objects in a digital library. The Content View is the specific intellectual form that the content of the document itself takes in order to be perceived by the user. Defining a specific intellectual form as an entity allows relationships to be established between specific content views of a document. The Content View elements may be used to identify, for example, perceptions of the same text in different languages, or a silent perception of an audio-video document and its audio summary. Manifestation: a manifestation is the physical form of an intellectual perception of the document. Each manifestation may be disseminated under the form of either a single or a set of items. The items are identified as a sequence with an ordinal attribute. The content type of the items in the set must be the same. A content type is expressed as a MIME type such as image/tiff or text/html. The entities described above are linked by the following relations. Has_version: the domain of this relation is the document entity and its range is the version entity. A specific document has at least one version and the same version cannot be the instance of two different documents. Has_view: the domain of this relation is the version entity and its range is the view entity. It is used to express how a user may perceive a document instance. Every document instance is perceived through one or more views. A book, for example, may be perceived by means of its metadata descriptions and by means of two or more content views in different languages. Two document instances cannot share the same view. Has_metadata: the domain of this relation is the content view entity and its range is the metadata view entity. It is used to associate metadata with a specific content view in order to describe it. Has_part: the domain and range of the has-part relation is the content view entity. It makes it possible to manage views that are realized as a hierarchy of parts that are slices of a single intellectual creation. It describes a view as an aggregation of a set of other views: for example, a book view of a document may be perceived as organized into a set of chapters where each of these parts, in turn, has a set of paragraphs. Sharing of parts is not possible. Is_an_image_of: the domain and range of this relation is the content view entity. It is used to indicate that one view is the image of another. As such, it has exactly the same metadata description, and its structure and relations are also the same. For example, if each of the articles of a journal has been registered as an independent document, the journal view may be defined as the aggregation of a set of views that are images of the content views of the journal articles. Is_a_specialization_of: the domain and range of this relation is the content view entity. This relation links a view to its specialization. A volume of a journal, for example, may be perceived as a set of articles or as a set of numbers: the two sets can be modelled as specialised views of the journal view. Has_manifestation: the domain of this relation is the view entity and its range is the manifestation entity. It is used to express the relationship between the intellectual and the physical form of a document. For example, a textual view of a document may have two different manifestations: one in pdf format and another in postscript format. Each view with no parts has at least one manifestation.
An ever-expandable DL may well contain an extremely large and heterogeneous content. The growth of the content space does not necessarily result in a benefit for the DL users. The heterogeneity forces them to use generic services such as, for example, generic query languages. Thus a large content may result in a loss of precision with respect to retrieval and in a degradation of performance. The solution that is usually proposed to this problem is to build a number of specialized portals that provide different views of the DL. These portals may offer partial views of both the content space and the set of services. In a highly expandable DL this solution presents a number of drawbacks if it is not appropriately implemented. In particular, the portals are the expression of the needs of the user communities, thus any time a new community is added, or a community changes its requirements, a new portal must also be developed. In order to overcome these drawbacks, we decided to design a more general solution that could provide the basis for building a dynamic set of virtual views of both the content space and the available services. This solution is based on the use of content space mediator services. The Collection Service is the first of these general utility services that we have developed. The Collection is a service that mediates between the virtual dynamic organization of the content space, built according to the requirements of the DL community of users, and the concrete organization into basic collections of documents held by publishing institutions. The virtual organization consists in a number of hierarchically structured subsets of the DL documents that we call ''collections''. Each collection is characterized by a set of criteria, the membership condition that must be satisfied by all its members. Examples of membership conditions are: ''all the documents published by a certain institution'', and ''all the documents on a certain subject published after a certain date''. Each community of users has the possibility to define dynamically its own virtual collections by specifying the name of the collection, and providing a textual description and its membership condition. The Collection Service accepts collection creation requests and processes them. In particular, it generates a set of collection descriptive metadata from information gathered by sending requests to other digital library services. The Collection Service disseminates the list of existing collections and their metadata, on demand. All the application services can exploit this information when implementing customized views of their functionality. For example, the Query Mediator can make a particular search operation available on the collection ''Recent Italian Computer Science'' that accept queries with fields extracted from the RFC1810 schema, and terms selected from an official Italian translation of the ACM schema vocabulary. The User Interface service can exploit the same information to show a menu that lists the available collections to the users. The User Interface can then visualize the services available on a selected collection. The application service can use the collection metadata not only to provide a community with a customized view of their functionality but also to improve their efficacy. Let us illustrate this point by considering again the behaviour of the Query Mediator Service. By exploiting the list of publishing institutions that submit documents to the collection, the Query Mediator can derive the subset of the Index instances that index the documents in the collection. It can thus improve query performance by sending requests only to this restricted subset of Index instances. Moreover, it can improve the retrieval precision by appending the filtering condition that characterizes the collection to the user query. The Collection Service is an example of a basic service tool that operates as a content space mediator.
The Multilingual Thesaurus Service is one of the Scholnet services. It is an autonomous service based on the SIS Thesaurus Management System (SIS-TMS) which consists of a tool to develop multilingual thesauri and a terminology server for cataloguers and for distributed access to heterogeneous electronic collections. The communication between the Multilingual Thesaurus Service and Scholnet takes place via the OpenLib Protocol. The Multilingual Thesaurus Service provides the capability to view and/or update the SIS-TMS database through XML documents. The basic notion of SIS-TMS is a thesaurus concept, which is described by an XML document. The user is able to view and/or edit this document. The SIS-TMS database includes the English ACM classification scheme and the French (section I), Italian, Czech, and Swedish translations of the ACM classification scheme. A limited set of German and Greek terms has also been loaded.
The Repository Service is one of the service modules belonging to OpenDLib but can also be used in other Digital Libraries. It stores, maintains, and preserves documents. The structure of the documents that can be handled by a DL can vary considerably. For example, a digital library can contain conference proceedings that are aggregates of other documents (the preface and the articles). Each article, can be disseminated in different ways, for example it can be disseminated both as a text in postscript format (the readable content of the article) and as an audio in MPEG3 format (the speaker presentation). The same digital library can also contain project deliverables. These are likely to have a completely different structure. For example, they may be textual reports, structured into sections, and demos of the project prototypes. A digital library can also support different metadata formats. For example, it can have both a MARC format, used by library professionals, and RFC-1807 format used by the general public. In order to support this variability in DL content, the Repository Service stores and disseminates documents that conform to a powerful document model, the Document Model for Digital Libraries (DoMDL). This model can represent a wide range of document structures and associate any number of different metadata formats with them. Given its flexibility, this model represents the first mechanism implemented by the Repository Service to support the expandability of the DL content. The Repository Service is dynamically configurable along several dimensions. Below we list some of these. - Publishing institutions. These are the publishing institutions that are entitled to store their documents in the Repository instance. - Basic collections. A collection is a set of documents that satisfy some commonly established set of criteria. The documents stored in a Repository instance may be organized into basic collections. For example, the documents in a repository managed by a group of Computer Science institutions might be organized into basic collections that reflect the ACM subject schema classes. - Metadata formats. The Repository is capable of storing multiple metadata formats. The simplest way to specify these formats is to describe them as simple XML configuration files that maintain (for each metadata format) the name and description, plus references to its DTD and to the list of used namespaces. - Derived metadata formats. The Repository service can automatically derive metadata records from other existing metadata formats. For example, it can be configured to generate a Dublin Core record each time a MARC record is submitted. This automatic generation is executed by a generic procedure whose input is a tuple indicating the source metadata format name, the target name, and a reference to an XML file that maintains the corresponding mapping. This configuration file, called the mapping table, is very easy to define because it maintains the relation between source and target attributes plus a function, or a reference to it, to map source values into target ones. - Manifestation type. Any view of a document can have several different manifestations, i.e. formats in which the document can be disseminated. For example, a conference paper can be disseminated both as a Postscript file and as a PDF file; the video of its presentation at the conference can be disseminated as MPEG and as AVI files. Manifestations can be physically stored within the Repository, or be handled by other specialized services. - Derived manifestation type. The Repository service can automatically derive manifestation types from others. For example, it can be configured to generate a PDF manifestation each time a Postscript is submitted. This automatic generation is executed by using appropriate procedures pre-loaded by the service. Other derivations can easily be added by specifying the source and target manifestation type, plus a reference to an internal procedure or to an external program. The Repository service can be customized by specifying the value of a number of configuration parameters. Some of these correspond to the content configuration dimensions listed above. Others specify the values for variation features, such as the security and the preservation policies that allow additional customization of the document and metadata handling functions. The values assigned to the parameters are constrained by consistency rules that establish both the legal configurations of each single instance, (e.g. derived metadata formats must belong to the set of metadata formats supported), and the legal configurations of the whole group of instances, (e.g. a publishing institution cannot be associated with more than one instance). By changing the value of the above parameters and by exploiting the flexibility of the document model, the Repository Service can adapt its behaviour to many different situations.
OpenDLib is a software toolkit that can be used to create a digital library easily, according to the requirements of a given user community, by instantiating the software appropriately and then either loading or harvesting the content to be managed. OpenDLib consists of a federation of services that implement the digital library functionality making few assumptions about the nature of the documents to be stored and disseminated. If necessary, the system can be extended with other services to meet particular needs. OpenDLib was built as a distributed digital library, according to the notion of individually defined services located anywhere on the Internet. When combined, these services constitute a digital library. The functionality of the OpenDLib digital library includes the storage of and access to multimedia and multilingual resources, cross-language search, browsing, and user registration. A service can be distributed over different servers, replicated, or if necessary centralized. Although the presence of multiple instances of a service increases fault tolerance, reduces the overload of each instance, and makes it possible to dynamically reorganize the environment when a server hosting a service instance is not reachable, the replication and distribution of the services is not mandatory and therefore each of supported services can be instantiated as a single instance. This means that the level of distribution and replication, and the physical location of the service instances may be freely chosen to better satisfy the needs of the specific digital library context. The federation is instantiated when the digital library is created and can change dynamically during the digital library lifetime. For example, a new server may be added or removed; a service instance can change its role from master to slave, etc. The OpenDLib model describes the OpenDLib digital library architecture. An OpenDLib digital library architecture model uses three concepts: Service, Server, and Region. A Server is a network device that is able to provide services to the network users by managing shared resources. It can host different service instance types. A Region is an abstract notion which stands for a dynamic set of service instances which cover the complete functionality of the digital library and which represent the optimal choice with respect to some set of optimization criteria, e.g. their mutual connectivity. A region thus consists of the entire set of centralized and distributed service instances and a set of instances of the replicated services, one for each service type. For each region, the set of replicated service instances changes over time in order to always implement the best choice, e.g. according to the state of the connection. Any configuration that agrees with this model is a legal configuration of an OpenDLib digital library system, i.e. an OpenDLib digital library. For each region, more than one alternative of the same service type can be indicated. A priority value is associated with each pair (region, service instance) as a measure of the quality of participation of the service in the region with respect to the set of optimization criteria selected. The service with the highest priority belongs to the region. Note that the same service instance can belong to, and be an alternative in, more than one region. This means that the number of replications can be chosen freely and is not constrained by the number of established regions. The OpenDLib Architecture entity models digital library architecture created by instantiating the OpenDLib system. Each digital library has a name and participates in three relationships, which specify the digital library functionality and distribution. The relation HasService expresses the composition of the federation, i.e. the service instances that working together provide the digital library functionality; HasRegion models the organization of the instances into a cluster of services that satisfy optimization criteria; HasMeta identifies the services that control of the architecture and are responsible over time for its consistency. Note that the only constraint on the number of participants in a relationship is that they must be sufficient to cover the digital library functionality. This means that OpenDLib digital library architecture is completely flexible in terms of the number of instances, regions and servers. These can be freely chosen when the digital library is created and they can be modified dynamically during the digital library lifetime. Note also that the part of the OpenDLib model does not impose any constraint on the type of services implemented by the federation. In presenting the model we have assumed the service types implemented by the current version of OpenDLib, however other choices are equally supported.
The Personalised Information Dissemination server (PIDS) is a component of the SCHOLNET set of services. This server automatically notifies a user when a new document, matching the user's interests (called topics), is available in the SCHOLNET digital library. Each topic is defined in terms of a list of user free-chosen keywords, to which a document should be relevant. This server has been implemented by relying partially on a set of modules developed by the EUROgatherer project (IE-8011).
The Hypermedia Annotation Service integrates annotation and reference linking features into the digital library infrastructure. It stores annotations on documents and makes them available to authorised users. The Hypermedia Annotation Service is an autonomous service that can be distributed on multiple servers because an Annotation server can store annotations on documents published by different authorities that are stored in different repositories. The Hypermedia Annotation Service is based on the Semantic Index System (SIS) developed by FORTH. The Semantic Index System (SIS) is a tool for describing and documenting large evolving varieties of highly interrelated data, concepts and complex relationships, as opposed to large homogeneous populations in fixed formats (handled by traditional DBMS). The communication between the Hypermedia Annotation Service and Scholnet takes place via the Open Library Protocol. The Hypermedia Annotation Service provides the capability to view and/or update the Annotation Server SIS database through XML documents. The Hypermedia Annotation Service handles two types of objects: annotations and annotation metadata. The annotation repository stores only the annotations while the annotation metadata are the result of a mapping between the annotation record and OLAP.

Exploitable results

The result of the project is a digital library testbed for networked scholarly communities. In addition to traditional digital library services, support is provided for non-textual data types, hypermedia annotation, cross-language search and retrieval, and personalised information dissemination. The testbed enables the immediate dissemination and accessibility of technical documentation (and the underlying ideas) within a globally distributed multilingual community. It contributes to the creation and diffusion of a new model for scholarly production and use by providing functionality to permit annotation on digital objects in any format by authorised users, to support personalised information dissemination, and to access federated repositories of related material. The enhanced digital library infrastructure produces significant benefits for a scholarly community, providing it with additional credibility and visibility and encouraging its expansion. Basic service provided by an existing European digital library for grey literature (ETRDL) are extended with tools that implement a new set of services to handle multimedia digital objects and to provide a collaborative working environment. ETRDL was implemented using the Dienst technology, developed by a US Consortium led by Cornell University. Dienst version 4.1.9 provided functionality for archiving, access, discovery and browsing. ETRDL added capability for on-line document submission/withdrawal, subject classification, multiple language indexing and search, and on-line administration. A new digital library infrastructure which is based on Dienst is created in the project. It enhances the system openness and provides a powerful document model able to support multimedia documents, extending the functionality provided by the following services: - Multimedia data service. The project employs current multimedia database technology in order to integrate a multimedia support server into the ETRDL architecture. This server will manage archives of multimedia documents, namely video documents; - Hypermedia annotation service. This service supports the maintenance of collective knowledge and collaborative work; - Multilingual information access service. This service enables monolingual access, search and retrieval in all languages used by the community. Two simple cross-language mechanisms will also be introduced: controlled vocabulary searching using a multilingual thesaurus, and free-text searching; - Personalised information dissemination service. This service is implemented developed using software modules developed by the EUROgatherer Project (IE-8011) for user personalisation and information pushing. The complete system is evaluated by two large IT scientific communities.

Searching for OpenAIRE data...

There was an error trying to search data from OpenAIRE

No results available