CORDIS Archive

View the original page arrowbar Legal Noticebar Print the page
This page has been archived. It will no longer be updated.
 

7.1 Data Model
7.2 Subject Indexing
7.3 Recommendations for the use of CERIF 2000 guidelines
7.4 Implementation Scenarios

7.1 Data Model

The major design objectives for the CERIF 2000 data model were:

  • A full CRIS data model with flexibility to allow the majority of existing CRIS to accommodate their own database structures;
  • A base framework for data exchange.

These objectives have been met by:

  • Defining a full CRIS data model which will cover the database structures of the majority of existing CRIS;
  • Defining a set of data models that could provide examples for data exchange (since there are an infinity of possible exchange data models between CRIS). These examples of data models also illustrate that it is not necessary to implement the full CRIS data model if the requirement is for only a particular subset;
  • Defining a metadata data model to provide a uniform summary-level view over heterogeneous information sources.

TOP

7.2 Subject Indexing

Subject indexing is essential in the case of access to multiple data sets from multiple sources and in multiple languages. This is the case for the CERIF 2000 environment. Specific thesauri and classifications have been recommended for the areas of research subject (Ortelius), economic activity (NACE) and products (CPA). Other indexing guidelines have been given for controlled value lists for specific attributes.

7.2.1 Ortelius Thesaurus

The Ortelius Consortium made the following proposal to the Commission for the use of the Ortelius Thesaurus as the main R&D subject indexing scheme for CERIF 2000.

"The Ortelius Consortium agrees to undertake the following actions:

  • To make the Ortelius Multilingual Thesaurus available via the CERIF Homepage (/cerif/) for browsing purposes.
  • To distribute the Multilingual Thesaurus on a CD-ROM to authorised users. Authorised users will be identified by DG XIII-D, based upon their involvement in CERIF. The Thesaurus will have an international standard format, with export and import functions into any database for indexing and retrieval purposes. Technical specifications will be included, as well as routines for printing the Thesaurus. The CD-ROM will be updated and distributed every 4 months;
  • To provide continuous management of the Thesaurus, according to the ISO guidelines for the establishment and development of a multilingual Thesaurus (ISO 5964). All terms or modifications proposed by the users through the CERIF user service via DG XIII-D will be considered for introduction into the hierarchical structure of the Thesaurus. The translation of the terms into the various languages will be a joint task.
  • The Ortelius Consortium proposes that the Commission provides them with a financial contribution to support the costs of developing the CD-ROM, updating and distributing it, and to recover (partially) the costs of the staff involved in the management activities."

Contact person :

Fiora Imberciadori
Consorzio Ortelius
Via dell'Agnolo, 87
IT-50122 Firenze
Tel. +39-055-2380418
Fax. +39-055-2341516
Email: ortelius@bdp.it

TOP

7.3 Recommendations for the use of CERIF 2000 guidelines

The following recommendations are made for the use of the CERIF 2000 findings and deliverables:
  • New CRIS should utilise appropriate components from the full CRIS Data Model. Existing CRIS should consider a development path to intersect with this data model;
  • New and existing CRIS should provide utility software, to export schema and data instances, with structure and content at least as rich as the Exchange Data model;
  • New CRIS should adopt the classification standards recommended herein. Existing CRIS should consider a development path to intersect with these standards;
  • The ERGO Working Group should consider a project to provide a single access point enabling CRIS users to reach CERIF 2000 compliant research information systems.

TOP

7.4 Implementation Scenarios

7.4.1 Introduction
7.4.2 Local CERIF implementation
7.4.3 Integrated implementation

7.4.1 Introduction

CERIF 2000 is intended to be "implementation independent". A discussion on implementation scenarios is included here to help the reader to visualise how it could be put into practical use.

Two extreme scenarios can be considered:

  • The most simple case - Local implementation of CERIF 2000 to create a research information system;
  • The most sophisticated case - Integrated implementation of CERIF 2000 for different sources accessible via a WWW-based Catalogue Metadata model, including rules for access rights etc..

7.4.2 Local CERIF implementation

Research information providers can apply/implement CERIF 2000 in a flexible way, meeting their own needs. They should use appropriate components from the full CRIS Data Model and should provide utility software, to export schema and data instances, with structure and content at least as rich as the Exchange Data model.

7.4.3 Integrated implementation

To provide one single access point allowing the CRIS user access to distributed research information sources, one could consider a web implementation of the Catalogue Metadata model.

It is possible for the metadata and for the data to be in many different kinds of system. For general undisciplined web resources, it is becoming increasingly common for metadata to be in RDF/XML pointing to data in html. However, most of the data on the web is in databases, the html is ephemeral, and the metadata is also stored in databases - in the form of data elements that could be converted to XML if required; - relational systems are particularly suitable for this. This means the metadata and data should be modelled correctly and then implemented in the most suitable systems environment. The relative strengths and weaknesses of the different types of implementation system environments (Relational, Object Oriented (OO), Information Retrieval (IR) and RDF/XML) are discussed in the following section. Then the practicalities of metadata extraction are reviewed before a discussion on how it might be implemented in ERGO, in RDF/XML and in an IR environment.

7.4.3.1 Systems Environments

  Relational O-O IR RDF/XML
Data structure Totally flexible Bound to methods in encapsulated object Inflexible Flexible (binary relational)
Data types Rich on basic types Rich on basic and constructed type Poor Rich on basic types
Schema Logical level relatively rich but still lacks international data Rather poor and requires programming definition Very simple and incomplete DTD like SGML - not as rich as relational
Data quality Schema plus domain constraints good if methods programmed can be good Poor can define some constraints to control quality
Query language SQL (standard) or programming language Programming language or SQL (becoming standard) Specific IR languages - but good for free text search with boolean connectors Some being developed currently prototypes
Transferable data to SGML Good - DTD and documents OK if programmed Requires a lot of programming easy if DTD
Transferable data to XML Good - DTD and documents OK if programmed Requires a lot of programming Already there
Performance structured data Very good Good if methods programmed Poor Not as good as relational or O-O
Performance free text Poorer as attribute length increases Depends on methods programmed Very good - complete inverted index Better than relational or O-O

From the above table it is clear that Relational has advantages except for large blocks of free text - whether handling data or metadata..

7.4.3.2 How to get the metadata?

Schema metadata has to be constructed by the database administrator, although in advanced heterogeneous systems it is possible by schema reconciliation to generate new schemas. A similar situation applies for metadata for security, access or charging purposes.

Content metadata is different; instances are subsets of the real data - instance by instance - a catalogue. This can be generated by projection from the data instances - but this requires the content metadata to be a proper subset of the export or exchange schema (what the Database system is willing to expose to the outside world). This is the basis for the three layer model of CERIF 2000.

7.4.3.3 ERGO

ERGO has already implemented a metadata-like catalogue for several European research databases. It is recommended that ERGO should consider a project to provide a single access point for the CRIS users to reach CERIF 2000 compliant research information systems.

A scenario for implementing "ERGO 2" architecture could be as follows:

(a) Web query form accessing content metadata database (like ERGO pilot). There is advantage in using RDBMS 1 technology to allow realised XML metadata pages to be generated;

(b)A multimedia web document (xml or html) with query answers to be sent back to the end-user over the web;

(c) Connection between the metadata system and multiple heterogeneous systems with:

  1. Metadata system: hit list, sorted by country / host;
  2. At each host, software to accept hit list relevant to that host, query and deliver export schema compliant to instance records

7.4.3.3 RDF/XML metadata

Exactly as referred under 7.3.3.3, but the metadata instances are generated from metadata database records as XML when seen by the end-user. For retrieval speed they are stored / accessed as database records. One could imagine authored RDF/XML metadata and authored html pages one per project or other entity but it would take a lot of time and effort.

7.4.3.5 Information Retrieval Systems for metadata and / or data

Information Retrieval (IR) systems can be used to handle either data or metadata - especially when free text retrieval over long attributes is important, and especially with advanced Boolean query capability. However, IR systems cannot handle complexly structured data and are very inflexible to structural change.

TOP

 

 
 

About CERIF CERIF 2000 recommandations CERIF 2000 tool kit CERIF 2000 Assistance CERIF 2000 maintenance and feedback CERIF Reference Material Links to CRIS CORDIS Home page CERIF Assistance page CERIF copyright/disclaimer CERIF glossary CERIF Home page CORDIS Home page CORDIS Home page CERIF Home page