With the objectives defined, Chapter 5 describes what the CERIF 2000 data model should be. The format definition of the model is included as an annex. Some downloadable examples of the data model in a selection of Database Management Systems are available in the CERIF 2000 toolkit on this Home page.
At this moment, Chapter 5 is available both in Word format and HTML:
A detailed description of the full CRIS Data Model is available in Annex 2 of the final CERIF 2000 report which is downloadable from here:
The major design objectives for the CERIF 2000 data model are to provide:
As with all information systems these objectives are not 100% compatible in terms of a single data model. For example, a data model covering all existing CRIS database structures would not be a very practical model for data exchange, as not all CRIS would be interested in every piece of data from every other CRIS.
The approach therefore in meeting the design objectives is threefold:
The relationship between these models is shown in the following diagram.
The full CRIS data model is (in terms of entities, attributes and relationships)a proper superset of CRIS B and CRIS C, but intersects with CRIS A.
Both the exchange model and the metadata model are proper subsets of the full model.
The metadata model is a proper subset of the full model and also of the exchange model.
This means that:
In the following, the data models are explained in a series of diagrams. A full description of attribute types and lengths is provided for the full data model in Annex2.
The full CRIS model is presented here at a number (5) of levels of abstraction to aid understanding of the model, as it is rather complex in its entirety.
The model consists of three major entities: projects, persons and organisational units. The relationships between them are many-to-many and controlled by role and time-interval. This means that a project may have relationships with many persons and organisational units. Similarly, a person may have relationships with many projects and many organisational units. Organisational unit has similar flexibility in relationships.
Furthermore, there can be relationships between instances of persons, between instances of projects and between instances of organisational units – again controlled by role and time-interval. This means that the relationship between person A in role "project manager" and person B in role "programmer" can be captured and with its relevant start and end dates. Similarly the relationships between project X and sub-projects Y and Z, or between project X and associated project W can be captured. In the case of organisational units, the relationship between department and group, for example, can be represented. This provides both formality and flexibility for almost any kind of CRIS. At a deeper level:
The data model is multi-lingual throughout.
The data model forms a template from which CRIS implementations may choose the entities and attributes required for their purpose. Any particular CRIS is likely to be a subset of the Ideal model; some specialised R&D Databases may have additional entities and attributes but a large proportion of the data model should intersect with the Ideal CRIS.
A simplified view of the major entities and their relationships is shown in this picture
In this chapter all five levels of abstraction are only shown for the entity Person as an example to illustrate the concept. Other entities are not covered in these examples.
With a "level" is meant: the level of detail, the level of abstraction of the data model.
The illustration of the data model concept below starts at level 1 showing the relationships of the major entities and then builds on lower levels of abstraction to show increasing levels of detail of other entities associated with the main ones. At the lowest level, level 5, the linking entities representing the many-to-many relationships between entities are shown. A complete description of attribute types and lengths for all entities is provided for the full data model in the Annexes.
The first diagram below represents the top level of the model in Entity Relationship format. Only the primary or most important entities are shown (the main things that the database deals with). These are Person, OrgUnit (Organisation Unit) and Project. Other entities (secondary entities)within the scope of CERIF 2000 (such as Results, Expertise and Equipment) are considered to be associated with these top-level entities rather than to be top-level entities in themselves.
The maximum possible relationships are assumed between entities (many-to-many in all cases). In practice, it is unlikely that all such relationships would be used in a particular CRIS.
Level 1 Entities is shown in this diagram
The level 2 of the model includes what are referred to as secondary entities. These are the main "associative" entities with the top-level entities. The following diagram shows only the entities associated with the Person entity as an example. The top-level entity is shown in green and the associated entities in light blue.
The Contact entity holds the address details for a person. Telephone and Electronic Address are shown as associated entities to allow for multiple phones and electronic addresses associated with one contact point. The Contact Entity is also linked to the top-level entity OrgUnit. This entity is likely to be of great importance to all CERIF users because it provides the data to allow contact to be made.
CV and Expertise Skills are entities which can be directly associated with entity "Person". "Result Patent", "Result Product" and "Result Publication" are results more strongly related to entity "Project" but entity "Person" is also associated with "results", typically with some kind of role which is picked up in the relationship table (e.g. author for a publication). Entity "Service" is something more directly associated with a role of being offered by entity "OrgUnit" but can also be associated with entity "Person" in a role (e.g. manager, technician).
Entity "Particular Equipment" and entity "General Facility" are more strongly associated with entity "Project" (equipment and facilities used by the project) and with "OrgUnit" (e.g.the institution owning the resource). Again, entity "Person" can have a role to play with entity "Particular Equipment" and entity "General Facility" (e.g. operator, manager). The adjectives particular and general are used with equipment and facility to distinguish between a project using a particular piece of equipment in contrast to the project having general access to a facility which might have many particular pieces of equipment.
The entity "Classification" is shown as a general subject indexing entity, based on controlled vocabularies. In other parts of the model particular subject indexes are specified for particular data items or entities but a CRIS may have its own additional controlled vocabularies for its own specific user needs. CERIF also provides recommendations on what Classifications should be used for subject indexing in particular subject areas. (See Chapter 6 ).
The entity "Classification" is shown in this diagram .
Level 3 is where the translated entities are introduced. CERIF should deal with many different language sources and users of many different languages. The entities that can appear in different languages are represented at this level. Translated entities are shown in yellow in this diagram .
For example, entity Event may have a name and a description that may appear in several languages. Separate entities are provided for the name and the description – typically an event would have a name in only one language but may have descriptions in several languages. The reader should be aware that only the translated entities for entities immediately associated with Person are shown in the diagram to clarify the presentation. The reader should be aware that not all the translated entities are shown in the diagram to clarify the presentation (thus Result Product, Result Publication etc. all have names and descriptions as translated entities).
Level 4 is where Lookup entities are introduced. Lookup entities usually contain a defined set of values. Against these values, the value of an attribute of an entity is checked to make sure it is a valid entry – for example codes for countries or currencies which are typically also the subject of international standards. Only those that appear as lookups for separate entities appear at this level. Lookups can also appear for attributes such as roles in link or relationship entities. These relationship entities are used to resolve many to many relationships between the other entities. These are dealt with at level 5 in this section.
CERIF also makes recommendations for what lists of values or Classification codes should be used for some entities or attributes. See Chapter 6 for a description of these.
In the accompanying diagram the lookup tables are presented as grey entities. Note that the translation tables (Level 4 entities) are left out for clarity purposes.
The Level 4 person Entity is shown in this diagram .
Level 5 deals with the "many-to-many" relationships between entities – the relationship entities are shown as colourless boxes in the following diagrams. This allows the introduction of detail of what role is played by an entity in terms of its relationship to another entity. It may capture further details of that role.
For example, entity Person to entity Project is a many-to-many relationship. In the entity representing this relationship one could introduce the type of role the person played in the project (e.g. project leader) and the period for which he/she carried out that role.
Due to the large number of relationships in the model, the representation here is further broken down. We present three diagrams:
The first diagram – named Primary Links – shows the relationships with primary entities (Person, Project, and OrgUnit) as well as the recursive relationship of the entity with itself.
The second diagram – named Characteristics – shows the relationships associated with the characteristics of the person .
The third diagram – named Results – shows the relationships associated with the results of the project, services, facilities and equipment and events.
The relationship between entity Person and entity Service (Pers_Service) shows the full range of the possible characteristics of a person in relation to a service (role, price, conditions, availability, start and end dates). This structure is repeated in other relationships where the possibility of commercial transactions arises (for example entity OrgUnit_Service, or OrgUnit_Product).
As stated under 5.1, there is an infinity of exchange data models negotiated as a transfer between the CRIS provider and the requesting end-user. Most exchange data models will have as common information entity Contact and probably at least one of the entities Project, Person, OrgUnit.
The CERIF Exchange Model is shown in the following diagram .
Some commonly required groupings of entities and relationships for exchange have been identified. These are:
The preceding diagram delineates these four examples, and demonstrates their relationships with the entities Contact, Person, Project, OrgUnit.
Implementation of such an exchange data model requires choosing appropriate entities, attributes and relationships from the Full CRIS Model as specified in Annex2.
The report does not contain a formal data model definition for each exchange model. The formal specification of any exchange data model is to be extracted from the full CRIS data model as specified in Annex 2.
As a detailed example, expertise is taken and described at all levels in the model in a separate diagram. The other examples are described in the following sections.
This includes the entities Skills and CV; these are attached to Person. The data model provides only a unique reference to CV, assuming it exists and is maintained elsewhere. Entity Skills allows for a name and description of each skill in one or more chosen languages.
The Expertise Exchange is shown in the following diagram .
The major entities are Result Patent, Result Publication and Result Project. There is the implicit assumption that external databases exist with detail of entities Patent and Publication with the possibly of more detail for Product. For Publication, CERIF provides only reference via unique identifier to the appropriate records in those databases, with minimal additional attributes. The Patent entity assumes an external patent database but introduces it with Title, Abstract and their languages. The Product entity includes a full description so that its potential utility to the end- user is identifiable.
This example data model includes the technological and special skilled support for R&D. Some organisations do not distinguish these three entities whereas others have a clear need to treat them differently for management purposes.
This data model example consists solely of the entity Funding Programme with relationships to OrgUnit (Funding Organisation) and Project. It also has a recursive relationship to allow for sub-programmes to be referenced within a programme.
To obtain appropriate exchange data models from the full CRIS, it is necessary for the end-user to specify the conditions for the negotiated data exchange. In its simplest form, this specification involves selection of appropriate instances (records) from the full CRIS and appropriate entities, attributes and relationships to meet the end-user request.
The CERIF Metadata Model is shown in the following diagram .
This selection requires some uniform summary representation of all full CRIS instances records). This representation is formally associational descriptive content metadata; informally analogous to a library card catalogue.
The definition of the data model to describe this content metadata requires finding the minimal entities, attributes and relationships necessary to allow the end-user to find (retrieve) as precisely as possible the required information.
This metadata is used as if it were a catalogue to select instances (records) of interest across multiple heterogeneous distributed CRIS. Each CRIS then exports in an exchange data model format the requested information, identified by unique identifier link between the metadata instance (record) and the CRIS database instance (record).
The CERIF metadata model is shown in the preceding diagram. The entities, attributes etc. are a proper subset of the full CERIF data model. Entities ProjectAdditional, OrgUnitAdditional and PersonAdditional are introduced. These signal the presence of relationships in the Exchange of Full CRIS models by mapping them as attributes in an additional entity attached to a prime entity. Thus ProjectAdditional is the additional entity attached to base entity Project which contains a set of 1 character binary-valued flags indicating that a relationship is instantiated in the exchange or ideal CRIS data model or not (hence [Y | N]). For example assume the metadata values for a particular project in ProjectAdditional are: Result_Publication: Y, Result_Patent: N, Result_Product: N, Facility: Y, Equipment: Y, Service N. This means that for this project, one may find one or more instances of publication information in either or both of exchange data file (if exchange data is requested) and the full CRIS database, there is no information on Patents, Products, there is information (one or more instances) on facility and equipment and no information on service.
The metadata model is formally defined in Annex 3 .