CORDIS Archive

View the original page arrowbar Legal Noticebar Print the page
This page has been archived. It will no longer be updated.
 

6.1 Introduction
6.2 R&D contents related subject indexing
6.3 Indexing guidelines for other data elements

6.1 Introduction

The subject indexing guidelines apply both to the subject contents of the records and to all data elements for which a controlled value list is applicable.

The use of controlled vocabulary gives advantages to information available on the Internet to the extent that it:

  • helps with browsing;
  • enables the broadening and narrowing of searches;
  • gives context to search terms being used;
  • simplifies multilingual access to collections of information;
  • and allows the partitioning and manipulation of large databases.

For the widest interoperability, more than one type of controlled vocabulary was analysed and is recommended for CERIF 2000.

Automatic indexing techniques (e.g. using a software program to derive "automatically" subject access points from a text) are still no substitute for the conventional use of classification schemes and/or thesauri. Appropriate subject indexing is required to avoid retrieval noise and is crucial in a multilingual environment.

TOP

6.2 R&D contents related subject indexing

6.2.1 Research Subject
6.2.2 Economic activity
6.2.3 Products by activity

The existing CERIF recommendations on subject indexing covered only research areas. CERIF 2000 aims at widening the scope to include results, organisations etc. and will therefore recommend three different types of subject related indexing tools (classification schemes or thesauri) for indexing on research subject and, where appropriate, on economic activity area (market application) and product type (product as a research result).

 

6.2.1 Research Subject

A comparison was made between various research classification schemes and thesauri for research subject indexing, which led to the selection of the "ORTELIUS Thesaurus1".This is an indexing and retrieval tool developed in close co-operation with the European Commission’s DG XXII, to support a European information system on higher education and training. The selection of ORTELIUS was conditioned upon guarantees for its updating management, its improvement of the terminological R&D orientation and a clear situation on user rights of ORTELIUS in a CERIF environment. The ORTELIUS management organisation has therefore negotiated with DG XIII these requirements. Provisions were also taken to provide "mapping" of more specialised subject research schemes in ORTELIUS higher-level classification terms.

 

6.2.1.1 Research Subject

The following widely used research classification schemes were compared:

  • CERIF 1991 science classification scheme;
  • ORTELIUS thesaurus. A browser (dated September 1998) and an alphabetical and a hierarchical list of terms are available via the CERIF Homepage (/cerif/)
  • International Patent Classification (IPC);
  • SIGLE (grey literature classification scheme);
  • SPINES (Science and Technology Policy Information Exchange System).
  • NABS (Nomenclature pour l’Analyse et la comparaison des Budgets et des programmes Scientifiques);
  • Universal Decimal Classification (UDC);
  • Dewey Decimal Classification (DDC);
  • Library of Congress Classification (LCC);

Eurodicautom, EUROVOC and OECD macrothesaurus were also considered mainly for their multi-linguality dimension.

Hierarchical structure, electronic availability, multi-linguality, reliability of the updating procedure, research domain coverage and international/European usage were considered as the most important evaluation criteria.

ORTELIUS gave good results for multilingual availability. ORTELIUS has also a wide coverage of research areas, due to its target use in higher education. The user interface that is offered by ORTELIUS has specific strength compared with schemes as UDC and DDC, as thesaurus-specific features have been added to the hierarchical structure. Related terms help the user to skip between related areas of research and the hierarchical structure gives the possibility to enter narrower or broader terms.

In specialised R&D domains, other research classification schemes for that domain may be used in far deeper detail than ORTELIUS. In such cases CERIF should support such classifications in order not to loose the subject indexing in existing research databases and mapping should be possible, at least to top level of broad research areas from the recommended CERIF subject indexing tool, Ortelius.

Useful background information was found in the Report of the DESIRE project (Development of a European Service for Information on Research and Education), mainly in the part on "Current use of classification schemes in existing search services". An executive summary can be found at:
http://www.ukoln.ac.uk/metadata/desire/classification/class_su.htm

6.2.1.2 Ortelius, the Data Base on Higher Education in Europe

In their conclusions of 25.11.1991, the Education Ministers of the European Communities, meeting within the Council, expressed their desire for a database on higher education to be created at a European level. Directorate General XXII of the EU Commission accepted this request and in February 1992 called for tenders and received answers from many European bodies. In view of the complexity of the mandate and the creative, organisational and economic effort involved, the EU Commission decided to offer an economic contribution equal to 50% of the cost of developing the project, for a period of three years. The Commission's presence was also a guarantee that the product would conform to Community standards, both in terms of infrastructures and technologies and in terms of contents and working methods.

In December 1993, after a careful assessment of the projects presented by the competitors applying for the tender, the creation of the data base was entrusted to an Italian Consortium formed of public institutions and private organisations. The Consortium comprises the University of Florence, the Education Documentation Library (BDP), the Olivetti computer company and the Giunti Multimedia publishing company.

At present, Ortelius is run only by the following two public institutions: the University of Florence and the Education Documentation Library (BDP).

The project was named Ortelius, to emphasise that its ideals are close to the spirit of the XVIth century Flemish cartographer who was the first to set down the geographical information of his time in a modern atlas.

The main objectives of Ortelius are to supply information on higher institutions and study programmes in Europe and thus to help promote flow of information, co-operation among institutions and mobility of students.

How is its structure organised? It is a network that involves the active participation - both in collecting and in updating the information - of national authorities and universities in the 15 member states of the EU.

Thus the password is quality and concerted action.

For further information please contact:

Consorzio Ortelius
Offices: Via dell'Agnolo, 87
50122 Firenze
Tel. (+39)055/2380418
Fax (+39)055/2341516
E-mail: ortelius@bdp.it

6.2.1.3 Ortelius Thesaurus for indexing of higher education and training subjects

The Ortelius Thesaurus has been developed to be used for indexing and retrieving information in the Ortelius Database, particularly when dealing with information on study programmes. Since the analogy between study areas and research areas in universities and other institutes for higher education, the CERIF 1991 Science Classification Scheme was taken as a starting point for the development of the Ortelius Thesaurus. In the first instance, the CERIF classification scheme was improved to cover all areas of higher education and training, and further developed and enriched towards a hierarchical structure up to 7 levels, including related terms and scope notes. Above all, the thesaurus contains the terms in the 11 official EU languages.

Enriching thesaurus for research and development

CERIF deals with research information. Investigation of the terms covered by Ortelius learnt that some areas of research and development are to be considered to be covered in more detail. As a test case, terms were gathered needed for the Fifth Framework Programme. Ortelius was modified to cover these areas in a detailed way. This exercise was a good experiment for future updating maintenance procedures.

Maintenance of the Thesaurus

Since the thesaurus is in use for applications on higher education and research, DG XIII-D prepared an agreement with DG XXII on co-operation in the updating management of the terms.

The CERIF working proposes an updating procedure by which at least once a year, but maybe more frequent, an update is done. Therefore, everyone with a need for changes, for additional terms or for scope notes and related terms should be given the opportunity to send their request to a central CERIF secretariat. At regular times, these requests can be investigated by a small group of experts, and proposals for modification can be formulated to the Ortelius management. This procedure includes proposals for the terms in the other languages.

Formats in which the Ortelius Consortium offers the Thesaurus (see proposal described in chapter 7)

The Ortelius Multilingual Thesaurus can be put available for browsing on the CERIF Homepage (/cerif/). The Thesaurus can be used by authorised users as an independent tool, and can be implemented in different Database Management Systems. As such, the Ortelius Thesaurus may be distributed on a CD-ROM in an international standard format, with import and export functions, so as to be adapted to any database operating environment for semi-automatical indexing and retrieval purposes.

Routines for printing may be added in the CD-ROM version of the Thesaurus.

6.2.2 Economic Activity

For the indication of "Economic Activity Area" the NACE2 has been chosen in line with the EUROSTAT guidelines. The 2-digit level is mandatory, the 4-digit level is recommended for CERIF 2000.

This is recommended for subject indexing in the General Classification entity in the CERIF 2000 model. It is also recommended for use with the Project Keywords entity.

6.2.3 Products by Activity

Also in line with the EUROSTAT guidelines, the NACE related product classification scheme CPA (Classification of Products by Activity) is recommended for products.

In the CERIF 2000 data model, this is used for subject indexing the Product Type entity. It is also recommended for use with the Project Keywords entity.

TOP

6.3 Indexing guidelines for other data elements

Controlled attribute value lists are required to index in a guided way lookup elements of the data model. The following proposes either an existing and agreed classification standard or a recommended "list of values" for certain lookup elements. The different sources used to find suitable typologies and lookup tables were:

  • the DESIRE project (Development of a European Service for Information on Research and Education);
  • the Canberra, Frascati and Oslo manuals (UNESCO and OECD);
  • UNIMARC Manual – Bibliographic Format 1994 (IFLA-UBCIM), and updates 1996 and 1998 (IFLA-UBCIM);
  • the Nomenclature and Classification lists included in the "Thésaurus de l’index général du système WWW de l’UCL";
  • the harmonised information package for calls for proposals under the Fifth Framework Programme;
  • and the proposals of the subgroup on statistics for the Fifth Framework Programme.

6.3.1 Language

The ISO 639 standard, two-letter code should be used.

This standard is applied to the entity Language in the CERIF 2000 data model.

6.3.2 Country

The ISO 3166 standard, two-letter code should be used.

This standard is applied to entity Country in the CERIF 2000 data model.

6.3.3 Currency

The three-letter ISO 4217 code (SWIFT code) should be used.

This is applied to the currency values in the CERIF 2000 data model.

6.3.4 Address

The NUTS (territorial units EU) codes should be used for regions in the EU.

This is applied to entity NUTS-Region in the CERIF 2000 data model.

6.3.5 Role of a person in an organisation

The ISCO3 should be used.

This is applied to the role attribute in the Person-OrgUnit entity of the CERIF 2000 data model.

6.3.6 Qualification of a person

The ISCED4 should be used.

This is applied to the entity Qualification of the CERIF 2000 data model.

6.3.7 Organisation/Company size

The values from the harmonised information package for Fifth Framework Programme should be used.

  • S1 : 0
  • S2 : 1-9
  • S3a : 10-49
  • S3b : 50-249
  • S4 : 250-499
  • S5 : 500-1999
  • S6 : 2000+ employees

Note: An SME (small and medium-sized enterprise) is defined as an entity that has less than 250 full time equivalent employees, and has an annual turnover not exceeding EUR 40 million, or an annual balance sheet total not exceeding EUR 27 million, and is not owned by 25% or more by a non-SME.

These values are applied to entity attribute Org_Headcount in entity OrgUnit in CERIF 2000 data model.

6.3.8 Type and status of a patent

The IPC5 should be used.

This is applied to Entity Patent Type in CERIF 2000 data model.

6.3.9 Type of publication

The UNIMARC Manual – Bibliographic Format 1994 and the updates from 1996 and 1998 should be used.

This is applied to entity Publication Type in CERIF 2000 data model.

6.3.10 Type of event

The following list of values should be used:

  • Conference
  • Cultural event
  • Exhibition
  • Political event
  • Sport event
  • Trade fair
  • Workshop

This is applied to the entity Event Type of the CERIF 2000 data model.

6.3.11 Type of multimedia item

Use the UNIMARC Manual – Bibliographic Format 1994 (IFLA-UBCIM), and updates from 1996 and 1998 (fields 115, 125-128) (IFLA-UBCOM).

This is applied to entity Multimedia type in CERIF 2000 data model.

6.3.12 Role of an organisation in a project

The following definitions from the harmonised information package for Fifth Framework programme should be used:

  • CO: Co-ordinator (=scientific, administrative and financial co-ordinator)
  • CF: Only financial co-ordinator (if different from co-ordinator)
  • AC: Associate contractor
  • CR: Contractor (other than the co-ordinator)

This is applied to attribute Proj_Org_Role of entity Proj_Org in CERIF 2000 data model.

.6.3.13 Role of a person in a publication

Use the UNIMARC Manual – Bibliographic Format 1994 (IFLA-UBCIM), and the annex C from the updates from 1996 and 1998.

This is applied to attribute Pers_Pub_Role of entity Pers_Pub in the CERIF 2000 data model.

6.3.14 Role of a person as an expert

Use the following list of values for linking a person to expertise/skills:

  • Consultant
  • Evaluator
  • Referee
  • Reviewer

Applied to attribute Pers_Exp_Role of entity Pers_Expert_Skill in CERIF 2000 Data Model.

6.3.15 Type of organisation

Use the values from the harmonised information package of the Fifth Framework Programme:

  • BES = enterprise sector including SMEs and individual consultants
  • HES = higher education establishments
  • RPR = private/commercial research centres including SMEs
  • RPN = private non-profit research centres
  • RPU = public research centres
  • JRC = joint research centre
  • PUS = non-research public sector
  • PNP = non-research private non-profit (? Sector)
  • INO = international organisations
  • OTH = others

This is applied to the attribute Org_Type_Full of entity OrgUnit_Type in the CERIF 2000 Data Model.

6.3.16 Role of a person related to equipment

Use the following list of values:

  • Contact person
  • Maintenance technician
  • Operator/technician

It is applied to attribute Pers_Equip_Role of entity Pers_Equip in the CERIF 2000 data model.

6.3.17 Role of an organisation related to a product

The following list of values should be used:

  • Ownership
  • Franchise
  • License
  • Purchase

This is applied to attribute Org_Prod_Role of OrgUnit_Product in the CERIF 2000 data model.

6.3.18 Type of equipment

Use class 6 of UDC6.

This is applied to entity Equipment Type of the CERIF 2000 data model.

TOP

Reference material on the Recommendation on Subject indexing within the CRIS Data Model

  • DESIRE: Development of a European Service for Information on Research and Education - report, bibliography and references.
    http://www.ub2.lu.se/desire/radar/reports
  • Essai de thésaurus de l’index général du système WWW de l’UCL.
    http://www.sri.ucl.ac.be/thesaurus.html
  • Manuals for indexing/INIS-IAEA (International Atomic Agency)
  • Proposals from the Management co-ordination committee and the sub-group on statistics for the Fifth Framework Programme of the European Commission on the harmonisation of information packages for calls for proposals and for statistics.
  • The measurement of scientific and technological activities: proposed standard practice for surveys of research and experimental development - Frascati manual/Organisation for Economic Co-operation and Development. – Paris, 1994. – 261 p.
  • The measurement of scientific and technological activities: manual on the measurement of human resources devoted to S&T: "Canberra manual"/Organisation for Economic Co-operation and Development, Statistical Office of the European Communities. – Paris, 1995.
  • The measurement of scientific and technological activities: proposed guidelines for collecting and interpreting technological innovation data: Oslo manual/Organisation for Economic Co-operation and Development, Statistical Office of the European Communities. – Paris, 1997. – 122 p.
  • Thesaurus Guide: analytical directory of selected vocabularies for information retrieval/prepared by Gesellschaft für Information und Dokumentation (GID) for the Commission of the European Communities. – Amsterdam, Luxembourg: North-Holland, Office for Official Publications of the European Communities, 1985. – 749 p.
  • "Towards harmonisation of databases on research in progress – Final report of the European Working Group on Research Databases", November 1988. Published by the Liaison Committee of Rectors’ Conferences of Member States of the European Communities and Directorate General for Science, Research and Development of the Commission of the European Communities; financed by the Commission of the EC, contract PSS*0058/B, compiled by Dr. L. Van Woensel.
  • Recommendation to the Member States to use the CERIF format;
    In : Official Journal of the European Communities, OJ L 189, 13th July 1991.

TOP

 

 
 

About CERIF CERIF 2000 recommandations CERIF 2000 tool kit CERIF 2000 Assistance CERIF 2000 maintenance and feedback CERIF Reference Material Links to CRIS CORDIS Home page CERIF Assistance page CERIF copyright/disclaimer CERIF glossary CERIF Home page CORDIS Home page CORDIS Home page CERIF Home page