CORDIS Archive

View the original page arrowbar Legal Noticebar Print the page
This page has been archived. It will no longer be updated.




















Download the PowerPoint presentation.

B1. Information collection

Suggestions and practical examples on how to improve quality and quantity of data

Jostein Hauge: (University of Bergen)

  • references are not enough
  • CRIS is primary management tool
  • data collection at primary level
  • a receipt for involvement of the user

Elmar Schaluck (University of Bochum)

  • ELFI: electronic research funding information
  • pilot project based on 3 components
    • automatic periodical extraction
    • categorisation of structured info
    • personalised dissemination

Bethan Hubbard (Cartermill International)

  • Data collection through the web for the Cartermill DBs - BEST,CRIB,CRIS
  • cost effectiveness
  • benefits

Primary data collection and automation is the key!

B2. Standards related to CRIS

How to exchange or access CRIS information from multiple sources

Lieve Van Woensel (EC: DGXIII)

  • historical overview and present scope of CERIF: to cover different types of CRIS; review the classification tools; take account of technical evolution
  • approach: metadata structure based on Internet access to individual CRISs

Irma Pasanen-Tuomainen (Helsinki University of Technology)

  • experience of the use of CERIF at Helsinki University of Technology
  • national project - different forms of data collection
  • major problems: diacritics, financial fields, classification scheme, free keywords

Bethan Hubbard (Cartermill International)

  • EuroCRIS code of good practice
  • Topics: types of DBs, identification of users, definition of contents, structure, classification & indexing, search & navigation, acquisition & processing, quality control, publicity / distribution, marketing

Now some CERIF experience; better new version available 1999

B3. Indexing and multilinguism

How to extract useful knowledge from increasingly large and complex datasets

Gian Piero Zarri (CNRS)

  • add conceptual annotation in NL to web objects to increase accessibility
  • automatic translation into NKRL (narrative knowledge representation language)
  • translation used for retrieval

Jean Moscarola (Le Spinx Developpment)

  • commercial available lexical system (LEXICA)
  • extract from web and use lexical reduction to focus on significant content
  • statistical analysis to enrich understanding

Use of semantic knowledge annotation or syntactic lexical reduction to improve access to information

B4. Integration of CRISs

Features for integration of different data sources in various environments

Ritva Hagelin (University of Helsinki)

  • continuous decentralized CERIF-comptible update by researchers
  • 3 ways of searching
  • several single DBs linked together, email contact facilities

Hubert Pampouille (INRA)

  • Information system for agriculture
  • integrated set of data: research unit, teams, projects, results
  • static HTML pages in EN and FR
  • full text in electronic format, hardcopy on demand

Peter Baur (EC DGXIII)

  • PROSOMA Esprit; bridge gap research <--> market place
  • data collected in predefined electronic formats
  • owners pay for preparation ofhe material

The value of several DBs together is greater than the sum of the parts

B5. Cooperation between CRISs

Facilitate access to different heterogeneous distributed databases

Peter Finch (EC DGXIII)

  • ERGO project: facilitate access to existing CRISs
  • CERIF export format, catalog model
  • Pilot: single catalog on CORDIS for 30 project DBs on 24 hosts; launch 07/98

Wolfgang Adamczak (University of Kassel)

  • CRIS will no longer be a conversational DB but a search engine
  • Don’t need CRISs - the Web is enough!

Jostein Hauge (University of Bergen)

  • European platform for current research database producers (EuroCRIS)
  • informal forum open to representatives of CRISs
  • 30 members, 17 countries

Technical cooperation among people and their information is the KERNEL of the nut!





The three papers describe practical examples and suggestions on how to develop data acquisition procedures and improve both quality and quantity of collected data.

B.1.1 - Jostein Hauge
"The proof of the pudding is in the eating"

Starting from the example of the Research Documentation Unit of the University of Bergen Jostein Hauge discusses ways and means to secure the participation of researchers when documenting R&D at institutional level.

Three main points arise from the presentation:

  • people are not satisfied by references only
  • CRIS is becoming a primary management tool
  • Data should be collected at primary level (where the research is carried on)

Suggestions to obtain the maximum involvement by the researchers (the data providers):

  • easy access and use
  • duplication prohibited (do not ask twice for the same information)
  • rapid presentation of the results
  • high tolerance in data acquisition ("Came as you are")
ABSTRACT

B.1.2 - Elmar Schalück
"ELFI - Electronic Research Funding Information"

Elmar Schalück presents a 2 year pilot project for the implementation and management of a system for collecting and providing funding information.

The system is based on three main components:

  • automatic periodical extraction of interesting data from already existing data sources (each funding source in Germany)
  • categorisation of the extracted information in order to describe different kinds of objects (e.g. programmes, funding organisations, call for proposal, etc.).
  • personalised info dissemination based on a user profile which describes her/his main interests (e.g. agriculture, fellowships, SMI, etc.)

The categorisation activity is the key point of the system since it allows to produce "objects" instead of simple texts. It is mainly performed manually (i.e. by the people involved in the management of ELFI) to provide well structured description arranged by deadlines, research areas, eligibility criteria, etc.

The dissemination of information is a combination of "push and pull" technologies: when a new piece of information is found out and loaded as a new object in the system, the potentially interested users are alerted by E-mail and suggested to access the ELFI database.

ABSTRACT

B.1.3 - Bethan Hubbard (instead of Alan Harrison)
"The future of CRIS as a publishing initiative"

Describes an interactive data collection system based on Web technologies that has been developed by Cartermill International to allow researchers to directly supply and update in real-time their own information.

Cartermill’s Data collection includes three kinds of CRISs: BEST (people), CRIB (research projects) and CRES (Institutions) and involves the whole academic community in UK. For the CRES database data collection is extended world wide.

The system is a part of a commercial activity and must be cost-effective.

The benefits of the approach are twofold:

  • for the contributors: it is an easy and flexible way to provide data, data are rapidly visible, an on-line help is available;
  • for Cartermill International: it reduces the cost of processing, allows on-line procedure and saves time and storage;
  • for both: there is only one stop procedure to provide, collect and modify data.

Two important organisational aspects arise from the discussion: (1) frequent contacts between data management and providers are useful to encourage participation and (2) a validation phase is advisable in any case to check data quality.

ABSTRACT



B.2.1 - Lieve Van Woensel
"Common European Research Information Format (CERIF)"

In a short historical overview Lieve Van Woensel showed that the first CRIS and CRIS-related standards go back to the early seventies. The present version of CERIF is dated July ‘91 and was developed by a Group of Experts with the scope of providing a standard to exchange records on research projects and facilitate networking of research projects databases.

CERIF need now to be reviewed in order to:

  • widen its scope to other kind of CRIS;
  • update indexing and classification tools;
  • take in account the technological evolution.

A new Working Group has been set up recently to update CERIF. A lot of topics are under discussion on implementation aspects, multilinguism, user interface and database access.

The new standard will hopefully be available in 1999 and will:

  • follow metadata approach;
  • consider a wide range of type of research information (programmes, projects, groups, funding, results, expertise, etc.);
  • probably include a minimal obligation to use an updated version of the classification schema arranged in 30-35 basic domains;
  • provide recommendations for classification (research area, industrial area, ...)
  • establish procedures to update itself.

ABSTRACT

B.2.2 - Irma Pasanen-Tuomainen
"Experience with CERIF"

The paper describes the experience of the Helsinki University of Technology in the management of the national CRIS designed at the beginning of 1990s according a recommendation of the Ministry of Education. The CERIF Standard has been taken in account in the implementation of the Finnish CRIS.

The main weaknesses of CERIF are reported in the following areas:

  • financial data
  • classification schema
  • diacritics

The usage of free keywords produces some troubles in indexing documents but the problem is more general and not directly related to CERIF.

Several managerial issues have also been discussed like the importance of a help desk and of a data collection system that should be able to accept different kinds of formats and media (both paper and electronic forms).

The improvement of the retrieval and browsing procedures based on a common WWW interface is the main future development of the system.

ABSTRACT

B.2.3 - Bethan Hubbard
"Code of good practice for Current Research Information Systems"

The Code of Good Practice (CGP) is the result of a study carried out by a group of international experts experienced in development and maintenance CRISs and members of the EuroCRIS Platform. The project was managed by Cartermill International and funded by CORDIS.

The CGP has been developed as a guide for both new and existing producers of CRISs with the aim of increasing interoperability and harmonisation between CRISs. It covers the following main topics: types of CRISs and definition of contents, identification of users, structure and presentation, classification and indexing, search and navigation, data collection and processing, quality control, publishing / distribution, marketing.

The document is available on the Web at the CORDIS site.

The CGP is meant to be a living document, contributions and updates are welcome and can be addressed to the EuroCRIS Platform.

ABSTRACT



The two papers presented in this session are both related to the use and management of Natural Language.

B.3.1 - Gian Piero Zarri
"Indexing the Web using natural language techniques"

Starting from a project for a Web distance learning system, based on a central kernel which stores and retrieves WWW elements and an authoring system, Gian Piero Zarri illustrates a way for "intelligent" indexing and retrieval of multimedia objects.

The approach, applicable also to CRISs, consists in the possibility of adding to each object in a database a "conceptual" annotation, which describes, better than a list of keywords, its information contents. The annotation is indexed and used for retrieval.

Since the final "conceptual" annotation is difficult to prepare, "a middle ground" solution is suggested, based on the following steps:

  1. to associate to the object a simple text in natural language which describes the informational content of the object;
  2. to convert the text in natural language into a "conceptual" annotation in NKRL (Narrative Knowledge Representation Language) using tools already developed under EU financed projects.

ABSTRACT

B.3.2 - Jean Moscarola
"Textual data analysis: the LEXICA system"

A commercially available analytical tool for lexical analysis (the LEXICA system) is presented.

The package includes facilities for both lexical (generation of the lexicon, search of repeated segments, etc.) , syntax (lemmatisation) and statistical analysis.

The lexical approach can be applied to any sort of very large textual databases (market surveys, press articles, patents, etc.) to obtain a fast interpretation and knowledge of the contents.

Through a three step procedure:

  1. extraction of bulk data from Web or CD
  2. lexical reduction to focus on significant contents
  3. cross-analysis to enrich understanding

the system provides the user with a rapid foresight of the knowledge potential and reduces user "discouragement" by "allowing the treasure to be perceived before the safe is open".

ABSTRACT



This session covered the integration of different types of research information and specific applications

B.4.1. Ritva Hagelin - description of the Helsinki University CRIS-database system

Interesting points:
  • continuous decentralized updating by the researchers themselves
  • projects-database CERIF-compatible, including CERCS-classification
  • user interface in three languages: Finnish, Swedish, English
  • three different ways of searching offered: 1)Predefined searches on the web (by researchers name, department name, pull-down indexes, etc.), 2) Search form on the web, 3) link to CCL-search, i.e. Common command language for experienced searchers.
  • links to the various CRIS-databases, to email, to homepages of the departments,publications, full texts including maps, etc.
  • the TUHTI-database family is intended to serve both external as well as university internal purposes, e.g. one database will have information on other (than publications) scientific activities which give merit points to the researchers.

ABSTRACT

B.4.2 - Hubert Pampouille
Description of the information system of INRA, the French National Agricultural Research Institution

Interesting points:
  • access to the system from outside is to the laboratory level, care taken on confidentiality in access mechanisms
  • structure centered on projects (called activities), content is text fields which are submitted by the project leaders
  • input is static HTML-pages, keywords in English and French
  • minimum critical mass required of the project: three researchers working on the project
  • various helping mechanisms to guide in the navigation between laboratories and projects, i.e. index term counts to related subject areas
  • results: so far only publications database, in SGML format, continuous electronic publishing, hard copy publication ( as extraction from the SGML-coded database) as print-on-demand)
  • tool for searching for information, for document production as well as for research management.

ABSTRACT

B.4.3 - Peter Baur
Presentation of PROSOMA - Esprit

Interesting points:
  • purpose to "bridge the gap between research and the market place", i.e. to disseminate intelligence about results in interesting ways such as presenting results as possibility for further exploitation
  • should be seen as a tool to promote research results, i.e. an added-value service to result owners
  • the results owner is requested to submit data in specified forms,
  • PROSOMA will produce CD-ROMs, web access to constantly updated results, printed material
  • CD-ROMs and web database can include various kinds of multimedia presentations, interviews of researchers, etc.; the results owner pays, e.g. making of a trailer is about 1000-2000 ECU.
  • the PROSOMA system is funded by EU (Esprit money)
  • presently there is a link from PROSOMA to CORDIS, vision is to
  • integrate this and results from other EU research programs to CORDIS
  • PROSOMA presently intended for the use of intermediaries in different Member States for giving presentations, demos, etc. for promotional purposes.

Replying on a question about the cost of PROSOMA, Peter Baur added that the whole PROSOMA infrastructure had cost 4 MECU.

ABSTRACT



How to cooperate between CRISs? If one tries to find keywords through the three speakers’ presentations and through the comments of the assistance, it would be:

  • from CRIS users’ point of view: one question - no more, simple search procedure, completeness of the databases, data quality, different kinds of data (research projects, expertise, results, ...), authenticity of information input;
  • from CRIS managers’ point of view: utility of its CRIS, marketing, automated updating procedures, geographic coverage.

B.5.1 - Peter Finch
The ERGO Pilot Project (European Research Gateways on-line).

Peter M. Finch presented the ERGO Pilot Project (European Research Gateways On-line). One of the main objectives of that project is certainly "one question-no more" for the end-user. Through a simple search procedure, this one should find good quality data. For the managers, the utility of each database participating to the project increases, the marketing efforts made by one of them can be reduced because part of it can be done jointly, and finally, the geographic coverage augments through the project itself.

The main topic of the discussion was the decentralization of the project which is foreseen but in the future because it would need a bigger budget. If the pilot project can assess the efficiency of ERGO, it would encourage the Innovation Programme Committee to finance the whole project.

ABSTRACT

B.5.2 - Wolfgang Adamczak
The future of CRIS as "LINK" systems

Professor Wolfgang Adamczak sees the future of CRISs as "LINK" systems, as search engines coupled with WWW. To use the Web immediately as search tool gives to the user the access to all kinds of data available on-line, these data being authentic which enhances their quality. The geographic coverage is obviously the globe.

The discussion attempted to determine the border between the Web and databases. WWW is not a database itself, these are behind the Web.

ABSTRACT

http://www.uni-kassel.de/wiss_tr/Veranstaltungen/CRIS98.html

B.5.3 - Jostein Hauge
The EuroCRIS platform

Professor Jostein Hauge presented the EuroCRIS Platform. Being composed by database producers taking care of both their users’ needs and their needs, all the keywords above are important for the Platform. Until now, some important projects are or have been executed by the Platform to harmonize their databases and facilitate the communication between them. Now, what is the future? Shall the platform complement the work made within the ERGO project and how?

ABSTRACT

http://www.fou.uib.no/cris/index.htm

Conclusion of session B.5, "Co-operation between CRISs"

The conclusion of that session could thus be that, for the CRIS 2000 conference, we could dream about a GLORGO (GLObal Research Gateways On-line) project, appearing on Yahoo’s homepage and managed by a Global CRIS platform?!