Researchers in chemistry, molecular biology, ecology and many other branches of science use materials obtained from organisms as the base of their studies and subsequently store their research results in databases. Their source material often consists of samples taken from natural history collections, or it is vouchered in such collections to ensure proper identification of the organisms. This material includes plant, animal, or paleontological specimens in natural history collections, culture collections of microbial strains, botanic- or zoological garden collections, natural product collections, etc.
In almost all of these fields, Europe owns the most extensive collections of such specimens worldwide. To facilitate access to these resources electronic inventories are needed. However, only few significant efforts in this direction have up to now been undertaken in Europe, at least if compared in contrast to the United States, where a policy towards such inventories was taken long ago. To achieve interoperability of present and future databases, common data models and standards are needed.
At the outset of the project, the objective of CDEFD was to develop project-independent structures to be used in the design of floristic databases and databases including floristic data. In the course of the project, this was extended to include biological collections in general, because it was realized that all objects or samples obtained from organisms share the same core data structure.
By means of a CASE (Computer Aided Software Engineering) program the results were recorded in a constantly updated information model, which consists mainly of diagrams depicting the complex data structures. Accompanying text and tables further define the contents and meaning of structural elements.
The CDEFD datamodel for biological collections is based upon a proposed structure of elemental inter-related types of 'biological objects': Scientific plant name, potential taxon (plant name with circumscription reference), collection or observation site, and unit (a physical object in a collection or in the field). All results of biological studies are linked to at least one of these objects.
'Plant Name' and 'Potential Taxon Name' are objects which have been treated in detail by a separate project (IOPI GPC Taxonomic Information Model). The accent of the CDEFD model lies on the Units, which have been analysed in full detail. Units may be derived from other units, in processes like duplication or extraction. This is modelled by means of a reflexive relationship, with an entity 'Derived Unit Creation Event' used to provide further details on the process. Unit-subtypes (entities which may be added to the core unit's attributes) are defined for different classes of parameters, ranging from quantification data to chemical identification procedures and results. Collection management was analyzed, including storage parameters and transaction management of units in and between biological collections; e.g. the exchange of herbarium material or the distribution of strains from a culture collection. At some point in time, any material in a collection has been obtained from a field source. This process is termed 'gathering', and it relates the unit to a system of geotraphic data items.
Problems surfaced in the attempt to define and standardize the information relating to geographic description and location of collection sites. Experts were consulted and specialized geographic information systems have been evaluated. The group came to the conclusion that commercially available geographic information systems (GIS) should be used whenever a detailed cover of geographic information is requested. Such systems can be linked to relational databases, so that integration with the model for collection data can be achieved. However, a tentative detailed datastructure for geographic data relating to collection site has been included in the model.
Links with information outside the realm of collection data were evaluated in detail by looking at the structure of chemical data (secondary metabolites), karyological data, and ecophysiological data. This information was modelled from two points of view: The data about organisms produced by research studies, and the data which is used during the study itself. The latter discussion leads to the development of a generalized model for experimental studies. The discussion of linked information provided important feedback for the modelling process of core biological collection data.
The CDEFD model is considered to be basic research, i.e. it is not meant to provide a model for direct implementation. Instead, the complex research model provides a general framework for the planning of specific databases. In addition, the model supplies guidelines for the definition of data fields and thus facilitates the discussion on data standards.
Funding SchemeCON - Coordination of research actions