A complex carbohydrate structural database


The objective of the research programme is to create and maintain a complex carbohydrate structural database (CCSD) and a database management program (CarbBank). The database will contain all published carbohydrate structures larger than disaccharides to enable scientists to search for carbohydrate structures in a systematic and rapid fashion. Currently, the database contains about 8000 records which will be extended in the next years to about 20000.
The latest version of the CarbBank program and the structural database, Complex Carbohydrate Structural Database (CCSD) with about 8000 records has been released. The computer program CASON was developed to translate the systematic name of a compound as used in the Chemical Abstracts (CA) registry file to a structural representation as used in CarbBank. The CA ONLINE service was utilized to search for specific structural elements of nitrogen (N) and oxygen (O) linked oligosaccharides. The search resulted in about 2 900 entries which were extracted and subsequently converted using the CASON program. These data were verified against the original literature.

A computer program, SUGABASE, was developed to add hydrogen-1 and carbon-13 NMR data to the structural information included in CarbBank. The database can be searched for carbohydrate structures by entering a list of chemical shift values. The resulting carbohydrate structures and NMR tables are displayed concurrently, whereby the matching monosaccharide residues in the carbohydrate structures and the matching chemical shift values in the NMR tables are highlighted. During the reporting period a carbon-13 NMR module has been appended to the database. The database of NMR tables of carbohydrate structures has been extended and includes 508 hydrogen-1 NMR records and 237 carbon-13 NMR records. The corresponding database management program has been improved and he user interface has been changed to be IBM SAA/CUA compliable.
In cooperation with scientific groups worldwide a survey of the existing literature is performed. In addition to collecting structural data manually, Chemical Abstracts (CA) services are used to provide carbohydrate structures and thus allowing for an easy access to the existing literature in that field. The CCSD contains information about complex carbohydrates larger than disaccharides with biological importance. The full primary structure, the reference, keywords and supplementary information are given for each record. Using the management program CarbBank the database can be edited and searched for whole and partial structures and text entries. Due to the menu driven CarbBank program a search can be performed easy and quickly.

In addition to the survey of the literature the structural database will be linked to spectroscopic information, like hydrogen-1 and carbon-13 NMR data. This has already been accomplished using the preliminary version of the program SUGABASE. The development of tools to analyse the spectroscopic data utilising neural network methods are planned. Links and cross references to other databases, ie protein sequence databases will be established. Currently the program is available for IBM compatible PCs running DOS. Development of a program version running under UNIX is in progress.


