AVAILABILITY OF A TOOL FOR THE SOLUTION OF PRACTICAL PROTEIN DESIGN PROBLEMS IN MOLECULAR BIOLOGY, MOLECULAR MEDICINE AND BIOTECHNOLOGY, WITH LONG TERM ECONOMIC BENEFITS IN THESE FIELDS.
Data on known protein structures and amino acid sequences have proven to be very useful for deriving empirical rules for protein folding and design. With the growing volume of these data, more sophisticated systems for storing and handlingknowledge on macromolecular structures are urgently needed. Progress should be made by improving ways to exploit sequence homology to infer structural information from the more than 10000 proteins for which only sequence data are available.
Research was carried out in order to develop a database of protein knowledge containing structure and sequence information and extend the database to include information on inferred 3-dimensional structures of proteins for which only sequence data are available.
SESAM, a performing relational database for protein structure and sequence capable of containing data from public and private sources, was developed; it features powerful procedures for validating and cleaning up input data, and rapid data retrieval. It has been interfaced with a graphics package, BRUGEL, and specialized user friendly interfaces have been implemented. The database on known protein structures was extended to include inferred 3-dimensional structures, grouped into structural families, by exploiting the correlations between structural homology and sequence similarity above a certain threshold of the latter. A limited number of short sequence patterns characterising with high accuracy local structure motifs in proteins can be found. This does not improve protein structure prediction methods, due to the limited size of the structural database and to the influence of spatial interactions between distant residues in the sequence. Object oriented methods and logic programming (Prolog) yield important benefits in terms of speeding up the design, development and debugging stages.
DEVELOPMENT OF A SYSTEM FOR PROTEIN STRUCTURE PREDICTION, MERGING STATISTICAL AND INFORMATION ANALYSIS TECHNIQUES WITH ADVANCED TOOLS FROM COMPUTER SCIENCE (A.I.) AS WELL AS EXISTING METHODS IN MOLECULAR MODELLING.
IN THIS COLLABORATION EFFORT, THE SPECIFIC ROLE OF THIS PART OF THE PROJECT CONSISTS IN THE APPLICATION OF NEW TYPES OF STATISTICAL ANALYSIS TO PROTEIN DATABANKS IN ORDER TO PROVIDE HEURISTICS FOR LOCATING SEQUENCE SEGMENTS OR AMINO-ACID PROPERTY PATTERNS OF HIGH STRUCTURAL INFORMATION CONTENT.
Funding SchemeCSC - Cost-sharing contracts