Community Research and Development Information Service - CORDIS

New software tools for the storage, analysis and functional interpretation of protein and gene expression data

A portal for managing 2D gel data, Ashwin Kotwaliwale, Keith Vass and Walter Kolch

Summary: Signalling pathways lie at the core of information transmission from extracellular environment to the nucleus, producing specific biological responses. Using 2D gels we have attempted to identify a large number of proteins ideal for protein expression studies. Managing 2D gel data to understand protein interactions and compare results using computational methods is the aim of our study. We have developed a portal coupled to a relational database for managing 2D gel data.

Introduction: The overall aim of our project is to model the MAPK pathway using high throughput proteomic data such as 2D gels and understand the properties of protein interactions within the MAPK pathway. We use data from 2 different gel systems (a) the DiGE, which employs the 3 fluro-dye-detection system (b) P32 labelled gels.

DiGE detects difference in protein expression between samples using a common master gel and identifying each spot on the master as a single protein and mapping it across to all the gels. Quantitation is obtained by comparing against the master and so to each other. Using the DeCyder software the results can be saved as XML documents achieving compatibility on various computer operating systems. These XML documents store raw and processed values, aiding in analytical work. Tools to query XML documents using XPATH and XSLT exist, however, those methods are difficult and not appropriate for storage and comparing results.

As an alternative to the XML-based methods we have designed a relational database management system to store DiGE data and Meta information about experiments as recommended in the Pedro model. As proteomics experiments are expensive and time consuming, success of the project in part would depend on efficient data management.
The user is served by a rich portal interface developed in .NET platform. The portal is a modification of the fully content managed, database driven IBuySpy portal which is freely available over the internet. We have written custom modules to add, edit and retrieve experimental information and integrated them into the portal. .NET was a chosen platform as it offers extended support for XML documents and object oriented programming style for designing web pages. The web pages are an actual implementation of ASP.NET data model with VB.NET as code behind. The basic modules within the IBuySpy portal are used for storing protocols, contacts lists and discussion boards etc which is often needed by collaborating labs and proteomics community users. These modules prove useful to manage large projects wherein different bits of the experiment are performed in different geographic locations.

The database is implemented in Microsoft SQL Server 2000. The database consists of a basic schema and a gel specific schema. The gel specific schema address tables used to store DiGE gel information such as spots, spotsets, spotratios and proteinID s. Meta information about the experiments is stored in other relevant tables. We have taken a modular programming approach and have written stored procedures to access data.

Data flow is implemented in 3 distinct layers (a) User interface performing client-side validation (b) Business object layer which is an implementation of classes to extract data from XML documents. For DiGE data we have written a parser which implements methods of the input-output and file reader classes of the .NET framework. We use the XML streamreader which is a forward, read only implementation of the XML streamreader class. As opposed to this the P32 gel data is parsed within the database server using the Data Transformation Services. This ensures even greater efficiency and speed for data handling. (c) Data access layer which comprises of stored procedures and in some case SQL commands to interact with the database.

Querying the database is a possible by various means. Firstly, we have a set of standard queries built into the database; one such example is the one that transforms DiGE data so as to compare protein expression in a set of gels. Secondly, there is often a need to connect the database directly to 3rd party software. Our database is tested with standard tools such as MATLAB and R. Lastly, we are working on an interactive query tool where the non expert users can design queries for reuse later.

Related information

Reported by

Beatson Institute for Cancer Research
G61 1BD Glasgow
United Kingdom
Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top