CORDIS - Wyniki badań wspieranych przez UE
CORDIS

A novel bioinformatics’ platform to identify key proteasic pathways involved in kidney diseases

Final Report Summary - PROTEASIX (A novel bioinformatics’ platform to identify key proteasic pathways involved in kidney diseases)

Proteasix: a tool for automated and large-scale prediction of proteases involved in naturally-occurring peptide generation
Proteases are involved in numerous pathophysiological processes and produce peptides as evidence of their activity. The human genome encodes more than 500 proteases, many of which are orthologous, indicating the significance of this class of enzymes to biological processes. Modifications of protease activity have been associated with many pathologies including cardiovascular, inflammatory and fibrotic diseases, cancer, or neurological disorders. Understanding the protease network in health and disease represents a major research question highly relevant for pathophysiology, biomarker discovery and drug development.
Body fluids (e.g. serum, urine, cerebrospinal fluid) contain thousands of protein peptides; a substantial number of disease-associated peptides have been described for a variety of disorders such as kidney, cardiovascular, autoimmune, infectious diseases and certain types of cancer. This quantity of accumulated data is in direct contrast to the (lack of) knowledge about the mechanisms leading to the generation of these peptides, which may hold the key to linking the observed biomarkers to pathophysiology. As a first step in this direction we examined links between naturally occurring peptides and the proteases that may be responsible for their generation. The underlying hypothesis is that changes in protease activity may be linked to disease pathophysiology in a more direct way than the generated peptide itself.
Information about proteases and their cleavage sites is, however, scattered across publications and databases, different formats, and not always suitable for automatic computed searches. TopFIND, the MEROPS peptidase database, the CutDB database are large information resources for proteolytic events, documenting proteases from different organisms, along with their experimentally identified or predicted substrates and observed cleavage site sequences. Although useful, these databases still present several limitations. Proteases and substrates are sometimes defined using obsolete identifiers, or textual descriptions not corresponding to annotation standards, while most proteomics results are annotated using stable SWISS-PROT curated identifiers. Moreover, none of the resources allow for batch searches and cleavage sites must be manually queried one by one. Finally, these different databases are either protease-, cleavage site- or substrate-centric, and do not permit automatic cleavage site retrieval for peptide sequence input. As a result, querying has to be performed manually, including initial sequence alignment with the full-length protein to identify N- and C-term cleavage sites, and in some cases also requires specific knowledge of protein names and representation.
To address this we developed Proteasix (www.proteasix.org) an open-source peptide-centric tool that can be used in an automatic and large-scale fashion to predict in silico the proteases involved in native peptide generation.
The main component of Proteasix is the underlying curated cleavage site database, containing 3500 entries about human protease/cleavage site combinations. On top of this database, we built a bioinformatics tool, Proteasix, which allows cleavage site retrieval and protease associations from a list of peptides.
To establish the proof of concept of the approach, we used a list of 1388 peptides identified from human urine samples, and compared the prediction to the analysis of 1003 randomly generated cleavage sites. Metalloprotease activity was predominantly involved in urinary peptide generation and more particularly to peptides associated with extracellular matrix remodelling, compared to proteins from other origins. In comparison, random cleavage sites returned almost no results, highlighting the specificity of the prediction.
This project provides a unique tool that can facilitate linking of identified protein peptides to predicted protease activity, and therefore into presumed mechanisms of disease. Experiments are needed to confirm the in silico hypotheses.
We believe that this tool will be of great interest to understand and define the protease networks in health and disease, representing a major clinical objective in terms of pathophysiology, biomarker discovery and drug development for many diseases such as cardiovascular, inflammatory and fibrotic diseases, cancer, or neurological disorders.