Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
Content archived on 2024-06-18

Probabilistic Models in Pseudo-Euclidean Spaces

Article Category

Article available in the following languages:

Fast and flexible query response systems

Advances in biotechnology and particularly genomic techniques have produced a wealth of sequence data. Retrieving relevant sequence data from such vast databases requires a principled, formalised model, an aspect that EU-funded researchers have successfully resolved.

Fundamental Research icon Fundamental Research
Health icon Health

Scientists generally utilise database sequence similarity searches for data retrieval. However, public databases such as GenBank and UniProt/SwissProt contain several hundred thousand sequences, and existing bioinformatics techniques cannot achieve good data retrieval quality. The PROMOS (Probabilistic models in pseudo-Euclidean spaces) team addressed this breach in bioinformatics approaches. Their goal was to devise algorithms that rapidly provide accurate sequence data from large-scale databases. To begin with, researchers employed generic non-metric score similarities to derive and implement data-specific probabilistic relational models. They successfully developed a probabilistic framework for relational methods in pseudo-Euclidean spaces. To enhance model learning and enable fast data retrieval, they developed approximation schemes for relational data as well as a hierarchical model and retrieval schema. This domain-specific approach is effective as it converts large-scale dissimilarity matrices into approximated positive semi-definite kernel matrices at linear costs. PROMOS technology was tested on several large-scale protein databases and demonstrated better run-time performance than classical retrieval systems with competitive model accuracy. The methods have been published in numerous highly ranked publications, with several more under preparation. Project activities and outcomes should considerably speed up research and development in the biotechnology and pharma sectors.

Keywords

Sequence data, bioinformatics, PROMOS, probabilistic models, pseudo-Euclidean spaces

Discover other articles in the same domain of application