Biological sequence databases are a core source of information in the life sciences
and have nowadays grown to multiple thousand entries. Classically, a query of a sequence to such a
database requires the comparison of the query to each entry using an alignment algorithm,
like fasta, smith-Waterman or blast. Many realtime and high-througput experiments rely
on a quick identification of the query to decide the next steps in the experimental pipeline
and are currently slown down by the costs of the classical retrieval systems.
The main objective of this proposal is to provide quick large-scale identification
algorithms in non-metric spaces, induced by the scoring functions for sequence-alignments.
Thereby, the proposal aims on techniques which avoid the full calculations of the scorings
during training and retrieval, employing different mathematical and probabilistic approximation techniques.
Field of science
- /natural sciences/computer and information sciences/databases
Call for proposal
See other projects for this call