Skip to main content
Go to the home page of the European Commission (opens in new window)
English en
CORDIS - EU research results
CORDIS

Scalable Graph Algorithms for Bioinformatics using Structure, Parameterization and Dynamic Updates

Project description

Improving graph algorithms’ scalability for bioinformatics

Advancements in sequencing technologies, such as human genome mapping, have paved the way for groundbreaking discoveries. However, as data volumes grow, there is a need for reliable computational methods. The ERC-funded SCALEBIO project aims to enhance the scalability of exact graph algorithms through innovative preprocessing graph structures and modern algorithmic techniques. Specifically, it will introduce safety structures to simplify problem-solving by identifying common paths in optimal solutions, along with variation structures that focus on areas with significant genetic variation. Key methodologies will include parameterised polynomial algorithms and dynamic algorithms that can adapt to new data. These techniques will be applied to areas such as discovering long-read RNA transcripts and indexing large genetic databases.

Objective

Sequencing technologies have developed to be cheap and accurate, leading to major breakthroughs, such as the complete sequence of a human genome, the creation of nationwide population gene banks, or the discovery of novel viruses. As the amount of data produced grows exponentially and their applications become more broad and complex, the community needs accurate computational methods that scale.

At the core of many algorithmic methods for processing sequencing data is the basic primitive of finding a set of paths or walks in graphs of various nature. Under different formulations and objective functions, the resulting problems can be NP-hard (e.g. flow decompositions) or polynomial-time (e.g. path covers), which are impractical on large graphs. Thus, many practical tools prefer fast heuristics to exact algorithms. While these may be optimized for specific inputs, they may not be reliable or accurate in general, which is a highly relevant issue in e.g. medical and life-science research.

This project will develop general methods to massively scale such exact graph algorithms. First, via novel graph structures usable in a preprocessing step: safety structures, e.g. sets of paths that can be quickly found to appear in all optimal solutions and thus simplify the problem; variation structures that limit the hardness of a problem only to graph areas rich in genetic variation. Second, via modern algorithmic techniques: parameterizing polynomial algorithms to run in time linear in the graph size and superlinear only in a small parameter; dynamic algorithms that, as the input grows, update solutions based only on the new data.

We will apply these methods in two high-impact applications: long-read RNA transcript discovery, and indexing massive and rapidly growing genomic databases.

This project paves the way for exact graph algorithms usable independently of the problem complexity or of the input size, applicable to real-world problems.

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.
This project's classification has been human-validated.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

HORIZON-ERC - HORIZON ERC Grants

See all projects funded under this funding scheme

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

(opens in new window) ERC-2024-COG

See all projects funded under this call

Host institution

HELSINGIN YLIOPISTO
Net EU contribution

Net EU financial contribution. The sum of money that the participant receives, deducted by the EU contribution to its linked third party. It considers the distribution of the EU financial contribution between direct beneficiaries of the project and other types of participants, like third-party participants.

€ 1 999 868,00
Address
FABIANINKATU 33
00014 HELSINGIN YLIOPISTO
Finland

See on map

Activity type
Higher or Secondary Education Establishments
Links
Total cost

The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.

€ 1 999 868,00

Beneficiaries (1)

My booklet 0 0