Since the start of the project, we have made substantial progress in both theoretical and practical directions, focusing on developing algorithms that are safe, complete, and efficient. Our work has addressed key challenges in computer science and bioinformatics, leading to significant advancements across multiple research directions.
We created safe and complete algorithms for walks in directed graphs, including the development of linear-time safe and complete algorithms for Eulerian cycles and edge-covering closed walks. These algorithms are pioneering in addressing fundamental reachability problems, uncovering structural properties of the graphs that can benefit further theoretical research but also applications.
We next focused on modeling the genome assembly problem, where we introduced novel concepts like "cut paths" and "remainder structure" as tools for obtaining safe and complete algorithms. These algorithms have been integrated into popular genome assembly tools, enhancing their performance and demonstrating improvements in assembly contiguity, especially on metagenomic datasets.
Then, we tackled safe walks concerning path-finding problems with objective functions, such as Minimum Path Cover (MPC) and network flow decomposition. We achieved groundbreaking results, including the first linear-time parameterized algorithm for MPC, and developed the first efficient solutions for (minimum) flow decomposition problems via integer linear programming, significantly advancing the field.
The next results focused on pangenome graphs. We developed the tool GraphChainer, a read-to-pangenome graph aligner, and more generally explored faster solutions for string-to-graph problems. Our work on safe partial alignments for protein sequences offers promising applications in predicting stable protein structures.
Finally, we established lower bounds for our algorithms, proving their optimality and exploring variants of safety in graph and string matching problems. We also proved fine-grained complexity lower bounds for the problem of finding an occurrence of a string in a labeled graph.
Overall, our project's core methodology of outputting "safe partial solutions" has provided novel insights into algorithmic problems with multiple solutions. This approach has led to deep theoretical results and practical applications, particularly in genome assembly, where integrating safe algorithms improved assembly contiguity and incorporated data features like abundances.