Skip to main content
Go to the home page of the European Commission (opens in new window)
English en
CORDIS - EU research results
CORDIS

Pan-genome Graph Algorithms and Data Integration

Periodic Reporting for period 2 - PANGAIA (Pan-genome Graph Algorithms and Data Integration)

Reporting period: 2023-07-01 to 2025-10-31

PANGAIA is a network of researchers who are working on laying down the algorithmic foundations of graph pangenomics, a new computational field that aims to renew genome informatics by replacing the notion of linear reference genome with a much richer graph representation.
In fact, the traditional view of a human genome as a linear sequence fails to capture human diversity and misses the structure of a population genome, where common and rare genomic variations, such as polymorphisms, duplications, insertions, and deletions are manifest. The notion of graph pangenomes has been introduced to overcome this limitation, effectively facilitating the comparison of thousands (or even millions) of genomes.
The main goal of PANGAIA is to leverage the notion of graph pangenomes to address numerous questions arising from the unprecedented availability of sequencing data. This data, for the first time in human history, has the potential to serve as a valuable source of information for advancing human health.
In pursuit of this objective, PANGAIA addresses questions such as: How can we produce a better representation of viral genomes? How can we support the investigation of genetic causes of diseases? How can we contribute to understanding human genomic diversity? How can we support research on antibiotic resistance?
PANGAIA focuses on three scientific topics:
1) developing methods for constructing pangenomes from vertebrate, viral, and bacterial genomes.
2) Studying measures of similarity and dissimilarities to compare pangenome graphs, including developing new algorithms to compare pangenomes to detect significant genomic variants.
3) Translating the results on the previous topics into actual human health advances.
International collaboration was at the heart of the PANGAIA project’s success. Through researcher exchanges (known as secondments), the project brought together experts from different countries and disciplines to tackle major challenges in genomic research and precision medicine.

By working together, PANGAIA researchers addressed key questions related to human health. These included understanding and combating antimicrobial resistance, describing genetic diversity across human populations, identifying genetic variants linked to disease risk, improving non-invasive diagnostic methods based on DNA sequencing, and understanding how tumors evolve over time in patients.

One of the main outcomes of the project was the development of software tools that are now widely used by the scientific community. These tools can:

Represent the genetic diversity of many related genomes using compact and efficient graph-based models

Identify previously hidden genetic variations, particularly in complex regions of the human genome that may be linked to rare diseases

Use machine learning to discover new genes associated with specific diseases

Detect and classify plasmids responsible for antimicrobial resistance

Reconstruct the evolutionary history of tumors

Together, this work has significantly advanced our understanding of genome analysis and has resulted in 118 peer-reviewed open-access publications (74 journal articles and 44 conference papers). Below, we highlight some representative studies.

In a study published in Nature Methods, PANGAIA researchers applied a novel algorithm to analyze “long” variants, which are particularly difficult to detect and can span several hundred consecutive nucleotides. This method achieved a major advance by detecting 10% more long variants than existing approaches.

In a study published in Genome Research, the PANGAIA team developed a novel method aimed at reconstructing viral genome sequences with greater precision. This is particularly important because viral variants—such as those of SARS-CoV-2—often differ only in minute details that can nonetheless significantly affect viral behavior.

In a study published in Nature Machine Intelligence, PANGAIA researchers advanced the understanding of the genetic architecture of Amyotrophic Lateral Sclerosis (ALS). Using disease capsule networks, research groups identified genes relevant to ALS.

In a study published in Bioinformatics, PANGAIA researchers developed a tool for analyzing bacterial isolates to detect plasmids that play a key role in the spread of antimicrobial resistance.

Another major goal of PANGAIA was to build a strong international research network and to raise awareness of the computer science challenges underlying computational pan-genomics. Equally important was training the next generation of researchers in this emerging field and supporting their career development.

These objectives were achieved through extensive secondments involving both early-career and senior researchers. The project connected nine European partners from academia and industry—including a small enterprise and a leading sequencing technology company—with six institutions outside the EU, in countries such as China, Japan, and the United States. Seconded researchers worked closely with international experts, gaining invaluable experience in both academic and industrial environments. As a result, they developed a unique and versatile skill set spanning algorithms and data structures, software development, big-data analysis, statistics, machine learning, and genomics.

Overall, more than 60 researchers from European institutions spent a combined total of 248 months working in partner institutions outside the EU, strengthening international collaboration and leaving a lasting impact on the field of computational pan-genomics. In addition, PANGAIA organized four PhD schools on pangenomics, attended by PhD students from around the world, with several of the field’s most influential researchers serving as lecturers.
The PANGAIA project advances computational pangenomics through the development of novel algorithms and software tools based on graph-based models, enabling the efficient representation and analysis of large-scale genomic data beyond traditional linear, sequence-based approaches. These advances address key challenges in understanding genetic diversity, structural variation, and complex genomic regions relevant to human health. Given the novelty of graph-based pangenomics, PANGAIA’s training activities are expected to have a strong societal impact, particularly as the field matures and its capacity to analyze multiple genomes simultaneously continues to expand.

A major contribution of PANGAIA is the development of new algorithms for the analysis of bacterial plasmids, which play a central role in the spread of antibiotic resistance. By jointly analyzing data from multiple samples using pangenomic graphs, the project provides new insights into antimicrobial resistance, one of the most critical challenges in modern medicine.

PANGAIA also delivers software tools for the efficient detection of genetic variants in medically relevant genomic regions, supporting improved disease risk assessment, drug response prediction, and personalized diagnostics.

Overall, PANGAIA’s outcomes have significant implications for personalized medicine, diagnostics, and public health. By reducing the cost and increasing the accuracy of genomic analyses, the project supports efforts to combat infectious diseases, advances cancer genomics, and strengthens Europe’s leadership in genomics through innovation, collaboration, and training.
PANGAIA interviews from the the International School on Evolutionary Pangenomic
From genomes to a pangenome
PANGAIA list of interviews on youtube
PANGAIA trailer on interviews from the the first International School on Pangenomics
My booklet 0 0