Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Pan-genome Graph Algorithms and Data Integration

Periodic Reporting for period 1 - PANGAIA (Pan-genome Graph Algorithms and Data Integration)

Reporting period: 2020-01-01 to 2023-06-30

PANGAIA is a network of researchers who are working on laying down the algorithmic foundations of graph pangenomics, a new computational field that aims to renew genome informatics by replacing the notion of linear reference genome with a much richer graph representation.
In fact, the traditional view of a human genome as a linear sequence fails to capture human diversity and misses the structure of a population genome, where common and rare genomic variations, such as polymorphisms, duplications, insertions, and deletions are manifest.
A graph pangenome effectively facilitates the comparison of thousands (or even millions) of sequences.
The main goal of PANGAIA is to leverage the notion of graph pangenomes to address numerous questions arising from the unprecedented availability of sequencing data. This data, for the first time in human history, has the potential to serve as a valuable source of information for advancing human health.
In pursuit of this objective, PANGAIA addresses questions such as: How can we produce a better representation of viral genomes? How can we support the investigation of genetic causes of diseases? How can we contribute to understanding human genomic diversity? How can we support research on antibiotic resistance?
PANGAIA focuses on three scientific topics:
1) developing methods for constructing pangenomes from vertebrate, viral, and bacterial genomes.
In particular, PANGAIA is studying novel, efficient and cheap, approaches for the storage and indexing of large collections of data, especially those that are stored in exponentially growing biobanks (such as the UK bioBank). Given the rate of data production achieved by some ongoing sequencing projects, this topic is essential to the success of personalized medicine.
2) Studying measures of similarity and dissimilarities to compare pangenome graphs, including developing new algorithms to compare pangenomes to detect significant genomic variants.
In particular, PANGAIA has developed various tools for facing the pandemic of SARS-CoV-2 and PANGAIA members have played an important role, both at national and international levels, in monitoring the evolution and diffusion of SARS-CoV-2 variants.
3) Translating the results on the previous topics into actual human health advances.
This includes exploring deep learning approaches for predicting which genes are causes of diseases.
PANGAIA has actively worked to promote the dissemination of Graph Pangenomics. Indeed, it has organized the first international PhD school on the Introduction of Pangenomics that has been attended by PhD students from all over the world. Training has been done by some of the top researchers in the field.
PANGAIA has been addressing some fundamental questions that will impact the contribution that computer science research can give to improve human health.

PANGAIA has produced new graph representations of viral genomes, which are capable of distinguishing different strains. In a study published in Genome Research in March 2022, the PANGAIA team developed a novel method called Strainline, aimed at reconstructing genome sequences of viruses with greater precision. This is particularly important as variants of viruses, such as SARS-CoV-2, often differ only in minute details that can significantly impact the behavior of the virus.
Furthermore, PANGAIA has made advancements in understanding the genetic architecture of Amyotrophic Lateral Sclerosis (ALS). Through the use of disease capsule networks, research groups have identified relevant genes involved in ALS.
PANGAIA has advanced in understanding Human diversity due to structural variations.
In a study published on December 22, 2022 in the Nature Methods journal, PANGAIA researchers applied a novel algorithm to analyze “long” variants that are particularly difficult to detect, comprising several hundred consecutive nucleotides. The major advance of this method is the detection of 10% more long variants than all the others.
PANGAIA has developed new software tools to study bacteria that are resistant to drug treatments (i.e. antibiotic resistance).
In a study published on June 23 in Bioinformatics, PANGAIA researchers have developed PlasBin-Flow a tool for the analysis of bacterial isolates to detect plasmids whose role is relevant in the propagation of antimicrobial resistance.
The results achieved by the project have significantly advanced the state-of-the-art in the field of computational pangenomics. This progress has primarily been made through the development of new algorithms and data structures that are implemented in software tools. These tools have great relevance for future applications in human health.

By harnessing the extraordinary power of algorithms, these tools can effectively analyze sequencing data and entire genomes at once. This analysis enables the extraction of crucial biological information that holds immense importance for medical science.

One of the main contributions of PANGAIA is the development of new algorithms that improve the investigation of plasmids in bacteria, which can be responsible for antibiotic resistance. Antibiotic resistance is a significant concern in medicine, as new species of bacteria are constantly evolving to be resistant to common antibiotic treatments. By using graph pangenomics, PANGAIA's research aims to produce new results by analyzing data collected from different samples together.

Additionally, PANGAIA has already developed software that enables the identification of different microbial species using novel algorithms for genome construction from sequencing data. This software can assist in accurately identifying and understanding various microbial species, which is crucial for fields like microbiology.

PANGAIA's focus on making the analysis of sequencing pangenome data more efficient and accurate is indeed a major achievement with significant economic implications for computational biologists. By reducing the cost and improving the effectiveness of processing genomic data, PANGAIA's contributions have the potential to revolutionize personalized medicine and diagnostics for various diseases.

One significant contribution of PANGAIA is the development of software that can detect new genetic variations in regions of medical interest. By analyzing sequencing samples from an individual, this software can identify important genomic variations that may have implications for disease susceptibility, drug response, or other medical factors. This information can provide valuable insights for personalized medicine, leading to more accurate diagnostics and targeted treatments.

Furthermore, PANGAIA will be actively working on expanding this method to investigate cancer-specific variations. Genomic variations play a crucial role in the development and progression of cancer, and by applying their expertise in pangenomics, PANGAIA aims to contribute to the identification and understanding of these cancer-specific variations. Such advancements in cancer genomics can potentially lead to improved diagnosis, treatment selection, and therapeutic strategies.

Overall, PANGAIA's efforts to enhance the analysis of pangenome data have promising applications in personalized medicine, diagnostics, and cancer research, thereby extending their impact beyond computational biology.
PANGAIA on social media
PANGAIA list of interviews on youtube
PANGAIA trailer on interviews from the the first International School on Pangenomics
My booklet 0 0