Skip to main content
European Commission logo print header

ALgorithms for PAngenome Computational Analysis

Description du projet

De la séquence aux représentations basées sur des graphes – un changement de paradigme en génomique

Le séquençage du génome vise à déterminer l’ordre des As, Cs, Gs et Ts, qui représentent les nucléotides de l’ADN dans le génome d’un organisme. Les avancées en la matière ont engendré un volume de données en constante augmentation dans le monde entier. Financé par le programme Actions Marie Skłodowska-Curie, le projet ALPACA développera des représentations de génomes basées sur les graphes qui s’appuient sur la combinaison des variations individuelles d’une manière significative du point de vue de l’évolution. Cela permettra de traiter et d’analyser les données de séquençage de manière plus efficace par rapport aux approches traditionnelles, qui sont basées sur des représentations ordinaires de génomes de type séquentiel. Ce changement de paradigme jouera un rôle essentiel dans la médecine personnalisée et l’analyse des agents pathogènes.

Objectif

Genomes are strings over the letters A,C,G,T, which represent nucleotides, the building blocks of DNA. In view of ultra-large amounts of genome sequence data emerging from ever more and technologically rapidly advancing genome sequencing devices—in the meantime, amounts of sequencing data accrued are reaching into the exabyte scale—the driving, urgent question is: how can we arrange and analyze these data masses in a formally rigorous, computationally efficient and biomedically rewarding manner?
Graph based data structures have been pointed out to have disruptive benefits over traditional sequence based structures when representing pan-genomes, sufficiently large, evolutionarily coherent collections of genomes. This idea has its immediate justification in the laws of genetics: evolutionarily closely related genomes vary only in relatively little amounts of letters, while sharing the majority of their sequence content. Graph-based pan-genome representations that allow to remove redundancies without having to discard individual differences, make utmost sense. In this project, we will put this shift of paradigms—from sequence to graph based representations of genomes—into full effect. As a result, we can expect a wealth of practically relevant advantages, among which arrangement, analysis, compression, integration and exploitation of genome data are the most fundamental points. In addition, we will also open up a significant source of inspiration for computer science itself.
For realizing our goals, our network will (i) decisively strengthen and form new ties in the emerging community of computational pan-genomics, (ii) perform research on all relevant frontiers, aiming at significant computational advances at the level of important breakthroughs, and (iii) boost relevant knowledge exchange between academia and industry. Last but not least, in doing so, we will train a new, “paradigm-shift-aware” generation of computational genomics researchers.

Coordinateur

UNIVERSITAET BIELEFELD
Contribution nette de l'UE
€ 505 576,80
Adresse
UNIVERSITAETSSTRASSE 25
33615 Bielefeld
Allemagne

Voir sur la carte

Région
Nordrhein-Westfalen Detmold Bielefeld, Kreisfreie Stadt
Type d’activité
Higher or Secondary Education Establishments
Liens
Coût total
€ 505 576,80

Participants (12)