Periodic Reporting for period 1 - PopMet (Investigating bacterial strain evolution through metagenomic genome assemblies)
Periodo di rendicontazione: 2015-03-01 al 2017-02-28
In this project, we exploited the information in SNVs to get an understanding of the bacteria and their functional capacities in a diverse set of environments. This is different from classical metagenomics, in that we have a higher resolution to differentiate between single strains and in that we can recover part of the genomes and link this to the taxonomic identity of a bacterium, giving us more information about what who is doing at a given environment. This is important to understand how pathogenic bacteria are distributed in the environment, where they normally occur and when they turn from a “bystander” to an “aggressor” that is trying to damage the host. This kind of switch is known for many bacteria, e.g. E. coli is often found in the gut ecosystem, but is also often causing diarrhea, given opportunistic circumstances. Understanding the bacterial distribution and what can be considered as normal was thus an important scientific research goal in this project.
A second part was to investigate how bacterial genomes evolve over time, as this is important to understand how the phenotype of a bacteria can change. Normally these changes in phenotype are not harmful to the host, but in some rare cases this can also lead to a switch from bystander to aggressor strain.
For the PopMet project an extensive software pipeline was developed to assemble genomes from metagenomes, henceforth referred to as MATAFILER, (https://github.com/hildebra/MATAFILER/(si apre in una nuova finestra)). With the help of this pipeline, we analysed the core samples of the PopMet project, a gut microbial time series, as well as other microbial samples.
In a first step, the genomes of selected species were reconstructed and assembled. Subsequently, the same species from different patients were compared to estimate their global genetic divergence. The genetic diversity within a patient’s time series was in all cases extremely low. The first research question was to define and quantify bacterial strains. One established approach is to use the 16S gene as a phylogenetic marker gene, but the resolution is too low to even reliably identify a bacterial species. Instead, the usage of SNVs and their application at resolving species at sub-species level was investigated. Relying on stable, single copy marker genes in bacterial genomes, I extended the mOTU approach by including 40 and 100 stable core genes and then comparing the marker genes of a given species between samples. This was extended to comparing the whole genome, if recoverable, between samples. The latter two approaches could reliable place species from the same patient as the same bacterial strain.
The second part of my analysis was to reconstruct bacterial genomes from metagenomes. For this a new algorithm was implemented, with which we discovered a new bacterial species in a sample that was coinciding with an antibiotic treatment. This new species probably represents a new family of Clostridiaceae. Using the reconstructed genome, we could show it presence in samples from the same patient before and after antibiotic treatment. Further, the genomes of 4 co-occurring species in the same sample were also assembled.
The third question was to estimate the genetic variability and population behaviour over time. For this the reconstructed genomes were essential, enabling a precise mapping of metagenomic reads to the assembled bacterial genomes of a specific patient. This extended also to the question which algorithms are suitable to call SNVs on a potentially highly diverse and heterogeneous strain mix within metagenomes. To call genetic variants within a time series, a novel SNV calling pipeline is being developed. Using this pipeline we determined SNVs that are undergoing fixation over the time course of 4 years.
To disseminate my knowledge and experience, I taught a visiting school class about evolution and introduced metagenomics to participants of the Biology Olympiad. I also was responsible for a one day training in metagenomics (bioinformatics) in Leuven, Belgium. Until now, the project generated six peer-reviewed publications in high ranking journals. Thus I have encouraged young citizens to a scientific career and strengthened Europe as a knowledge based economy.