Skip to main content

Exploring the human gut microbiome at strain resolution

Periodic Reporting for period 3 - MicrobioS (Exploring the human gut microbiome at strain resolution)

Reporting period: 2019-07-01 to 2020-12-31

With the genome sequencing of hundreds of bacterial isolates per day and a vast and growing number of metagenomics sequencing projects on gut microbiomes in healthy and diseased people all over the world, it becomes feasible to explore the microbial diversity in us not only at the level of genera and species, but at strains. As two different strains of a prokaryotic species might only share 40% of the genes and can also vastly differ in single nucleotide variation (SNV), many aspects of a proper understanding of the microbial communities we host in the gut might only be revealed at this high resolution level. This proposal aims (i) to develop a robust methodology to characterize the SNV and gene content landscape from metagenomic shotgun data (ii) to explore patterns of variation in the human population to stratify geographically, but also in subpopulations such as families and to understand dispersal and the evolution of microbial strains as well as iii) to work towards medical applications, for example by monitoring fecal microbiota transplantation (FMT) at strain resolution or monitoring particular strains of interest in the population
In the reporting period, we were able to achieve many of the envisioned tools and applications of the proposal. Regarding tool developments, we have published a single nucleotide analyser, metaSNV (Costea et al., PlosOne2017), devised methods for gene content analyses of conspecific strains, subspecies delineation protocols (Costea et al., Mol. Sys. Biol. 2017) and many other analysis tools and resources that are required to zoom into strain resolution (e.g. the bacterial genome resource proGenomes: Mende et al., NAR 2017; an update of our eggnog (meta)genome annotation resource: Huerta-Cepas et al., NAR 2017; and a tool to annotate newly sequenced (meta)genomes: Huerta-Cepas et al., Mol. Biol. Evol.2017). Regarding the identification of known strains in metagenomics data, we have developed a respective pipeline that we are still testing and investigated the transmission or oral strains into the gut. We analysed longitudinal data using SNV markers (Li et al., Science, 2016; Korpela et al., Genome Res. 2018, in press), although we continue to refine the results. Finally, we have identified and quantified subspecies in the vast majority of abundant and prevalent human gut microbes, which have a distinct biogeography and which seem to be exclusive and stay with the host for a long time (Costea et al., Mol. Sys. Biol. 2017). We also zoomed beyond the species level in disease applications (e.g. in colon cancer, unpublished) or for Parkinson’s disease (Bedarf et al., Genome Med. 2017) and pointed to the need of strain level resolution in metagenomics (Schmidt et al., Cell 2018).
We could already demonstrate that the exploration of metagenomics data subordinate of species is very powerful and can yield basic knowledge, but also can be utilized in clinical applications. We intend to extend our analysis of longitudinal data of healthy individuals to understand strain influxes and mutational processes including gene content changes. We expect that strain transmission between individuals and body sites can be quantified and that known pathogenic strains can be reliably identified above a technically required abundance level. We might also be able to derive an operational strain definition that fits the data we observe to guide further research projects.