European Commission logo
English English
CORDIS - EU research results
CORDIS

Assembly-based discovery of uncharacterized human microbiome members and their tracking across individuals and time

Periodic Reporting for period 1 - DiMeTrack (Assembly-based discovery of uncharacterized human microbiomemembers and their tracking across individuals and time)

Reporting period: 2016-10-01 to 2018-09-30

The microbiome - the large set of micro-organisms that live and interact with the environment in which they live - has received a growing interest in the last few years due to its fundamental role in the human health, in the environment, and in different natural ecosystems. Although microbial communities have been investigated for decades, significant advances have been obtained through the introduction of high-throughput next-generation DNA sequencing technologies, and in particular thanks to shotgun metagenomics. Despite extensive recent studies using shotgun approaches, most characterization of these ecosystems has been focused on microbes that are easily cultivable and has suggested the presence of a large fraction of still unexplored diversity. New approaches and experimental analysis that are scalable to large cohorts are necessary in order to increase our knowledge of the microbiome from metagenomic approaches.

The MSCA DiMeTrack (leaded by Dr. Edoardo Pasolli and supervised by Prof. Nicola Segata at University of Trento, Italy) aimed to expose and characterize previously unknown members of the human microbiome. Through the application of existing and newly developed methodologies, we analysed around 10,000 metagenomes acquired from public repositories or newly generated in the host laboratory. We reconstructed and characterized several thousands of genomes and identified thousands of new species without reference genomes in public databases. The genomes were analysed in multiple projects in order to answer to different biological relevant problems. All the data and software generated were made freely available to the community, therefore representing an important resource for future studies on the human microbiome.
The project was conducted following three main Aims: i) we developed and implemented an integrated assembly-based computational pipeline to reconstruct genomes of unknown organisms from large-scale metagenomic datasets. This comprised also the screening and curation of around 10,000 metagenomes coming from publicly available repositories and their integration with new cohorts acquired in the host laboratory. This effort resulted also in the development of the curatedMetagenomicData package (Pasolli et al., Nature Methods, 2017); ii) the panel of extracted genomes was characterized phylogenetically and taxonomically. This was done through an improved version of the PhyloPhlAn software able to place the reconstructed genomes within the known microbial tree of life and to infer their phylogenetic relationship; iii) we started to integrate the large set of reconstructed genomes with the already available reference genomes. This still ongoing task aims to improve microbiome-based host condition prediction approaches and strain-level profiling methods. For the latter, we also developed the novel metagenomic strain identification tool StrainPhlAn (Truong et al., Genome Research, 2017).

The developed pipeline was applied in multiple projects to answer to different biological relevant questions: i) we reconstructed bacterial and Eukaryotic genomes from a newly acquired cohort spanning 97 skin microbiomes associated with patients affected by psoriasis. We showed strain heterogeneity colonisation and functional variability providing the hypothesis of psoriatic niche-specic strain adaptation or selection (Tett, Pasolli, et al., npj Biofilms and Microbiomes, 2017); ii) we reconstructed genomes from a newly acquired cohort of mothers and infants from multiple body sites for a total of 216 metagenomes, providing evidence of vertical strain transmission at birth (Ferretti, Pasolli, et al., Cell Host & Microbe, 2018); iii) we conducted a large-scale metagenomics analysis from 1689 subjects for the characterization of Blastocystis in the human gut microbiome (Beghini*, Pasolli*, et al., ISME Journal, 2017); iv) we conducted a whole-genome epidemiology, characterisation, and phylogenetic reconstruction of 184 newly acquired Staphylococcus aureus strains in a paediatric hospital in Italy (Manara*, Pasolli*, Dolce* et al., Genome Medicine, 2018); v) we conducted a very large-scale analysis spanning body sites, ages, countries, and lifestyles. We reconstructed and characterized 154,723 genomes from a total of 9,428 metagenomes from 46 publicly available datasets in addition to a newly sequenced cohort from Madagascar. This resulted in the largest microbial catalogue of the human microbiome. We recapitulated 4,930 species-level genome bins (SGBs), 77% without genomes in public repositories, and annotated 2.85 M genes in SGBs, many associated with conditions including infant development or Westernization (Pasolli et al., Cell, 2019); vi) we extended the approach to non-human environments (Pinto et al., Genome Ann., 2017; Pinto et al., Genome Ann., 2018).

The objectives of the MSCA action have been fulfilled and exceeded from different points of view: i) the researcher Dr. Edoardo Pasolli has co-authored ten papers (in addition to other papers under review/preparation) in international journals, including four papers as (co-)first author in top journals (Cell, Nature Methods, ISME Journal, and Genome Medicine); 2) the researcher has successfully transitioned from developing machine learning methodologies to a more computational biology-focused research; 3) the researcher has strengthen his expertise in dissemination and communication capabilities thanks to his participation at multiple conferences and events; 4) the researcher has successfully transitioned from a postdoctoral to an independent position (as tenure-track assistant professor in Italy) in the EU system.
The project advanced the state-of-the-art in different disciplines including computational biology, metagenomics, microbiology, and computer science. We developed new computational tools, generated data, and answered to relevant biological and biomedical questions. All the software and data generated in this action were made freely available.

We developed curatedMetagenomicData, the largest resource of accessible and curated metagenomic datasets, and StrainPhlAn, state-of-the-art tool for metagenomic strain-level analysis. We reconstructed, characterized, and released genomes from multiple projects. Of particular importance was the building of the largest genome catalogue of the human microbiome (Pasolli et al., Cell, 2019). We expanded the collection of microbial genomes associated with the human microbiome by more than doubling the current collections, in the process recovering hidden functional and phylogenetic diversity associated with global populations. Moreover, the metagenomic-assembly strategies employed here represented a scalable methodology for very large-scale integration of metagenomes that could be fruitfully applied to additional or non-human-associated metagenomes. The study’s results themselves emphasize the phylogenetic and functional diversity that remains to be captured from rare organisms, especially for sample types other than stool, global human populations, and varied lifestyles for the human microbiome. These results helped to pinpoint microbes unique to a particular population, environment, or exposure, and most importantly, future work may then be able to more easily capture specific strains or microbial molecular mechanisms that are causal in microbiome-associated human health conditions.
4,930 species-level genome bins assembled from 9,428 metagenomes (Pasolli et al., Cell, 2019)