Reconstructing microbial genomes from the environment
Microbe communities are everywhere, from the environment to our own bodies. Human and environmental microbiomes are diverse and play a range of key roles in human health and the functioning of healthy ecosystems. Thanks to the recent advent of metagenomics, an efficient and cost-effective DNA sequencing technology, our knowledge of these communities has grown significantly in recent years. In fact, the pace of microbiome research has accelerated (learn more in this recent episode of the CORDIScovery podcast, ‘The wonderful world of the gut microbiome’). Metagenome-assembled genomes (MAGs) reconstructed through these techniques are hugely valuable to furthering our understanding of diverse ecological niches of microbes, which could have a wide range of applications in biotechnology, medicine and even climate science. Yet the quality of reconstructed MAGs rests on a technique known as binning, where groups of nucleotide sequences from an organism are placed into bins depending on how often they appear across different samples. If there aren’t many samples of a particular organism, the grouping becomes more difficult and reconstruction is poor. In the EU-funded Metagenome binning project, undertaken with the support of the Marie Skłodowska-Curie Actions(opens in new window) programme, researchers aimed to address these challenges by developing a new algorithm to improve binning in situations where samples are scarce. The work will help scientists gain a better understanding of human and environmental microbiomes. “Improved binning leads to better reconstruction of high-quality microbial genomes from environmental samples, for example from the human gut,” says Yazhini Arangasamy(opens in new window), Marie-Curie postdoctoral fellow at the Max-Planck Institute for Multidisciplinary Sciences(opens in new window).
Creating a new metagenome binning algorithm
Through the project, Arangasamy developed a new metagenome binning algorithm to address the challenges and performed extensive binning benchmarking. The project drew on new deep learning techniques to create the new algorithm. The results demonstrate that deep learning tools using contrastive models represent the state of the art for metagenome binning. These models extract meaningful information by contrasting pairs of nucleotide sequences against each other. The researchers also found, importantly, that the choice of binning strategy has the greatest impact on recovering high-quality genomes of low-abundance microbes and similar strains. “The choice of binning strategy should depend on sequencing depth of samples, strain-complexity, and read coverage of genomes in the metagenome samples,” explains Arangasamy. A new tool developed in the project, MAGmax(opens in new window), improves the recovery of high-quality genomes by integrating bins from multiple samples.
Helping with the discovery of new microbial strains
Arangasamy and the team hope the results will benefit the wider scientific community by helping metagenome scientists optimise bioinformatics pipelines for large-scale microbiome studies. Ultimately this will lead to a better understanding of microbiomes and their links to human health and ecosystem functioning. “Complete genome information of metagenomes is crucial for understanding metabolic influence of microorganisms on human health and ecosystem functioning,” adds Arangasamy. “It enables the discovery of novel proteins from unculturable microbes, unique gene clusters, and new strains with potential for biotechnological intervention.” Arangasamy will now apply lessons learned in the project to generate high-quality genomes from human metagenome samples, discover new microbial proteins and investigate their functions using wet-lab experiments.