European Commission logo
English English
CORDIS - EU research results
CORDIS

Reconstructing a dated tree of life using phylogenetic incongruence

Periodic Reporting for period 4 - GENECLOCKS (Reconstructing a dated tree of life using phylogenetic incongruence)

Reporting period: 2022-01-01 to 2023-06-30

The GENECLOCKS project addresses a significant open problem in molecular phylogenetics: the challenge of reconciling gene trees with species trees. This problem arises because gene trees, which are reconstructed from the sequences of related genes, are each a unique result of a series of evolutionary events, such as gene duplications, transfers, and losses, and do not directly correspond to the evolutionary history of species. This discrepancy has led to renewed debates instead of resolving existing questions in the field. The importance of this issue extends beyond academic interest; understanding the evolutionary relationships among species is crucial for various aspects of biology and medicine, including the study of disease, biodiversity conservation, and the development of new drugs and therapies.

The project's primary objective is to develop new methods that systematically extract information on the pattern and timing of genomic evolution by explaining the differences between gene trees that have evolved along a shared species tree. This innovative approach aims to reconstruct a dated tree of life using genome-scale data, a significant advancement over traditional methods. The project leverages parallel programming and computer science algorithms to analyze a vast number of genomes efficiently and access qualitatively new sources of phylogenetic and timing information.

Going beyond method development, the GENECLOCKS project seeks to apply these methods to unresolved questions, furthering our understanding of the pattern and process of evolution. This endeavour is vital for society as understanding evolutionary history is of fundamental importance to our understanding of life's diversity with profound implications for various scientific disciplines and practical applications.

The methodology we developed also has novel applications, reconciliation methods developed to explain how genes evolve in genomes over hundreds of millions of years have recently been applied to explain another type of coevolution: the evolution of the human gut microbiome. This application is particularly significant as it sheds light on the complex evolutionary dynamics between humans and their gut microbes, revealing insights into how this crucial host-symbiont system has adapted and evolved over time. Understanding these dynamics is essential for advancing our knowledge of human health and disease, as the gut microbiome plays a critical role in various physiological processes and has been linked to numerous health conditions.
We successfully developed and applied hierarchical probabilistic gene tree - species tree reconciliation algorithms that can compare the genetic sequences of different genes from different species and assess how likely it is that the differences between them are the result of gene transfers, duplications, or deletions. Finding transfers is particularly useful because it has the potential to reconstruct temporal relationships: For any given transfer, the ancestors of the gene donor must be older than the descendants of the recipient. By pinpointing many such transfers we can sort out the relative order of speciation involved.

Applying these methods, in collaboration with an international team of experts, including Tom Williams from the University of Bristol, Anja Spang from the Royal Netherlands Institute for Sea Research, and Phil Hugenholtz from the University of Queensland, we made significant strides in elucidating bacterial evolution.

We reconstructed and rooted the bacterial tree of life using information derived from gene duplications, losses, and transfers, without the need for an archaeal outgroup. Comparatively analyzing data from 11,272 gene families, we modeled both the vertical and horizontal components of bacterial evolution. We showed that the root of the bacterial tree lies between the Gracilicutes and Terrabacteria clades, with the Candidate Phyla Radiation (CPR) being part of the Terrabacteria. This indicates that the last common bacterial ancestor was a complex, motile cell with a double membrane and a CRISPR-Cas system. Despite extensive horizontal gene transfer, about two-thirds of gene transmissions in bacteria have been vertical, highlighting the importance of vertical evolution in understanding bacterial phylogeny and diversification. This understanding aids in exploring bacterial evolution, metabolic changes, and shifts in cell architecture. Our results, published in Science in 2021 in a since highly cited paper “A rooted phylogeny resolves early bacterial evolution” (https://www.science.org/doi/10.1126/science.abe0511) were also highlighted in the perspective: “Illuminating the first bacteria” (https://www.science.org/doi/10.1126/science.abh2814). See attached figure.

In collaboration with Iñaki Ruiz-Trillo's group at the Institut de Biologia Evolutiva (CSIC-UPF) in Barcelona we made substantial progress towards understanding the evolutionary origins of animals and fungi, published in the Nature article “Divergent genomic trajectories predate the origin of animals and fungi” (https://www.nature.com/articles/s41586-022-05110-4). Using gene tree species tree reconciliation methods, we sequenced four new protist opisthokont genomes and compared them, using the methods developed as part of the GENECLOCKS project, with extensive genomic data from a variety of animal and fungal species. This research enabled us to identify and analyze the distinct evolutionary trajectories that led to the emergence of animals and fungi, revealing a gradual functional change and net gain in metazoan genes, contrasted with net functional losses in fungi.

Most recently, we undertook to advance our understanding of the early evolution of microbial life with a focus on the emergence of aerobic metabolism. While microbial life has shaped Earth's biosphere since its inception but has left a scarce fossil record, making molecular dating difficult before the rise of more readily fossilizing multicellular life.

By combining machine learning and gene tree-species tree reconciliation, we identified ancestral transitions to aerobic lifestyles in Bacteria. Linking these transitions to the Great Oxidation Event (GOE), circa 2.33 billion years ago allowed us to ameliorate the lack of fossil calibrations that hinder the dating of prokaryotes, with the GOE providing a maximum constraint on aerobic lineages. The resulting geological timescale for bacterial evolution and oxygen adaptation demonstrated that bacterial phyla are ancestrally anaerobic, transitioning to an aerobic lifestyle only after the GOE, and that the earliest aerobes pre-dated the GOE possibly facilitating the evolution of oxygenic photosynthesis.
The GENECLOCKS project, with its ambitious objectives and innovative methodologies, aimed to advance beyond the state of the art in the field of evolutionary genomics by integrating evolutionary theory, molecular phylogenetics, and systematics. By its completion, the project has managed to progress beyond the state of the art not only in developing new phylogenetic methods but also in applying these across the diversity of life to better resolve deep evolution. Funding from the ERC has provided a dedicated team and necessary purpose-built computational resources, positioning us to make meaningful contributions to our understanding of the evolutionary processes that have shaped the diversity of life on our planet.
lbca.png
screenshot-2020-02-25-at-14-25-35.png