Skip to main content

Reconstructing a dated tree of life using phylogenetic incongruence

Periodic Reporting for period 3 - GENECLOCKS (Reconstructing a dated tree of life using phylogenetic incongruence)

Reporting period: 2020-07-01 to 2021-12-31

With the advent of genome-scale sequencing, molecular phylogeny, which reconstructs gene trees from related sequences, has reached an impasse. Instead of answering open questions, new genomes have reignited old debates. The problem is clear: gene trees are not species trees, but each is the unique result of a series of evolutionary events. If, however, we model these differences in the context of a common species tree, we can access a wealth of information on the pattern and process of evolution that is not available to traditional methods. For example, as horizontal gene transfer (HGT) can only occur between coexisting species, HGTs provide information on the order of speciations. When HGT is rare, lineage sorting can generate disagreements between gene trees. Under these conditions, we can formulate the dating problem in terms of meaningful parameters (such as population size), that are informative on the rate of evolution, which is invaluable to molecular dating.

The first goal of the GENECLOCKS project is to develop methods. Methods that systematically extract information on the pattern and timing of genomic evolution by explaining differences between gene trees. These methods will allow us to, for the first time, reconstruct a dated tree of life from genome-scale data. We use parallel programming and computer science algorithms to maximize the number of genomes analyzed.

The second goal of the GENECLOCKS project is to apply these methods to open problems.
We have developed and applied probabilistic algorithms that can compare DNA sequences from different species and, for example, assess how likely it is that the differences between them are the result of gene transfers, duplications or deletions. Finding transfers is a particularly useful exercise because it logically reconstructs what family relationships are possible: For any given transfer, the ancestors of the gene donor must be older than the descendants of the recipient. By pinpointing many such transfers we can sort out the relative order of speciation involved.

In our 2018 publication "Gene Transfers can date the Tree of Life" we applied our method to thousands of gene families from a diverse set of organisms: 40 species of the oxygen-producing photosynthetic bacteria called cyanobacteria, 60 species of single-celled microorganisms called archaea, and 60 species of fungi.

For more details download our publication from here: and see the article "Chronological Clues to Life’s Early History Lurk in Gene Transfers" describing it in Quanata Magazine:

As shown in the attached figure, one of the dates we were able to better resolve was the emergence of the Asgard group of Archaea, which we believe gave rise to the Eukaryotes from pine trees to human beings.

We also published two important papers in collaboration with Tom Willimas at the University of Bristol that help to resolve the evolutionary history of Archeae and the position of Eukaryotes among them. In our paper published in PNAS ( ) we applied a new approach that harnesses the information in patterns of gene family evolution to find the root of the archaeal tree and to resolve the metabolism of the earliest archaeal cells, which lived over 3 billion years age. Our approach robustly distinguished between published rooting hypotheses, suggested that the first Archaea were anaerobes that may have fixed carbon via the Wood–Ljungdahl pathway, and quantifies the cumulative impact of the horizontal transfer on archaeal genome evolution. In a second paper ( we showed that eukaryotes consistently originate from within the archaea in a two-domains tree when due consideration is given to the fit between model and data. Our analyses support a close relationship between eukaryotes and Asgard archaea and identify the Heimdallarchaeota as the current best candidate for the closest archaeal relatives of the eukaryotic nuclear lineage.

We also developed and made available GeneRax ( a parallel tool for species tree-aware maximum likelihood-based gene tree inference under gene duplication, transfer, and loss. It infers gene trees from their aligned sequences, the mapping between genes and species, and a rooted updated species tree.
Develop and use novel methods that extract bona fide conflict among gene trees and interpret them in biological terms. Use these methods to exploit this novel source of information and hence offer a great hope to resolve issues that have been left pending by traditional methods.