Final Report Summary - COUNTERING LBA (Countering Confounding Heterogeneity in Phylogenetics through Non-Parametric Analyses of Quartet Split Patterns)
The main objective of this project is to develop and evaluate new divide and conquer tree reconstruction algorithms based on specific split patterns (relationships supported by site-patterns of nucleotides or amino acids) and to consider the logics of phylogenetics to successfuly overcome LBA.
Based on this objective topic we developed a new, quartet-based algorithm, called PhyQuart, combining Hennigian logic and Maximum Likelihood (ML) estimation. PhyQuart considers two alternative directions of character evolution along the internal branch of a quartet tree to discern between potentially apomorphic and plesiomorphic split-supporting site-patterns, and ML to estimate the expected number of convergent split supporting site-patterns. This combination of Hennigian logic and ML estimation represents a completely new strategy for the evaluation of sequence data.
Through extensive quartet simulations, including cases with strong branch length differences, we could demonstrate the efficiency of our new approach in detecting phylogenetically informative and conflicting signals and compared its performance to ML alone when there is a small degree of model misspecification using 172,800 single quartet simulations. PhyQuart was successful in the majority of simulated cases even when internal branches were kept very short. The simulations show that the reconstruction success of ML decreases with increasing branch length differences even when there is only very minor model misspecification, whereas the performance of PhyQuart is only slightly affected by more extreme branch length conditions.
The PhyQuart algorithm is implemented in a command line driven software script (PENGUIN) that runs on Windows PCs, Mac OS and Linux operating systems and can be easily implemented into automatic process pipelines. PENGUIN writes information on split support for each possible quartet relationship between four taxa or clans in plain TEXT files. Obtained discrepancies in topological split support of the three possible quartet topologies of a set of four clans are also presented as split network and triangle graphs. A further vector network shows the distribution of best, second best, and third best resolved quartet trees. The software script as well as the corresponding manual and example files can be downloaded from https://github.com/PatrickKueck/Penguin.
In a further study we show that PhyQuart allows the analysis of all quartets of taxa in larger trees, or of defined quartets of mutiple-taxon clans, and therefore provides a new tool for evaluating contradicting signals which can be used to assess the robustness of relationships within a more complex tree. Based on comprehensive simulation and empirical tests, PhyQuart identifies signal in the data where ML is misled by substitution rate heterogeneities. In some simulation cases even showing high support for the correct clan relationship whereas ML fails due to LBA. It can be stated from our performance tests that the higher the PhyQuart observed contradicting signal for possible clan relationships, the more suspicious is the reliability and branch support for a resolved tree or a given a priori assumption. Regardless if defined clans are based on an already reconstructed tree (a posteriori) or by a priori assumptions the PENGUIN software allows the analysis of all quartets for multiple-taxon clans and provides a new tool for evaluating contradicting signals which can be used to assess the robustness of a given hypotheses or of relationships within a more complex tree.
A new supertree algorithm based on single quartet split support values of the new developed PhyQuart algorithm has been developed and implemented in a new software environment called 4BaSAl. The software is command line driven and written in JULIA. Starting from a triplet of sequences, 4BaSAl analysis a set of quartet trees each of which is assumed, on the basis of the analysis, to be the best candidate for the true tree. Beside the reconstruction of complete trees, 4BaSAl can further be used as evaluation tool to analyse the split robustness of internal branch relationships or to identify and re-analyse only suspicious long branched and therefore potentially unreliable taxon relationships of given topologies. 4Basal has been comprehensively tested on simulated 5-, 6-, and 8-taxa tree simulations. Given first results, 4BaSAl is more efficient in finding correct long branch relationships with the PhyQuart split algorithm as with ML. Additional extensive testing and publication of the approach and its performance results is planed in the second half of 2017.