## Final Report Summary - LIFENET (New Algorithmic and Mathematical Tools to Construct a Net of Life)

Phylogenetics, the study of evolutionary relationships among groups of species, is a well-established interdisciplinary scientific field. Until today, phylogeneticists put a lot of effort into the precise reconstruction of evolutionary trees. However, in recent years, the research focus on reconstructing phylogenetic trees is turning towards the reconstruction of phylogenetic networks because not all processes of evolution can be properly represented by a tree. For example, a comparison of several trees that all represent a given set of present-day species and have been reconstructed for different genetic loci often reveals conflicting tree topologies. These discrepancies are not always due to missampling or uncertainty in the tree reconstruction method, but rather due to reticulation (e.g. lateral gene transfer, hybridization, or recombination).

Since phylogenetic networks are far more complex than phylogenetic trees, new mathematical and computational tools are needed. The aim of the project LifeNet is to develop new tools to reconstruct and analyze phylogenetic networks that are meaningful from a biological viewpoint and computationally feasible, and that will ultimately be used to analyze biological data sets. Particular emphasis is placed on a tool to quantify reticulation as well as the reconstruction of phylogenetic networks from sequence data by studying the parsimony problem on networks. Furthermore, due to the inherent complexity of phylogenetic networks, the first step in approaching many problems in this area of research is often to settle its computational complexity (i.e. investigating whether the problem at hand is solvable in a reasonable amount of time).

The following results have been established and published or submitted for publication in peer-reviewed international journals:

(1) We have developed the first characterization and a fixed-parameter algorithm to quantify hybridization in an arbitrary large set of phylogenetic trees by simultaneously embedding them into a phylogenetic network. Prior to our work, this approach was restricted to two input trees.

(2) We showed that tree reconstruction based on three-taxon trees is a statistically consistent estimator of the species tree under a random lateral gene transfer model if the expected number of lateral gene transfer events per gene is `not too high’. However, we also established a zone of inconsistency under rampant lateral gene transfer and when majority-rule three-taxon gene trees are used to reconstruct a species tree under lateral gene transfer.

(3) We showed that the two parsimony frameworks to reconstruct phylogenetic networks from sequence data (previously published by other research groups) produce most parsimonious networks that challenge the biological relevance of such networks. Furthermore, we have proposed a new definition of parsimony on networks that, we believe, has desirable properties and that factors additional biological knowledge into a parsimony analysis.

(4) We settled various complexity questions related to the study of phylogenetic networks. For example, we showed that counting the number of phylogenetic trees that are embedded in a phylogenetic network is a computationally hard problem. Moreover, we related the problem of approximating the minimum number of reticulation events that is necessary to simultaneously explain two phylogenetic trees to the graph-theoretic problem Directed Feedback Vertex Set that is one of the problems in Karp’s seminal 1972 list of 21 NP-complete problems. Since it remains to this day unknown whether a constant factor polynomial-time approximation exists for Directed Feedback Vertex Set, our result places the (in)approximability of the reticulation number in a much broader complexity context.

(5) We have taken a first step to generalize the popular nearest neighbor interchange (NNI) operation on phylogenetic trees to (rooted and unrooted) phylogenetic networks and described properties of this operation when applied to a relatively simple class of networks.

(6) We have analyzed the computational complexity of partitioning a set of rooted triplets, the atomic building blocks of larger evolutionary histories, into a smallest number of blocks such that the triplets of each block can simultaneously be explained by a single phylogenetic tree. Minimizing this smallest number turned out to be an NP-hard optimization problem. Moreover, we gave extremal results that show that the minimum number of blocks needed is, in general, not bounded, and established a connection to so-called ternary permutation constraint satisfaction problems that are of interests to the approximation, parameterized complexity, and algebra communities.

Additionally, LifeNet has established new and fostered existing collaborations among researchers in New Zealand, Germany, the US, and the Netherlands. Moreover, a PhD student and a Master student, who are currently jointly supervised by the LifeNet fellow and researchers at the University of Canterbury, New Zealand, work on projects related to LifeNet.

Since phylogenetic networks are far more complex than phylogenetic trees, new mathematical and computational tools are needed. The aim of the project LifeNet is to develop new tools to reconstruct and analyze phylogenetic networks that are meaningful from a biological viewpoint and computationally feasible, and that will ultimately be used to analyze biological data sets. Particular emphasis is placed on a tool to quantify reticulation as well as the reconstruction of phylogenetic networks from sequence data by studying the parsimony problem on networks. Furthermore, due to the inherent complexity of phylogenetic networks, the first step in approaching many problems in this area of research is often to settle its computational complexity (i.e. investigating whether the problem at hand is solvable in a reasonable amount of time).

The following results have been established and published or submitted for publication in peer-reviewed international journals:

(1) We have developed the first characterization and a fixed-parameter algorithm to quantify hybridization in an arbitrary large set of phylogenetic trees by simultaneously embedding them into a phylogenetic network. Prior to our work, this approach was restricted to two input trees.

(2) We showed that tree reconstruction based on three-taxon trees is a statistically consistent estimator of the species tree under a random lateral gene transfer model if the expected number of lateral gene transfer events per gene is `not too high’. However, we also established a zone of inconsistency under rampant lateral gene transfer and when majority-rule three-taxon gene trees are used to reconstruct a species tree under lateral gene transfer.

(3) We showed that the two parsimony frameworks to reconstruct phylogenetic networks from sequence data (previously published by other research groups) produce most parsimonious networks that challenge the biological relevance of such networks. Furthermore, we have proposed a new definition of parsimony on networks that, we believe, has desirable properties and that factors additional biological knowledge into a parsimony analysis.

(4) We settled various complexity questions related to the study of phylogenetic networks. For example, we showed that counting the number of phylogenetic trees that are embedded in a phylogenetic network is a computationally hard problem. Moreover, we related the problem of approximating the minimum number of reticulation events that is necessary to simultaneously explain two phylogenetic trees to the graph-theoretic problem Directed Feedback Vertex Set that is one of the problems in Karp’s seminal 1972 list of 21 NP-complete problems. Since it remains to this day unknown whether a constant factor polynomial-time approximation exists for Directed Feedback Vertex Set, our result places the (in)approximability of the reticulation number in a much broader complexity context.

(5) We have taken a first step to generalize the popular nearest neighbor interchange (NNI) operation on phylogenetic trees to (rooted and unrooted) phylogenetic networks and described properties of this operation when applied to a relatively simple class of networks.

(6) We have analyzed the computational complexity of partitioning a set of rooted triplets, the atomic building blocks of larger evolutionary histories, into a smallest number of blocks such that the triplets of each block can simultaneously be explained by a single phylogenetic tree. Minimizing this smallest number turned out to be an NP-hard optimization problem. Moreover, we gave extremal results that show that the minimum number of blocks needed is, in general, not bounded, and established a connection to so-called ternary permutation constraint satisfaction problems that are of interests to the approximation, parameterized complexity, and algebra communities.

Additionally, LifeNet has established new and fostered existing collaborations among researchers in New Zealand, Germany, the US, and the Netherlands. Moreover, a PhD student and a Master student, who are currently jointly supervised by the LifeNet fellow and researchers at the University of Canterbury, New Zealand, work on projects related to LifeNet.