Skip to main content

Reconstruction Algorithms for Biological Networks

Periodic Report Summary 1 - BIONETRECON (Reconstruction Algorithms for Biological Networks)

Summary Report

Network models are of increasing importance in the biological sciences. Tree networks, also named Phylogenies, are fundamental in the study of evolution, Pedigrees play a key role in population genetics and linkage analysis and Cellular and Metabolic networks are studied in cellular biology. The reconstruction of such models is a major goal of the biological sciences, and many heuristic algorithms for reconstructing networks have been developed. The proposed project made important contribution in understanding the reconstruction of such networks.
Reconstructing Trees. The accurate reconstruction of phylogenies from short molecular sequences is an important problem in computational biology. Recent work has highlighted deep connections between sequence-length requirements for high-probability phylogeny reconstruction and the related problem of the estimation of ancestral sequences. Daskalakis et al.'09, building on the work of [Mossel'04], derive a tight sequence-length requirement was obtained for the simple CFN model of substitution, that is, the case of a two-state symmetric rate matrix $Q$. In particular the required sequence length for high-probability reconstruction was shown to undergo a sharp transition (from O(log n) to polynomial in the number of leaves n. In a joint work with Roch and Sly, we consider a more general evolutionary model, the called GTR model, where the defined by a reversible rate matrix Q. For this model, recent results of [Roch'09] show that the tree can be accurately reconstructed with sequences of length O(log(n)) when the branch lengths are below the Kesten-Stigum (KS) bound, up to which ancestral sequences can be accurately estimated using simple linear estimators. It is known that for the more general GTR models one has reconstruction above the Kesten Stigum bound. In our work we show that this phenomenon also holds for phylogenetic reconstruction (adapted from the paper’s abstract).
Co-Evolution and the Markov Property on Trees In a joint work with T. Tuller we showed that two basic assumption in molecular evolution are contradictory. The first assumption is that evolution of genetic information on species trees satisfies a Markovian property. The second assumption is co-evolution between interacting species.

General Work In Combinatorial Statistcs. MCMC is a very popular method for statistical analysis of high dimensional data. For many Markov chains used in practice, the convergence time of the chain is not known. Even in cases where the mixing time is known to be polynomial, the bounds are often too crude to be practical. This has led to the development of convergence diagnostics which are used by practitioners of MCMC to diagnose convergence. With Bhatnagar and Bogdanov, we study the computational complexity of testing convergence in the following settings and prove that the problem is computationally hard even in instances with strong guarantees on properties of the chain.