Periodic Reporting for period 2 - NonnegativeRank (Geometry of Nonnegative Rank)
Reporting period: 2018-09-01 to 2019-08-31
One direction in the study of nonnegative rank is motivated by linear relaxations for hard combinatorial optimization problems. Nonnegative rank of the slack matrix of the feasible region of the relaxation measures the complexity of the linear relaxation and hence provides lower bounds for the complexity of the original optimization problem. In joint article with Per Austrin and Petteri Kaski, we study another restricted model of computation which is given by tensor networks for evaluating multilinear maps. We show that this model of computation captures the best algorithms for several problems, such as matrix multiplication, discrete Fourier transform, (3t)-clique counting and computing the permanent of a matrix. For counting homomorphisms of a general pattern graph P into a host graph on n vertices we obtain an upper bound that essentially matches the bound for counting cliques and yields small improvements over previous algorithms for many choices of P. There results are published in Proceedings of 10th Innovations in Theoretical Computer Science Conference.
In joint article with Elizabeth Allman, Hector Banos Cervantes, Robin Evans, Serkan Hosten, Daniel Lemke, John Rhodes and Piotr Zwiernik, we study binary tensors of nonnegative rank at most two and three. We give the boundary stratification of binary tensors of nonnegative rank at most two. For small tensors, we show that this stratification can be used for exact computation of maximum likelihood estimates (MLEs). This method guarantees finding the MLE whereas the EM algorithm does not. We show how the EM fixed point ideal provides an alternative method for obtaining the boundary decomposition and for computing MLEs. These results are published in Journal of Algebraic Statistics.
In the joint paper with Anastasiya Belyaeva, Lawrence J. Sun and Caroline Uhler, we study the problem of reconstructing the 3D organization of the genome from such whole-genome contact frequencies. We prove that the 3D organization of the DNA is not identifiable from pairwise distance measurements derived from Hi-C for diploid organisms. In fact, there are infinitely many solutions even in the noise-free setting. We then discuss various additional biologically relevant constraints and prove identifiability under these conditions. Finally, we provide SDP formulations for computing the 3D embedding of the DNA with these additional constraints and show that we can recover the true 3D embedding with high accuracy also from both noiseless and noisy measurements. These formulations minimize the trace of a Gram matrix as an approximation of rank minimization and finally use eigendecomposition to get a rank three approximation of the Gram matrix. These results are in the last stages of preparation and will be submitted in the near future.
In joint article with Carlos Amendola and Dimitra Kosta, we study the maximum likelihood estimation problem for several classes of toric Fano models. We start by exploring the maximum likelihood degree for all 2-dimensional Gorenstein toric Fano varieties. We then explore the reasons for the ML degree drop using A-discriminants and intersection theory. Finally, we show that toric Fano varieties associated to 3-valent phylogenetic trees have ML degree one and provide a formula for the maximum likelihood estimate. We prove it as a corollary to a more general result about the multiplicativity of ML degrees of codimension zero toric fiber products, and it also follows from a connection to a recent result about staged trees. These results appear in a preprint on arXiv and are submitted for publication.
These results have been presented at international seminars, workshops and conferences.
It is very difficult to prove lower bounds on complexity for canonical NP-complete problems and models of computation provide an alternative for proving lower bounds in restrictive settings. Two well-known examples are linear and semidefinite models of computation that have nonnegative and positive semidefinite rank as the measures of complexity. Tensor networks provide another model of computation that is general enough to capture best algorithms for several problems.
As the spatial organization of the DNA plays an important role for gene regulation, DNA replication, and genomic integrity, we hope that our theoretical results will have an impact on 3D genome reconstruction of diploid organisms from whole-genome contact frequencies by biologists.
The multiplicative behavior of maximum likelihood estimation on codimension-0 toric fiber products generalizes similar results for undirected graphical models and staged trees.