Spectral and Optimization Techniques for Robust Recovery, Combinatorial Constructions, and Distributed Algorithms

Periodic Reporting for period 3 - SO-ReCoDi (Spectral and Optimization Techniques for Robust Recovery, Combinatorial Constructions, and Distributed Algorithms)

Periodo di rendicontazione: 2022-09-01 al 2024-02-29

This project aims to discover new methodologies to model and solve problems of a fundamental nature in unsupervised machine learning and distributed computing, and to build new bridges between pure mathematics and computer science. The project locates itself in the area of theoretical computer science, and its objectives are new insights, new algorithms and new analytical methods, backed by rigorous mathematical proofs.

More specifically, the project develops new ways to apply mathematical methods from linear algebra and from convex optimization to approach three families of application domains:

1) The solution of combinatorial problems arising in unsupervised machine learning in which one wants to discover structure in seemingly unstructured data, even in presence of outliers and noise. Here it is the "robustness" of the algorithm to outliers (incorrect data points) which is most challenging and interesting.

2) The computational construction of sparse approximations to networks and of other objects that have certain "pseudorandom" properties. This has been a fertile area of collaboration between pure mathematics and computer science, and, by the study of problems of this nature, powerful methods from pure mathematics have been transferred to computer science (and then spread out to achieve a major impact in other areas of computer science) and vice versa. Recent results in pure mathematics establish the existence of certain sparse approximations, but a corresponding algorithmic theory is still lacking.

3) The analysis of probabilistic processes in networks, motivated by distributed computing, computational social sciences, and network modelings of biological processes. Here the project aims to introduce novel modeling and analytical methods.

An innovation of the project is to treat these three very different application domains in a unified way, with the unification provided by the fact that similar methods from linear algebra and from convex optimization apply to all three. One of the aims of the project is to take inspiration from the way such methods are used in one domain to develop innovative ways to apply them to others.

We discuss progress along the three main aims of the project.

1) Convex optimization and spectral techniques for discrete problems:

The main theme in this direction has been the comparison of the power of different methodologies. Techniques from linear algebra, also called "spectral" techniques, are broadly used in practice, and have certain known limitations, especially when it comes to robustness to outliers. A much more robust methodology comes from the area of convex optimization, and is called semidefinite programming. It generalizes spectral techniques and it also generalizes a broadly used convex optimization problem called linear programming.

The PI and his collaborators showed how to apply semidefinite programming to a robust recovery problem in networks, improving a previous result based on spectral techniques. This work was presented at SODA 2020. The PI and his collaborators analyzed the worst-case integrality ratio of linear programming relaxations of the Maximum Cut problem. Their result showed that linear programming can achieve a quality of approximation previously thought possible only via spectral or semidefinite programming methods. This work appeared in FOCS 2020 and is perhaps the most significant contribution so far of this project.

2) Constructions of sparse approximations

The proposal conjectured the existence of certain new types of sparse approximations of networks and of higher-dimensional analogs of networks called hypergraphs. The proposal provided a hypothetical approach to such conjectures. Most of the conjectures were proved by the PI and his collaborators, and the results were presented by the PI at the FOCS 2019 conference. The most difficult conjecture, about the existence of "hyper sparsifiers with O(|V|) hyperedges", remains open.

The PI and his collaborators also studied the difference between two previously studied definitions of approximation for sparse approximations of networks, known respectively as "cut sparsifiers" and "spectral sparsifiers". They rigorously proved that one notion provides a worse trade-off between approximation and sparsity than the other. These results were presented at SODA 2022.

3) Distributed algorithms

The PI and his collaborators analyzed a completely decentralized process that sparsifies a dense network to create a network in which every node has a bounded constant number of connections, while preserving the good connectivity of the overall network. The result is that a simple protocol, which is similar to how virtual networks are created in peer-to-peer protocols, achieves remarkable performance. This result was presented at SODA 2020.

The PI and his collaborators then focused on the question that currently remains their main concern in this direction: is there a methodology analogous to spectral methods that can be applied to networks that change over time, to understand information diffusion? Some preliminary result, in which the authors analyze broadcast processes in networks where nodes are added and removed over time, and which use a notion of expansion for time-changing networks, appeared in ICDCS 2021.

It is always difficult to anticipate progress on theoretical problems, but the directions were significant further progress can be expected in the second half of the project are:

1) On the use of semidefinite programming to study tensors and hypergraphs, particularly in connection to problems of robust recovery.

Semidefinite programming is more powerful than other methods in dealing with outliers and in finding structure in seemingly unstructured data. Analyzing such algorithms is, however, rather difficult and relies each time on relatively ad-hoc analyses were we seem to discover, each time, a small piece of a bigger theory. The project aims to uncover such a larger theory. General-purpose algorithms for semidefinite programming are not efficient in practice, and the project aims to develop faster solvers for the specific types of semidefinite programs that arise in unuspervised learning applications.

2) On the construction of sparse approximations and other pseudorandom objects.

The PI has a conjectural path to proving the conjecture that there are "hyper-sparsifiers with O(|V|) hyperedges". This is a question that faces a similar type of difficulties to long-standing open problems in discrepancy theory and it could lead to significant new methodologies at the intersection of theoretical computer science and pure mathematics. Because of certain significant difficulties that would need to be overcome, this is the most high-risk-high-reward direction discussed in this section

3) On the study of networks that change over time

The PI and his collaborators are studying processes and notions in time-changing networks (sparsification, broadcast, diameter) that, in the static case,have spectral estimates. The goal will be to find analyses in the time-changing setting, to discover commonalities, and to discover general methods for the study of properties of time-changing processes and of processes that run on them.

A network that changes over time

Periodic Reporting for period 3 - SO-ReCoDi (Spectral and Optimization Techniques for Robust Recovery, Combinatorial Constructions, and Distributed Algorithms)

Condividi questa pagina

Scarica