Skip to main content

Time-Data Trade-Offs in Resource-Constrained Information and Inference Systems

Periodic Reporting for period 2 - time-data (Time-Data Trade-Offs in Resource-Constrained Information and Inference Systems)

Reporting period: 2019-03-01 to 2020-08-31

Machine learning systems face a deluge of data in scientific and industrial applications under the promise of potentially huge technological and societal benefits. Massive data, however, presents a fundamental challenge to the back-end learning algorithms, which is captured by the following computational dogma: Running time of a learning algorithm increases with the size of the data. Since the available computational power is growing slowly relative to data sizes, large-scale problems of practical interest require increasingly more time to solve.

Our recent research has led us to mathematically demonstrate that this dogma is false in general, and supports an emerging perspective: Data should be treated as a resource that can be traded off with other resources, such as running time. For data acquisition and communications, we have now shown related sampling, power, latency, and circuit area trade-offs in hardware. A detailed understanding of time-data and other analogous trade-offs, however, requires our interdisciplinary approach, which is taken by our ERC project (time-data).

Our goal of systematically understanding and expanding on this emerging perspective is ambitious, but promises potentially huge impacts within and beyond the data sciences. To this end, we will seek three closely interrelated research objectives:

1. Fundamental trade-offs in convex optimization: This thrust proposes scalable and universal convex optimization algorithms that not only rigorously trade off their convergence rates with their per-iteration time for these templates, but also match the theoretical lower bounds on their runtime efficiency.

2. Theory and methods for information and computation trade-offs: This thrust rethinks how we formulate learning problems by providing a new hierarchy of estimators and dimensionality reduction techniques for non-linear models, and characterizing their sample complexities and computational trade-offs.

3. Time-data trade-offs in scientific discovery: This thrust demonstrates our rigorous theory by applying it to real massive and complex data problems, such as super-resolved fluorescence microscopy to understand how cells respond to diseases, materials science in automating discovery, and neuroscience to develop energy efficient neural interfaces for bypassing spinal cord injuries.
WP1:
Task 1 studies the fundamental trade-offs in primal dual optimization, resulting in new algorithms that achieve unparalleled scaling. Our analysis relies on a novel combination of four ideas applied to the primal-dual gap function: smoothing, acceleration, homotopy, and coordinate descent with non-uniform sampling. As described in Tasks 2 and 4, we study different notions of smoothness and developed an extension of the famous mirror descent methods that only require differentiability. We then focused on the structure of the constraints as promised by Task 3, in particular simplex constraints, to also resolve an open problem in sampling via the Langevin dynamics. Our optimization framework combines ideas from the Mirror Descent algorithm for optimization and the theory of Optimal Transport. Furthermore, we developed a primal-dual method to handle infinitely many constraints.

WP2:
Task 1 develops a novel method for convex unconstrained optimization that, without any modifications, ensures: (i) accelerated convergence rate for smooth objectives, (ii) standard convergence rate in the general (non-smooth) setting, and (iii) standard convergence rate in the stochastic optimization setting. Task 2 derives a non-Euclidean optimization framework in the non-convex setting that takes nonlinear gradient steps on the matrix factorization problems. We also developed a conditional gradient framework for a composite convex minimization template with broad applications, including semidefinite programming. We have also developed new algorithms that can solve convex optimization problems in space required to specify the problem and its solution. For scalability questions in Task 4, we initially took a sketching based approach. As a result, we obtain optimal rates for regularized algorithms with randomized sketches, provided that the sketch dimension is proportional to the effective dimension up to a logarithmic factor. This particular result also relates to WP3 below.

WP3:
Task1 focuses on the fundamental effects of regularized learning formulations, where we investigate a class of spectral/regularized algorithms, including ridge regression, principal component regression, and gradient methods. We then obtained some new results for Tasks 2 and 3 along with non-linear models with group testing. In addition, we looked into performing maximizing an unknown function that has a particular graph structure that we can exploit. This approach enables us to exploit the underlying structures in non-linear and non-convex models and perform statistical inference tasks, even applying to general automatic machine learning tuning tasks.

WP4:
As promised in Task 1, we developed a sampling design framework for non-linear decoders and have managed to demonstrate it with real magnetic resonance imaging data along with statistical generalization guarantees. These results are the first of its kind, in handling discrete optimization problems with nearly submodular structure, not only in the deterministic setting but also handling the stochastic setting where there is a clear memory-performance trade-off. We have also obtained new sketching results that support solutions of semidefinite programming in small space.

WP5:
Task 1 develops a test bed for a super resolution imaging system and have obtained preliminary sampling results. For Task 2, we have managed to collaborate with the material scientists and managed to publish at a chemistry journal. For WP5, we have also obtained strong initial demonstration results for data sampling. In particular, our trained sampling operators provide great area, power, and performance trade-offs for neural signal acquisition as automating machine learning systems and have managed to collaborate with MRI engineers to demonstrate results with real data.
Estimation:
We develop scalable optimization methods that exploit dimensionality reduction, adaptivity, and stochastic approximation at their core to overcome bottlenecks in the machine learning pipeline. In particular, for semidefinite programs, we developed new algorithms that can solve convex optimization problems in space required to specify the problem and its solution. In this setting, we have obtained additional results, such as the first convergence result for composite convex minimization as well as the Frank-Wolfe method for non-Lipschitz objectives, the first stochastic forward Douglas-Rachford splitting framework, the first coordinate descent framework for three-composite minimization, non-Euclidean training methods for neural networks.

Decisions:
We also developed the first algorithm-independent lower bounds on the performance using information theory for Bayesian Optimization (BO), and closed a prominent gap in the literature on the optimality of the existing upper bounds. We have also studied BO with the added twist of robustness. We also studied another robustness twist in set function maximization, where and adversary removes our chosen set after we make them. We resolved an open problem, whereby allowing removals that are proportional to the size of our choices while retaining the approximation guarantees. We extended this result by focusing also on model mismatches.

Theory:
We developed an information-theoretic framework for studying sparse model selection with general probabilistic models, thereby covering a broad range of important data models in a unified fashion -- from linear to non-linear, and from discrete to continuous. We are also developing learning-based approaches to exploiting arbitrary structure in data, by using training data to directly optimize sampling patterns specifically for the task at hand. By doing so, we seek to overcome a limitation of existing techniques for sampling structured signals.

Outlook:
Our preliminary research results extend to non-convex optimization via a novel Augmented Lagrangian Framework with optimal rates that can solve large scale SDP's, increase neural network robustness to adversarial attacks, and provide interpretability for decision making with non-linear models. We also take some strides in obtaining first of its kinds results with the Langevin dynamics, not only improving the existing rates but also providing the first-of-its-kind rates for the Wasserstein distance.

In particular, we would like to extend our homotopy conditional gradient framework to handle stochastic objectives to handle stochastic semidefinite programs. We would like expand on our primal-dual framework to tackle stochastic constraints that are streamed at us so that the constraints are never stored. We would like to also further develop the game-theoretic perspective using our optimization tools, and see if we can carve out key applications in neural networks, including generative adversarial networks (which is closely related to Objective 2).

We believe that expanding the sharp-operator as well as our primal-dual analysis to non-convex cases will also open additional avenues for research, including neural networks. In addition to keeping key practical questions in mind, such as robustness, we also would like to pursue further connections with optimization and sampling (i.e. Langevin dynamics) and seek applications in emerging topics, such as reinforcement learning.