Time-Data Trade-Offs in Resource-Constrained Information and Inference Systems

Periodic Reporting for period 4 - time-data (Time-Data Trade-Offs in Resource-Constrained Information and Inference Systems)

Reporting period: 2022-03-01 to 2023-02-28

Machine learning systems face a deluge of data in scientific and industrial applications under the promise of potentially huge technological and societal benefits. In this setting, our ERC project supports an emerging perspective: Data should be treated as a resource that can be traded off with other resources, such as running time. For data acquisition and communications, we have now shown related sampling, power, latency, and circuit area trade-offs in hardware. A detailed understanding of time-data and other analogous trade-offs, however, requires our interdisciplinary approach, which is taken by our ERC project (time-data).

Our goal of systematically understanding and expanding on this emerging perspective is ambitious, but promises potentially huge impacts within and beyond the data sciences. To this end, we will seek three closely interrelated research objectives:

1. We propose scalable and universal convex optimization algorithms that not only rigorously trade off their convergence rates with their per-iteration time for these templates, but also match the theoretical lower bounds on their runtime efficiency.

2. We rethink how we formulate learning problems by providing a new hierarchy of estimators and dimensionality reduction techniques for non-linear models, and characterizing their sample complexities and computational trade-offs.

3. We demonstrate our rigorous theory by applying it to real massive and complex data problems, such as super-resolved fluorescence microscopy to understand how cells respond to diseases, materials science in automating discovery, and neuroscience to develop energy efficient neural interfaces for bypassing spinal cord injuries.

WP1:
Task 1 studies the fundamental trade-offs in primal dual optimization, resulting in new algorithms that achieve unparalleled scaling. Our analysis relies on a novel combination of four ideas applied to the primal-dual gap function: smoothing, acceleration, homotopy, and coordinate descent with non-uniform sampling. As described in Tasks 2 and 4, we study different notions of smoothness and developed an extension of the famous mirror descent methods that only require differentiability. We then focused on the structure of the constraints as promised by Task 3 to also resolve an open problem in sampling via the Langevin dynamics. Furthermore, we developed a primal-dual method to handle infinitely many constraints.

WP2:
Task 1 develops a novel method for convex unconstrained optimization that, without any modifications, ensures: (i) accelerated convergence rate for smooth objectives, (ii) standard convergence rate in the general (non-smooth) setting, and (iii) standard convergence rate in the stochastic optimization setting. Task 2 derives a non-Euclidean optimization framework in the non-convex setting that takes nonlinear gradient steps on the matrix factorization problems. We also developed a conditional gradient framework for a composite convex minimization template with broad applications, including semidefinite programming. We have also developed new algorithms that can solve convex optimization problems in space required to specify the problem and its solution. For scalability questions in Task 4, we initially took a sketching based approach. As a result, we obtain optimal rates for regularized algorithms with randomized sketches, provided that the sketch dimension is proportional to the effective dimension up to a logarithmic factor. This particular result also relates to WP3 below.

WP3:
Task1 focuses on the fundamental effects of regularized learning formulations, where we investigate a class of spectral/regularized algorithms, including ridge regression, principal component regression, and gradient methods. We then obtained some new results for Tasks 2 and 3 along with non-linear models with group testing. In addition, we looked into performing maximizing an unknown function that has a particular graph structure that we can exploit. This approach enables us to exploit the underlying structures in non-linear and non-convex models and perform statistical inference tasks, even applying to general automatic machine learning tuning tasks.

WP4:
As promised in Task 1, we developed a sampling design framework for non-linear decoders and have managed to demonstrate it with real magnetic resonance imaging data along with statistical generalization guarantees. These results are the first of its kind, in handling discrete optimization problems with nearly submodular structure, not only in the deterministic setting but also handling the stochastic setting where there is a clear memory-performance trade-off. We have also obtained new sketching results that support solutions of semidefinite programming in small space.

WP5:
Task 1 develops a test bed for a super resolution imaging system and have obtained preliminary sampling results. For Task 2, we have managed to collaborate with the material scientists and managed to publish at a chemistry journal. For WP5, we have also obtained strong initial demonstration results for data sampling. In particular, our trained sampling operators provide great area, power, and performance trade-offs for neural signal acquisition as automating machine learning systems and have managed to collaborate with MRI engineers to demonstrate results with real data.

Estimation:
We develop scalable optimization methods that exploit dimensionality reduction, adaptivity, and stochastic approximation at their core to overcome bottlenecks in the machine learning pipeline. In particular, for semidefinite programs, we developed new algorithms that can solve convex optimization problems in space required to specify the problem and its solution. In this setting, we have obtained additional results, such as the first convergence result for composite convex minimization as well as the Frank-Wolfe method for non-Lipschitz objectives, the first stochastic forward Douglas-Rachford splitting framework, the first coordinate descent framework for three-composite minimization, non-Euclidean training methods for neural networks.

Decisions:
We also developed the first algorithm-independent lower bounds on the performance using information theory for Bayesian Optimization (BO), and closed a prominent gap in the literature on the optimality of the existing upper bounds. We have also studied BO with the added twist of robustness. We also studied another robustness twist in set function maximization, where and adversary removes our chosen set after we make them. We resolved an open problem, whereby allowing removals that are proportional to the size of our choices while retaining the approximation guarantees. We extended this result by focusing also on model mismatches. We also developed robust reinforcement learning approaches.

Theory:
We developed an information-theoretic framework for studying sparse model selection with general probabilistic models, thereby covering a broad range of important data models in a unified fashion -- from linear to non-linear, and from discrete to continuous. We are also developing learning-based approaches to exploiting arbitrary structure in data, by using training data to directly optimize sampling patterns specifically for the task at hand. By doing so, we seek to overcome a limitation of existing techniques for sampling structured signals.

Example application in WP5

Periodic Reporting for period 4 - time-data (Time-Data Trade-Offs in Resource-Constrained Information and Inference Systems)

Share this page

Download