Reliable and cost-effective large scale machine learning

Informations projet

REAL

N° de convention de subvention: 947908

DOI

10.3030/947908

Date de signature de la CE 29 Septembre 2020

Date de début 1 Avril 2021

Date de fin 31 Mars 2027

Financé au titre de

EXCELLENT SCIENCE - European Research Council (ERC)

Coût total

€ 1 498 830,00

Contribution de l’UE

€ 1 498 830,00

1 498 830,00

Coordonné par

UNIVERSITA COMMERCIALE LUIGI BOCCONI
Italy

Periodic Reporting for period 3 - REAL (Reliable and cost-effective large scale machine learning)

Période du rapport: 2024-04-01 au 2024-09-30

- What is the problem/issue being addressed:

Current machine learning is not suitable for the new scenario, both from a theoretical and a practical viewpoint: (a) the lack of cost-effectiveness of the algorithms impacts directly the economic/energetic costs of large scale ML; (b) the lack of reliability of the predictions affects critically the safety of the systems where ML is employed.

- Why is it important for society?

To make machine learning really useful from a social and scientific viewpoint, we must be able to guarantee -by construction- that the algorithms require a minimal cost to perform the task they are asked for and that the result they provide is reliable, e.g. with clear confidence intervals on the predictions plus guarantees on their validity.

- What are the overall objectives?

To deal with the challenges posed by the new scenario, REAL will lay the foundations of a solid theoretical and algorithmic framework for reliable and cost-effective large-scale machine learning on modern computational architectures.

Works in the domain of supervised learning with statistics and computations beyond worst-case analysis:
To set goals in designing optimal learning algorithms, the first step is determining theoretically which is the maximum accuracy and reliability achievable in a learning problem. The challenge in this domain is to devise tight statistical/computational bounds and the resulting trade-offs on accuracy and reliability accounting for a wide set of ML problems, the high dimensionality/structure of the data, and more recent learning scenarios.
In the context of WP1, I worked with the Ph.D. student Gaspard Beugnot and Rémi Jézéquel and other collaborators, to analyze the statistical and computational behavior of important classes of supervised learning problems, with the goal of obtaining guarantees that are usable in practice, i.e. beyond worst-case analysis.

* "Beyond Tikhonov: Faster Learning with Self-Concordant Losses via Iterative Regularization" Gaspard Beugnot, Julien Mairal, Alessandro Rudi. NeurIPS 2021. (December 2021).
In this work, (a) we show how to achieve a fast statistical convergence in terms of learning rates for a wide class of supervised learning problems based on generalized self-concordant losses (b) we go beyond the worst-case rate by using a more advanced regularization technique, beyond Tikhonov regularization.
This approach could lead to a new class of learning algorithms with guarantees and reduced computational complexity.

* "Mixability made efficient: Fast online multiclass logistic regression", Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi. NeurIPS 2021. (December 2021).
In the same spirit, we devise here a new approach that allows achieving optimal rates and, at the same time, reduced computational complexity, for the problem of multiclass logistic regression, in the more difficult context of online learning, where it is not possible to do the standard assumptions of the statistical i.i.d. setting.

Works intending to extend results from supervised learning to structured prediction settings:
Instead of reinventing ad-hoc theories and algorithms for each different type of data and learning problem, the challenge here is to derive a unified framework for structured prediction problems, which is reliable and cost-effective. In particular, the approach taken here, is to provide for structured prediction problems the same reliability guarantees, generality, and cost-effective algorithms that we achieved for standard supervised learning and in particular for the results achieved in the rest of the project.
In this context, I worked with the PhD student Vivien Cabannes, and collaborators, to extend to structured prediction, existing refined results that are known for standard supervised learning techniques. We did this in a theoretical framework that is purposely flexible enough that should be able to integrate easily the future results we will obtain in the domain of supervised learning with statistics and computations beyond worst-case analysis.

* "Fast Rates for Structured Prediction" Vivien Cabannes, Francis Bach, Alessandro Rudi. Colt 2021. (Aug 2021).
This work extends to the Structured Prediction setting, the fast rates beyond the worst-case scenario, that have been derived in the case of supervised learning in the context of single output prediction (i.e. w.r.t. functions whose output is a real number).

* "Disambiguation of Weak Supervision leading to Exponential Convergence rates" Vivien Cabannes, Francis Bach, Alessandro Rudi. ICML 2021. (Jul 2021).
In this work, we study a very important subcase of structured prediction, i.e. "weak supervision" providing an analysis that goes beyond the worst case and allows to derive exponential convergence rates for suitable algorithms.

Progresses in the domain of supervised learning with statistics and computations beyond worst-case analysis:
With the PhD students Gaspard Beugnot, Rémi Jézéquel, and other collaborators, we analyzed the statistical and computational behavior of important classes of supervised learning problems.
The prototypical case that is not yet well understood from a statistical and computational viewpoint is (multiclass)-logistic regression, beyond worst-case guarantees.
With the works highlighted above, we shed light on which are suitable regularity conditions and algorithmic schemes that allow achieving fast learning rates with reduced computational complexity.
This work is at the intersection of convex optimization and advanced statistical learning theory. We expect that by generalizing our tools, in particular, including approximation and interpolation theory, and elements of non-convex optimization,
we will be able to generalize these kinds of results to encompass wider families of problems and real-world applications.

Progresses in the extension of results from supervised learning to structured prediction settings:
With the PhD student Vivien Cabannes and collaborators, we extended to structured prediction, existing refined results that are known for standard supervised learning techniques. The results, are interesting in themselves, but also because we posed there the first steps of a theory that generalizes the theory of "implicit embeddings" (see "A General Framework for Consistent Structured Prediction with Implicit Loss Embeddings", Ciliberto, Rosasco, Rudi, JMLR 2020), to take into account noise conditions a la Tsybakov, that are crucial to go beyond the worst-case scenario and characterize real-world problems as natural image classification.
The final goal is to have a unified theory for Structured prediction problem, that is as general as the theory for structured prediction and encompasses at the same time statistical and computational aspects. Concretely we expect to derive a theory that allows to transpose smoothly the results obtained in the previous work-packages for supervised learning, in the context of structured prediction.

Progresses other important aspects of the project:
We are working actively with the postdocs Boris Muzellec and Pierre-Cyril Aubin (paid on the ERC grant), to design cost-effective algorithms for supervised learning. In particular, the goal is to have at the same time optimal approximation properties, explicit uncertainty measures, and adaptability to hybrid architectures as multi-GPU systems.
We obtained very interesting preliminary results, that surprisingly apply also beyond the machine learning context, but in other applied mathematical contexts like probabilistic inference, (non-convex) optimization, optimal control, optimal transport. This could have far-reaching applications beyond the original goals of WP2 and WP3 (that are limited to the context of machine learning) and, currently, we are exploring it with the utmost attention.

Illustration of the improvement by using Logistic regression with Iterated Tikhonov.

Periodic Reporting for period 3 - REAL (Reliable and cost-effective large scale machine learning)

Télécharger Télécharger le contenu de la page