Neural Network : An Overparametrization Perspective

Información del proyecto

NN-OVEROPT

Identificador del acuerdo de subvención: 101030817

Sitio web del proyecto

DOI

10.3030/101030817

Proyecto cerrado

Fecha de la firma de la CE 16 Abril 2021

Fecha de inicio 1 Noviembre 2021

Fecha de finalización 31 Octubre 2024

Financiado con arreglo a

SOCIETAL CHALLENGES - Europe In A Changing World - Inclusive, Innovative And Reflective Societies

Coste total

€ 257 619,84

Aportación de la UE

€ 257 619,84

257 619,84

Coordinado por

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET AUTOMATIQUE
France

Periodic Reporting for period 2 - NN-OVEROPT (Neural Network : An Overparametrization Perspective )

Período documentado: 2023-11-01 hasta 2024-10-31

The problem being addressed: We study the generalization behaviour of training with SGD on convex as well as non-convex models.

Importance for society: Our work does not have any direct consequence on society, however, our works make progress towards providing a theoretical foundation for modern machine learning systems. A better theoretical understanding of modern machine-learning systems would eventually lead to better algorithm design across various applications of machine-learning systems. That will result in more interpretable systems which can be further modified to develop bais free algorithms.

Overall Objective: In the works, our main goal is to understand optimization and generalization while training a machine learning model with gradient descent, stochastic gradient descent, and noisy gradient descent.

The project started by studying the explicit bias induced after injecting the noise into the model directly. This result has broad implications. We show that a large variety of explicit biases can be obtained by just simple noise injection in the model that was previously obtained by explicitly regularizing the objective. It also has implications for neural network training as it pushes the optimizer towards wider minima. In the later part of the work, we study the generalization behavior of noisy stochastic gradient descent under the presence of Gaussian and heavy-tailed noise. We obtain the first tight generalization bound for heavy-tailed
SGD for the least square objective. In the later work, we show that our result can be extended to a class of non-convex problems as well. We also study the optimal control formulation of mirror descent and mirror Langevin to show that in the case of the convex optimization task, mirror descent and mirror Langevin solved certain optimal control formulations. In my current and future work, my goal is to show the control between vanilla SGD and various SDEs that we consider in our previous works. This will allow us to directly apply our result to analyzing the generalization bound for SGD on a class of non-convex problems.

We make the following progress beyond the state of the art:

(i) We show that the effects of various complicated explicit regularizations can be simply obtained by noise injection in the model. We show that noise injection can help in explicitly making the solution sparse in the case of over-parametrized linear regression, minimizing the nuclear norm of the solution in the case of overparametrized matrix factorization and driving the solution towards the wider minima in the case of neural network training. We also demonstrate this in our experiment that our method consistently beat the vanilla gradient descent and vanilla stochastic gradient descent in training deep neural networks.
(ii) In the next series of works, we provided the first algorithmic stability-based generalization bound for heavy-tailed SGD on convex and non-convex functions. We also prove that the interaction between the tail decay coefficient and generalization behaviour is non-monotonic in nature and for some choice of tail decay coefficient, we get the best generalization behavior. The analysis is based on the results from the applied probability theory in the Levy process SDEs. These results were not known earlier in the literature. In the extension, we propose a unified theory to derive algorithmic stability bound for discrete-time Markov chains. The analysis is not driven by the theory of SDEs instead directly works for the discrete-time Markov chain which covers vanilla SGD.
(iii) In another work, we show a connection between optimal control and mirror descent as well as mirror langevin. We show that running mirror descent or mirror langevin on a class of convex problems directly solves an optimal control problem whose cost is associated with the loss function and its fenchel dual.
In the last phase of the fellowship, we would like to show that the iterates of vanilla sgd converge to an SDE that we study in our other works. We would also like to obtain direct results on training a multi-layer perceptron training with SGD in a teacher student setting and obtain a recovery guarantee for that. We would also try to extend our work on optimal control to the class of non-convex problem.
Economic Impact: No direct impact
Societal Impact: No direct impact

Contains the list of publication in different topics

Periodic Reporting for period 2 - NN-OVEROPT (Neural Network : An Overparametrization Perspective )

Compartir esta página Compartir esta página en las redes sociales

Descargar el PDF Descargar el contenido de la página

Neural Network : An Overparametrization Perspective

Periodic Reporting for period 2 - NN-OVEROPT (Neural Network : An Overparametrization Perspective )

Compartir esta página Compartir esta página en las redes sociales

Descargar el PDF Descargar el contenido de la página

Neural Network : An Overparametrization Perspective 

Periodic Reporting for period 2 - NN-OVEROPT (Neural Network : An Overparametrization Perspective )