Periodic Reporting for period 4 - SMILE (Statistical Mechanics of Learning)
Reporting period: 2022-03-01 to 2023-02-28
In this project, we use advanced methods of statistical mechanics to develop a theoretical understanding of deep neural networks and their behaviour. We develop simplified models where learning performance can be analyzed and predicted mathematically. The overall goal is to make these models as realistic as possible and capture an extensive range of the behaviour observed empirically in deep learning. Analyzing how the performance depends on various tunable parameters brings a theoretical understanding of the principles behind the empirical success of deep neural networks. The synergy between the theoretical statistical physics approach and scientific questions from machine learning enables a leap forward in our understanding of learning from data.
We advanced the SOTA significantly in terms of the rigorous establishment of the methods stemming from the physics of disordered systems. In particular, we proved that the replica method results for the optimal generalization error in the single-layer perceptron are exact and compared them thoroughly for a range of models to the best-known algorithmic performance.
We also advanced significantly in theoretical understanding of the performance of gradient descent algorithms in high-dimensional non-convex landscapes. We found a way to analyze their performance via dynamical mean-field theory and computed the exact signal-to-noise ratio needed for good performance. Interestingly we thus unveiled a region of parameters where spurious local minima exist and can trap the dynamics, but randomly initialized dynamics avoids them.
Looking at the interplay between the network architecture, the training algorithm and the structure of the data, we identified cases where overparametrization allows to reduce the number of samples that training algorithms need to achieve good performance. We showed how to generalize the analysis method to take into account the data structure, up to the extent that for simple neural networks, we can theoretically characterize the learning curves for realistic datasets.