Deep Learning Theory: Geometric Analysis of Capacity, Optimization, and Generalization for Improving Learning in Deep Neural Networks

Informations projet

DLT

N° de convention de subvention: 757983

Site Web du projet

DOI

10.3030/757983

Projet clôturé

Date de signature de la CE 26 Avril 2018

Date de début 1 Juillet 2018

Date de fin 31 Decembre 2023

Financé au titre de

EXCELLENT SCIENCE - European Research Council (ERC)

Coût total

€ 1 500 000,00

Contribution de l’UE

€ 1 500 000,00

1 500 000,00

Coordonné par

MAX-PLANCK-GESELLSCHAFT ZUR FORDERUNG DER WISSENSCHAFTEN EV
Germany

Periodic Reporting for period 4 - DLT (Deep Learning Theory: Geometric Analysis of Capacity, Optimization, and Generalization for Improving Learning in Deep Neural Networks)

Période du rapport: 2023-01-01 au 2023-12-31

Deep Learning is one of the most promising approaches to Artificial Intelligence and one of the most active areas of contemporary machine learning research. It is having a transformative impact on science and technology, and is quickly permeating our daily lives. This project proposed to develop mathematical theory for deep learning, observing this is critical in making these machine learning methods more broadly applicable, efficient, interpretable, safe, and reliable. The project established mathematical results clarifying the interplay between the representational power of artificial neural networks as parametric sets of hypotheses, the properties and consequences of the parameter optimization procedures that are employed to select a hypothesis based on data, and the performance of trained neural networks at test time on new data. We pursued a geometric approach drawing on innovative methods from information geometry and algebraic statistics. The achieved results contribute to a better quantitative understanding of when and why neural networks can be expected to succeed in practice, in turn facilitating the development of better algorithms and overcoming previous limitations.

(fig1.png) A mathematical theory of deep learning aims to quantify the relationships between three key elements in learning with neural networks: a) The representational power and the approximation errors of artificial neural networks as parametric sets of hypotheses, b) the properties and consequences of the training methods or optimization procedures, which are used to select a hypothesis based on training data, and c) the performance of the trained neural networks at test time on new data, i.e. their generalization performance.

(fig2.png) An artificial neural network is a composition of simple parametric functions (neurons), which together can map complex relationships. The top row illustrates how the input values (here pixel locations x colored with a picture C(x) of Max Planck) are mapped by one layer φ1 or two layers φ2 ∘ φ1 of neurons into output values (new pixel locations). The lower row illustrates how the input space is broken into regions in which the function is linear. Such geometric-combinatorial decompositions can be used to investigate Important properties of the networks (e.g. possible advantages of different architectures) and the trained functions (e.g. decision boundaries or smoothness).

With the project we established the Mathematical Machine Learning research group at the Max Planck Institute MiS. In the course of the project, our team involved 3 doctoral students (all continuing to reputable postdoctoral positions) and 5 postdoctoral researchers (2 continuing to faculty positions and 3 to postdoctoral or researcher positions), in addition to 10+ visiting researchers and interns.

We published 50+ articles, including 20 at the machine learning conferences ICML, ICLR, NeurIPS, 12 at conferences such as ISIT, Allerton, MSML, GSI, and 20 in journals such as SIAM SIAGA, JMLR, Information Geometry, FoCM, or in books such as Mathematical Aspects of Deep Learning. We presented this research in 100+ invited talks at workshops, conferences, and seminars, including 7 keynotes and plenary lectures, in addition to 50+ presentations at workshops, conferences, and general public outreach events. The achieved results have served as the basis for several subsequent research endeavours by us and others in theoretical deep learning, particularly such highlighting geometric and combinatorial aspects of learning with neural networks.

Within this project we created multiple platforms for research, training, and dissemination, including in particular the Deep Learning Theory meeting and the Mathematical Machine Learning Seminar, which has hosted 150+ talks in the reporting period. Among the organized events we may highlight the Deep Learning Theory kickoff workshop in early 2019 and the co-organized Mathematics of Machine Learning Conference at ZiF in 2021, in addition to further conference sessions and collaboration programs. The project has had a significant synergistic footprint particularly through the close interface we maintained with the Math Machine Learning group at UCLA, the co-creation of the Math of Data Initiative at MPI MiS, and the interface with other machine learning research stakeholders such as the DFG priority programme 2298 on Theoretical Foundations of Deep Learning and the School of Embedded Composite Artificial Intelligence in Leipzig and Dresden.

Representational power of neural networks. A fundamental question is how the choice of a neural network architecture affects the types of functions that can be represented by the neural network. We have pioneered combinatorial-geometric approaches leading to concrete demonstrations of the differences and possible benefits between different network architectures with multiple layers of computation. In this project we advanced this theory to address not only richer classes of architectures and to obtain tighter quantitative descriptions of the complexity of the functions represented by different neural networks, but also to describe the effects of particular choices of parameters and formulate effective parameter initialization strategies.

Optimization theory for neural networks. Training neural networks involves non-convex optimization problems and practical methodologies for which a theoretical footing has been elusive. In this project we obtained a series of results illuminating the interplay between training data, network architectures, parameter optimization, and capacity control in neural networks. These provide theoretical explanation for the success of some of these methodologies and capture nuanced characteristics of the optimization dynamics in training neural networks.

Regularization in neural networks. One of the puzzles in deep learning is why overparametrized networks may overfit to the training data and yet they perform well at test time. A possible explanation is that the training procedures are biased towards solutions with good properties. In this project we obtained results describing the biases of gradient descent training of neural networks depending on various key factors, including the training time and the parameter initialization. For a wide variety of network architectures we further obtained quantitative descriptions of spectral biases, or how a learning algorithm implicitly decomposes a learning problem into several components which are learned at different rates.

A mathematical theory of deep learning

Functions represented by a neural network

Periodic Reporting for period 4 - DLT (Deep Learning Theory: Geometric Analysis of Capacity, Optimization, and Generalization for Improving Learning in Deep Neural Networks)

Partager cette page Partager cette page sur les réseaux sociaux

Télécharger Télécharger le contenu de la page