Skip to main content

A Theory for Understanding, Designing, and Training Deep Learning Systems

Periodic Reporting for period 2 - THUNDEEP (A Theory for Understanding, Designing, and Training Deep Learning Systems)

Reporting period: 2020-03-01 to 2021-08-31

Deep learning, in the form of artificial neural networks, is one of the most rapidly evolving fields in machine learning, with wide-ranging impact on real-world applications. Neural networks can efficiently represent complex predictors, and are nowadays routinely trained successfully. Unfortunately, our scientific understanding of neural networks is quite rudimentary. Most methods used to design and train these systems are based on rules-of-thumb and heuristics, and there is a drastic theory-practice gap in our understanding of why these systems actually work. We believe this poses a significant risk to the long-term health of the field, as well as an obstacle to widening the applicability of deep learning beyond that achieved with current methods. The goal of this project is to develop principled tools for understanding, designing, and training deep learning systems, based on rigorous theoretical results. This is a major challenge in this rapidly evolving field, and any progress along these lines is expected to have a substantial impact on the theory and practice of creating such systems. To do so, we focus on three inter-related sources of performance losses in neural network learning: The optimization error of neural networks (that is, how to train a given network in a computationally efficient manner); The estimation error (how to ensure that training a network on a finite training set will ensure good performance on future examples); and the approximation error (how architectural choices of the networks affect the type of functions they can compute).
We made significant progress on all objectives defined for the project, with over a dozen publications so far in top-tier venues. In the context of optimization error, we explored the effect of various network architecture on the training process along several fronts, as well as a detailed study of simple non-linear networks. In addition, we published several papers dealing with optimization in non-convex settings which include deep learning as a special case. These include the problem of finding stationary points in various settings, as well as improved sampling schemes for stochastic gradient methods. In the context of estimation error, we published several papers on the expressiveness of neural networks, including several results on the provable benefits of depth; A paper which is the first to theoretically analyze and validate a popular "Lottery Ticket" Hypothesis, and a paper on the limitations of random feature and kernel approaches for understanding the expressive and optimization abilities of neural networks. Finally, in the context of estimation error, we studied several questions on the intersection of generalization and optimization, providing both positive and negative results on how neural networks are able to learn and generalize when trained with simple gradient-based methods.
By the end of the project, we expect to transform our understanding of deep learning, and place it on a much firmer theoretical footing, a process already began by the results obtained so far. The different objectives we currently pursue are expected to coalesce, and result in new insights about how to design and train deep learning systems in practice.