Periodic Reporting for period 1 - Understanding DL (Understanding Deep Learning)
Reporting period: 2022-09-01 to 2025-02-28
Despite their practical successes, the theoretical understanding of neural networks and modern machine learning systems remains incomplete. Many questions about how and why these models generalize well, optimize efficiently, or resist certain failures are still the subject of active research. This work aims to contribute to the growing body of theoretical insights that seek to explain the principles governing learning systems. By grounding these technologies in rigorous mathematical frameworks, we hope to deepen our understanding of both their capabilities and their limitations, and to inform the development of more robust and reliable AI systems.
The first question concerns the ability of neural networks to find good solutions despite optimizing non-convex objective functions. We have written three papers on this topic. In the first, we showed that "most" neural networks are learnable in almost efficient time. By "almost," we mean a runtime of n^(ln^c(n)), which, while not polynomial, is still significantly faster than the worst-case exponential-time performance of general neural network algorithms. In another paper, we explored several fundamental limitations of neural network algorithms. While the main goal of our research is to understand why neural networks succeed, identifying their limitations is crucial for mapping the boundaries of what is theoretically achievable. A third paper demonstrated that presenting neural networks with carefully chosen "correct" examples can dramatically enhance their learning capabilities.
The second question this project addresses is the ability of neural networks to perform well on unseen, out-of-sample data. One well-studied approach to this problem is to show that networks with small-weight magnitudes tend to generalize better. Some of the leading theoretical results in this area apply only to sufficiently smooth functions. We extended these techniques to cover non-smooth functions, which are more representative of real-world applications and common in practice.
The third question concerns the ability of neural networks to learn so-called "deep" models. (The precise definition of a deep model is somewhat technical and omitted here.) Currently, very few deep models are known to be provably learnable. We contributed a basic example of such a family of models and plan to further investigate the capacity of neural network algorithms to learn these models in future work.