Understanding Deep Learning

Project Information

Understanding DL

Grant agreement ID: 101041711

DOI

10.3030/101041711

EC signature date 4 April 2022

Start date 1 September 2022

End date 31 August 2027

Funded under

European Research Council (ERC)

Total cost

€ 1 499 750,00

EU contribution

€ 1 499 750,00

1 499 750,00

Coordinated by

THE HEBREW UNIVERSITY OF JERUSALEM
Israel

Periodic Reporting for period 1 - Understanding DL (Understanding Deep Learning)

Reporting period: 2022-09-01 to 2025-02-28

In recent years, artificial intelligence (AI) has become one of the most transformative technologies across science, industry, and society. At the core of many AI systems lies machine learning, a framework that allows computers to learn from data and improve their performance on tasks without being explicitly programmed. Among the most powerful tools in this field are neural networks—computational models inspired by the brain’s architecture. These networks are trained on large datasets to detect patterns, make predictions, and perform tasks such as image recognition, language translation, or game playing, often at superhuman levels of performance.

Despite their practical successes, the theoretical understanding of neural networks and modern machine learning systems remains incomplete. Many questions about how and why these models generalize well, optimize efficiently, or resist certain failures are still the subject of active research. This work aims to contribute to the growing body of theoretical insights that seek to explain the principles governing learning systems. By grounding these technologies in rigorous mathematical frameworks, we hope to deepen our understanding of both their capabilities and their limitations, and to inform the development of more robust and reliable AI systems.

This project aims to address three fundamental questions at the heart of neural network learning theory. Below, we outline these questions and summarize the progress made on each within the scope of this work.

The first question concerns the ability of neural networks to find good solutions despite optimizing non-convex objective functions. We have written three papers on this topic. In the first, we showed that "most" neural networks are learnable in almost efficient time. By "almost," we mean a runtime of n^(ln^c(n)), which, while not polynomial, is still significantly faster than the worst-case exponential-time performance of general neural network algorithms. In another paper, we explored several fundamental limitations of neural network algorithms. While the main goal of our research is to understand why neural networks succeed, identifying their limitations is crucial for mapping the boundaries of what is theoretically achievable. A third paper demonstrated that presenting neural networks with carefully chosen "correct" examples can dramatically enhance their learning capabilities.

The second question this project addresses is the ability of neural networks to perform well on unseen, out-of-sample data. One well-studied approach to this problem is to show that networks with small-weight magnitudes tend to generalize better. Some of the leading theoretical results in this area apply only to sufficiently smooth functions. We extended these techniques to cover non-smooth functions, which are more representative of real-world applications and common in practice.

The third question concerns the ability of neural networks to learn so-called "deep" models. (The precise definition of a deep model is somewhat technical and omitted here.) Currently, very few deep models are known to be provably learnable. We contributed a basic example of such a family of models and plan to further investigate the capacity of neural network algorithms to learn these models in future work.

See above

Periodic Reporting for period 1 - Understanding DL (Understanding Deep Learning)

Share this page Share this page on social networks

Download Download the content of the page