Practically Relevant Theory of Deep Learning

Informations projet

TheoryDL

N° de convention de subvention: 676774

Site Web du projet

DOI

10.3030/676774

Projet clôturé

Date de signature de la CE 7 Decembre 2015

Date de début 1 Février 2016

Date de fin 31 Janvier 2021

Financé au titre de

EXCELLENT SCIENCE - European Research Council (ERC)

Coût total

€ 1 342 500,00

Contribution de l’UE

€ 1 342 500,00

1 342 500,00

Coordonné par

THE HEBREW UNIVERSITY OF JERUSALEM
Israel

Periodic Reporting for period 4 - TheoryDL (Practically Relevant Theory of Deep Learning)

Période du rapport: 2020-08-01 au 2021-01-31

"Since the proposal of this project has been written, the impact of Deep Learning on our everyday life has been greatly increased.
Translating text, searching and organizing images based on textual content, chat-bots, self-driving cars, are all examples of technologies which heavily rely on Deep Learning.
To the general audience, new technologies tend to look like a magic.
The unique situation in deep learning is that this technology looks like a magic even to data scientists.
The goal of the TheoryDL project is to demystify deep learning, by providing a deep (pun intended) theoretical understanding of this technology, and in particular, understanding its potential but also its limitations.
The significance of this goal is two folded. First, I believe that it is dangerous to rely on technology which we do not understand. Second, a better theoretical understanding should enable to improve existing algorithms. Of particular interest is to be able to come up with faster algorithms, which are not of brute-force nature. Current algorithms contain a lot of brute-force components, and therefore the ""power"" of using deep learning is focused around few industrial companies that have the data and computing resources. A better theoretical understanding may lead to a democratization of this technology.
"

We have tackled the problem from several angles and the findings are summarized in several publications (see the publications list).
At a high level, maybe the most important result is the coupling between deep learning and gradient-based algorithms, where we have shown that analyzing neural networks independently of the algorithm that is being used to train them is not the right approach.
On this theme, we have first performed a systematic study of failures of deep learning.
People tend to run and tell about success stories, but failures are even more interesting as laying down the boundaries of a technology enables to better understand why and when it works. We have identified cases in which gradient based training of deep learning fails miserably. Interestingly, the failures are neither due to overfitting/underfitting nor due to spurious local minima or a plethora of saddle points. They are rather due to more subtle issues such as insufficient information in the gradients or bad signal to noise ratios.
This direction led us to an important observation: that weight sharing is crucial for optimization of deep learning. We proved that without weight sharing, deep learning can only essentially learn low frequencies, but completely fails to learn mid and high frequencies. Weight sharing enables some sort of coarse-to-fine training.
From there, we were able to define generative hierarchical models for which provably efficient algorithms, that actually work in practice, exist.
We continued with a series of papers, where at the end we were able to establish the foundations of a general theory of deep learnability based on gradient-based algorithms using the language of statistical queries.

"We have published several results that go beyond the state of the art in the understanding of deep learning. Most recently:
- We identified the connection between approximation, depth separation and learnability in neural networks (Malach, Yehudai, S., Shamir, 2021)
- We have proved the well known ""lottery ticket hypothesis"", showing the pruning is all what you need for building a deep network (Malach, Yehudai, S., Shamir, 2021).
- We have derived Computational Separation Between Convolutional and Fully-Connected Networks (Malach & S., 2020)
- We have derived a general novel theory of deep learning connecting both hardness of approximation and hardness of learning through a new concept of the ""Variance"" of hypothesis classes (Malach & S., 2020)"

fractals.jpg

Periodic Reporting for period 4 - TheoryDL (Practically Relevant Theory of Deep Learning)

Partager cette page Partager cette page sur les réseaux sociaux

Télécharger Télécharger le contenu de la page