Skip to main content

Data-Driven Methods for Modelling and Optimizing the Empirical Performance of Deep Neural Networks

Periodic Reporting for period 2 - BeyondBlackbox (Data-Driven Methods for Modelling and Optimizing the Empirical Performance of Deep Neural Networks)

Reporting period: 2018-07-01 to 2019-12-31

Deep neural networks (DNNs) have led to dramatic improvements of the state-of-the-art for many important classification problems, such as object recognition from images or speech recognition from audio data.
They therefore present a key technology for further economic growth. Yet, this key technology is still hard to use, e.g. for small and medium enterprises, due to its sensitivity on good hyperparameter settings and selections of neural architectures.
This ERC grant aims to change this by making deep learning much easier to use by means of automated machine learning (AutoML): automated methods for adjusting the learning method to avoid the need for manual tuning (and thus the need for expensive and often unavailable deep learning experts).

A key aspect in this work is efficiency: while previous *blackbox* hyperparameter optimization and neural architecture search methods are slow, this project goes *beyond the black box* to substantially speed up these processes.
Specifically, the project aims to develop efficient multi-fidelity Bayesian optimization approaches that reason across datasets, across training epochs of neural networks, and across subsets of large datasets, in order to enable much faster hyperparameter optimization and neural architecture search.
"The action has 3 method goals (WP1-3) and 1 application goal (WP4), and in the project's first half we made progress on all of these.

The main methodology for WP 1-2 and WP 4 is based on efficient Bayesian optimization, whereas WP 3 concerns more reinforcement learning and stochastic optimization of neural networks. In the first half of the project, we focussed our efforts on efficient Bayesian optimization (relevant WPs: 1,2,4) as well as stochastic optimization of neural networks (WP 3); we also started the work on reinforcement learning (WP 3) and plan to focus more on this in the second half of the project. Also, since the reviewers of the initial proposal asked prominently about optimizing architectures of neural networks, we also focused on this.

In terms of efficient Bayesian optimization, we focused (a) on modelling performance and (b) exploiting these models for faster optimization.
In terms of performance modelling, we developed the first methods for extrapolating learning curves across hyperparameters (Klein et al, ""Learning curve prediction with Bayesian neural networks"" and Gargiani et al, ""Probabilistic Rollouts for Learning Curve Extrapolation Across Hyperparameter Settings""). We also developed the first methods for predicting performance across datasets (van Rijn & Hutter, ""Hyperparameter Importance Across Datasets""), created a dataset to model performance across image resolution (Chrabaszcz et al, ""A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets"") and a dataset of datasets in order to allow learning across datasets (Bischl et al, ""OpenML Benchmarking Suites and the OpenML100""). We also studied how to integrate uncertainty into deep neural networks (Ilg et al, ""Uncertainty Estimates and Multi-hypotheses Networks for Optical Flow"").
In terms of exploiting models for faster optimization, we introduced the first method for fast Bayesian optimization for large datasets (Klein et al, ""Fast Bayesian hyperparameter optimization on large datasets"") and the currently most efficient and robust general hyperparameter optimization approach, BOHB (Falkner et al, ""BOHB: Robust and Efficient Hyperparameter Optimization at Scale""). We also studied fast optimization techniques for joint hyperparameter optimization and neural architecture search (Zela et al, ""Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search""), as well as pure neural architecture search (Ying et al, ""NAS-Bench-101: Towards Reproducible Neural Architecture Search"" and Elsken et al, ""Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution""). We also improved parallel Bayesian optimization by a faster optimization of the acquisition function (Wilson et al, ""The reparameterization trick for acquisition functions"").

In terms of stochastic optimization, we developed a new weight decay method that leads to substantially better generalization performance and also showed that adaptive gradient algorithms perform much better with a good learning rate schedule (Loshchilov et al, ""Decoupled Weight Decay Regularization""). We also developed automated hyperparameter optimization methods for policy gradient reinforcement learning algorithms to allow their usage out of the box. Even though we performed this in the context of an application of reinforcement learning to a different domain (Runge et al, ""Learning to Design RNA""), we can now also apply reinforcement learning in an automated fashion on other problems, with much more robust results.

In terms of open-source software packages, we developed three main ones:
- Auto-Pytorch, a package for automated deep learning (Mendoza et al, ""Auto-Net: Towards Automatically-Tuned Neural Networks"",
- Hpbandster, a package for multi-fidelity Bayesian optimization (Falkner et al, ""BOHB: Robust and Efficient Hyperparameter Optimization at Scale"";
- RoBO, a package for Bayesian optimization (Klein et al, ""RoBO: A Flexible and Robust Bayesian Optimization Framework in Python"";

In terms of overarching publications, we edited the first book on Automated Machine Learning (Hutter et al, ""Automated Machine Learning: Methods, Systems, Challenges"") and wrote survey articles on hyperparameter optimization (Feurer & Hutter, ""Hyperparameter Optimization"") and neural architecture search (Elsken, Metzen und Hutter, ""Neural Architecture Search: A Survey"")."
We have dramatically improved the state of the art in hyperparameter optimization of deep neural networks. E.g. our BOHB approach has led to more than 50x speedups as compared to traditional blackbox optimization methods.

Until the end of the project, we also expect substantial advances in reinforcement learning for whitebox hyperparameter control, leading to adaptive learning rate schedules, and adaptive settings of other neural network hyperparameters.