The action has 3 method goals (WP1-3) and 1 application goal (WP4).
Method Goal 1: Exploiting data-driven priors in Bayesian optimization
We developed methods to effectively exploit large quantities of algorithm performance data on previous datasets [5, 9, 33], but also developed tooling to access such data in the first place [28, 18, 45]. Together with a methodology to use dataset subsets [1,13, 53], we used this methodology to win the 2nd AutoML competition [24]. Moreover, we were able to implement the equations given in WP 1.3 to successfully tune support vector machines and obtain up to 100-fold speedup [4,22]. We have also published the first work that allows the specification of free-form priors in Bayesian optimization [51] and the first work that allows learning across developer adjustments [31]. We also explored meta-learning in the context of neural architecture search [37].
Method Goal 2: Graybox Bayesian optimization
We worked on methods for modeling learning curves and exploiting them in a Bayesian optimization setting [2, 13, 21, 22]. However, concurrent work demonstrated excellent performance with a simpler, bandit-based method; we therefore shifted our focus on improving said bandit method and developed model-based bandit methods that improved the state of the art in hyperparameter optimization of deep neural networks. E.g. our BOHB approach has led to more than 50x speedups as compared to traditional black-box optimization methods [13]. We also developed efficient and reproducible benchmark problems to further evaluate gray-box methods [48]. As part of our work on extracting SGD state features, we found an issue with the popular Adam methodology, and the proposed fix is now my most highly cited paper [11].
Method Goal 3: Hyperparameter control
We worked on improving stochastic optimization with AdamW [11] in order to control this optimization with RL. To ease the application of reinforcement learning (RL) we worked on automated hyperparameter optimization for RL algorithms [26, 27], also adapting hyperparameters over time. We clearly demonstrated the benefit of hyperparameter optimization in the RL domain by finding hyperparameters so good that they allowed the agent to break the Mujoco simulator [12].
A part of our work that was not covered by the original work packages but requested by the reviewers of the original proposal, was the optimization of neural architectures. We therefore invested substantial effort into this new field of neural architecture search (NAS) and made substantial contributions to the field [7, 10, 20, 25, 37, 39, 40, 42, 43, 47].
Application Goal: Computationally inexpensive auto-tuned deep learning, even for large datasets.
To improve the applicability of deep learning we worked on the hyperparameter optimization of augmentation and regularization techniques [44, 50]. We achieved the application goal by releasing the open-source automated deep learning library Auto-PyTorch [49], which allows efficient hyperparameter optimization & training of deep networks even on large datasets. This democratizes deep learning by allowing non-ML-experts to achieve state-of-the-art ML results.