Periodic Reporting for period 1 - IT-DNN (INFORMATION-THEORETIC LIMITS FOR DEEP NEURAL NETWORKS)
Reporting period: 2020-04-01 to 2022-03-31
The other paper titled as "A Unified View on PAC-Bayes Bounds for Meta-Learning".
Two other papers about 1.Meta excess risk and 2. f-CMI for meta learning may be prepared for submission in the future.
To increase our theoretical understanding of DNNs, we have studied meta learning. Meta-learning formalizes the goal of why DNNs have good performance to problems such as transfer learning. In fact, the learning process is done based on the set of assumptions known as inductive bias. In many machine learning problems (including DNNs), finding methods for automatically learning the inductive bias is desirable. Meta learning formalizes this goal by observing data from a number of inherently related tasks. Then, it uses the gained experience and knowledge to learn appropriate bias which can be fine-tuned to perform well on new tasks. Thus, the meta learner speeds up the learning of a new, previously unseen task.
For example, in DNNs, learning the initial weight and the learning rate of the training algorithm is in the scope of meta learning. As mentioned, the goal is extracting knowledge from several observed tasks referred to as meta-training set, and using the knowledge to improve performance on a novel task. The meta-learner generalizes well if after observing sufficiently training tasks, it infers a hyperparameter which contains good solutions to novel tasks. The good solution means that meta-generalization loss, which is defined as the average loss incurred by the hyperparameter when used on a new task, is minimized.
However, since both data and task distributions are unknown, the meta-generalization loss cannot be optimized. Instead, the meta-learner evaluates the empirical meta-training loss for the hyperparameter based on the meta-training set. Meta-generalization gap is defined as the difference between the meta-generalization loss and the meta-training loss. If the meta-generalization gap is small, it means that the meta-training loss is a good estimation of the meta-generalization loss.
Thus, bounding the meta-generalization gap is a key technique to understanding how the prior knowledge acquired from previous tasks may improve the performance of learning an unseen task.
developing novel information-theoretic bounds on the generalization error attainable using DNN and by demonstrating how
such bounds can guide the design of such network.
To formalize this goal, we start by finding information-theoretic (IT) bounds for meta learning.
We found IT bounds for average and high-probability scenarios that work much better than existing bounds.
We also develop some theoretical understanding of meta learning, which guides to design DNNs.
We could present an information-theoretic bound on the generalization performance of any given meta-learner, which builds on the conditional mutual information (CMI) framework. These results were presented in the top conferences of information theory (ISIT 2021), and the outcome is the paper titled as “Conditional Mutual Information-Based Generalization Bound for Meta Learning”. The main insight of this paper is that the average meta-generalization gap in meta-learning can be bounded via two conditional mutual information (CMI) terms that capture the sensitivity of the meta-learner and base-learner to their input training sets. The resulting bound “inherits the advantage of the CMI bound for conventional learning, including its boundedness.”
We could also derive a general framework that gives PAC-Bayes bounds on the meta-generalization gap. Under certain setups, different families of PAC-Bayes bounds, namely classic, quadratic and fast-rate families, can be re-obtained by the general framework. We also propose new PAC-Bayes classic bounds which reduce the meta-overfitting problem. These results, have been published in our second paper titled as “A Unified View on PAC-Bayes Bounds for Meta-Learning”, and presented in the top conferences of machine learning ICML 2022.
The main insight of the paper is that we proposed a general framework and the derived bounds are new and interesting. We could overcome the problem of decomposing in the PAC-Bayes setup for meta learning. Our technique can be used in many other problems related to the meta-generalization gap.
It may have impact on designing DNNs with better performance in many applications.