Periodic Reporting for period 1 - GenClassifier4CL (Incrementally learning new classes with generative classification)
Reporting period: 2022-10-01 to 2024-09-30
Another challenge for scaling up generative classification, is that continually learning generative models of suitable quality is a challenging task in itself. Before the fellowship I had mostly worked with variational auto-encoders. To explore potential benefits of using other types of generative models, together with visiting master student Sergi Masip, I explored the continual training of diffusion models. We found that using distillation in combination with generative replay can substantially enhance the performance of continually trained diffusion models.
To address the computational efficiency and scalability of generative classification, I worked together with visiting PhD student Michal Zajac. Initially we looked at different ways of sharing parts of the class-specific generative models, but we ended up developing a novel approach for class-incremental learning that is closely related to generative classification, but a lot more scalable and efficient. This novel approach, which we called Prediction Error-based Classification (PEC), is based on the prediction errors of shallow, class-specific networks that are trained to mimic the output of a random teacher network. The attached figure schematically illustrates the difference between discriminative classification, generative classification and our newly proposed approach of prediction error-based classification. This approach obtained remarkably strong performance on a variety of class-incremental learning benchmarks, while being substantially more computationally efficient than generative classification. Besides its strong performance, a nice feature is that this approach can also be motivated as an approximation of a classification rule based on Gaussian Process posterior variance.
Initially as a side-project, in collaboration with PhD student Matthias De Lange, I explored the continual evaluation for continual learning. (As opposed to evaluating performance at the end of training each task, as until then had been the norm.) Doing so, we discovered an intriguing phenomenon: when starting to train on a new task, state-of-the-art continual learning methods (e.g. replay, regularization) suffer temporary but substantial forgetting of previously learned tasks. We called this phenomenon the “stability gap”. The stability gap is problematic for safety-critical applications in which sudden drops in performance can be harmful, but it is also indicative of inefficient optimization.
With my PhD student Timm Hess, I dived deeper into the stability gap. Remarkably, we showed that the stability gap consistently occurs even with incremental joint training, which can be seen as the “upper bound” of most current continual learning methods (e.g. replay, regularization). Based on this finding, we argue that continual learning needs to develop new kinds of methods, which focus on changing the way in which optimization is done in continual learning problems.
Finally, another outcome of this fellowship has been that in a collaboration with Nicholas Soures and Dhireesha Kudithipudi from UT San Antonio, I wrote an invited book chapter for the major reference work “Learning and Memory: A Comprehensive Reference” on continual learning and catastrophic forgetting.
(1) Prediction error-based classification (abbreviated as PEC) is a new approach that demonstrates strong performance on a variety of class-incremental learning problems, without the need to store data and with relatively low computational costs.
(2) The discovery of the stability gap. The stability gap is the intriguing phenomenon that successful state-of-the-art continual learning methods (e.g. replay, regularization) still suffer from substantial forgetting when starting to learn something new, except that this forgetting is often only temporary and recovered during continued training.