Incrementally learning new classes with generative classification

Project Information

GenClassifier4CL

Grant agreement ID: 101067759

DOI

10.3030/101067759

Project closed

EC signature date 23 June 2022

Start date 1 October 2022

End date 30 September 2024

Funded under

Marie Skłodowska-Curie Actions (MSCA)

Total cost

No data

EU contribution

€ 191 760,00

Coordinated by

KATHOLIEKE UNIVERSITEIT LEUVEN
Belgium

Periodic Reporting for period 1 - GenClassifier4CL (Incrementally learning new classes with generative classification)

Reporting period: 2022-10-01 to 2024-09-30

Learning continually from non-stationary streams of data is a key feature of natural intelligence, but an unsolved problem in deep learning. Particularly challenging for deep neural networks is the problem of "class-incremental learning", whereby a network must learn to distinguish classes that are not observed together. In deep learning, the default approach to classification is learning discriminative classifiers. This works great in the i.i.d. setting when all classes are available simultaneously, but when new classes must be learned incrementally, successful training of discriminative classifiers depends on workarounds such as storing data or generative replay. In a radical shift of gears, here I propose to instead address class-incremental learning with generative classification. Key advantage is that generative classifiers – unlike discriminative classifiers – do not compare classes during training, but only during inference (i.e. when making a classification decision). As a proof-of-concept, in preliminary work I showed that a naïve implementation of a generative classifier, with a separate variational autoencoder model per class and likelihood estimation through importance sampling, outperforms comparable generative replay methods. To improve the efficiency, scalability, and performance of this generative classifier, I propose four further modifications: (1) move the generative modelling objective from the raw inputs to an intermediate network layer; (2) share the encoder network between classes, but not necessarily the decoder networks; (3) use fewer importance samples for unlikely classes; and (4) make classification decisions hierarchical. This way, it is hoped that generative classification can be developed into a practical, efficient, and scalable state-of-the-art deep learning method for class-incremental learning.

To move the generative modeling objective from the raw inputs to an intermediate network layer, the learning of strong representations is important. Therefore, in collaboration with Timm Hess and Eli Verwimp, I studied the quality of continually learned representation. We found that, although forgetting in the representation (i.e. feature forgetting) can be small in absolute terms, when measuring relative to how much was learned during a task, forgetting in the representation tends to be just as catastrophic as forgetting at the output level. We further showed that this feature forgetting is problematic as it substantially slows down the incremental learning of good general representations (i.e. knowledge accumulation).
Another challenge for scaling up generative classification, is that continually learning generative models of suitable quality is a challenging task in itself. Before the fellowship I had mostly worked with variational auto-encoders. To explore potential benefits of using other types of generative models, together with visiting master student Sergi Masip, I explored the continual training of diffusion models. We found that using distillation in combination with generative replay can substantially enhance the performance of continually trained diffusion models.
To address the computational efficiency and scalability of generative classification, I worked together with visiting PhD student Michal Zajac. Initially we looked at different ways of sharing parts of the class-specific generative models, but we ended up developing a novel approach for class-incremental learning that is closely related to generative classification, but a lot more scalable and efficient. This novel approach, which we called Prediction Error-based Classification (PEC), is based on the prediction errors of shallow, class-specific networks that are trained to mimic the output of a random teacher network. The attached figure schematically illustrates the difference between discriminative classification, generative classification and our newly proposed approach of prediction error-based classification. This approach obtained remarkably strong performance on a variety of class-incremental learning benchmarks, while being substantially more computationally efficient than generative classification. Besides its strong performance, a nice feature is that this approach can also be motivated as an approximation of a classification rule based on Gaussian Process posterior variance.

Initially as a side-project, in collaboration with PhD student Matthias De Lange, I explored the continual evaluation for continual learning. (As opposed to evaluating performance at the end of training each task, as until then had been the norm.) Doing so, we discovered an intriguing phenomenon: when starting to train on a new task, state-of-the-art continual learning methods (e.g. replay, regularization) suffer temporary but substantial forgetting of previously learned tasks. We called this phenomenon the “stability gap”. The stability gap is problematic for safety-critical applications in which sudden drops in performance can be harmful, but it is also indicative of inefficient optimization.
With my PhD student Timm Hess, I dived deeper into the stability gap. Remarkably, we showed that the stability gap consistently occurs even with incremental joint training, which can be seen as the “upper bound” of most current continual learning methods (e.g. replay, regularization). Based on this finding, we argue that continual learning needs to develop new kinds of methods, which focus on changing the way in which optimization is done in continual learning problems.

Finally, another outcome of this fellowship has been that in a collaboration with Nicholas Soures and Dhireesha Kudithipudi from UT San Antonio, I wrote an invited book chapter for the major reference work “Learning and Memory: A Comprehensive Reference” on continual learning and catastrophic forgetting.

This fellowship has produced two scientific results that I consider to be clearly above the state of the art:

(1) Prediction error-based classification (abbreviated as PEC) is a new approach that demonstrates strong performance on a variety of class-incremental learning problems, without the need to store data and with relatively low computational costs.

(2) The discovery of the stability gap. The stability gap is the intriguing phenomenon that successful state-of-the-art continual learning methods (e.g. replay, regularization) still suffer from substantial forgetting when starting to learn something new, except that this forgetting is often only temporary and recovered during continued training.

Schematic of PEC relative to discriminative and generative classifiation.

Periodic Reporting for period 1 - GenClassifier4CL (Incrementally learning new classes with generative classification)

Share this page Share this page on social networks

Download Download the content of the page