Periodic Reporting for period 2 - KeepOnLearning (Beyond solving static datasets: Deep learning from streaming data)
Reporting period: 2023-03-01 to 2024-08-31
With the KeepOnLearning project, we aim to build a new generation of deep learning methods, able to adapt to new conditions by continuously updating the models based on new training data becoming available. Learning from non-stationary streaming data is, however, still a major challenge requiring fundamental research. To reach our goal, we build on our earlier expertise in continual learning, and plan to tightly link continual learning with advances in self-supervised representation learning and multimodal learning. If successful, this will lead to machine learning systems that keep on learning over time, systematically improving their skills and never getting outdated. It also may lower the threshold for applying machine learning, as it reduces the need for a skilled data scientist carefully preparing the data beforehand. As a practical application, we plan to showcase our work’s feasibility, scalability and flexibility in the context of automatic generation of audio descriptions of videos for the visually impaired.
1. Better understanding of the learning dynamics leading to catastrophic forgetting
We consolidated a principled taxonomy of continual learning setups; we studied where in a neural network the forgetting is most severe; and we discovered that forgetting is most outspoken immediately after switching to a new task. The latter is important, as it means the worst case performance of a continual learning system is much lower than what one would expect based on previous more coarse evaluation metrics. This has a severe impact on the use of continual learning in safety-critical applications.
2. Methods to prevent catastrophic forgetting
We proposed a first method to tackle the catastrophic forgetting, that does not require storing old samples in a memory buffer and is especially well suited for the realistic case of just a single pass through the data.
3. Better self-supervised representation learning models
Based on our findings on the learning dynamics, it became evident that good representations are key in tackling the catastrophic forgetting issue: networks that have learned to extract powerful representations can learn new tasks more easily, with less interference and hence less forgetting. This motivated us to look closer into representation learning. The fact that these are typically trained in a self-supervised setting, i.e. without requiring human annotations, is an extra advantage, as it makes them well suited for a continual learning setup, where labels are often hard to get.
In particular, we developed a method that discovers semantic concepts in images and learns local representations based on them, leading to state-of-the-art results on a variety of dense downstream tasks under a low data regime, even without any retraining ('in context inference').
3. Better multimodal learning models
We identified the co-occurrence among different modalities (e.g. vision and language) as an important cue to learn in a self-supervised or weakly supervised manner. So far, we focused on human-object interaction detection, where we proposed a zero-shot solution.
The next major step is to scale up. Instead of focusing on small sequences of tasks that are learned from scratch, starting from the best possible pretrained representation learner makes sense.
These are often large models, which require special parameter efficient methods for adaptation. The use case of automatic generation of audio descriptions will drive our research in that direction, integrating the advances in representation learning, multimodal learning and continual learning.