Probing Representation Forgetting in Supervised and Unsupervised
Continual Learning
- URL: http://arxiv.org/abs/2203.13381v1
- Date: Thu, 24 Mar 2022 23:06:08 GMT
- Title: Probing Representation Forgetting in Supervised and Unsupervised
Continual Learning
- Authors: MohammadReza Davari, Nader Asadi, Sudhir Mudur, Rahaf Aljundi, Eugene
Belilovsky
- Abstract summary: Catastrophic forgetting is associated with an abrupt loss of knowledge previously learned by a model.
We show that representation forgetting can lead to new insights on the effect of model capacity and loss function used in continual learning.
- Score: 14.462797749666992
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual Learning research typically focuses on tackling the phenomenon of
catastrophic forgetting in neural networks. Catastrophic forgetting is
associated with an abrupt loss of knowledge previously learned by a model when
the task, or more broadly the data distribution, being trained on changes. In
supervised learning problems this forgetting, resulting from a change in the
model's representation, is typically measured or observed by evaluating the
decrease in old task performance. However, a model's representation can change
without losing knowledge about prior tasks. In this work we consider the
concept of representation forgetting, observed by using the difference in
performance of an optimal linear classifier before and after a new task is
introduced. Using this tool we revisit a number of standard continual learning
benchmarks and observe that, through this lens, model representations trained
without any explicit control for forgetting often experience small
representation forgetting and can sometimes be comparable to methods which
explicitly control for forgetting, especially in longer task sequences. We also
show that representation forgetting can lead to new insights on the effect of
model capacity and loss function used in continual learning. Based on our
results, we show that a simple yet competitive approach is to learn
representations continually with standard supervised contrastive learning while
constructing prototypes of class samples when queried on old samples.
Related papers
- Premonition: Using Generative Models to Preempt Future Data Changes in
Continual Learning [63.850451635362425]
Continual learning requires a model to adapt to ongoing changes in the data distribution.
We show that the combination of a large language model and an image generation model can similarly provide useful premonitions.
We find that the backbone of our pre-trained networks can learn representations useful for the downstream continual learning problem.
arXiv Detail & Related papers (2024-03-12T06:29:54Z) - RanDumb: A Simple Approach that Questions the Efficacy of Continual Representation Learning [68.42776779425978]
We show that existing online continually trained deep networks produce inferior representations compared to a simple pre-defined random transforms.
We then train a simple linear classifier on top without storing any exemplars, processing one sample at a time in an online continual learning setting.
Our study reveals the significant limitations of representation learning, particularly in low-exemplar and online continual learning scenarios.
arXiv Detail & Related papers (2024-02-13T22:07:29Z) - Mitigating Catastrophic Forgetting in Task-Incremental Continual
Learning with Adaptive Classification Criterion [50.03041373044267]
We propose a Supervised Contrastive learning framework with adaptive classification criterion for Continual Learning.
Experiments show that CFL achieves state-of-the-art performance and has a stronger ability to overcome compared with the classification baselines.
arXiv Detail & Related papers (2023-05-20T19:22:40Z) - The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z) - Task-Agnostic Robust Representation Learning [31.818269301504564]
We study the problem of robust representation learning with unlabeled data in a task-agnostic manner.
We derive an upper bound on the adversarial loss of a prediction model on any downstream task, using its loss on the clean data and a robustness regularizer.
Our method achieves preferable adversarial performance compared to relevant baselines.
arXiv Detail & Related papers (2022-03-15T02:05:11Z) - New Insights on Reducing Abrupt Representation Change in Online
Continual Learning [69.05515249097208]
We focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream.
We show that applying Experience Replay causes the newly added classes' representations to overlap significantly with the previous classes.
We propose a new method which mitigates this issue by shielding the learned representations from drastic adaptation to accommodate new classes.
arXiv Detail & Related papers (2022-03-08T01:37:00Z) - Reducing Representation Drift in Online Continual Learning [87.71558506591937]
We study the online continual learning paradigm, where agents must learn from a changing distribution with constrained memory and compute.
In this work we instead focus on the change in representations of previously observed data due to the introduction of previously unobserved class samples in the incoming data stream.
arXiv Detail & Related papers (2021-04-11T15:19:30Z) - Learning Invariant Representation for Continual Learning [5.979373021392084]
A key challenge in Continual learning is catastrophically forgetting previously learned tasks when the agent faces a new one.
We propose a new pseudo-rehearsal-based method, named learning Invariant Representation for Continual Learning (IRCL)
Disentangling the shared invariant representation helps to learn continually a sequence of tasks, while being more robust to forgetting and having better knowledge transfer.
arXiv Detail & Related papers (2021-01-15T15:12:51Z) - Odd-One-Out Representation Learning [1.6822770693792826]
We show that a weakly-supervised downstream task based on odd-one-out observations is suitable for model selection.
We also show that a bespoke metric-learning VAE model which performs highly on this task also out-performs other standard unsupervised and a weakly-supervised disentanglement model.
arXiv Detail & Related papers (2020-12-14T22:01:15Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.