Anatomy of Catastrophic Forgetting: Hidden Representations and Task
Semantics
- URL: http://arxiv.org/abs/2007.07400v1
- Date: Tue, 14 Jul 2020 23:31:14 GMT
- Title: Anatomy of Catastrophic Forgetting: Hidden Representations and Task
Semantics
- Authors: Vinay V. Ramasesh, Ethan Dyer, Maithra Raghu
- Abstract summary: We investigate how forgetting affects representations in neural network models.
We find that deeper layers are disproportionately the source of forgetting.
We also introduce a novel CIFAR-100 based task approximating realistic input distribution shift.
- Score: 24.57617154267565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A central challenge in developing versatile machine learning systems is
catastrophic forgetting: a model trained on tasks in sequence will suffer
significant performance drops on earlier tasks. Despite the ubiquity of
catastrophic forgetting, there is limited understanding of the underlying
process and its causes. In this paper, we address this important knowledge gap,
investigating how forgetting affects representations in neural network models.
Through representational analysis techniques, we find that deeper layers are
disproportionately the source of forgetting. Supporting this, a study of
methods to mitigate forgetting illustrates that they act to stabilize deeper
layers. These insights enable the development of an analytic argument and
empirical picture relating the degree of forgetting to representational
similarity between tasks. Consistent with this picture, we observe maximal
forgetting occurs for task sequences with intermediate similarity. We perform
empirical studies on the standard split CIFAR-10 setup and also introduce a
novel CIFAR-100 based task approximating realistic input distribution shift.
Related papers
- CUCL: Codebook for Unsupervised Continual Learning [129.91731617718781]
The focus of this study is on Unsupervised Continual Learning (UCL), as it presents an alternative to Supervised Continual Learning.
We propose a method named Codebook for Unsupervised Continual Learning (CUCL) which promotes the model to learn discriminative features to complete the class boundary.
Our method significantly boosts the performances of supervised and unsupervised methods.
arXiv Detail & Related papers (2023-11-25T03:08:50Z) - Sequential Action-Induced Invariant Representation for Reinforcement
Learning [1.2046159151610263]
How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a challenging problem in visual reinforcement learning.
We propose a Sequential Action-induced invariant Representation (SAR) method, in which the encoder is optimized by an auxiliary learner to only preserve the components that follow the control signals of sequential actions.
arXiv Detail & Related papers (2023-09-22T05:31:55Z) - Multi-Level Contrastive Learning for Dense Prediction Task [59.591755258395594]
We present Multi-Level Contrastive Learning for Dense Prediction Task (MCL), an efficient self-supervised method for learning region-level feature representation for dense prediction tasks.
Our method is motivated by the three key factors in detection: localization, scale consistency and recognition.
Our method consistently outperforms the recent state-of-the-art methods on various datasets with significant margins.
arXiv Detail & Related papers (2023-04-04T17:59:04Z) - Causal Triplet: An Open Challenge for Intervention-centric Causal
Representation Learning [98.78136504619539]
Causal Triplet is a causal representation learning benchmark featuring visually more complex scenes.
We show that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts.
arXiv Detail & Related papers (2023-01-12T17:43:38Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - Contrastive Supervised Distillation for Continual Representation
Learning [18.864301420659217]
A neural network model is sequentially learned to alleviate catastrophic forgetting in visual search tasks.
Our method, called Contrastive Supervised Distillation (CSD), reduces feature forgetting while learning discriminative features.
arXiv Detail & Related papers (2022-05-11T13:20:47Z) - An Empirical Investigation of the Role of Pre-training in Lifelong
Learning [21.995593026269578]
We show that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially.
We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima.
arXiv Detail & Related papers (2021-12-16T19:00:55Z) - Hierarchical Bayesian Bandits [51.67132887113412]
We analyze a natural hierarchical Thompson sampling algorithm (hierTS) that can be applied to any problem in this class.
Our regret bounds hold under many instances of such problems, including when the tasks are solved sequentially or in parallel.
Experiments show that the hierarchical structure helps with knowledge sharing among the tasks.
arXiv Detail & Related papers (2021-11-12T20:33:09Z) - Do Self-Supervised and Supervised Methods Learn Similar Visual
Representations? [3.1594831736896025]
We compare a constrastive self-supervised algorithm (SimCLR) to supervision for simple image data in a common architecture.
We find that the methods learn similar intermediate representations through dissimilar means, and that the representations diverge rapidly in the final few layers.
Our work particularly highlights the importance of the learned intermediate representations, and raises important questions for auxiliary task design.
arXiv Detail & Related papers (2021-10-01T16:51:29Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.