Task Agnostic Representation Consolidation: a Self-supervised based
Continual Learning Approach
- URL: http://arxiv.org/abs/2207.06267v1
- Date: Wed, 13 Jul 2022 15:16:51 GMT
- Title: Task Agnostic Representation Consolidation: a Self-supervised based
Continual Learning Approach
- Authors: Prashant Bhat, Bahram Zonooz, Elahe Arani
- Abstract summary: We propose a two-stage training paradigm for CL that intertwines task-agnostic and task-specific learning.
We show that our training paradigm can be easily added to memory- or regularization-based approaches.
- Score: 14.674494335647841
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Continual learning (CL) over non-stationary data streams remains one of the
long-standing challenges in deep neural networks (DNNs) as they are prone to
catastrophic forgetting. CL models can benefit from self-supervised
pre-training as it enables learning more generalizable task-agnostic features.
However, the effect of self-supervised pre-training diminishes as the length of
task sequences increases. Furthermore, the domain shift between pre-training
data distribution and the task distribution reduces the generalizability of the
learned representations. To address these limitations, we propose Task Agnostic
Representation Consolidation (TARC), a two-stage training paradigm for CL that
intertwines task-agnostic and task-specific learning whereby self-supervised
training is followed by supervised learning for each task. To further restrict
the deviation from the learned representations in the self-supervised stage, we
employ a task-agnostic auxiliary loss during the supervised stage. We show that
our training paradigm can be easily added to memory- or regularization-based
approaches and provides consistent performance gain across more challenging CL
settings. We further show that it leads to more robust and well-calibrated
models.
Related papers
- Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network.
Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z) - ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially.
Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks.
However, such methods lack theoretical guarantees, making them prone to unexpected failures.
We bridge this gap by integrating an empirically strong approach into a principled framework, designed to prevent forgetting.
arXiv Detail & Related papers (2024-10-01T12:58:37Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Mitigating Interference in the Knowledge Continuum through Attention-Guided Incremental Learning [17.236861687708096]
Attention-Guided Incremental Learning' (AGILE) is a rehearsal-based CL approach that incorporates compact task attention to effectively reduce interference between tasks.
AGILE significantly improves generalization performance by mitigating task interference and outperforming rehearsal-based approaches in several CL scenarios.
arXiv Detail & Related papers (2024-05-22T20:29:15Z) - Dynamic Sub-graph Distillation for Robust Semi-supervised Continual
Learning [52.046037471678005]
We focus on semi-supervised continual learning (SSCL), where the model progressively learns from partially labeled data with unknown categories.
We propose a novel approach called Dynamic Sub-Graph Distillation (DSGD) for semi-supervised continual learning.
arXiv Detail & Related papers (2023-12-27T04:40:12Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - Continual Learners are Incremental Model Generalizers [70.34479702177988]
This paper extensively studies the impact of Continual Learning (CL) models as pre-trainers.
We find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance.
We propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks.
arXiv Detail & Related papers (2023-06-21T05:26:28Z) - Rethinking the Representational Continuity: Towards Unsupervised
Continual Learning [45.440192267157094]
Unsupervised continual learning (UCL) aims to learn a sequence of tasks without forgetting the previously acquired knowledge.
We show that reliance on annotated data is not necessary for continual learning.
We propose Lifelong Unsupervised Mixup (LUMP) to alleviate catastrophic forgetting for unsupervised representations.
arXiv Detail & Related papers (2021-10-13T18:38:06Z) - Posterior Meta-Replay for Continual Learning [4.319932092720977]
Continual Learning (CL) algorithms have recently received a lot of attention as they attempt to overcome the need to train with an i.i.d. sample from some unknown target data distribution.
We study principled ways to tackle the CL problem by adopting a Bayesian perspective and focus on continually learning a task-specific posterior distribution.
arXiv Detail & Related papers (2021-03-01T17:08:35Z) - Aggregative Self-Supervised Feature Learning from a Limited Sample [12.555160911451688]
We propose two strategies of aggregation in terms of complementarity of various forms to boost the robustness of self-supervised learned features.
Our experiments on 2D natural image and 3D medical image classification tasks under limited data scenarios confirm that the proposed aggregation strategies successfully boost the classification accuracy.
arXiv Detail & Related papers (2020-12-14T12:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.