Related papers: Continual Learners are Incremental Model Generalizers

Continual Learners are Incremental Model Generalizers

URL: http://arxiv.org/abs/2306.12026v1
Date: Wed, 21 Jun 2023 05:26:28 GMT
Title: Continual Learners are Incremental Model Generalizers
Authors: Jaehong Yoon, Sung Ju Hwang, Yue Cao
Abstract summary: This paper extensively studies the impact of Continual Learning (CL) models as pre-trainers. We find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance. We propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks.
Score: 70.34479702177988
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivated by the efficiency and rapid convergence of pre-trained models for solving downstream tasks, this paper extensively studies the impact of Continual Learning (CL) models as pre-trainers. In both supervised and unsupervised CL, we find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance. This is because CL models can learn improved task-general features when easily forgetting task-specific knowledge. Based on this observation, we suggest a new unsupervised CL framework with masked modeling, which aims to capture fluent task-generic representation during training. Furthermore, we propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks. The model fine-tuned with GLAD achieves competitive performance and can also be used as a good pre-trained model itself. We believe this paper breaks the barriers between pre-training and fine-tuning steps and leads to a sustainable learning framework in which the continual learner incrementally improves model generalization, yielding better transfer to unseen tasks.

Related papers

Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation [67.80294336559574]
Continual Test Time Adaptation (CTTA) is a task that requires a source pre-trained model to continually adapt to new scenarios.<n>We propose a novel pipeline, Orthogonal Projection Subspace to aggregate online Prior-knowledge, dubbed OoPk.
arXiv Detail & Related papers (2025-06-23T18:17:39Z)
Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning [59.6658995479243]
We propose texttext-Perturb-and-Merge (P&M), a novel continual learning framework that integrates model merging into the CL paradigm to avoid forgetting.<n>Through theoretical analysis, we minimize the total loss increase across all tasks and derive an analytical solution for the optimal merging coefficient.<n>Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
arXiv Detail & Related papers (2025-05-28T14:14:19Z)
Sculpting [CLS] Features for Pre-Trained Model-Based Class-Incremental Learning [3.73232466691291]
Class-incremental learning requires models to continually acquire knowledge of new classes without forgetting old ones. Although pre-trained models have demonstrated strong performance in class-incremental learning, they remain susceptible to catastrophic forgetting when learning new concepts. We introduce a new parameter-efficient fine-tuning module 'Learn and Calibrate', or LuCA, designed to acquire knowledge through an adapter-calibrator couple. For each learning session, we deploy a sparse LuCA module on top of the last token, which we refer to as 'Token-level Sparse and Adaptation', or TO
arXiv Detail & Related papers (2025-02-20T17:37:08Z)
A Retention-Centric Framework for Continual Learning with Guaranteed Model Developmental Safety [75.8161094916476]
In real-world applications, learning-enabled systems often undergo iterative model development to address challenging or emerging tasks. New or improving existing capabilities may inadvertently lose good capabilities of the old model, also known as catastrophic forgetting. We propose a retention-centric framework with data-dependent constraints, and study how to continually develop a pretrained CLIP model for acquiring new or improving existing capabilities of image classification.
arXiv Detail & Related papers (2024-10-04T22:34:58Z)
ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially. Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks. However, such methods lack theoretical guarantees, making them prone to unexpected failures. We bridge this gap by integrating an empirically strong approach into a principled framework, designed to prevent forgetting.
arXiv Detail & Related papers (2024-10-01T12:58:37Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z)
Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning [46.80199921638615]
Large pre-trained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks. However, through naive fine-tuning, these zero-shot models lose their generalizability and robustness towards distribution shifts. In this work, we showcase that where fine-tuning falls short to adapt such zero-shot capable models, simple momentum-based weight can provide consistent improvements.
arXiv Detail & Related papers (2022-11-06T17:41:39Z)
Task Agnostic Representation Consolidation: a Self-supervised based Continual Learning Approach [14.674494335647841]
We propose a two-stage training paradigm for CL that intertwines task-agnostic and task-specific learning. We show that our training paradigm can be easily added to memory- or regularization-based approaches.
arXiv Detail & Related papers (2022-07-13T15:16:51Z)
Self-Supervised Models are Continual Learners [79.70541692930108]
We show that self-supervised loss functions can be seamlessly converted into distillation mechanisms for Continual Learning. We devise a framework for Continual self-supervised visual representation Learning that significantly improves the quality of the learned representations.
arXiv Detail & Related papers (2021-12-08T10:39:13Z)
When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning? [99.4914671654374]
We propose AdvCL, a novel adversarial contrastive pretraining framework. We show that AdvCL is able to enhance cross-task robustness transferability without loss of model accuracy and finetuning efficiency.
arXiv Detail & Related papers (2021-11-01T17:59:43Z)
Continual Learning From Unlabeled Data Via Deep Clustering [7.704949298975352]
Continual learning aims to learn new tasks incrementally using less computation and memory resources instead of retraining the model from scratch whenever new task arrives. We introduce a new framework to make continual learning feasible in unsupervised mode by using pseudo label obtained from cluster assignments to update model.
arXiv Detail & Related papers (2021-04-14T23:46:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.