SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models
- URL: http://arxiv.org/abs/2411.02175v1
- Date: Mon, 04 Nov 2024 15:34:30 GMT
- Title: SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models
- Authors: Linglan Zhao, Xuerui Zhang, Ke Yan, Shouhong Ding, Weiran Huang,
- Abstract summary: Continual learning aims to incrementally acquire new concepts in data streams while resisting forgetting previous knowledge.
With the rise of powerful pre-trained models (PTMs), there is a growing interest in training incremental learning systems.
- Score: 26.484208658326857
- License:
- Abstract: Continual learning aims to incrementally acquire new concepts in data streams while resisting forgetting previous knowledge. With the rise of powerful pre-trained models (PTMs), there is a growing interest in training incremental learning systems using these foundation models, rather than learning from scratch. Existing works often view PTMs as a strong initial point and directly apply parameter-efficient tuning (PET) in the first session for adapting to downstream tasks. In the following sessions, most methods freeze model parameters for tackling forgetting issues. However, applying PET directly to downstream data cannot fully explore the inherent knowledge in PTMs. Additionally, freezing the parameters in incremental sessions hinders models' plasticity to novel concepts not covered in the first session. To solve the above issues, we propose a Slow And Fast parameter-Efficient tuning (SAFE) framework. In particular, to inherit general knowledge from foundation models, we include a transfer loss function by measuring the correlation between the PTM and the PET-applied model. After calibrating in the first session, the slow efficient tuning parameters can capture more informative features, improving generalization to incoming classes. Moreover, to further incorporate novel concepts, we strike a balance between stability and plasticity by fixing slow efficient tuning parameters and continuously updating the fast ones. Specifically, a cross-classification loss with feature alignment is proposed to circumvent catastrophic forgetting. During inference, we introduce an entropy-based aggregation strategy to dynamically utilize the complementarity in the slow and fast learners. Extensive experiments on seven benchmark datasets verify the effectiveness of our method by significantly surpassing the state-of-the-art.
Related papers
- Sparse Orthogonal Parameters Tuning for Continual Learning [34.462967722928724]
Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting.
We propose a novel yet effective method called SoTU (Sparse Orthogonal Parameters TUning)
arXiv Detail & Related papers (2024-11-05T05:19:09Z) - SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT.
Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework.
Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - FeTT: Continual Class Incremental Learning via Feature Transformation Tuning [19.765229703131876]
Continual learning (CL) aims to extend deep models from static and enclosed environments to dynamic and complex scenarios.
Recent CL models have gradually shifted towards the utilization of pre-trained models with parameter-efficient fine-tuning strategies.
This paper proposes feature transformation tuning (FeTT) model to non-parametrically fine-tune backbone features across all tasks.
arXiv Detail & Related papers (2024-05-20T06:33:50Z) - InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning [12.004172212239848]
Continual learning requires the model to learn multiple tasks sequentially.
In this work, we propose a new PEFT method, called interference-free low-rank adaptation (InfLoRA) for continual learning.
arXiv Detail & Related papers (2024-03-30T03:16:37Z) - Rethinking Class-incremental Learning in the Era of Large Pre-trained Models via Test-Time Adaptation [20.62749699589017]
Class-incremental learning (CIL) is a challenging task that involves sequentially learning to categorize classes from new tasks.
We propose Test-Time Adaptation for Class-Incremental Learning (TTACIL) that first fine-tunes PTMs using Adapters on the first task.
Our TTACIL does not undergo any forgetting, while benefiting each task with the rich PTM features.
arXiv Detail & Related papers (2023-10-17T13:06:39Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need [84.3507610522086]
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones.
Recent pre-training has achieved substantial progress, making vast pre-trained models (PTMs) accessible for CIL.
We argue that the core factors in CIL are adaptivity for model updating and generalizability for knowledge transferring.
arXiv Detail & Related papers (2023-03-13T17:59:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.