Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need
- URL: http://arxiv.org/abs/2406.03216v1
- Date: Wed, 5 Jun 2024 12:53:37 GMT
- Title: Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need
- Authors: Martin Wistuba, Prabhu Teja Sivaprasad, Lukas Balles, Giovanni Zappella,
- Abstract summary: We find that the choice of prompt tuning as a PEFT method hurts the overall performance of the CL system.
We replace prompt tuning with LoRA in two state-of-the-art continual learning methods: Learning to Prompt and S-Prompts.
- Score: 18.112632827740878
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent Continual Learning (CL) methods have combined pretrained Transformers with prompt tuning, a parameter-efficient fine-tuning (PEFT) technique. We argue that the choice of prompt tuning in prior works was an undefended and unablated decision, which has been uncritically adopted by subsequent research, but warrants further research to understand its implications. In this paper, we conduct this research and find that the choice of prompt tuning as a PEFT method hurts the overall performance of the CL system. To illustrate this, we replace prompt tuning with LoRA in two state-of-the-art continual learning methods: Learning to Prompt and S-Prompts. These variants consistently achieve higher accuracy across a wide range of domain-incremental and class-incremental benchmarks, while being competitive in inference speed. Our work highlights a crucial argument: unexamined choices can hinder progress in the field, and rigorous ablations, such as the PEFT method, are required to drive meaningful adoption of CL techniques in real-world applications.
Related papers
- Sparse Orthogonal Parameters Tuning for Continual Learning [34.462967722928724]
Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting.
We propose a novel yet effective method called SoTU (Sparse Orthogonal Parameters TUning)
arXiv Detail & Related papers (2024-11-05T05:19:09Z) - SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT.
Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework.
Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z) - Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images [18.094731760514264]
We study the effectiveness of fine-tuning methods when adapting foundation models to medical image classification tasks.
We propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels.
EPT outperforms several state-of-the-art finetuning methods by a significant margin on few-shot medical image classification tasks.
arXiv Detail & Related papers (2024-07-01T06:35:53Z) - Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models [63.11967672725459]
We show how P-RFCL techniques can be matched by a simple and lightweight PEFT baseline.
We show how most often, P-RFCL techniques can be matched by a simple and lightweight PEFT baseline.
arXiv Detail & Related papers (2024-06-13T17:57:10Z) - Prompt Customization for Continual Learning [57.017987355717935]
We reformulate the prompting approach for continual learning and propose the prompt customization (PC) method.
PC mainly comprises a prompt generation module (PGM) and a prompt modulation module (PMM)
We evaluate our method on four benchmark datasets for three diverse settings, including the class, domain, and task-agnostic incremental learning tasks.
arXiv Detail & Related papers (2024-04-28T03:28:27Z) - On the Effectiveness of LayerNorm Tuning for Continual Learning in
Vision Transformers [47.77328392236625]
State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts.
We introduce a two-stage training procedure, where we first optimize the task-specific parameters and then train the classifier with the same selection procedure of the inference time.
Our method achieves results that are either superior or on par with the state of the art while being computationally cheaper.
arXiv Detail & Related papers (2023-08-18T15:11:16Z) - Stepsize Learning for Policy Gradient Methods in Contextual Markov
Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL.
They tend to struggle when asked to accomplish a series of heterogeneous tasks.
We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z) - PTP: Boosting Stability and Performance of Prompt Tuning with
Perturbation-Based Regularizer [94.23904400441957]
We introduce perturbation-based regularizers, which can smooth the loss landscape, into prompt tuning.
We design two kinds of perturbation-based regularizers, including random-noise-based and adversarial-based.
Our new algorithms improve the state-of-the-art prompt tuning methods by 1.94% and 2.34% on SuperGLUE and FewGLUE benchmarks, respectively.
arXiv Detail & Related papers (2023-05-03T20:30:51Z) - Strong Baselines for Parameter Efficient Few-Shot Fine-tuning [50.83426196335385]
Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase.
Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC.
Fine-tuning ViTs, however, is expensive in time, compute and storage.
This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters.
arXiv Detail & Related papers (2023-04-04T16:14:39Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.