On the Effectiveness of LayerNorm Tuning for Continual Learning in
Vision Transformers
- URL: http://arxiv.org/abs/2308.09610v1
- Date: Fri, 18 Aug 2023 15:11:16 GMT
- Title: On the Effectiveness of LayerNorm Tuning for Continual Learning in
Vision Transformers
- Authors: Thomas De Min, Massimiliano Mancini, Karteek Alahari, Xavier
Alameda-Pineda, Elisa Ricci
- Abstract summary: State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts.
We introduce a two-stage training procedure, where we first optimize the task-specific parameters and then train the classifier with the same selection procedure of the inference time.
Our method achieves results that are either superior or on par with the state of the art while being computationally cheaper.
- Score: 47.77328392236625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art rehearsal-free continual learning methods exploit the
peculiarities of Vision Transformers to learn task-specific prompts,
drastically reducing catastrophic forgetting. However, there is a tradeoff
between the number of learned parameters and the performance, making such
models computationally expensive. In this work, we aim to reduce this cost
while maintaining competitive performance. We achieve this by revisiting and
extending a simple transfer learning idea: learning task-specific normalization
layers. Specifically, we tune the scale and bias parameters of LayerNorm for
each continual learning task, selecting them at inference time based on the
similarity between task-specific keys and the output of the pre-trained model.
To make the classifier robust to incorrect selection of parameters during
inference, we introduce a two-stage training procedure, where we first optimize
the task-specific parameters and then train the classifier with the same
selection procedure of the inference time. Experiments on ImageNet-R and
CIFAR-100 show that our method achieves results that are either superior or on
par with {the state of the art} while being computationally cheaper.
Related papers
- Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Parameter-Efficient Transfer Learning with Diff Pruning [108.03864629388404]
diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework.
We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
arXiv Detail & Related papers (2020-12-14T12:34:01Z) - Lifelong Learning Without a Task Oracle [13.331659934508764]
Supervised deep neural networks are known to undergo a sharp decline in the accuracy of older tasks when new tasks are learned.
We propose and compare several candidate task-assigning mappers which require very little memory overhead.
Best-performing variants only impose an average cost of 1.7% parameter memory increase.
arXiv Detail & Related papers (2020-11-09T21:30:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.