Strong Baselines for Parameter Efficient Few-Shot Fine-tuning
- URL: http://arxiv.org/abs/2304.01917v1
- Date: Tue, 4 Apr 2023 16:14:39 GMT
- Title: Strong Baselines for Parameter Efficient Few-Shot Fine-tuning
- Authors: Samyadeep Basu, Daniela Massiceti, Shell Xu Hu, Soheil Feizi
- Abstract summary: Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase.
Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC.
Fine-tuning ViTs, however, is expensive in time, compute and storage.
This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters.
- Score: 50.83426196335385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot classification (FSC) entails learning novel classes given only a few
examples per class after a pre-training (or meta-training) phase on a set of
base classes. Recent works have shown that simply fine-tuning a pre-trained
Vision Transformer (ViT) on new test classes is a strong approach for FSC.
Fine-tuning ViTs, however, is expensive in time, compute and storage. This has
motivated the design of parameter efficient fine-tuning (PEFT) methods which
fine-tune only a fraction of the Transformer's parameters. While these methods
have shown promise, inconsistencies in experimental conditions make it
difficult to disentangle their advantage from other experimental factors
including the feature extractor architecture, pre-trained initialization and
fine-tuning algorithm, amongst others. In our paper, we conduct a large-scale,
experimentally consistent, empirical analysis to study PEFTs for few-shot image
classification. Through a battery of over 1.8k controlled experiments on
large-scale few-shot benchmarks including Meta-Dataset (MD) and ORBIT, we
uncover novel insights on PEFTs that cast light on their efficacy in
fine-tuning ViTs for few-shot classification. Through our controlled empirical
study, we have two main findings: (i) Fine-tuning just the LayerNorm parameters
(which we call LN-Tune) during few-shot adaptation is an extremely strong
baseline across ViTs pre-trained with both self-supervised and supervised
objectives, (ii) For self-supervised ViTs, we find that simply learning a set
of scaling parameters for each attention matrix (which we call AttnScale) along
with a domain-residual adapter (DRA) module leads to state-of-the-art
performance (while being $\sim\!$ 9$\times$ more parameter-efficient) on MD.
Our extensive empirical findings set strong baselines and call for rethinking
the current design of PEFT methods for FSC.
Related papers
- Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models [24.62337386603331]
Large Multi-modal Models (LMMs) are revolutionizing the way machines interact with the world.
To adapt LMMs for downstream tasks, parameter-efficient fine-tuning (PEFT) has gained popularity.
This paper focuses on the strengths and weaknesses of each tuning strategy, shifting the focus from the efficiency typically associated with these approaches.
arXiv Detail & Related papers (2024-10-29T07:55:50Z) - Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition [36.031972728327894]
We study representative PETL methods in the context of Vision Transformers.
PETL methods can obtain similar accuracy in the low-shot benchmark VTAB-1K.
PETL is also useful in many-shot regimes -- it achieves comparable and sometimes better accuracy than full FT.
arXiv Detail & Related papers (2024-09-24T19:57:40Z) - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts [52.1635661239108]
We introduce ExPLoRA, a highly effective technique to improve transfer learning of pre-trained vision transformers (ViTs) under domain shifts.
Our experiments demonstrate state-of-the-art results on satellite imagery, even outperforming fully pre-training and fine-tuning ViTs.
arXiv Detail & Related papers (2024-06-16T15:14:56Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - PVP: Pre-trained Visual Parameter-Efficient Tuning [29.05396521860764]
Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks.
It is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs.
We propose a Pre-trained Visual.
efficient (PVP) Tuning framework, which pre-trains the parameter-efficient tuning modules first and then leverages the pre-trained modules.
arXiv Detail & Related papers (2023-04-26T15:55:29Z) - Incremental Few-Shot Object Detection via Simple Fine-Tuning Approach [6.808112517338073]
iFSD incrementally learns novel classes using only a few examples without revisiting base classes.
We propose a simple fine-tuning-based approach, the Incremental Two-stage Fine-tuning Approach (iTFA) for iFSD.
iTFA achieves competitive performance in COCO and shows a 30% higher AP accuracy than meta-learning methods in the LVIS dataset.
arXiv Detail & Related papers (2023-02-20T05:48:46Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Pushing the Limits of Simple Pipelines for Few-Shot Learning: External
Data and Fine-Tuning Make a Difference [74.80730361332711]
Few-shot learning is an important and topical problem in computer vision.
We show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-15T02:55:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.