Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning
- URL: http://arxiv.org/abs/2402.04009v1
- Date: Tue, 6 Feb 2024 14:03:15 GMT
- Title: Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning
- Authors: Ningyuan Tang, Minghao Fu, Ke Zhu, Jianxin Wu
- Abstract summary: Low-rank Attention Side-Tuning (LAST) trains a side-network composed of only low-rank self-attention modules.
We show LAST can be highly parallel across multiple optimization objectives, making it very efficient in downstream task adaptation.
- Score: 19.17362588650503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In finetuning a large pretrained model to downstream tasks,
parameter-efficient fine-tuning (PEFT) methods can effectively finetune
pretrained models with few trainable parameters, but suffer from high GPU
memory consumption and slow training speed. Because learnable parameters from
these methods are entangled with the pretrained model, gradients related to the
frozen pretrained model's parameters have to be computed and stored during
finetuning. We propose Low-rank Attention Side-Tuning (LAST), which
disentangles the trainable module from the pretrained model by freezing not
only parameters but also outputs of the pretrained network. LAST trains a
side-network composed of only low-rank self-attention modules. By viewing the
pretrained model as a frozen feature extractor, the side-network takes
intermediate output from the pretrained model and focus on learning
task-specific knowledge. We also show that LAST can be highly parallel across
multiple optimization objectives, making it very efficient in downstream task
adaptation, for example, in finding optimal hyperparameters. LAST outperforms
previous state-of-the-art methods on VTAB-1K and other visual adaptation tasks
with roughly only 30\% of GPU memory footprint and 60\% of training time
compared to existing PEFT methods, but achieves significantly higher accuracy.
Related papers
- Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models.
Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs.
In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - PELA: Learning Parameter-Efficient Models with Low-Rank Approximation [16.9278983497498]
We propose a novel method for increasing the parameter efficiency of pre-trained models by introducing an intermediate pre-training stage.
This allows for direct and efficient utilization of the low-rank model for downstream fine-tuning tasks.
arXiv Detail & Related papers (2023-10-16T07:17:33Z) - PVP: Pre-trained Visual Parameter-Efficient Tuning [29.05396521860764]
Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks.
It is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs.
We propose a Pre-trained Visual.
efficient (PVP) Tuning framework, which pre-trains the parameter-efficient tuning modules first and then leverages the pre-trained modules.
arXiv Detail & Related papers (2023-04-26T15:55:29Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z) - LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer
Learning [82.93130407930762]
It is costly to update the entire parameter set of large pre-trained models.
PETL techniques allow updating a small subset of parameters inside a pre-trained backbone network for a new task.
We propose Ladder Side-Tuning (LST), a new PETL technique that reduces training memory requirements by more substantial amounts.
arXiv Detail & Related papers (2022-06-13T23:51:56Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.