Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning
- URL: http://arxiv.org/abs/2402.04009v1
- Date: Tue, 6 Feb 2024 14:03:15 GMT
- Title: Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning
- Authors: Ningyuan Tang, Minghao Fu, Ke Zhu, Jianxin Wu
- Abstract summary: Low-rank Attention Side-Tuning (LAST) trains a side-network composed of only low-rank self-attention modules.
We show LAST can be highly parallel across multiple optimization objectives, making it very efficient in downstream task adaptation.
- Score: 19.17362588650503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In finetuning a large pretrained model to downstream tasks,
parameter-efficient fine-tuning (PEFT) methods can effectively finetune
pretrained models with few trainable parameters, but suffer from high GPU
memory consumption and slow training speed. Because learnable parameters from
these methods are entangled with the pretrained model, gradients related to the
frozen pretrained model's parameters have to be computed and stored during
finetuning. We propose Low-rank Attention Side-Tuning (LAST), which
disentangles the trainable module from the pretrained model by freezing not
only parameters but also outputs of the pretrained network. LAST trains a
side-network composed of only low-rank self-attention modules. By viewing the
pretrained model as a frozen feature extractor, the side-network takes
intermediate output from the pretrained model and focus on learning
task-specific knowledge. We also show that LAST can be highly parallel across
multiple optimization objectives, making it very efficient in downstream task
adaptation, for example, in finding optimal hyperparameters. LAST outperforms
previous state-of-the-art methods on VTAB-1K and other visual adaptation tasks
with roughly only 30\% of GPU memory footprint and 60\% of training time
compared to existing PEFT methods, but achieves significantly higher accuracy.
Related papers
- SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters.
Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks.
Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z) - Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models.
Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs.
In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - PVP: Pre-trained Visual Parameter-Efficient Tuning [29.05396521860764]
Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks.
It is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs.
We propose a Pre-trained Visual.
efficient (PVP) Tuning framework, which pre-trains the parameter-efficient tuning modules first and then leverages the pre-trained modules.
arXiv Detail & Related papers (2023-04-26T15:55:29Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.