Parameter-efficient is not sufficient: Exploring Parameter, Memory, and
Time Efficient Adapter Tuning for Dense Predictions
- URL: http://arxiv.org/abs/2306.09729v2
- Date: Mon, 27 Nov 2023 12:58:59 GMT
- Title: Parameter-efficient is not sufficient: Exploring Parameter, Memory, and
Time Efficient Adapter Tuning for Dense Predictions
- Authors: Dongshuo Yin and Xueting Han and Bin Li and Hao Feng and Jing Bai
- Abstract summary: parameter-efficient transfer learning (PETL) methods have shown promising performance in adapting to downstream tasks with only a few trainable parameters.
PETL methods in computer vision (CV) can be computationally expensive and require large amounts of memory and time cost during training.
mathrmE3VA$ can save up to 62.2% training memory and 26.2% training time on average.
- Score: 9.068569788978854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training & fine-tuning is a prevalent paradigm in computer vision (CV).
Recently, parameter-efficient transfer learning (PETL) methods have shown
promising performance in adapting to downstream tasks with only a few trainable
parameters. Despite their success, the existing PETL methods in CV can be
computationally expensive and require large amounts of memory and time cost
during training, which limits low-resource users from conducting research and
applications on large models. In this work, we propose Parameter, Memory, and
Time Efficient Visual Adapter ($\mathrm{E^3VA}$) tuning to address this issue.
We provide a gradient backpropagation highway for low-rank adapters which
eliminates the need for expensive backpropagation through the frozen
pre-trained model, resulting in substantial savings of training memory and
training time. Furthermore, we optimise the $\mathrm{E^3VA}$ structure for CV
tasks to promote model performance. Extensive experiments on COCO, ADE20K, and
Pascal VOC benchmarks show that $\mathrm{E^3VA}$ can save up to 62.2% training
memory and 26.2% training time on average, while achieving comparable
performance to full fine-tuning and better performance than most PETL methods.
Note that we can even train the Swin-Large-based Cascade Mask RCNN on GTX
1080Ti GPUs with less than 1.5% trainable parameters.
Related papers
- CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive Tensor Optimization [10.319009303849109]
Training large AI models such as deep learning recommendation systems and foundation language (or multi-modal) models costs massive GPU and computing time.
CoMERA achieves end-to-end rank-adaptive tensor-compressed training via a multi-objective optimization formulation.
CoMERA is $2times$ faster per training epoch and $9times$ more memory-efficient than GaLore on a tested six-encoder transformer with single-batch training.
arXiv Detail & Related papers (2024-05-23T09:52:15Z) - GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection [133.45193150403537]
Training Large Language Models (LLMs) presents significant memory challenges due to the growing size of weights and GPU states.
In this work, we propose Gradient Low-Rank Projection (GaLore) as a memory-efficient training strategy.
Our 8-bit GaLore further reduces memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline.
arXiv Detail & Related papers (2024-03-06T07:29:57Z) - Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning [19.17362588650503]
Low-rank Attention Side-Tuning (LAST) trains a side-network composed of only low-rank self-attention modules.
We show LAST can be highly parallel across multiple optimization objectives, making it very efficient in downstream task adaptation.
arXiv Detail & Related papers (2024-02-06T14:03:15Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - DTL: Disentangled Transfer Learning for Visual Recognition [21.549234013998255]
We introduce Disentangled Transfer Learning (DTL), which disentangles the trainable parameters from the backbone using a lightweight Compact Side Network (CSN)
The proposed method not only reduces a large amount of GPU memory usage and trainable parameters, but also outperforms existing PETL methods by a significant margin in accuracy.
arXiv Detail & Related papers (2023-12-13T02:51:26Z) - Towards Efficient Visual Adaption via Structural Re-parameterization [76.57083043547296]
We propose a parameter-efficient and computational friendly adapter for giant vision models, called RepAdapter.
RepAdapter outperforms full tuning by +7.2% on average and saves up to 25% training time, 20% GPU memory, and 94.6% storage cost of ViT-B/16 on VTAB-1k.
arXiv Detail & Related papers (2023-02-16T06:14:15Z) - LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer
Learning [82.93130407930762]
It is costly to update the entire parameter set of large pre-trained models.
PETL techniques allow updating a small subset of parameters inside a pre-trained backbone network for a new task.
We propose Ladder Side-Tuning (LST), a new PETL technique that reduces training memory requirements by more substantial amounts.
arXiv Detail & Related papers (2022-06-13T23:51:56Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.