Towards Efficient Visual Adaption via Structural Re-parameterization
- URL: http://arxiv.org/abs/2302.08106v2
- Date: Tue, 21 Mar 2023 02:51:27 GMT
- Title: Towards Efficient Visual Adaption via Structural Re-parameterization
- Authors: Gen Luo, Minglang Huang, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang,
Zhiyu Wang and Rongrong Ji
- Abstract summary: We propose a parameter-efficient and computational friendly adapter for giant vision models, called RepAdapter.
RepAdapter outperforms full tuning by +7.2% on average and saves up to 25% training time, 20% GPU memory, and 94.6% storage cost of ViT-B/16 on VTAB-1k.
- Score: 76.57083043547296
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parameter-efficient transfer learning (PETL) is an emerging research spot
aimed at inexpensively adapting large-scale pre-trained models to downstream
tasks. Recent advances have achieved great success in saving storage costs for
various pre-trained models by updating a small number of parameters instead of
full tuning. However, we notice that most existing PETL methods still incur
non-negligible latency during inference. In this paper, we propose a
parameter-efficient and computational friendly adapter for giant vision models,
called RepAdapter. Specifically, we first prove that common adaptation modules
can also be seamlessly integrated into most giant vision models via our
structural re-parameterization, thereby achieving zero-cost during inference.
We then investigate the sparse design and effective placement of adapter
structure, helping our RepAdaper obtain other advantages in terms of parameter
efficiency and performance. To validate RepAdapter, we conduct extensive
experiments on 27 benchmark datasets of three vision tasks, i.e., image and
video classifications and semantic segmentation. Experimental results show the
superior performance and efficiency of RepAdapter than the state-of-the-art
PETL methods. For instance, RepAdapter outperforms full tuning by +7.2% on
average and saves up to 25% training time, 20% GPU memory, and 94.6% storage
cost of ViT-B/16 on VTAB-1k. The generalization ability of RepAdapter is also
well validated by a bunch of vision models. Our source code is released at
https://github.com/luogen1996/RepAdapter.
Related papers
- Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition [22.615830919860777]
This paper presents an efficient visual recognition paradigm, called Dynamic Adapter (Dyn-Adapter)
We devise a dynamic architecture with balanced early heads for multi-level feature extraction, along with adaptive training strategy.
We reduce FLOPs during inference by 50%, while maintaining or even yielding higher recognition accuracy.
arXiv Detail & Related papers (2024-07-19T13:33:38Z) - Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models.
Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs.
In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - Efficient Adaptation of Large Vision Transformer via Adapter
Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision.
We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation.
Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z) - Revisiting the Parameter Efficiency of Adapters from the Perspective of
Precision Redundancy [17.203320079872952]
Current state-of-the-art results in computer vision depend in part on fine-tuning large pre-trained vision models.
With the exponential growth of model sizes, the conventional full fine-tuning leads to increasingly huge storage and transmission overhead.
In this paper, we investigate how to make adapters even more efficient, reaching a new minimum size required to store a task-specific fine-tuned network.
arXiv Detail & Related papers (2023-07-31T17:22:17Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Parameter-Efficient Sparse Retrievers and Rerankers using Adapters [4.9545244468634655]
We study adapters for SPLADE, a sparse retriever, for which adapters retain the efficiency and effectiveness otherwise achieved by finetuning.
We also address domain adaptation of neural retrieval thanks to adapters on cross-domain BEIR datasets and TripClick.
arXiv Detail & Related papers (2023-03-23T12:34:30Z) - UniAdapter: Unified Parameter-Efficient Transfer Learning for
Cross-modal Modeling [49.134517040512414]
This paper proposes UniAdapter, which unifies unimodal and multimodal adapters for parameter-efficient cross-modal adaptation on vision-language models.
Experiments show that UniAdapter not only outperforms the state-of-the-arts, but even beats the full fine-tuning strategy.
arXiv Detail & Related papers (2023-02-13T18:59:10Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Visual Prompt Tuning [74.5309408185523]
This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision.
VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen.
arXiv Detail & Related papers (2022-03-23T01:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.