Related papers: Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy

Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy

URL: http://arxiv.org/abs/2307.16867v1
Date: Mon, 31 Jul 2023 17:22:17 GMT
Title: Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy
Authors: Shibo Jie, Haoqing Wang, Zhi-Hong Deng
Abstract summary: Current state-of-the-art results in computer vision depend in part on fine-tuning large pre-trained vision models. With the exponential growth of model sizes, the conventional full fine-tuning leads to increasingly huge storage and transmission overhead. In this paper, we investigate how to make adapters even more efficient, reaching a new minimum size required to store a task-specific fine-tuned network.
Score: 17.203320079872952
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Current state-of-the-art results in computer vision depend in part on fine-tuning large pre-trained vision models. However, with the exponential growth of model sizes, the conventional full fine-tuning, which needs to store a individual network copy for each tasks, leads to increasingly huge storage and transmission overhead. Adapter-based Parameter-Efficient Tuning (PET) methods address this challenge by tuning lightweight adapters inserted into the frozen pre-trained models. In this paper, we investigate how to make adapters even more efficient, reaching a new minimum size required to store a task-specific fine-tuned network. Inspired by the observation that the parameters of adapters converge at flat local minima, we find that adapters are resistant to noise in parameter space, which means they are also resistant to low numerical precision. To train low-precision adapters, we propose a computational-efficient quantization method which minimizes the quantization error. Through extensive experiments, we find that low-precision adapters exhibit minimal performance degradation, and even 1-bit precision is sufficient for adapters. The experimental results demonstrate that 1-bit adapters outperform all other PET methods on both the VTAB-1K benchmark and few-shot FGVC tasks, while requiring the smallest storage size. Our findings show, for the first time, the significant potential of quantization techniques in PET, providing a general solution to enhance the parameter efficiency of adapter-based PET methods. Code: https://github.com/JieShibo/PETL-ViT

Related papers

Towards Optimal Adapter Placement for Efficient Transfer Learning [73.1149084352343]
PETL aims to adapt pre-trained models to new downstream tasks while minimizing the number of fine-tuned parameters. adapters, a popular approach in PETL, inject additional capacity into existing networks by incorporating low-rank projections. This paper investigates the relationship between the placement of an adapter and its performance.
arXiv Detail & Related papers (2024-10-21T10:37:17Z)
Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models [108.08773541490191]
Pre-trained Language models (PLMs) have a huge amount of parameters, fine-tuning them is often expensive and time consuming. It is necessary to adopt a parameter-efficient approach to reduce parameters of PLMs in fine-tuning without compromising their performance in downstream tasks. In this paper, we design a novel adapter which only acts on self-attention outputs in PLMs.
arXiv Detail & Related papers (2024-07-04T18:21:28Z)
Mini but Mighty: Finetuning ViTs with Mini Adapters [7.175668563148084]
adapters perform poorly when the dimension of adapters is small. We propose MiMi, a training framework that addresses this issue. Our method outperforms existing methods in finding the best trade-off between accuracy and trained parameters.
arXiv Detail & Related papers (2023-11-07T10:41:27Z)
Parameter-Efficient Sparse Retrievers and Rerankers using Adapters [4.9545244468634655]
We study adapters for SPLADE, a sparse retriever, for which adapters retain the efficiency and effectiveness otherwise achieved by finetuning. We also address domain adaptation of neural retrieval thanks to adapters on cross-domain BEIR datasets and TripClick.
arXiv Detail & Related papers (2023-03-23T12:34:30Z)
Towards Efficient Visual Adaption via Structural Re-parameterization [76.57083043547296]
We propose a parameter-efficient and computational friendly adapter for giant vision models, called RepAdapter. RepAdapter outperforms full tuning by +7.2% on average and saves up to 25% training time, 20% GPU memory, and 94.6% storage cost of ViT-B/16 on VTAB-1k.
arXiv Detail & Related papers (2023-02-16T06:14:15Z)
Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters [25.958600375299735]
Adapter-tuning is a paradigm that transfers a pretrained language model to downstream tasks by adding and tuning a small number of new parameters. In this paper, we investigate the effectiveness of using tiny-attention -- i.e., attention with extremely small per-head dimensionality -- as adapters. Our tiny-attention adapter learns to modify the hidden states at each position directly conditioned on the hidden states at all the other positions.
arXiv Detail & Related papers (2022-10-18T15:20:44Z)
SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters [96.52807311742198]
We re-examine the parameter-efficiency of Adapters through the lens of network pruning. We find that SparseAdapter can achieve comparable or better performance than standard Adapters when the sparse ratio reaches up to 80%.
arXiv Detail & Related papers (2022-10-09T15:28:48Z)
AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters. This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation. We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z)
AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z)
AdapterDrop: On the Efficiency of Adapters in Transformers [53.845909603631945]
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters.
arXiv Detail & Related papers (2020-10-22T17:49:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.