A Comprehensive Analysis of Adapter Efficiency
- URL: http://arxiv.org/abs/2305.07491v1
- Date: Fri, 12 May 2023 14:05:45 GMT
- Title: A Comprehensive Analysis of Adapter Efficiency
- Authors: Nandini Mundra, Sumanth Doddapaneni, Raj Dabre, Anoop Kunchukuttan,
Ratish Puduppully, Mitesh M. Khapra
- Abstract summary: We show that for Natural Language Understanding (NLU) tasks, the parameter efficiency in adapters does not translate to efficiency gains compared to full fine-tuning of models.
We recommend that for moderately sized models for NLU tasks, practitioners should rely on full fine-tuning or multi-task training rather than using adapters.
- Score: 20.63580880344425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adapters have been positioned as a parameter-efficient fine-tuning (PEFT)
approach, whereby a minimal number of parameters are added to the model and
fine-tuned. However, adapters have not been sufficiently analyzed to understand
if PEFT translates to benefits in training/deployment efficiency and
maintainability/extensibility. Through extensive experiments on many adapters,
tasks, and languages in supervised and cross-lingual zero-shot settings, we
clearly show that for Natural Language Understanding (NLU) tasks, the parameter
efficiency in adapters does not translate to efficiency gains compared to full
fine-tuning of models. More precisely, adapters are relatively expensive to
train and have slightly higher deployment latency. Furthermore, the
maintainability/extensibility benefits of adapters can be achieved with simpler
approaches like multi-task training via full fine-tuning, which also provide
relatively faster training times. We, therefore, recommend that for moderately
sized models for NLU tasks, practitioners should rely on full fine-tuning or
multi-task training rather than using adapters. Our code is available at
https://github.com/AI4Bharat/adapter-efficiency.
Related papers
- MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning [20.68925288222065]
Mixture of Sparse Adapters, or MoSA, is a novel Adapter Tuning method.
MoSA can achieve significantly better performance than standard without any additional computational storage overhead.
MoSA consistently outperforms other Adapter Tuning methods as well as other baselines by a large margin.
arXiv Detail & Related papers (2023-12-05T17:50:55Z) - MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z) - SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency
of Adapters [96.52807311742198]
We re-examine the parameter-efficiency of Adapters through the lens of network pruning.
We find that SparseAdapter can achieve comparable or better performance than standard Adapters when the sparse ratio reaches up to 80%.
arXiv Detail & Related papers (2022-10-09T15:28:48Z) - To Adapt or to Fine-tune: A Case Study on Abstractive Summarization [7.353994554197792]
Recent advances in the field of abstractive summarization leverage pre-trained language models rather than train a model from scratch.
Such models are sluggish to train and accompanied by a massive overhead.
It remains uncertain whether using adapters benefits the task of summarization, in terms of improved efficiency without an unpleasant sacrifice in performance.
arXiv Detail & Related papers (2022-08-30T22:48:28Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - VL-Adapter: Parameter-Efficient Transfer Learning for
Vision-and-Language Tasks [71.40656211497162]
Recently, fine-tuning language models pre-trained on large text corpora have provided huge improvements on vision-and-language (V&L) tasks.
We introduce adapter-based parameter-efficient transfer learning techniques to V&L models such as VL-BART and VL-T5.
Our results demonstrate that training the adapter with the weight-sharing technique can match the performance of fine-tuning the entire model.
arXiv Detail & Related papers (2021-12-13T17:35:26Z) - AdapterDrop: On the Efficiency of Adapters in Transformers [53.845909603631945]
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements.
Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters.
arXiv Detail & Related papers (2020-10-22T17:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.