Related papers: A Comprehensive Analysis of Adapter Efficiency

A Comprehensive Analysis of Adapter Efficiency

URL: http://arxiv.org/abs/2305.07491v1
Date: Fri, 12 May 2023 14:05:45 GMT
Title: A Comprehensive Analysis of Adapter Efficiency
Authors: Nandini Mundra, Sumanth Doddapaneni, Raj Dabre, Anoop Kunchukuttan, Ratish Puduppully, Mitesh M. Khapra
Abstract summary: We show that for Natural Language Understanding (NLU) tasks, the parameter efficiency in adapters does not translate to efficiency gains compared to full fine-tuning of models. We recommend that for moderately sized models for NLU tasks, practitioners should rely on full fine-tuning or multi-task training rather than using adapters.
Score: 20.63580880344425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adapters have been positioned as a parameter-efficient fine-tuning (PEFT) approach, whereby a minimal number of parameters are added to the model and fine-tuned. However, adapters have not been sufficiently analyzed to understand if PEFT translates to benefits in training/deployment efficiency and maintainability/extensibility. Through extensive experiments on many adapters, tasks, and languages in supervised and cross-lingual zero-shot settings, we clearly show that for Natural Language Understanding (NLU) tasks, the parameter efficiency in adapters does not translate to efficiency gains compared to full fine-tuning of models. More precisely, adapters are relatively expensive to train and have slightly higher deployment latency. Furthermore, the maintainability/extensibility benefits of adapters can be achieved with simpler approaches like multi-task training via full fine-tuning, which also provide relatively faster training times. We, therefore, recommend that for moderately sized models for NLU tasks, practitioners should rely on full fine-tuning or multi-task training rather than using adapters. Our code is available at https://github.com/AI4Bharat/adapter-efficiency.

Related papers

Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves [123.07450481623124]
We propose Skip Tuning as a novel paradigm for adapting vision-language models to downstream tasks. Unlike existing PT or adapter-based methods, Skip Tuning applies Layer-wise Skipping (LSkip) and Class-wise Skipping (CSkip) upon the FT baseline without introducing extra context vectors or adapter modules.
arXiv Detail & Related papers (2024-12-16T07:33:23Z)
MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning [20.68925288222065]
Mixture of Sparse Adapters, or MoSA, is a novel Adapter Tuning method. MoSA can achieve significantly better performance than standard without any additional computational storage overhead. MoSA consistently outperforms other Adapter Tuning methods as well as other baselines by a large margin.
arXiv Detail & Related papers (2023-12-05T17:50:55Z)
MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion. Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z)
SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters [96.52807311742198]
We re-examine the parameter-efficiency of Adapters through the lens of network pruning. We find that SparseAdapter can achieve comparable or better performance than standard Adapters when the sparse ratio reaches up to 80%.
arXiv Detail & Related papers (2022-10-09T15:28:48Z)
To Adapt or to Fine-tune: A Case Study on Abstractive Summarization [7.353994554197792]
Recent advances in the field of abstractive summarization leverage pre-trained language models rather than train a model from scratch. Such models are sluggish to train and accompanied by a massive overhead. It remains uncertain whether using adapters benefits the task of summarization, in terms of improved efficiency without an unpleasant sacrifice in performance.
arXiv Detail & Related papers (2022-08-30T22:48:28Z)
AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters. This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation. We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z)
AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z)
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks [71.40656211497162]
Recently, fine-tuning language models pre-trained on large text corpora have provided huge improvements on vision-and-language (V&L) tasks. We introduce adapter-based parameter-efficient transfer learning techniques to V&L models such as VL-BART and VL-T5. Our results demonstrate that training the adapter with the weight-sharing technique can match the performance of fine-tuning the entire model.
arXiv Detail & Related papers (2021-12-13T17:35:26Z)
AdapterDrop: On the Efficiency of Adapters in Transformers [53.845909603631945]
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters.
arXiv Detail & Related papers (2020-10-22T17:49:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.