AdapterDrop: On the Efficiency of Adapters in Transformers
- URL: http://arxiv.org/abs/2010.11918v2
- Date: Tue, 5 Oct 2021 18:37:04 GMT
- Title: AdapterDrop: On the Efficiency of Adapters in Transformers
- Authors: Andreas R\"uckl\'e, Gregor Geigle, Max Glockner, Tilman Beck, Jonas
Pfeiffer, Nils Reimers, Iryna Gurevych
- Abstract summary: Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements.
Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters.
- Score: 53.845909603631945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Massively pre-trained transformer models are computationally expensive to
fine-tune, slow for inference, and have large storage requirements. Recent
approaches tackle these shortcomings by training smaller models, dynamically
reducing the model size, and by training light-weight adapters. In this paper,
we propose AdapterDrop, removing adapters from lower transformer layers during
training and inference, which incorporates concepts from all three directions.
We show that AdapterDrop can dynamically reduce the computational overhead when
performing inference over multiple tasks simultaneously, with minimal decrease
in task performances. We further prune adapters from AdapterFusion, which
improves the inference efficiency while maintaining the task performances
entirely.
Related papers
- Mini but Mighty: Finetuning ViTs with Mini Adapters [7.175668563148084]
adapters perform poorly when the dimension of adapters is small.
We propose MiMi, a training framework that addresses this issue.
Our method outperforms existing methods in finding the best trade-off between accuracy and trained parameters.
arXiv Detail & Related papers (2023-11-07T10:41:27Z) - MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z) - A Comprehensive Analysis of Adapter Efficiency [20.63580880344425]
We show that for Natural Language Understanding (NLU) tasks, the parameter efficiency in adapters does not translate to efficiency gains compared to full fine-tuning of models.
We recommend that for moderately sized models for NLU tasks, practitioners should rely on full fine-tuning or multi-task training rather than using adapters.
arXiv Detail & Related papers (2023-05-12T14:05:45Z) - SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency
of Adapters [96.52807311742198]
We re-examine the parameter-efficiency of Adapters through the lens of network pruning.
We find that SparseAdapter can achieve comparable or better performance than standard Adapters when the sparse ratio reaches up to 80%.
arXiv Detail & Related papers (2022-10-09T15:28:48Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - AdapterHub: A Framework for Adapting Transformers [148.6877231725939]
AdapterHub is a framework that allows dynamic "stitching-in" of pre-trained adapters for different tasks and languages.
Our framework enables scalable and easy access to sharing of task-specific models.
arXiv Detail & Related papers (2020-07-15T15:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.