On the Effectiveness of Adapter-based Tuning for Pretrained Language
Model Adaptation
- URL: http://arxiv.org/abs/2106.03164v1
- Date: Sun, 6 Jun 2021 16:10:12 GMT
- Title: On the Effectiveness of Adapter-based Tuning for Pretrained Language
Model Adaptation
- Authors: Ruidan He, Linlin Liu, Hai Ye, Qingyu Tan, Bosheng Ding, Liying Cheng,
Jia-Wei Low, Lidong Bing, Luo Si
- Abstract summary: adapter-based tuning works by adding light-weight adapter modules to a pretrained language model (PrLM)
It adds only a few trainable parameters per new task, allowing a high degree of parameter sharing.
We demonstrate that adapter-based tuning outperforms fine-tuning on low-resource and cross-lingual tasks.
- Score: 36.37565646597464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adapter-based tuning has recently arisen as an alternative to fine-tuning. It
works by adding light-weight adapter modules to a pretrained language model
(PrLM) and only updating the parameters of adapter modules when learning on a
downstream task. As such, it adds only a few trainable parameters per new task,
allowing a high degree of parameter sharing. Prior studies have shown that
adapter-based tuning often achieves comparable results to fine-tuning. However,
existing work only focuses on the parameter-efficient aspect of adapter-based
tuning while lacking further investigation on its effectiveness. In this paper,
we study the latter. We first show that adapter-based tuning better mitigates
forgetting issues than fine-tuning since it yields representations with less
deviation from those generated by the initial PrLM. We then empirically compare
the two tuning methods on several downstream NLP tasks and settings. We
demonstrate that 1) adapter-based tuning outperforms fine-tuning on
low-resource and cross-lingual tasks; 2) it is more robust to overfitting and
less sensitive to changes in learning rates.
Related papers
- Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning [20.68925288222065]
Mixture of Sparse Adapters, or MoSA, is a novel Adapter Tuning method.
MoSA can achieve significantly better performance than standard without any additional computational storage overhead.
MoSA consistently outperforms other Adapter Tuning methods as well as other baselines by a large margin.
arXiv Detail & Related papers (2023-12-05T17:50:55Z) - MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z) - Evaluating Parameter-Efficient Transfer Learning Approaches on SURE
Benchmark for Speech Understanding [40.27182770995891]
Fine-tuning is widely used as the default algorithm for transfer learning from pre-trained models.
We introduce the Speech UndeRstanding Evaluation (SURE) benchmark for parameter-efficient learning for various speech-processing tasks.
arXiv Detail & Related papers (2023-03-02T08:57:33Z) - CHAPTER: Exploiting Convolutional Neural Network Adapters for
Self-supervised Speech Models [62.60723685118747]
Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data.
We propose an efficient tuning method specifically designed for SSL speech model, by applying CNN adapters at the feature extractor.
We empirically found that adding CNN to the feature extractor can help the adaptation on emotion and speaker tasks.
arXiv Detail & Related papers (2022-12-01T08:50:12Z) - Tiny-Attention Adapter: Contexts Are More Important Than the Number of
Parameters [25.958600375299735]
Adapter-tuning is a paradigm that transfers a pretrained language model to downstream tasks by adding and tuning a small number of new parameters.
In this paper, we investigate the effectiveness of using tiny-attention -- i.e., attention with extremely small per-head dimensionality -- as adapters.
Our tiny-attention adapter learns to modify the hidden states at each position directly conditioned on the hidden states at all the other positions.
arXiv Detail & Related papers (2022-10-18T15:20:44Z) - Exploring Efficient-tuning Methods in Self-supervised Speech Models [53.633222197712875]
Self-supervised learning can learn powerful representations for different speech tasks.
In downstream tasks, the parameters of SSL models are frozen, and only the adapters are trained.
We show that the performance parity can be achieved with over 90% parameter reduction.
arXiv Detail & Related papers (2022-10-10T11:08:12Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.