MerA: Merging Pretrained Adapters For Few-Shot Learning
- URL: http://arxiv.org/abs/2308.15982v1
- Date: Wed, 30 Aug 2023 12:10:17 GMT
- Title: MerA: Merging Pretrained Adapters For Few-Shot Learning
- Authors: Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, Dacheng Tao
- Abstract summary: We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
- Score: 71.44422347502409
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adapter tuning, which updates only a few parameters, has become a mainstream
method for fine-tuning pretrained language models to downstream tasks. However,
it often yields subpar results in few-shot learning. AdapterFusion, which
assembles pretrained adapters using composition layers tailored to specific
tasks, is a possible solution but significantly increases trainable parameters
and deployment costs. Despite this, our preliminary study reveals that even
single adapters can outperform Adapterfusion in few-shot learning, urging us to
propose \textbf{\texttt{Merging Pretrained Adapters}} (MerA) that efficiently
incorporates pretrained adapters to a single model through model fusion.
Extensive experiments on two PLMs demonstrate that MerA achieves substantial
improvements compared to both single adapters and AdapterFusion. To further
enhance the capacity of MerA, we also introduce a simple yet effective
technique, referred to as the "\textit{same-track}" setting, that merges
adapters from the same track of pretraining tasks. With the implementation of
the "\textit{same-track}" setting, we observe even more impressive gains,
surpassing the performance of both full fine-tuning and adapter tuning by a
substantial margin, e.g., 3.5\% in MRPC and 5.0\% in MNLI.
Related papers
- MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning [20.68925288222065]
Mixture of Sparse Adapters, or MoSA, is a novel Adapter Tuning method.
MoSA can achieve significantly better performance than standard without any additional computational storage overhead.
MoSA consistently outperforms other Adapter Tuning methods as well as other baselines by a large margin.
arXiv Detail & Related papers (2023-12-05T17:50:55Z) - Multi-Head Adapter Routing for Cross-Task Generalization [56.75667096355806]
Polytropon learns an inventory of adapters and a routing function that selects a subset of adapters for each task during both pre-training and few-shot adaptation.
We find that routing is most beneficial during multi-task pre-training rather than during few-shot adaptation.
arXiv Detail & Related papers (2022-11-07T19:35:55Z) - SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency
of Adapters [96.52807311742198]
We re-examine the parameter-efficiency of Adapters through the lens of network pruning.
We find that SparseAdapter can achieve comparable or better performance than standard Adapters when the sparse ratio reaches up to 80%.
arXiv Detail & Related papers (2022-10-09T15:28:48Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z) - Adaptable Adapters [74.65986170056945]
State-of-the-art pretrained NLP models contain a hundred million to trillion parameters.
Adaptable adapters contain different activation functions for different layers and different input data.
We show that adaptable adapters achieve on-par performances with the standard adapter architecture.
arXiv Detail & Related papers (2022-05-03T14:59:27Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - On the Effectiveness of Adapter-based Tuning for Pretrained Language
Model Adaptation [36.37565646597464]
adapter-based tuning works by adding light-weight adapter modules to a pretrained language model (PrLM)
It adds only a few trainable parameters per new task, allowing a high degree of parameter sharing.
We demonstrate that adapter-based tuning outperforms fine-tuning on low-resource and cross-lingual tasks.
arXiv Detail & Related papers (2021-06-06T16:10:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.