Related papers: MerA: Merging Pretrained Adapters For Few-Shot Learning

MerA: Merging Pretrained Adapters For Few-Shot Learning

URL: http://arxiv.org/abs/2308.15982v1
Date: Wed, 30 Aug 2023 12:10:17 GMT
Title: MerA: Merging Pretrained Adapters For Few-Shot Learning
Authors: Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, Dacheng Tao
Abstract summary: We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion. Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
Score: 71.44422347502409
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks. However, it often yields subpar results in few-shot learning. AdapterFusion, which assembles pretrained adapters using composition layers tailored to specific tasks, is a possible solution but significantly increases trainable parameters and deployment costs. Despite this, our preliminary study reveals that even single adapters can outperform Adapterfusion in few-shot learning, urging us to propose \textbf{\texttt{Merging Pretrained Adapters}} (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion. Extensive experiments on two PLMs demonstrate that MerA achieves substantial improvements compared to both single adapters and AdapterFusion. To further enhance the capacity of MerA, we also introduce a simple yet effective technique, referred to as the "\textit{same-track}" setting, that merges adapters from the same track of pretraining tasks. With the implementation of the "\textit{same-track}" setting, we observe even more impressive gains, surpassing the performance of both full fine-tuning and adapter tuning by a substantial margin, e.g., 3.5\% in MRPC and 5.0\% in MNLI.

Related papers

Improving Robustness of Foundation Models in Domain Adaptation with Soup-Adapters [0.0]
We show that by training multiple independent adapters and averaging their outputs, the new model has a higher performance and is more robust to distribution shifts compared to any individual adapter.<n>This is also the first study to explore CLIP adapter-style techniques for DINOv2 and to directly compare them with CLIP in this setting.
arXiv Detail & Related papers (2025-07-08T09:26:10Z)
MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning [20.68925288222065]
Mixture of Sparse Adapters, or MoSA, is a novel Adapter Tuning method. MoSA can achieve significantly better performance than standard without any additional computational storage overhead. MoSA consistently outperforms other Adapter Tuning methods as well as other baselines by a large margin.
arXiv Detail & Related papers (2023-12-05T17:50:55Z)
Multi-Head Adapter Routing for Cross-Task Generalization [56.75667096355806]
Polytropon learns an inventory of adapters and a routing function that selects a subset of adapters for each task during both pre-training and few-shot adaptation. We find that routing is most beneficial during multi-task pre-training rather than during few-shot adaptation.
arXiv Detail & Related papers (2022-11-07T19:35:55Z)
SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters [96.52807311742198]
We re-examine the parameter-efficiency of Adapters through the lens of network pruning. We find that SparseAdapter can achieve comparable or better performance than standard Adapters when the sparse ratio reaches up to 80%.
arXiv Detail & Related papers (2022-10-09T15:28:48Z)
AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters. This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation. We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z)
Adaptable Adapters [74.65986170056945]
State-of-the-art pretrained NLP models contain a hundred million to trillion parameters. Adaptable adapters contain different activation functions for different layers and different input data. We show that adaptable adapters achieve on-par performances with the standard adapter architecture.
arXiv Detail & Related papers (2022-05-03T14:59:27Z)
AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z)
On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation [36.37565646597464]
adapter-based tuning works by adding light-weight adapter modules to a pretrained language model (PrLM) It adds only a few trainable parameters per new task, allowing a high degree of parameter sharing. We demonstrate that adapter-based tuning outperforms fine-tuning on low-resource and cross-lingual tasks.
arXiv Detail & Related papers (2021-06-06T16:10:12Z)
AdapterDrop: On the Efficiency of Adapters in Transformers [53.845909603631945]
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters.
arXiv Detail & Related papers (2020-10-22T17:49:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.