Related papers: AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

URL: http://arxiv.org/abs/2302.07027v3
Date: Tue, 28 Mar 2023 13:37:54 GMT
Title: AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
Authors: Alexandra Chronopoulou, Matthew E. Peters, Alexander Fraser, Jesse Dodge
Abstract summary: Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A solution is to use a related-domain adapter for the novel domain at test time. We introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains.
Score: 127.04370753583261
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel domain at test time. In this paper, we introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains. Our approach is embarrassingly parallel: first, we train a set of domain-specific adapters; then, for each novel domain, we determine which adapters should be averaged at test time. We present extensive experiments showing that AdapterSoup consistently improves performance to new domains without extra training. We also explore weight averaging of adapters trained on the same domain with different hyper-parameters, and show that it preserves the performance of a PLM on new domains while obtaining strong in-domain results. We explore various approaches for choosing which adapters to combine, such as text clustering and semantic similarity. We find that using clustering leads to the most competitive results on novel domains.

Related papers

Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation [72.28364940168092]
Open-vocabulary semantic segmentation models associate vision and text to label pixels from an undefined set of classes using textual queries. We introduce Semantic Library Adaptation (SemLA), a novel framework for training-free, test-time domain adaptation.
arXiv Detail & Related papers (2025-03-27T17:59:58Z)
Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation [40.667166043101076]
We propose a small adapter for rectifying diverse target domain styles to the source domain. The adapter is trained to rectify the image features from diverse synthesized target domains to align with the source domain. Our method achieves promising results on cross-domain few-shot semantic segmentation tasks.
arXiv Detail & Related papers (2024-04-16T07:07:40Z)
Plug-and-Play Transformer Modules for Test-Time Adaptation [54.80435317208111]
We introduce PLUTO: a Plug-and-pLay modUlar Test-time domain adaptatiOn strategy. We pre-train a large set of modules, each specialized for different source domains. We harness multiple most-relevant source domains in a single inference call.
arXiv Detail & Related papers (2024-01-06T00:24:50Z)
Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories [31.995033685838962]
Pre-trained language models (PLMs) demonstrate excellent abilities to understand texts in the generic domain while struggling in a specific domain. In this paper, we investigate whether we can adapt PLMs both effectively and efficiently by only tuning a few parameters. Specifically, we decouple the feed-forward networks (FFNs) of the Transformer architecture into two parts: the original pre-trained FFNs to maintain the old-domain knowledge and our novel domain-specific adapters to inject domain-specific knowledge in parallel.
arXiv Detail & Related papers (2023-06-08T17:54:36Z)
UDApter -- Efficient Domain Adaptation Using Adapters [29.70751969196527]
We propose two methods to make unsupervised domain adaptation more parameter efficient. The first method deconstructs UDA into a two-step process: first by adding a domain adapter to learn domain-invariant information. We are within 0.85% F1 for natural language inference task, by fine-tuning only a fraction of the full model parameters.
arXiv Detail & Related papers (2023-02-07T02:04:17Z)
$m^4Adapter$: Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter [128.69723410769586]
Multilingual neural machine translation models (MNMT) yield state-of-the-art performance when evaluated on data from a domain and language pair. When a MNMT model is used to translate under domain shift or to a new language pair, performance drops dramatically. We propose $m4Adapter$, which combines domain and language knowledge using meta-learning with adapters.
arXiv Detail & Related papers (2022-10-21T12:25:05Z)
Efficient Hierarchical Domain Adaptation for Pretrained Language Models [77.02962815423658]
Generative language models are trained on diverse, general domain corpora. We introduce a method to scale domain adaptation to many diverse domains using a computationally efficient adapter approach.
arXiv Detail & Related papers (2021-12-16T11:09:29Z)
Unsupervised Domain Adaptation with Adapter [34.22467238579088]
This paper explores an adapter-based fine-tuning approach for unsupervised domain adaptation. Several trainable adapter modules are inserted in a PrLM, and the embedded generic knowledge is preserved by fixing the parameters of the original PrLM. Elaborated experiments on two benchmark datasets are carried out, and the results demonstrate that our approach is effective with different tasks, dataset sizes, and domain similarities.
arXiv Detail & Related papers (2021-11-01T02:50:53Z)
Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters [66.7986513246294]
We study the compositionality of language and domain adapters in the context of Machine Translation. We find that in the partial resource scenario a naive combination of domain-specific and language-specific adapters often results in catastrophic forgetting' of the missing languages.
arXiv Detail & Related papers (2021-10-18T18:55:23Z)
Non-Parametric Unsupervised Domain Adaptation for Neural Machine Translation [61.27321597981737]
$k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor retrieval. We propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval.
arXiv Detail & Related papers (2021-09-14T11:50:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.