AdapterSoup: Weight Averaging to Improve Generalization of Pretrained
Language Models
- URL: http://arxiv.org/abs/2302.07027v3
- Date: Tue, 28 Mar 2023 13:37:54 GMT
- Title: AdapterSoup: Weight Averaging to Improve Generalization of Pretrained
Language Models
- Authors: Alexandra Chronopoulou, Matthew E. Peters, Alexander Fraser, Jesse
Dodge
- Abstract summary: Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains.
A solution is to use a related-domain adapter for the novel domain at test time.
We introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains.
- Score: 127.04370753583261
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained language models (PLMs) are trained on massive corpora, but often
need to specialize to specific domains. A parameter-efficient adaptation method
suggests training an adapter for each domain on the task of language modeling.
This leads to good in-domain scores but can be impractical for domain- or
resource-restricted settings. A solution is to use a related-domain adapter for
the novel domain at test time. In this paper, we introduce AdapterSoup, an
approach that performs weight-space averaging of adapters trained on different
domains. Our approach is embarrassingly parallel: first, we train a set of
domain-specific adapters; then, for each novel domain, we determine which
adapters should be averaged at test time. We present extensive experiments
showing that AdapterSoup consistently improves performance to new domains
without extra training. We also explore weight averaging of adapters trained on
the same domain with different hyper-parameters, and show that it preserves the
performance of a PLM on new domains while obtaining strong in-domain results.
We explore various approaches for choosing which adapters to combine, such as
text clustering and semantic similarity. We find that using clustering leads to
the most competitive results on novel domains.
Related papers
- Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation [40.667166043101076]
We propose a small adapter for rectifying diverse target domain styles to the source domain.
The adapter is trained to rectify the image features from diverse synthesized target domains to align with the source domain.
Our method achieves promising results on cross-domain few-shot semantic segmentation tasks.
arXiv Detail & Related papers (2024-04-16T07:07:40Z) - Plug-and-Play Transformer Modules for Test-Time Adaptation [54.80435317208111]
We introduce PLUTO: a Plug-and-pLay modUlar Test-time domain adaptatiOn strategy.
We pre-train a large set of modules, each specialized for different source domains.
We harness multiple most-relevant source domains in a single inference call.
arXiv Detail & Related papers (2024-01-06T00:24:50Z) - Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to
Pre-trained Language Models Memories [31.995033685838962]
Pre-trained language models (PLMs) demonstrate excellent abilities to understand texts in the generic domain while struggling in a specific domain.
In this paper, we investigate whether we can adapt PLMs both effectively and efficiently by only tuning a few parameters.
Specifically, we decouple the feed-forward networks (FFNs) of the Transformer architecture into two parts: the original pre-trained FFNs to maintain the old-domain knowledge and our novel domain-specific adapters to inject domain-specific knowledge in parallel.
arXiv Detail & Related papers (2023-06-08T17:54:36Z) - UDApter -- Efficient Domain Adaptation Using Adapters [29.70751969196527]
We propose two methods to make unsupervised domain adaptation more parameter efficient.
The first method deconstructs UDA into a two-step process: first by adding a domain adapter to learn domain-invariant information.
We are within 0.85% F1 for natural language inference task, by fine-tuning only a fraction of the full model parameters.
arXiv Detail & Related papers (2023-02-07T02:04:17Z) - $m^4Adapter$: Multilingual Multi-Domain Adaptation for Machine
Translation with a Meta-Adapter [128.69723410769586]
Multilingual neural machine translation models (MNMT) yield state-of-the-art performance when evaluated on data from a domain and language pair.
When a MNMT model is used to translate under domain shift or to a new language pair, performance drops dramatically.
We propose $m4Adapter$, which combines domain and language knowledge using meta-learning with adapters.
arXiv Detail & Related papers (2022-10-21T12:25:05Z) - Efficient Hierarchical Domain Adaptation for Pretrained Language Models [77.02962815423658]
Generative language models are trained on diverse, general domain corpora.
We introduce a method to scale domain adaptation to many diverse domains using a computationally efficient adapter approach.
arXiv Detail & Related papers (2021-12-16T11:09:29Z) - Unsupervised Domain Adaptation with Adapter [34.22467238579088]
This paper explores an adapter-based fine-tuning approach for unsupervised domain adaptation.
Several trainable adapter modules are inserted in a PrLM, and the embedded generic knowledge is preserved by fixing the parameters of the original PrLM.
Elaborated experiments on two benchmark datasets are carried out, and the results demonstrate that our approach is effective with different tasks, dataset sizes, and domain similarities.
arXiv Detail & Related papers (2021-11-01T02:50:53Z) - Multilingual Domain Adaptation for NMT: Decoupling Language and Domain
Information with Adapters [66.7986513246294]
We study the compositionality of language and domain adapters in the context of Machine Translation.
We find that in the partial resource scenario a naive combination of domain-specific and language-specific adapters often results in catastrophic forgetting' of the missing languages.
arXiv Detail & Related papers (2021-10-18T18:55:23Z) - Non-Parametric Unsupervised Domain Adaptation for Neural Machine
Translation [61.27321597981737]
$k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor retrieval.
We propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval.
arXiv Detail & Related papers (2021-09-14T11:50:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.