Plug-and-Play Transformer Modules for Test-Time Adaptation
- URL: http://arxiv.org/abs/2401.04130v3
- Date: Thu, 8 Feb 2024 22:13:45 GMT
- Title: Plug-and-Play Transformer Modules for Test-Time Adaptation
- Authors: Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler,
Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury
- Abstract summary: We introduce PLUTO: a Plug-and-pLay modUlar Test-time domain adaptatiOn strategy.
We pre-train a large set of modules, each specialized for different source domains.
We harness multiple most-relevant source domains in a single inference call.
- Score: 54.80435317208111
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual
Prompt Tuning (VPT) have found success in enabling adaptation to new domains by
tuning small modules within a transformer model. However, the number of domains
encountered during test time can be very large, and the data is usually
unlabeled. Thus, adaptation to new domains is challenging; it is also
impractical to generate customized tuned modules for each such domain. Toward
addressing these challenges, this work introduces PLUTO: a Plug-and-pLay
modUlar Test-time domain adaptatiOn strategy. We pre-train a large set of
modules, each specialized for different source domains, effectively creating a
``module store''. Given a target domain with few-shot unlabeled data, we
introduce an unsupervised test-time adaptation (TTA) method to (1) select a
sparse subset of relevant modules from this store and (2) create a weighted
combination of selected modules without tuning their weights. This
plug-and-play nature enables us to harness multiple most-relevant source
domains in a single inference call. Comprehensive evaluations demonstrate that
PLUTO uniformly outperforms alternative TTA methods and that selecting $\leq$5
modules suffice to extract most of the benefit. At a high level, our method
equips pre-trained transformers with the capability to dynamically adapt to new
domains, motivating a new paradigm for efficient and scalable domain
adaptation.
Related papers
- StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization [85.18995948334592]
Single domain generalization (single DG) aims at learning a robust model generalizable to unseen domains from only one training domain.
State-of-the-art approaches have mostly relied on data augmentations, such as adversarial perturbation and style enhancement, to synthesize new data.
We propose emphStyDeSty, which explicitly accounts for the alignment of the source and pseudo domains in the process of data augmentation.
arXiv Detail & Related papers (2024-06-01T02:41:34Z) - Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation [59.37775534633868]
We present an extremely straightforward approach to transferring pre-trained, task-specific PEFT modules between same-family PLMs.
We also propose a method that allows the transfer of modules between incompatible PLMs without any change in the inference complexity.
arXiv Detail & Related papers (2024-03-27T17:50:00Z) - Agile Multi-Source-Free Domain Adaptation [25.06352660046911]
Bi-level ATtention ENsemble (Bi-ATEN) module learns both intra-domain weights and inter-domain ensemble weights to achieve a fine balance between instance specificity and domain consistency.
We achieve comparable or even superior performance on a challenging benchmark DomainNet with less than 3% trained parameters and 8 times of throughput compared with SOTA method.
arXiv Detail & Related papers (2024-03-08T05:17:10Z) - Virtual Classification: Modulating Domain-Specific Knowledge for
Multidomain Crowd Counting [67.38137379297717]
Multidomain crowd counting aims to learn a general model for multiple diverse datasets.
Deep networks prefer modeling distributions of the dominant domains instead of all domains, which is known as domain bias.
We propose a Modulating Domain-specific Knowledge Network (MDKNet) to handle the domain bias issue in multidomain crowd counting.
arXiv Detail & Related papers (2024-02-06T06:49:04Z) - TADA: Efficient Task-Agnostic Domain Adaptation for Transformers [3.9379577980832843]
In this work, we introduce TADA, a novel task-agnostic domain adaptation method.
Within TADA, we retrain embeddings to learn domain-aware input representations and tokenizers for the transformer encoder.
We conduct experiments with meta-embeddings and newly introduced meta-tokenizers, resulting in one model per task in multi-domain use cases.
arXiv Detail & Related papers (2023-05-22T04:53:59Z) - AdapterSoup: Weight Averaging to Improve Generalization of Pretrained
Language Models [127.04370753583261]
Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains.
A solution is to use a related-domain adapter for the novel domain at test time.
We introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains.
arXiv Detail & Related papers (2023-02-14T13:09:23Z) - Improving Transferability for Domain Adaptive Detection Transformers [34.61314708197079]
This paper aims to build a simple but effective baseline with a DETR-style detector on domain shift settings.
For one, mitigating the domain shift on the backbone and the decoder output features excels in getting favorable results.
For another, advanced domain alignment methods in both parts further enhance the performance.
arXiv Detail & Related papers (2022-04-29T16:27:10Z) - Unsupervised Domain Adaptation with Adapter [34.22467238579088]
This paper explores an adapter-based fine-tuning approach for unsupervised domain adaptation.
Several trainable adapter modules are inserted in a PrLM, and the embedded generic knowledge is preserved by fixing the parameters of the original PrLM.
Elaborated experiments on two benchmark datasets are carried out, and the results demonstrate that our approach is effective with different tasks, dataset sizes, and domain similarities.
arXiv Detail & Related papers (2021-11-01T02:50:53Z) - Dynamic Transfer for Multi-Source Domain Adaptation [82.54405157719641]
We present dynamic transfer to address domain conflicts, where the model parameters are adapted to samples.
It breaks down source domain barriers and turns multi-source domains into a single-source domain.
Experimental results show that, without using domain labels, our dynamic transfer outperforms the state-of-the-art method by more than 3%.
arXiv Detail & Related papers (2021-03-19T01:22:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.