TADA: Efficient Task-Agnostic Domain Adaptation for Transformers
- URL: http://arxiv.org/abs/2305.12717v1
- Date: Mon, 22 May 2023 04:53:59 GMT
- Title: TADA: Efficient Task-Agnostic Domain Adaptation for Transformers
- Authors: Chia-Chien Hung, Lukas Lange, Jannik Str\"otgen
- Abstract summary: In this work, we introduce TADA, a novel task-agnostic domain adaptation method.
Within TADA, we retrain embeddings to learn domain-aware input representations and tokenizers for the transformer encoder.
We conduct experiments with meta-embeddings and newly introduced meta-tokenizers, resulting in one model per task in multi-domain use cases.
- Score: 3.9379577980832843
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Intermediate training of pre-trained transformer-based language models on
domain-specific data leads to substantial gains for downstream tasks. To
increase efficiency and prevent catastrophic forgetting alleviated from full
domain-adaptive pre-training, approaches such as adapters have been developed.
However, these require additional parameters for each layer, and are criticized
for their limited expressiveness. In this work, we introduce TADA, a novel
task-agnostic domain adaptation method which is modular, parameter-efficient,
and thus, data-efficient. Within TADA, we retrain the embeddings to learn
domain-aware input representations and tokenizers for the transformer encoder,
while freezing all other parameters of the model. Then, task-specific
fine-tuning is performed. We further conduct experiments with meta-embeddings
and newly introduced meta-tokenizers, resulting in one model per task in
multi-domain use cases. Our broad evaluation in 4 downstream tasks for 14
domains across single- and multi-domain setups and high- and low-resource
scenarios reveals that TADA is an effective and efficient alternative to full
domain-adaptive pre-training and adapters for domain adaptation, while not
introducing additional parameters or complex training steps.
Related papers
- EUDA: An Efficient Unsupervised Domain Adaptation via Self-Supervised Vision Transformer [21.59850502993888]
Unsupervised domain adaptation (UDA) aims to mitigate the domain shift issue, where the distribution of training (source) data differs from that of testing (target) data.
Many models have been developed to tackle this problem, and recently vision transformers (ViTs) have shown promising results.
This paper introduces an efficient model that reduces trainable parameters and allows for adjustable complexity.
arXiv Detail & Related papers (2024-07-31T03:29:28Z) - Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models.
Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs.
In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z) - ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation [48.039156140237615]
A Continual Test-Time Adaptation task is proposed to adapt the pre-trained model to continually changing target domains.
We design a Visual Domain Adapter (ViDA) for CTTA, explicitly handling both domain-specific and domain-shared knowledge.
Our proposed method achieves state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-06-07T11:18:53Z) - Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation [86.02485817444216]
We introduce Multi-Prompt Alignment (MPA), a simple yet efficient framework for multi-source UDA.
MPA denoises the learned prompts through an auto-encoding process and aligns them by maximizing the agreement of all the reconstructed prompts.
Experiments show that MPA achieves state-of-the-art results on three popular datasets with an impressive average accuracy of 54.1% on DomainNet.
arXiv Detail & Related papers (2022-09-30T03:40:10Z) - Meta-Learning the Difference: Preparing Large Language Models for
Efficient Adaptation [11.960178399478718]
Large pretrained language models (PLMs) are often domain- or task-adapted via fine-tuning or prompting.
Instead, we prepare PLMs for data- and parameter-efficient adaptation by learning to learn the difference between general and adapted PLMs.
arXiv Detail & Related papers (2022-07-07T18:00:22Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks.
Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo.
Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z) - Unsupervised Domain Adaptation with Adapter [34.22467238579088]
This paper explores an adapter-based fine-tuning approach for unsupervised domain adaptation.
Several trainable adapter modules are inserted in a PrLM, and the embedded generic knowledge is preserved by fixing the parameters of the original PrLM.
Elaborated experiments on two benchmark datasets are carried out, and the results demonstrate that our approach is effective with different tasks, dataset sizes, and domain similarities.
arXiv Detail & Related papers (2021-11-01T02:50:53Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.