Related papers: TADA: Efficient Task-Agnostic Domain Adaptation for Transformers

TADA: Efficient Task-Agnostic Domain Adaptation for Transformers

URL: http://arxiv.org/abs/2305.12717v1
Date: Mon, 22 May 2023 04:53:59 GMT
Title: TADA: Efficient Task-Agnostic Domain Adaptation for Transformers
Authors: Chia-Chien Hung, Lukas Lange, Jannik Str\"otgen
Abstract summary: In this work, we introduce TADA, a novel task-agnostic domain adaptation method. Within TADA, we retrain embeddings to learn domain-aware input representations and tokenizers for the transformer encoder. We conduct experiments with meta-embeddings and newly introduced meta-tokenizers, resulting in one model per task in multi-domain use cases.
Score: 3.9379577980832843
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Intermediate training of pre-trained transformer-based language models on domain-specific data leads to substantial gains for downstream tasks. To increase efficiency and prevent catastrophic forgetting alleviated from full domain-adaptive pre-training, approaches such as adapters have been developed. However, these require additional parameters for each layer, and are criticized for their limited expressiveness. In this work, we introduce TADA, a novel task-agnostic domain adaptation method which is modular, parameter-efficient, and thus, data-efficient. Within TADA, we retrain the embeddings to learn domain-aware input representations and tokenizers for the transformer encoder, while freezing all other parameters of the model. Then, task-specific fine-tuning is performed. We further conduct experiments with meta-embeddings and newly introduced meta-tokenizers, resulting in one model per task in multi-domain use cases. Our broad evaluation in 4 downstream tasks for 14 domains across single- and multi-domain setups and high- and low-resource scenarios reveals that TADA is an effective and efficient alternative to full domain-adaptive pre-training and adapters for domain adaptation, while not introducing additional parameters or complex training steps.

Related papers

GeneralizeFormer: Layer-Adaptive Model Generation across Test-Time Distribution Shifts [58.95913531746308]
We consider the problem of test-time domain generalization, where a model is trained on several source domains and adjusted on target domains never seen during training. We propose to generate multiple layer parameters on the fly during inference by a lightweight meta-learned transformer, which we call textitGeneralizeFormer.
arXiv Detail & Related papers (2025-02-15T10:10:49Z)
EUDA: An Efficient Unsupervised Domain Adaptation via Self-Supervised Vision Transformer [21.59850502993888]
Unsupervised domain adaptation (UDA) aims to mitigate the domain shift issue, where the distribution of training (source) data differs from that of testing (target) data. Many models have been developed to tackle this problem, and recently vision transformers (ViTs) have shown promising results. This paper introduces an efficient model that reduces trainable parameters and allows for adjustable complexity.
arXiv Detail & Related papers (2024-07-31T03:29:28Z)
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models. Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs. In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z)
ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation [48.039156140237615]
A Continual Test-Time Adaptation task is proposed to adapt the pre-trained model to continually changing target domains. We design a Visual Domain Adapter (ViDA) for CTTA, explicitly handling both domain-specific and domain-shared knowledge. Our proposed method achieves state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-06-07T11:18:53Z)
Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation [86.02485817444216]
We introduce Multi-Prompt Alignment (MPA), a simple yet efficient framework for multi-source UDA. MPA denoises the learned prompts through an auto-encoding process and aligns them by maximizing the agreement of all the reconstructed prompts. Experiments show that MPA achieves state-of-the-art results on three popular datasets with an impressive average accuracy of 54.1% on DomainNet.
arXiv Detail & Related papers (2022-09-30T03:40:10Z)
Meta-Learning the Difference: Preparing Large Language Models for Efficient Adaptation [11.960178399478718]
Large pretrained language models (PLMs) are often domain- or task-adapted via fine-tuning or prompting. Instead, we prepare PLMs for data- and parameter-efficient adaptation by learning to learn the difference between general and adapted PLMs.
arXiv Detail & Related papers (2022-07-07T18:00:22Z)
AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z)
AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks. Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo. Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z)
Unsupervised Domain Adaptation with Adapter [34.22467238579088]
This paper explores an adapter-based fine-tuning approach for unsupervised domain adaptation. Several trainable adapter modules are inserted in a PrLM, and the embedded generic knowledge is preserved by fixing the parameters of the original PrLM. Elaborated experiments on two benchmark datasets are carried out, and the results demonstrate that our approach is effective with different tasks, dataset sizes, and domain similarities.
arXiv Detail & Related papers (2021-11-01T02:50:53Z)
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks. A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains. Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z)
Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec. PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks. We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.