Transformer Based Multi-Source Domain Adaptation
- URL: http://arxiv.org/abs/2009.07806v1
- Date: Wed, 16 Sep 2020 16:56:23 GMT
- Title: Transformer Based Multi-Source Domain Adaptation
- Authors: Dustin Wright and Isabelle Augenstein
- Abstract summary: In practical machine learning settings, the data on which a model must make predictions often come from a different distribution than the data it was trained on.
Here, we investigate the problem of unsupervised multi-source domain adaptation, where a model is trained on labelled data from multiple source domains and must make predictions on a domain for which no labelled data has been seen.
We show that the predictions of large pretrained transformer based domain experts are highly homogenous, making it challenging to learn effective functions for mixing their predictions.
- Score: 53.24606510691877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In practical machine learning settings, the data on which a model must make
predictions often come from a different distribution than the data it was
trained on. Here, we investigate the problem of unsupervised multi-source
domain adaptation, where a model is trained on labelled data from multiple
source domains and must make predictions on a domain for which no labelled data
has been seen. Prior work with CNNs and RNNs has demonstrated the benefit of
mixture of experts, where the predictions of multiple domain expert classifiers
are combined; as well as domain adversarial training, to induce a domain
agnostic representation space. Inspired by this, we investigate how such
methods can be effectively applied to large pretrained transformer models. We
find that domain adversarial training has an effect on the learned
representations of these models while having little effect on their
performance, suggesting that large transformer-based models are already
relatively robust across domains. Additionally, we show that mixture of experts
leads to significant performance improvements by comparing several variants of
mixing functions, including one novel mixture based on attention. Finally, we
demonstrate that the predictions of large pretrained transformer based domain
experts are highly homogenous, making it challenging to learn effective
functions for mixing their predictions.
Related papers
- Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation [59.41178047749177]
We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training.
We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling.
We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE.
arXiv Detail & Related papers (2024-07-01T09:45:22Z) - SALUDA: Surface-based Automotive Lidar Unsupervised Domain Adaptation [62.889835139583965]
We introduce an unsupervised auxiliary task of learning an implicit underlying surface representation simultaneously on source and target data.
As both domains share the same latent representation, the model is forced to accommodate discrepancies between the two sources of data.
Our experiments demonstrate that our method achieves a better performance than the current state of the art, both in real-to-real and synthetic-to-real scenarios.
arXiv Detail & Related papers (2023-04-06T17:36:23Z) - Improving Domain Generalization with Domain Relations [77.63345406973097]
This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on.
We propose a new approach called D$3$G to learn domain-specific models.
Our results show that D$3$G consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-06T08:11:16Z) - Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from
Mixture-of-Experts [33.21435044949033]
Most existing methods perform training on multiple source domains using a single model.
We propose a novel framework for unsupervised test-time adaptation, which is formulated as a knowledge distillation process.
arXiv Detail & Related papers (2022-10-08T02:28:10Z) - Multiple-Source Domain Adaptation via Coordinated Domain Encoders and
Paired Classifiers [1.52292571922932]
We present a novel model for text classification under domain shift.
It exploits the update representations to dynamically integrate domain encoders.
It also employs a probabilistic model to infer the error rate in the target domain.
arXiv Detail & Related papers (2022-01-28T00:50:01Z) - Improving Transferability of Domain Adaptation Networks Through Domain
Alignment Layers [1.3766148734487902]
Multi-source unsupervised domain adaptation (MSDA) aims at learning a predictor for an unlabeled domain by assigning weak knowledge from a bag of source models.
We propose to embed Multi-Source version of DomaIn Alignment Layers (MS-DIAL) at different levels of the predictor.
Our approach can improve state-of-the-art MSDA methods, yielding relative gains of up to +30.64% on their classification accuracies.
arXiv Detail & Related papers (2021-09-06T18:41:19Z) - Self-balanced Learning For Domain Generalization [64.99791119112503]
Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics.
Most existing approaches have been developed under the assumption that the source data is well-balanced in terms of both domain and class.
We propose a self-balanced domain generalization framework that adaptively learns the weights of losses to alleviate the bias caused by different distributions of the multi-domain source data.
arXiv Detail & Related papers (2021-08-31T03:17:54Z) - A Brief Review of Domain Adaptation [1.2043574473965317]
This paper focuses on unsupervised domain adaptation, where the labels are only available in the source domain.
It presents some successful shallow and deep domain adaptation approaches that aim to deal with domain adaptation problems.
arXiv Detail & Related papers (2020-10-07T07:05:32Z) - Adaptive Risk Minimization: Learning to Adapt to Domain Shift [109.87561509436016]
A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution.
In this work, we consider the problem setting of domain generalization, where the training data are structured into domains and there may be multiple test time shifts.
We introduce the framework of adaptive risk minimization (ARM), in which models are directly optimized for effective adaptation to shift by learning to adapt on the training domains.
arXiv Detail & Related papers (2020-07-06T17:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.