Related papers: Transformer Based Multi-Source Domain Adaptation

Transformer Based Multi-Source Domain Adaptation

URL: http://arxiv.org/abs/2009.07806v1
Date: Wed, 16 Sep 2020 16:56:23 GMT
Title: Transformer Based Multi-Source Domain Adaptation
Authors: Dustin Wright and Isabelle Augenstein
Abstract summary: In practical machine learning settings, the data on which a model must make predictions often come from a different distribution than the data it was trained on. Here, we investigate the problem of unsupervised multi-source domain adaptation, where a model is trained on labelled data from multiple source domains and must make predictions on a domain for which no labelled data has been seen. We show that the predictions of large pretrained transformer based domain experts are highly homogenous, making it challenging to learn effective functions for mixing their predictions.
Score: 53.24606510691877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In practical machine learning settings, the data on which a model must make predictions often come from a different distribution than the data it was trained on. Here, we investigate the problem of unsupervised multi-source domain adaptation, where a model is trained on labelled data from multiple source domains and must make predictions on a domain for which no labelled data has been seen. Prior work with CNNs and RNNs has demonstrated the benefit of mixture of experts, where the predictions of multiple domain expert classifiers are combined; as well as domain adversarial training, to induce a domain agnostic representation space. Inspired by this, we investigate how such methods can be effectively applied to large pretrained transformer models. We find that domain adversarial training has an effect on the learned representations of these models while having little effect on their performance, suggesting that large transformer-based models are already relatively robust across domains. Additionally, we show that mixture of experts leads to significant performance improvements by comparing several variants of mixing functions, including one novel mixture based on attention. Finally, we demonstrate that the predictions of large pretrained transformer based domain experts are highly homogenous, making it challenging to learn effective functions for mixing their predictions.

Related papers

Commute Your Domains: Trajectory Optimality Criterion for Multi-Domain Learning [50.80758278865274]
In multi-domain learning, a single model is trained on diverse data domains to leverage shared knowledge and improve generalization. The order in which the data from these domains is used for training can significantly affect the model's performance on each domain. We investigate the influence of training order (or data mixing) in multi-domain learning using the concept of Lie bracket of gradient vector fields.
arXiv Detail & Related papers (2025-01-26T15:12:06Z)
Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation [59.41178047749177]
We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training. We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling. We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE.
arXiv Detail & Related papers (2024-07-01T09:45:22Z)
SALUDA: Surface-based Automotive Lidar Unsupervised Domain Adaptation [62.889835139583965]
We introduce an unsupervised auxiliary task of learning an implicit underlying surface representation simultaneously on source and target data. As both domains share the same latent representation, the model is forced to accommodate discrepancies between the two sources of data. Our experiments demonstrate that our method achieves a better performance than the current state of the art, both in real-to-real and synthetic-to-real scenarios.
arXiv Detail & Related papers (2023-04-06T17:36:23Z)
Improving Domain Generalization with Domain Relations [77.63345406973097]
This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on. We propose a new approach called D$3$G to learn domain-specific models. Our results show that D$3$G consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-06T08:11:16Z)
Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts [33.21435044949033]
Most existing methods perform training on multiple source domains using a single model. We propose a novel framework for unsupervised test-time adaptation, which is formulated as a knowledge distillation process.
arXiv Detail & Related papers (2022-10-08T02:28:10Z)
Multiple-Source Domain Adaptation via Coordinated Domain Encoders and Paired Classifiers [1.52292571922932]
We present a novel model for text classification under domain shift. It exploits the update representations to dynamically integrate domain encoders. It also employs a probabilistic model to infer the error rate in the target domain.
arXiv Detail & Related papers (2022-01-28T00:50:01Z)
Improving Transferability of Domain Adaptation Networks Through Domain Alignment Layers [1.3766148734487902]
Multi-source unsupervised domain adaptation (MSDA) aims at learning a predictor for an unlabeled domain by assigning weak knowledge from a bag of source models. We propose to embed Multi-Source version of DomaIn Alignment Layers (MS-DIAL) at different levels of the predictor. Our approach can improve state-of-the-art MSDA methods, yielding relative gains of up to +30.64% on their classification accuracies.
arXiv Detail & Related papers (2021-09-06T18:41:19Z)
Self-balanced Learning For Domain Generalization [64.99791119112503]
Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics. Most existing approaches have been developed under the assumption that the source data is well-balanced in terms of both domain and class. We propose a self-balanced domain generalization framework that adaptively learns the weights of losses to alleviate the bias caused by different distributions of the multi-domain source data.
arXiv Detail & Related papers (2021-08-31T03:17:54Z)
A Brief Review of Domain Adaptation [1.2043574473965317]
This paper focuses on unsupervised domain adaptation, where the labels are only available in the source domain. It presents some successful shallow and deep domain adaptation approaches that aim to deal with domain adaptation problems.
arXiv Detail & Related papers (2020-10-07T07:05:32Z)
Adaptive Risk Minimization: Learning to Adapt to Domain Shift [109.87561509436016]
A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution. In this work, we consider the problem setting of domain generalization, where the training data are structured into domains and there may be multiple test time shifts. We introduce the framework of adaptive risk minimization (ARM), in which models are directly optimized for effective adaptation to shift by learning to adapt on the training domains.
arXiv Detail & Related papers (2020-07-06T17:59:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.