DoGE: Domain Reweighting with Generalization Estimation
- URL: http://arxiv.org/abs/2310.15393v2
- Date: Mon, 5 Feb 2024 16:33:05 GMT
- Title: DoGE: Domain Reweighting with Generalization Estimation
- Authors: Simin Fan, Matteo Pagliardini, Martin Jaggi
- Abstract summary: We propose DOmain reweighting with Generalization Estimation (DoGE)
In our experiments, we extensively show how DoGE improves the generalization of the base model to any target data mixture.
DoGE can effectively identify inter-domain dependencies, and consistently achieves better test perplexity on the target domain.
- Score: 42.32000165235568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The coverage and composition of the pretraining data significantly impacts
the generalization ability of Large Language Models (LLMs). Despite its
importance, recent LLMs still rely on heuristics and trial and error to
increase or reduce the influence of data-domains. We propose DOmain reweighting
with Generalization Estimation (DoGE), which optimizes the probability of
sampling from each domain (domain weights) in a principled way. Our approach is
a two-stage process consisting of (i) training a proxy model to obtain domain
weights using a bi-level optimization algorithm; (ii) training a larger base
model by sampling training domains according to the learned domain weights. In
our experiments, we extensively show how DoGE improves the generalization of
the base model to any target data mixture. On the SlimPajama dataset, our base
model gets better perplexity and few-shot reasoning accuracies across $6$ tasks
compared to baseline methods. Moreover, aiming to generalize to out-of-domain
target tasks, which is unseen in the pretraining corpus (OOD domain), DoGE can
effectively identify inter-domain dependencies, and consistently achieves
better test perplexity on the target domain.
Related papers
- Improving Domain Generalization with Domain Relations [77.63345406973097]
This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on.
We propose a new approach called D$3$G to learn domain-specific models.
Our results show that D$3$G consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-06T08:11:16Z) - FIXED: Frustratingly Easy Domain Generalization with Mixup [53.782029033068675]
Domain generalization (DG) aims to learn a generalizable model from multiple training domains such that it can perform well on unseen target domains.
A popular strategy is to augment training data to benefit generalization through methods such as Mixupcitezhang 2018mixup.
We propose a simple yet effective enhancement for Mixup-based DG, namely domain-invariant Feature mIXup (FIX)
Our approach significantly outperforms nine state-of-the-art related methods, beating the best performing baseline by 6.5% on average in terms of test accuracy.
arXiv Detail & Related papers (2022-11-07T09:38:34Z) - Learning to Augment via Implicit Differentiation for Domain
Generalization [107.9666735637355]
Domain generalization (DG) aims to overcome the problem by leveraging multiple source domains to learn a domain-generalizable model.
In this paper, we propose a novel augmentation-based DG approach, dubbed AugLearn.
AugLearn shows effectiveness on three standard DG benchmarks, PACS, Office-Home and Digits-DG.
arXiv Detail & Related papers (2022-10-25T18:51:51Z) - Domain Adaptation Principal Component Analysis: base linear method for
learning with out-of-distribution data [55.41644538483948]
Domain adaptation is a popular paradigm in modern machine learning.
We present a method called Domain Adaptation Principal Component Analysis (DAPCA)
DAPCA finds a linear reduced data representation useful for solving the domain adaptation task.
arXiv Detail & Related papers (2022-08-28T21:10:56Z) - Low-confidence Samples Matter for Domain Adaptation [47.552605279925736]
Domain adaptation (DA) aims to transfer knowledge from a label-rich source domain to a related but label-scarce target domain.
We propose a novel contrastive learning method by processing low-confidence samples.
We evaluate the proposed method in both unsupervised and semi-supervised DA settings.
arXiv Detail & Related papers (2022-02-06T15:45:45Z) - Improving Multi-Domain Generalization through Domain Re-labeling [31.636953426159224]
We study the important link between pre-specified domain labels and the generalization performance.
We introduce a general approach for multi-domain generalization, MulDEns, that uses an ERM-based deep ensembling backbone.
We show that MulDEns does not require tailoring the augmentation strategy or the training process specific to a dataset.
arXiv Detail & Related papers (2021-12-17T23:21:50Z) - Unified Instance and Knowledge Alignment Pretraining for Aspect-based
Sentiment Analysis [96.53859361560505]
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect.
There always exists severe domain shift between the pretraining and downstream ABSA datasets.
We introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline.
arXiv Detail & Related papers (2021-10-26T04:03:45Z) - Adaptive Methods for Real-World Domain Generalization [32.030688845421594]
In our work, we investigate whether it is possible to leverage domain information from unseen test samples themselves.
We propose a domain-adaptive approach consisting of two steps: a) we first learn a discriminative domain embedding from unsupervised training examples, and b) use this domain embedding as supplementary information to build a domain-adaptive model.
Our approach achieves state-of-the-art performance on various domain generalization benchmarks.
arXiv Detail & Related papers (2021-03-29T17:44:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.