Variance Matters: Improving Domain Adaptation via Stratified Sampling
- URL: http://arxiv.org/abs/2512.05226v1
- Date: Thu, 04 Dec 2025 20:01:04 GMT
- Title: Variance Matters: Improving Domain Adaptation via Stratified Sampling
- Authors: Andrea Napoli, Paul White,
- Abstract summary: This paper proposes Variance-Reduced Adaptation via Domain Stratified Sampling (VaRDASS)<n>VaRDASS is the first specialised variance reduction technique for UDA.<n> Experiments on three domain shift datasets improved discrepancy estimation accuracy and target domain performance.
- Score: 1.7188280334580195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during training, but the discrepancy estimates suffer from high variance in stochastic settings, which can stifle the theoretical benefits of the method. This paper proposes Variance-Reduced Domain Adaptation via Stratified Sampling (VaRDASS), the first specialised stochastic variance reduction technique for UDA. We consider two specific discrepancy measures -- correlation alignment and the maximum mean discrepancy (MMD) -- and derive ad hoc stratification objectives for these terms. We then present expected and worst-case error bounds, and prove that our proposed objective for the MMD is theoretically optimal (i.e., minimises the variance) under certain assumptions. Finally, a practical k-means style optimisation algorithm is introduced and analysed. Experiments on three domain shift datasets demonstrate improved discrepancy estimation accuracy and target domain performance.
Related papers
- Target-specific Adaptation and Consistent Degradation Alignment for Cross-Domain Remaining Useful Life Prediction [24.676267074769537]
We propose a novel domain adaptation approach for cross-domain RUL prediction named TACDA.<n>We develop a novel clustering and pairing strategy for consistent alignment between similar degradation stages.<n>Our results demonstrate the remarkable performance of our proposed TACDA method.
arXiv Detail & Related papers (2025-12-02T10:15:14Z) - Domain Adaptation via Feature Refinement [0.3867363075280543]
We propose Domain Adaptation via Feature Refinement (DAFR2), a simple yet effective framework for unsupervised domain adaptation under distribution shift.<n>The proposed method combines three key components: adaptation of Batch Normalization statistics using unlabeled target data, feature distillation from a source-trained model and hypothesis transfer.
arXiv Detail & Related papers (2025-08-22T06:32:19Z) - DIDS: Domain Impact-aware Data Sampling for Large Language Model Training [61.10643823069603]
We present Domain Impact-aware Data Sampling (DIDS) for large language models.<n>DIDS group training data based on learning effects, where a proxy language model and dimensionality reduction are employed.<n>It achieves 3.4% higher average performance while maintaining comparable training efficiency.
arXiv Detail & Related papers (2025-04-17T13:09:38Z) - Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift [9.387706860375461]
A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance.
The prediction interval serves as a crucial tool for characterizing uncertainties induced by their underlying distribution.
We propose methodologies for aggregating prediction intervals to obtain one with minimal width and adequate coverage on the target domain.
arXiv Detail & Related papers (2024-05-16T17:55:42Z) - Unsupervised Domain Adaptation Based on the Predictive Uncertainty of
Models [1.6498361958317636]
Unsupervised domain adaptation (UDA) aims to improve the prediction performance in the target domain under distribution shifts from the source domain.
We present a novel UDA method that learns domain-invariant features that minimize the domain divergence.
arXiv Detail & Related papers (2022-11-16T12:23:32Z) - Domain-Specific Risk Minimization for Out-of-Distribution Generalization [104.17683265084757]
We first establish a generalization bound that explicitly considers the adaptivity gap.
We propose effective gap estimation methods for guiding the selection of a better hypothesis for the target.
The other method is minimizing the gap directly by adapting model parameters using online target samples.
arXiv Detail & Related papers (2022-08-18T06:42:49Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - Regressive Domain Adaptation for Unsupervised Keypoint Detection [67.2950306888855]
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain.
We present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection.
Our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
arXiv Detail & Related papers (2021-03-10T16:45:22Z) - Model-Based Domain Generalization [96.84818110323518]
We propose a novel approach for the domain generalization problem called Model-Based Domain Generalization.
Our algorithms beat the current state-of-the-art methods on the very-recently-proposed WILDS benchmark by up to 20 percentage points.
arXiv Detail & Related papers (2021-02-23T00:59:02Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.