Related papers: Bias as a Virtue: Rethinking Generalization under Distribution Shifts

Bias as a Virtue: Rethinking Generalization under Distribution Shifts

URL: http://arxiv.org/abs/2506.00407v1
Date: Sat, 31 May 2025 05:54:49 GMT
Title: Bias as a Virtue: Rethinking Generalization under Distribution Shifts
Authors: Ruixuan Chen, Wentao Li, Jiahui Xiao, Yuchen Li, Yimin Tang, Xiaonan Wang,
Abstract summary: Machine learning models often degrade when deployed on data distributions different from their training data.<n>We show that higher in-distribution (ID) bias can lead to better out-of-distribution (OOD) generalization.<n>Our work provides both a practical method for improving generalization and a theoretical framework for reconsidering the role of bias in robust machine learning.
Score: 7.389812496011288
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning models often degrade when deployed on data distributions different from their training data. Challenging conventional validation paradigms, we demonstrate that higher in-distribution (ID) bias can lead to better out-of-distribution (OOD) generalization. Our Adaptive Distribution Bridge (ADB) framework implements this insight by introducing controlled statistical diversity during training, enabling models to develop bias profiles that effectively generalize across distributions. Empirically, we observe a robust negative correlation where higher ID bias corresponds to lower OOD error--a finding that contradicts standard practices focused on minimizing validation error. Evaluation on multiple datasets shows our approach significantly improves OOD generalization. ADB achieves robust mean error reductions of up to 26.8% compared to traditional cross-validation, and consistently identifies high-performing training strategies, evidenced by percentile ranks often exceeding 74.4%. Our work provides both a practical method for improving generalization and a theoretical framework for reconsidering the role of bias in robust machine learning.

Related papers

Robust Molecular Property Prediction via Densifying Scarce Labeled Data [53.24886143129006]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel bilevel optimization approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.
arXiv Detail & Related papers (2025-06-13T15:27:40Z)
Learning Where to Learn: Training Distribution Selection for Provable OOD Performance [2.7309692684728617]
Out-of-distribution (OOD) generalization remains a fundamental challenge in machine learning.<n>This paper studies the design of training data distributions that maximize average-case OOD performance.
arXiv Detail & Related papers (2025-05-27T18:00:58Z)
Class-Conditional Distribution Balancing for Group Robust Classification [11.525201208566925]
Spurious correlations that lead models to correct predictions for the wrong reasons pose a critical challenge for robust real-world generalization.<n>We offer a novel perspective by reframing the spurious correlations as imbalances or mismatches in class-conditional distributions.<n>We propose a simple yet effective robust learning method that eliminates the need for both bias annotations and predictions.
arXiv Detail & Related papers (2025-04-24T07:15:53Z)
Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing [15.214861534330236]
We introduce Diffusing DeBias (DDB) as a plug-in for common methods of unsupervised model debiasing.<n>Specifically, our approach adopts conditional diffusion models to generate synthetic bias-aligned images.<n>By tackling the fundamental issue of bias-conflicting training samples in learning auxiliary models, our proposed method beats current state-of-the-art in multiple benchmark datasets.
arXiv Detail & Related papers (2025-02-13T18:17:03Z)
Enhancing Robust Fairness via Confusional Spectral Regularization [6.041034366572273]
We derive a robust generalization bound for the worst-class robust error within the PAC-Bayesian framework.<n>We propose a novel regularization technique to improve worst-class robust accuracy and enhance robust fairness.
arXiv Detail & Related papers (2025-01-22T23:32:19Z)
Mixture Data for Training Cannot Ensure Out-of-distribution Generalization [21.801115344132114]
We show that increasing the size of training data does not always lead to a reduction in the test generalization error. In this work, we quantitatively redefine OOD data as those situated outside the convex hull of mixed training data. Our proof of the new risk bound agrees that the efficacy of well-trained models can be guaranteed for unseen data.
arXiv Detail & Related papers (2023-12-25T11:00:38Z)
Towards Calibrated Robust Fine-Tuning of Vision-Language Models [97.19901765814431]
This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models. We show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data. Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value.
arXiv Detail & Related papers (2023-11-03T05:41:25Z)
DAFT: Distilling Adversarially Fine-tuned Models for Better OOD Generalization [35.53270942633211]
We consider the problem of OOD generalization, where the goal is to train a model that performs well on test distributions that are different from the training distribution. We propose a new method - DAFT - based on the intuition that adversarially robust combination of a large number of rich features should provide OOD robustness. We evaluate DAFT on standard benchmarks in the DomainBed framework, and demonstrate that DAFT achieves significant improvements over the current state-of-the-art OOD generalization methods.
arXiv Detail & Related papers (2022-08-19T03:48:17Z)
General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z)
Self-balanced Learning For Domain Generalization [64.99791119112503]
Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics. Most existing approaches have been developed under the assumption that the source data is well-balanced in terms of both domain and class. We propose a self-balanced domain generalization framework that adaptively learns the weights of losses to alleviate the bias caused by different distributions of the multi-domain source data.
arXiv Detail & Related papers (2021-08-31T03:17:54Z)
Improved OOD Generalization via Adversarial Training and Pre-training [49.08683910076778]
In this paper, we theoretically show that a model robust to input perturbations generalizes well on OOD data. Inspired by previous findings that adversarial training helps improve input-robustness, we show that adversarially trained models have converged excess risk on OOD data.
arXiv Detail & Related papers (2021-05-24T08:06:35Z)
Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance [70.31427277842239]
We introduce a novel debiasing method called confidence regularization. It discourages models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples. We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets.
arXiv Detail & Related papers (2020-05-01T11:22:55Z)
On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.