Spuriosity Didn't Kill the Classifier: Using Invariant Predictions to
Harness Spurious Features
- URL: http://arxiv.org/abs/2307.09933v2
- Date: Wed, 8 Nov 2023 09:14:51 GMT
- Title: Spuriosity Didn't Kill the Classifier: Using Invariant Predictions to
Harness Spurious Features
- Authors: Cian Eastwood, Shashank Singh, Andrei Liviu Nicolicioiu, Marin
Vlastelica, Julius von K\"ugelgen, Bernhard Sch\"olkopf
- Abstract summary: Stable Feature Boosting (SFB) is an algorithm for learning a predictor that separates stable and conditionally-independent unstable features.
We show that SFB can learn anally-optimal predictor without test-domain labels.
Empirically, we demonstrate the effectiveness of SFB on real and synthetic data.
- Score: 19.312258609611686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To avoid failures on out-of-distribution data, recent works have sought to
extract features that have an invariant or stable relationship with the label
across domains, discarding "spurious" or unstable features whose relationship
with the label changes across domains. However, unstable features often carry
complementary information that could boost performance if used correctly in the
test domain. In this work, we show how this can be done without test-domain
labels. In particular, we prove that pseudo-labels based on stable features
provide sufficient guidance for doing so, provided that stable and unstable
features are conditionally independent given the label. Based on this
theoretical insight, we propose Stable Feature Boosting (SFB), an algorithm
for: (i) learning a predictor that separates stable and
conditionally-independent unstable features; and (ii) using the stable-feature
predictions to adapt the unstable-feature predictions in the test domain.
Theoretically, we prove that SFB can learn an asymptotically-optimal predictor
without test-domain labels. Empirically, we demonstrate the effectiveness of
SFB on real and synthetic data.
Related papers
- Adapting to Shifting Correlations with Unlabeled Data Calibration [6.84735357291896]
Distribution shifts between sites can seriously degrade model performance since models are prone to exploiting unstable correlations.
We propose Generalized Prevalence Adjustment (GPA), a flexible method that adjusts model predictions to the shifting correlations between prediction target and confounders.
GPA can infer the interaction between target and confounders in new sites using unlabeled samples from those sites.
arXiv Detail & Related papers (2024-09-09T18:45:43Z) - Confident Sinkhorn Allocation for Pseudo-Labeling [40.883130133661304]
Semi-supervised learning is a critical tool in reducing machine learning's dependence on labeled data.
This paper studies theoretically the role of uncertainty to pseudo-labeling and proposes Confident Sinkhorn Allocation (CSA)
CSA identifies the best pseudo-label allocation via optimal transport to only samples with high confidence scores.
arXiv Detail & Related papers (2022-06-13T02:16:26Z) - Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z) - Unsupervised Robust Domain Adaptation without Source Data [75.85602424699447]
We study the problem of robust domain adaptation in the context of unavailable target labels and source data.
We show a consistent performance improvement of over $10%$ in accuracy against the tested baselines on four benchmark datasets.
arXiv Detail & Related papers (2021-03-26T16:42:28Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z) - Self-training Avoids Using Spurious Features Under Domain Shift [54.794607791641745]
In unsupervised domain adaptation, conditional entropy minimization and pseudo-labeling work even when the domain shifts are much larger than those analyzed by existing theory.
We identify and analyze one particular setting where the domain shift can be large, but certain spurious features correlate with label in the source domain but are independent label in the target.
arXiv Detail & Related papers (2020-06-17T17:51:42Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.