Causally-motivated Shortcut Removal Using Auxiliary Labels
- URL: http://arxiv.org/abs/2105.06422v1
- Date: Thu, 13 May 2021 16:58:45 GMT
- Title: Causally-motivated Shortcut Removal Using Auxiliary Labels
- Authors: Maggie Makar, Ben Packer, Dan Moldovan, Davis Blalock, Yoni Halpern,
Alexander D'Amour
- Abstract summary: Key challenge to learning such risk-invariant predictors is shortcut learning.
We propose a flexible, causally-motivated approach to address this challenge.
We show both theoretically and empirically that this causally-motivated regularization scheme yields robust predictors.
- Score: 63.686580185674195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robustness to certain distribution shifts is a key requirement in many ML
applications. Often, relevant distribution shifts can be formulated in terms of
interventions on the process that generates the input data. Here, we consider
the problem of learning a predictor whose risk across such shifts is invariant.
A key challenge to learning such risk-invariant predictors is shortcut
learning, or the tendency for models to rely on spurious correlations in
practice, even when a predictor based on shift-invariant features could achieve
optimal i.i.d generalization in principle. We propose a flexible,
causally-motivated approach to address this challenge. Specifically, we propose
a regularization scheme that makes use of auxiliary labels for potential
shortcut features, which are often available at training time. Drawing on the
causal structure of the problem, we enforce a conditional independence between
the representation used to predict the main label and the auxiliary labels. We
show both theoretically and empirically that this causally-motivated
regularization scheme yields robust predictors that generalize well both
in-distribution and under distribution shifts, and does so with better sample
efficiency than standard regularization or weighting approaches.
Related papers
- Decision-Focused Evaluation of Worst-Case Distribution Shift [18.98504221245623]
We introduce a novel framework to identify worst-case distribution shifts in predictive resource allocation settings.
We show that the problem can be reformulated as a submodular optimization problem, enabling efficient approximations of worst-case loss.
Applying our framework to real data, we find empirical evidence that worst-case shifts identified by one metric often significantly diverge from worst-case distributions identified by other metrics.
arXiv Detail & Related papers (2024-07-04T01:00:53Z) - Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them) [14.396431159723297]
We show that conformal prediction can theoretically be extended to textitany joint data distribution.
Although the most general case is exceedingly impractical to compute, for concrete practical applications we outline a procedure for deriving specific conformal algorithms.
arXiv Detail & Related papers (2024-05-10T17:40:24Z) - STRAPPER: Preference-based Reinforcement Learning via Self-training
Augmentation and Peer Regularization [18.811470043767713]
Preference-based reinforcement learning (PbRL) promises to learn a complex reward function with binary human preference.
We present a self-training method along with our proposed peer regularization, which penalizes the reward model memorizing uninformative labels and acquires confident predictions.
arXiv Detail & Related papers (2023-07-19T00:31:58Z) - On Regularization and Inference with Label Constraints [62.60903248392479]
We compare two strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference.
For regularization, we show that it narrows the generalization gap by precluding models that are inconsistent with the constraints.
For constrained inference, we show that it reduces the population risk by correcting a model's violation, and hence turns the violation into an advantage.
arXiv Detail & Related papers (2023-07-08T03:39:22Z) - When Does Confidence-Based Cascade Deferral Suffice? [69.28314307469381]
Cascades are a classical strategy to enable inference cost to vary adaptively across samples.
A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction.
Despite being oblivious to the structure of the cascade, confidence-based deferral often works remarkably well in practice.
arXiv Detail & Related papers (2023-07-06T04:13:57Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Self-supervised debiasing using low rank regularization [59.84695042540525]
Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability.
We propose a self-supervised debiasing framework potentially compatible with unlabeled samples.
Remarkably, the proposed debiasing framework significantly improves the generalization performance of self-supervised learning baselines.
arXiv Detail & Related papers (2022-10-11T08:26:19Z) - Domain Adaptation meets Individual Fairness. And they get along [48.95808607591299]
We show that algorithmic fairness interventions can help machine learning models overcome distribution shifts.
In particular, we show that enforcing suitable notions of individual fairness (IF) can improve the out-of-distribution accuracy of ML models.
arXiv Detail & Related papers (2022-05-01T16:19:55Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.