Diagnosing Model Performance Under Distribution Shift
- URL: http://arxiv.org/abs/2303.02011v4
- Date: Mon, 10 Jul 2023 20:46:14 GMT
- Title: Diagnosing Model Performance Under Distribution Shift
- Authors: Tiffany Tianhui Cai, Hongseok Namkoong, Steve Yadlowsky
- Abstract summary: Prediction models can perform poorly when deployed to target distributions different from the training distribution.
Our approach decomposes the performance drop into terms for 1) an increase in harder but frequently seen examples from training, 2) changes in the relationship between features and outcomes, and 3) poor performance on examples infrequent or unseen during training.
- Score: 9.143551270841858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prediction models can perform poorly when deployed to target distributions
different from the training distribution. To understand these operational
failure modes, we develop a method, called DIstribution Shift DEcomposition
(DISDE), to attribute a drop in performance to different types of distribution
shifts. Our approach decomposes the performance drop into terms for 1) an
increase in harder but frequently seen examples from training, 2) changes in
the relationship between features and outcomes, and 3) poor performance on
examples infrequent or unseen during training. These terms are defined by
fixing a distribution on $X$ while varying the conditional distribution of $Y
\mid X$ between training and target, or by fixing the conditional distribution
of $Y \mid X$ while varying the distribution on $X$. In order to do this, we
define a hypothetical distribution on $X$ consisting of values common in both
training and target, over which it is easy to compare $Y \mid X$ and thus
predictive performance. We estimate performance on this hypothetical
distribution via reweighting methods. Empirically, we show how our method can
1) inform potential modeling improvements across distribution shifts for
employment prediction on tabular census data, and 2) help to explain why
certain domain adaptation methods fail to improve model performance for
satellite image classification.
Related papers
- Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Fairness Hub Technical Briefs: Definition and Detection of Distribution Shift [0.5825410941577593]
Distribution shift is a common situation in machine learning tasks, where the data used for training a model is different from the data the model is applied to in the real world.
This brief focuses on the definition and detection of distribution shifts in educational settings.
arXiv Detail & Related papers (2024-05-23T05:29:36Z) - Ask Your Distribution Shift if Pre-Training is Right for You [74.18516460467019]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others.
We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data.
Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z) - Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk
Minimization Framework [12.734559823650887]
In the presence of distribution shifts, fair machine learning models may behave unfairly on test data.
Existing algorithms require full access to data and cannot be used when small batches are used.
This paper proposes the first distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph.
arXiv Detail & Related papers (2023-09-20T23:25:28Z) - Distribution Shift Inversion for Out-of-Distribution Prediction [57.22301285120695]
We propose a portable Distribution Shift Inversion algorithm for Out-of-Distribution (OoD) prediction.
We show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.
arXiv Detail & Related papers (2023-06-14T08:00:49Z) - Domain Adaptation meets Individual Fairness. And they get along [48.95808607591299]
We show that algorithmic fairness interventions can help machine learning models overcome distribution shifts.
In particular, we show that enforcing suitable notions of individual fairness (IF) can improve the out-of-distribution accuracy of ML models.
arXiv Detail & Related papers (2022-05-01T16:19:55Z) - Model Transferability With Responsive Decision Subjects [11.07759054787023]
We formalize the discussions of the transferability of a model by studying how the performance of the model trained on the available source distribution would translate to the performance on its induced domain.
We provide both upper bounds for the performance gap due to the induced domain shift, as well as lower bounds for the trade-offs that a classifier has to suffer on either the source training distribution or the induced target distribution.
arXiv Detail & Related papers (2021-07-13T08:21:37Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - WILDS: A Benchmark of in-the-Wild Distribution Shifts [157.53410583509924]
Distribution shifts can substantially degrade the accuracy of machine learning systems deployed in the wild.
We present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts.
We show that standard training results in substantially lower out-of-distribution than in-distribution performance.
arXiv Detail & Related papers (2020-12-14T11:14:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.