Related papers: Diagnosing Model Performance Under Distribution Shift

Diagnosing Model Performance Under Distribution Shift

URL: http://arxiv.org/abs/2303.02011v4
Date: Mon, 10 Jul 2023 20:46:14 GMT
Title: Diagnosing Model Performance Under Distribution Shift
Authors: Tiffany Tianhui Cai, Hongseok Namkoong, Steve Yadlowsky
Abstract summary: Prediction models can perform poorly when deployed to target distributions different from the training distribution. Our approach decomposes the performance drop into terms for 1) an increase in harder but frequently seen examples from training, 2) changes in the relationship between features and outcomes, and 3) poor performance on examples infrequent or unseen during training.
Score: 9.143551270841858
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prediction models can perform poorly when deployed to target distributions different from the training distribution. To understand these operational failure modes, we develop a method, called DIstribution Shift DEcomposition (DISDE), to attribute a drop in performance to different types of distribution shifts. Our approach decomposes the performance drop into terms for 1) an increase in harder but frequently seen examples from training, 2) changes in the relationship between features and outcomes, and 3) poor performance on examples infrequent or unseen during training. These terms are defined by fixing a distribution on $X$ while varying the conditional distribution of $Y \mid X$ between training and target, or by fixing the conditional distribution of $Y \mid X$ while varying the distribution on $X$. In order to do this, we define a hypothetical distribution on $X$ consisting of values common in both training and target, over which it is easy to compare $Y \mid X$ and thus predictive performance. We estimate performance on this hypothetical distribution via reweighting methods. Empirically, we show how our method can 1) inform potential modeling improvements across distribution shifts for employment prediction on tabular census data, and 2) help to explain why certain domain adaptation methods fail to improve model performance for satellite image classification.

Related papers

Distributional Training Data Attribution [20.18145179467698]
We introduce distributional training data attribution (d-TDA) to predict how the distribution of model outputs depends upon the dataset.<n>We identify training examples that drastically change the distribution of some target measurement without necessarily changing the mean.<n>We also find that influence functions (IFs) emerge naturally from our distributional framework as the limit to unrolled differentiation.
arXiv Detail & Related papers (2025-06-15T21:02:36Z)
One-step Diffusion Models with $f$-Divergence Distribution Matching [41.21390253053562]
Recent approaches distill a multi-step diffusion model into a single-step student generator via variational score distillation. These approaches use the reverse Kullback-Leibler (KL) divergence for distribution matching which is known to be mode seeking. In this paper, we generalize the distribution matching approach using a novel $f$-divergence minimization framework.
arXiv Detail & Related papers (2025-02-21T18:59:20Z)
Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Fairness Hub Technical Briefs: Definition and Detection of Distribution Shift [0.5825410941577593]
Distribution shift is a common situation in machine learning tasks, where the data used for training a model is different from the data the model is applied to in the real world. This brief focuses on the definition and detection of distribution shifts in educational settings.
arXiv Detail & Related papers (2024-05-23T05:29:36Z)
Ask Your Distribution Shift if Pre-Training is Right for You [74.18516460467019]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others. We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data. Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z)
Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk Minimization Framework [12.734559823650887]
In the presence of distribution shifts, fair machine learning models may behave unfairly on test data. Existing algorithms require full access to data and cannot be used when small batches are used. This paper proposes the first distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph.
arXiv Detail & Related papers (2023-09-20T23:25:28Z)
Distribution Shift Inversion for Out-of-Distribution Prediction [57.22301285120695]
We propose a portable Distribution Shift Inversion algorithm for Out-of-Distribution (OoD) prediction. We show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.
arXiv Detail & Related papers (2023-06-14T08:00:49Z)
Domain Adaptation meets Individual Fairness. And they get along [48.95808607591299]
We show that algorithmic fairness interventions can help machine learning models overcome distribution shifts. In particular, we show that enforcing suitable notions of individual fairness (IF) can improve the out-of-distribution accuracy of ML models.
arXiv Detail & Related papers (2022-05-01T16:19:55Z)
Model Transferability With Responsive Decision Subjects [11.07759054787023]
We formalize the discussions of the transferability of a model by studying how the performance of the model trained on the available source distribution would translate to the performance on its induced domain. We provide both upper bounds for the performance gap due to the induced domain shift, as well as lower bounds for the trade-offs that a classifier has to suffer on either the source training distribution or the induced target distribution.
arXiv Detail & Related papers (2021-07-13T08:21:37Z)
Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions. We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts. We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z)
WILDS: A Benchmark of in-the-Wild Distribution Shifts [157.53410583509924]
Distribution shifts can substantially degrade the accuracy of machine learning systems deployed in the wild. We present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts. We show that standard training results in substantially lower out-of-distribution than in-distribution performance.
arXiv Detail & Related papers (2020-12-14T11:14:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.