"Why did the Model Fail?": Attributing Model Performance Changes to
Distribution Shifts
- URL: http://arxiv.org/abs/2210.10769v3
- Date: Tue, 6 Jun 2023 07:05:12 GMT
- Title: "Why did the Model Fail?": Attributing Model Performance Changes to
Distribution Shifts
- Authors: Haoran Zhang, Harvineet Singh, Marzyeh Ghassemi, Shalmali Joshi
- Abstract summary: We introduce the problem of attributing performance differences between environments to distribution shifts in the underlying data generating mechanisms.
We derive an importance weighting method for computing the value of an arbitrary set of distributions.
We demonstrate the correctness and utility of our method on synthetic, semi-synthetic, and real-world case studies.
- Score: 17.381178048938068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models frequently experience performance drops under
distribution shifts. The underlying cause of such shifts may be multiple
simultaneous factors such as changes in data quality, differences in specific
covariate distributions, or changes in the relationship between label and
features. When a model does fail during deployment, attributing performance
change to these factors is critical for the model developer to identify the
root cause and take mitigating actions. In this work, we introduce the problem
of attributing performance differences between environments to distribution
shifts in the underlying data generating mechanisms. We formulate the problem
as a cooperative game where the players are distributions. We define the value
of a set of distributions to be the change in model performance when only this
set of distributions has changed between environments, and derive an importance
weighting method for computing the value of an arbitrary set of distributions.
The contribution of each distribution to the total performance change is then
quantified as its Shapley value. We demonstrate the correctness and utility of
our method on synthetic, semi-synthetic, and real-world case studies, showing
its effectiveness in attributing performance changes to a wide range of
distribution shifts.
Related papers
- Omitted Variable Bias in Language Models Under Distribution Shift [22.663393629883206]
We show how distribution shifts in language models can be separated into observable and unobservable components.<n>We introduce a framework that maps the strength of the omitted variables to bounds on the worst-case generalization performance of language models.
arXiv Detail & Related papers (2026-02-18T19:00:05Z) - Quantifying Uncertainty and Variability in Machine Learning: Confidence Intervals for Quantiles in Performance Metric Distributions [0.17265013728931003]
Machine learning models are widely used in applications where reliability and robustness are critical.
Model evaluation often relies on single-point estimates of performance metrics that fail to capture the inherent variability in model performance.
This contribution explores the use of quantiles and confidence intervals to analyze such distributions, providing a more complete understanding of model performance and its uncertainty.
arXiv Detail & Related papers (2025-01-28T13:21:34Z) - Proxy Methods for Domain Adaptation [78.03254010884783]
proxy variables allow for adaptation to distribution shift without explicitly recovering or modeling latent variables.
We develop a two-stage kernel estimation approach to adapt to complex distribution shifts in both settings.
arXiv Detail & Related papers (2024-03-12T09:32:41Z) - Explanation Shift: How Did the Distribution Shift Impact the Model? [23.403838118256907]
We study how explanation characteristics shift when affected by distribution shifts.
We analyze different types of distribution shifts using synthetic examples and real-world data sets.
We release our methods in an open-source Python package, as well as the code used to reproduce our experiments.
arXiv Detail & Related papers (2023-03-14T17:13:01Z) - Diagnosing Model Performance Under Distribution Shift [9.143551270841858]
Prediction models can perform poorly when deployed to target distributions different from the training distribution.
Our approach decomposes the performance drop into terms for 1) an increase in harder but frequently seen examples from training, 2) changes in the relationship between features and outcomes, and 3) poor performance on examples infrequent or unseen during training.
arXiv Detail & Related papers (2023-03-03T15:27:16Z) - Robust Calibration with Multi-domain Temperature Scaling [86.07299013396059]
We develop a systematic calibration model to handle distribution shifts by leveraging data from multiple domains.
Our proposed method -- multi-domain temperature scaling -- uses the robustness in the domains to improve calibration under distribution shift.
arXiv Detail & Related papers (2022-06-06T17:32:12Z) - Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution.
We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z) - Accuracy on the Line: On the Strong Correlation Between
Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts.
Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet.
We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Estimating Generalization under Distribution Shifts via Domain-Invariant
Representations [75.74928159249225]
We use a set of domain-invariant predictors as a proxy for the unknown, true target labels.
The error of the resulting risk estimate depends on the target risk of the proxy model.
arXiv Detail & Related papers (2020-07-06T17:21:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.