Evaluating Model Robustness and Stability to Dataset Shift
- URL: http://arxiv.org/abs/2010.15100v2
- Date: Mon, 15 Mar 2021 16:34:55 GMT
- Title: Evaluating Model Robustness and Stability to Dataset Shift
- Authors: Adarsh Subbaswamy, Roy Adams, Suchi Saria
- Abstract summary: We propose a framework for analyzing stability of machine learning models.
We use the original evaluation data to determine distributions under which the algorithm performs poorly.
We estimate the algorithm's performance on the "worst-case" distribution.
- Score: 7.369475193451259
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the use of machine learning in high impact domains becomes widespread, the
importance of evaluating safety has increased. An important aspect of this is
evaluating how robust a model is to changes in setting or population, which
typically requires applying the model to multiple, independent datasets. Since
the cost of collecting such datasets is often prohibitive, in this paper, we
propose a framework for analyzing this type of stability using the available
data. We use the original evaluation data to determine distributions under
which the algorithm performs poorly, and estimate the algorithm's performance
on the "worst-case" distribution. We consider shifts in user defined
conditional distributions, allowing some distributions to shift while keeping
other portions of the data distribution fixed. For example, in a healthcare
context, this allows us to consider shifts in clinical practice while keeping
the patient population fixed. To address the challenges associated with
estimation in complex, high-dimensional distributions, we derive a "debiased"
estimator which maintains $\sqrt{N}$-consistency even when machine learning
methods with slower convergence rates are used to estimate the nuisance
parameters. In experiments on a real medical risk prediction task, we show this
estimator can be used to analyze stability and accounts for realistic shifts
that could not previously be expressed. The proposed framework allows
practitioners to proactively evaluate the safety of their models without
requiring additional data collection.
Related papers
- Evidential time-to-event prediction model with well-calibrated uncertainty estimation [12.446406577462069]
We introduce an evidential regression model designed especially for time-to-event prediction tasks.
The most plausible event time is directly quantified by aggregated Gaussian random fuzzy numbers (GRFNs)
Our model achieves both accurate and reliable performance, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2024-11-12T15:06:04Z) - How Reliable is Your Regression Model's Uncertainty Under Real-World
Distribution Shifts? [46.05502630457458]
We propose a benchmark of 8 image-based regression datasets with different types of challenging distribution shifts.
We find that while methods are well calibrated when there is no distribution shift, they all become highly overconfident on many of the benchmark datasets.
arXiv Detail & Related papers (2023-02-07T18:54:39Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Evaluating Predictive Uncertainty and Robustness to Distributional Shift
Using Real World Data [0.0]
We propose metrics for general regression tasks using the Shifts Weather Prediction dataset.
We also present an evaluation of the baseline methods using these metrics.
arXiv Detail & Related papers (2021-11-08T17:32:10Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions.
We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population.
An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z) - TraDE: Transformers for Density Estimation [101.20137732920718]
TraDE is a self-attention-based architecture for auto-regressive density estimation.
We present a suite of tasks such as regression using generated samples, out-of-distribution detection, and robustness to noise in the training data.
arXiv Detail & Related papers (2020-04-06T07:32:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.