Characterizing Out-of-Distribution Error via Optimal Transport
- URL: http://arxiv.org/abs/2305.15640v3
- Date: Fri, 27 Oct 2023 21:33:27 GMT
- Title: Characterizing Out-of-Distribution Error via Optimal Transport
- Authors: Yuzhe Lu, Yilong Qin, Runtian Zhai, Andrew Shen, Ketong Chen, Zhenlin
Wang, Soheil Kolouri, Simon Stepputtis, Joseph Campbell, Katia Sycara
- Abstract summary: Methods of predicting a model's performance on OOD data without labels are important for machine learning safety.
We introduce a novel method for estimating model performance by leveraging optimal transport theory.
We show that our approaches significantly outperform existing state-of-the-art methods with an up to 3x lower prediction error.
- Score: 15.284665509194134
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Out-of-distribution (OOD) data poses serious challenges in deployed machine
learning models, so methods of predicting a model's performance on OOD data
without labels are important for machine learning safety. While a number of
methods have been proposed by prior work, they often underestimate the actual
error, sometimes by a large margin, which greatly impacts their applicability
to real tasks. In this work, we identify pseudo-label shift, or the difference
between the predicted and true OOD label distributions, as a key indicator to
this underestimation. Based on this observation, we introduce a novel method
for estimating model performance by leveraging optimal transport theory,
Confidence Optimal Transport (COT), and show that it provably provides more
robust error estimates in the presence of pseudo-label shift. Additionally, we
introduce an empirically-motivated variant of COT, Confidence Optimal Transport
with Thresholding (COTT), which applies thresholding to the individual
transport costs and further improves the accuracy of COT's error estimates. We
evaluate COT and COTT on a variety of standard benchmarks that induce various
types of distribution shift -- synthetic, novel subpopulation, and natural --
and show that our approaches significantly outperform existing state-of-the-art
methods with an up to 3x lower prediction error.
Related papers
- Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Distance Matters For Improving Performance Estimation Under Covariate
Shift [18.68533487971233]
Under dataset shifts, confidence scores may become ill-calibrated if samples are too far from the training distribution.
We show that taking into account taking into account distances of test samples to their expected training distribution can significantly improve performance estimation.
We demonstrate the effectiveness of this method on 13 image classification tasks, across a wide-range of natural and synthetic distribution shifts.
arXiv Detail & Related papers (2023-08-14T15:49:19Z) - Predicting Out-of-Distribution Error with Confidence Optimal Transport [17.564313038169434]
We present a simple yet effective method to predict a model's performance on an unknown distribution without any addition annotation.
We show that our method, Confidence Optimal Transport (COT), provides robust estimates of a model's performance on a target domain.
Despite its simplicity, our method achieves state-of-the-art results on three benchmark datasets and outperforms existing methods by a large margin.
arXiv Detail & Related papers (2023-02-10T02:27:13Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions.
We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population.
An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.