Distance Matters For Improving Performance Estimation Under Covariate
Shift
- URL: http://arxiv.org/abs/2308.07223v1
- Date: Mon, 14 Aug 2023 15:49:19 GMT
- Title: Distance Matters For Improving Performance Estimation Under Covariate
Shift
- Authors: M\'elanie Roschewitz and Ben Glocker
- Abstract summary: Under dataset shifts, confidence scores may become ill-calibrated if samples are too far from the training distribution.
We show that taking into account taking into account distances of test samples to their expected training distribution can significantly improve performance estimation.
We demonstrate the effectiveness of this method on 13 image classification tasks, across a wide-range of natural and synthetic distribution shifts.
- Score: 18.68533487971233
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Performance estimation under covariate shift is a crucial component of safe
AI model deployment, especially for sensitive use-cases. Recently, several
solutions were proposed to tackle this problem, most leveraging model
predictions or softmax confidence to derive accuracy estimates. However, under
dataset shifts, confidence scores may become ill-calibrated if samples are too
far from the training distribution. In this work, we show that taking into
account distances of test samples to their expected training distribution can
significantly improve performance estimation under covariate shift. Precisely,
we introduce a "distance-check" to flag samples that lie too far from the
expected distribution, to avoid relying on their untrustworthy model outputs in
the accuracy estimation step. We demonstrate the effectiveness of this method
on 13 image classification tasks, across a wide-range of natural and synthetic
distribution shifts and hundreds of models, with a median relative MAE
improvement of 27% over the best baseline across all tasks, and SOTA
performance on 10 out of 13 tasks. Our code is publicly available at
https://github.com/melanibe/distance_matters_performance_estimation.
Related papers
- PUMA: margin-based data pruning [51.12154122266251]
We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin)
We propose PUMA, a new data pruning strategy that computes the margin using DeepFool.
We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies.
arXiv Detail & Related papers (2024-05-10T08:02:20Z) - Characterizing Out-of-Distribution Error via Optimal Transport [15.284665509194134]
Methods of predicting a model's performance on OOD data without labels are important for machine learning safety.
We introduce a novel method for estimating model performance by leveraging optimal transport theory.
We show that our approaches significantly outperform existing state-of-the-art methods with an up to 3x lower prediction error.
arXiv Detail & Related papers (2023-05-25T01:37:13Z) - Learning Sample Difficulty from Pre-trained Models for Reliable
Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization.
We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z) - Predicting Out-of-Distribution Error with Confidence Optimal Transport [17.564313038169434]
We present a simple yet effective method to predict a model's performance on an unknown distribution without any addition annotation.
We show that our method, Confidence Optimal Transport (COT), provides robust estimates of a model's performance on a target domain.
Despite its simplicity, our method achieves state-of-the-art results on three benchmark datasets and outperforms existing methods by a large margin.
arXiv Detail & Related papers (2023-02-10T02:27:13Z) - Labeling-Free Comparison Testing of Deep Learning Models [28.47632100019289]
We propose a labeling-free comparison testing approach to overcome the limitations of labeling effort and sampling randomness.
Our approach outperforms the baseline methods by up to 0.74 and 0.53 on Spearman's correlation and Kendall's $tau$, regardless of the dataset and distribution shift.
arXiv Detail & Related papers (2022-04-08T10:55:45Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.